I’ve long admired the work of the Open Source Malaria Project. Unfortunately time and “day job” constraints prevent me from being as involved as I’d like.
So: I was happy to make a small contribution recently in response to this request for help:
Can anyone help @O_S_M to convert this spreadsheet ( malaria.ourexperiment.org/biological_dat…) into chemical structures with data? #openscience #realtimechem—
Alice Williamson (@all_isee) June 24, 2014
Note – this all works fine under Linux; there seem to be some issues with Open Babel library files under OSX.
First step: make that data usable by rescuing it from the spreadsheet 😉 We’ll clean up a column name too.
library(XLConnect) mmv <- readWorksheetFromFile("TP compounds with solid amounts 14_3_14.xlsx", sheet = "Sheet1") colnames(mmv) <- "EC50" head(mmv) COMPOUND_ID Smiles MW QUANTITY.REMAINING..mg. 1 MMV668822 c1[n+](cc2n(c1OCCc1cc(c(cc1)F)F)c(nn2)c1ccc(cc1)OC(F)F)[O-] 434.35 0.0 2 MMV668823 c1nc(c2n(c1OCCc1cc(c(cc1)F)F)c(nn2)c1ccc(cc1)OC(F)F)Cl 452.79 0.0 3 MMV668824 c1ncc2n(c1CCO)c(nn2)c1ccc(cc1)OC(F)F 306.27 29.6 4 MMV668955 C1NCc2n(C1CCO)c(nn2)c1ccc(cc1)OC(F)F 310.30 18.5 5 MMV668956 C1(CN(C1)c1cc(c(cc1)F)F)Oc1cncc2n1c(nn2)c1ccc(cc1)OC(F)F 445.38 124.2 6 MMV668957 c1ncc2n(c1N1CCC(C1)c1ccccc1)c(nn2)c1ccc(cc1)OC(F)F 407.42 68.5 EC50 New.quantity.remaining 1 4.01 0 2 0.16 0 3 10.00 29 4 8.37 18 5 0.43 124 6 2.00 62
What OSM would like: an output file in Chemical Markup Language, containing the Compound ID and properties (MW and EC50).
The ChemmineR package makes conversion of SMILES strings to other formats pretty straightforward; we start by converting to Structure Data Format (SDF):
library(ChemmineR) library(ChemmineOB) mmv.sdf <- smiles2sdf(mmv$Smiles)
That will throw a warning, since all molecules in the SDF object have the same CID; currently, no CID (empty string). We add the CID using the compound ID, then use datablock() to add properties:
cid(mmv.sdf) <- mmv$COMPOUND_ID datablock(mmv.sdf) <- data.frame(MW = mmv$MW, EC50 = mmv$EC50)
Now we can write out to a SDF file. We could also use a loop or an apply function to write individual files per molecule.
write.SDF(mmv.sdf, "mmv-all.sdf", cid = TRUE)
It would be nice to stay in the one R script for conversion to CML too but for now, I just run Open Babel from the command line. Note that the -xp flag is required to include the properties in CML:
babel -xp mmv-all.sdf mmv-all.cml