RRegrs: exploring the space of possible regression models

November 22, 2015
By

(This article was first published on chem-bla-ics, and kindly contributed to R-bloggers)

Machine learning is a field of science that focusses on mathematically describing patterns in data. Chemometrics does this for chemical data. Examples are (nano)QSAR where structural information is related to biological activity. I studied during my PhD studies the interaction between the statistics and machine learning with how you computationally (numerically) represent the question. The right combination is not obvious and it has become common to try various modelling methods, though something with support vector machines (SVM/SVR) and more recently neural networks (deep learning) have become popular. A simpler model, however, has its benefits too and frequently not significantly worse than more complex models. That said, exploring all machine learning methods manually takes a lot of time, as each comes with its own parameters which need varying.

Georgia Tsiliki (NTUA partner in eNanoMapper), Cristian Munteany (former postdoc in our group), and others developed RRegrs, an R package to explore the various models and automatically calculate a number of statistics to allow to compare them (doi:10.1186/s13321-015-0094-2). That said, following my thesis, you must never rely on performance statistics, but the output of RRegrs may help you explore the full set of models.

Tsiliki, G., Munteanu, C. R., Seoane, J. A., Fernandez-Lozano, C., Sarimveis, H., Willighagen, E. L., Sep. 2015. RRegrs: an r package for computer-aided model selection with multiple regression models. Journal of Cheminformatics 7 (1), 46. http://dx.doi.org/10.1186/s13321-015-0094-2

To leave a comment for the author, please follow the link and comment on their blog: chem-bla-ics.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)