Why R for Mass Spectrometrist and Computational Proteomics
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Why R:
Installing R on Windows or Linux:
Windows: You can download the last version from: https://www.r-project.org/ you need to select a mirror, then in the base page you can select the last release of R. The next steps are really straightforward like Windows aplications.
Linux: You can download the latest precompile release from the same page (https://www.r-project.org/) for (suse, devian, ubuntu, redhat) and the source files in R-XXX. tar.gz.
Here you can find some tips if you have problem to install R http://cran.r-project.org/doc/manuals/R-admin.html.
First MS Example in Three lines:
“I want to know the mass distribution of my identified peptides“
First create a peptide-histogram.txt file with the list of mass as follow:
1576.7609
1809.956
1653.8549
1929.0003
then
The hist() function can be customize with different options (remember you can always see the help for each funtion using ? , for example: ?hist):
http://msenux.redwoods.edu/math/R/hist.php
http://stat.ethz.ch/R-manual/R-patched/library/graphics/html/hist.html
Generating the Venns for Search Engines (Mascot, XTadem, Sequest)
” I want a Venn diagram with the share proteins identified with Sequest, XTandem and Mascot”
Each file mascot.txt, xtandem.txt, sequest.txt is the list of Protein IDs..
* you can use the uniprot www.uniprot.org mapping service pr PICR http://www.ebi.ac.uk/Tools/picr/ to convert different PROTEIN IDs to a unique representation.
The venn diagrams are part of the gplots library and they are really useful to show all possible logical relations between a finite collection of sets.
When i read for the first time “Five statistical things I wished I had been taught 20 years ago” (Ewan Birney) the first thing that i thought was “…which R packages must be useful for mass spectrometrist such as biologist case.
- The ggplot2 for data visualization guaranty a set of functions to represent your data such as: Scatterplot function (Basic Introduction to ggplot2).
-
The caret package (short for Classification And REgression Training) is a set of functions that attempt to streamline the process for creating predictive models. It is a complete package for regression and classification techniques(caret)
-
The factominer is an Rpackage dedicated to multivariate Exploratory Data Analysis.It performs classical methods such as Principal Components Analysis (PCA), Correspondence analysis (CA), Multiple Correspondence Analysis (MCA) as well as more advanced methods. GUI is available. (factominerR)
-
The mzR provides a unified API to the common file formats and parsers available for mass spectrometry data. It comes with a wrapper for the ISB random access parser for mass spectrometry mzXML, mzData and mzML files. (mzR)
- The Bioconductor provides tools for the analysis and comprehension of high-throughput biology data. Bioconductor has two releases each year, 554 software packages, and an active user community. (bioconductor)
- The msProcess provides tools for protein mass spectra processing including data preparation, denoising, noise estimation, baseline correction, intensity normalization, peak detection, peak alignment, peak quantification, and various functionalities for data ingestion/conversion, mass calibration, data quality assessment, and protein mass spectra simulation. (msProcess)
Learning R is an ongoing process, and once researchers have mastered the basics, they should be encouraged to explore the wealth of contributed packages on the Comprehensive R Archive Network (CRAN) (http://cran.r-project.org) and Bioconductor (http://www.bioconductor.org). If we start to use R in our labs, we can provide our scripts to the comunity using our manuscripts and papers, it means we can check the statistics analysis and the results. R is the leading tool for statistics, data analysis, and machine learning in the research community is time. Time to begin!!!!
Some Ref’s:
- Statistics Using R with Biological Examples (http://cran.r-project.org/doc/contrib/Seefeld_StatsRBio.pdf)
- Biological Data Analysis Using R (http://dyerlab.bio.vcu.edu/downloads/Dyer_Data_Analysis_Using_R.pdf)
- R-bloggers (https://www.r-bloggers.com/)
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.