# Why R for Mass Spectrometrist and Computational Proteomics

**Computational Proteomics**, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Why R:

*in silico*predictions of the data generated in your manuscript and your daily research. Mass spectrometrist, biologist and bioinformaticians commonly use programs like excel, calc or other office tools to generate their charts and statistical analysis. In recent years many computational biologists especially those from the Genomics field, regard R and Bioconductor as fundamental tools for their research.

**Installing R on Windows or Linux:**

*Windows*: You can download the last version from: https://www.r-project.org/ you need to select a mirror, then in the **base** page you can select the last release of R. The next steps are really straightforward like Windows aplications.

*Linux*: You can download the latest precompile release from the same page (https://www.r-project.org/) for (suse, devian, ubuntu, redhat) and the source files in R-XXX. tar.gz.

Here you can find some tips if you have problem to install **R** http://cran.r-project.org/doc/manuals/R-admin.html.

**First MS Example in Three lines**:

*“I want to know the mass distribution of my identified peptides“*

First create a peptide-histogram.txt file with the list of mass as follow:

1576.7609

1809.956

1653.8549

1929.0003

then

**>**peptides.txt <- read.table(“peptide-histogram.txt”, header=FALSE)

***if you want to compute the mean of the masses, it’s simple:**

The hist() function can be customize with different options (remember you can always see the help for each funtion using ? , for example: ?hist):

http://msenux.redwoods.edu/math/R/hist.php

http://stat.ethz.ch/R-manual/R-patched/library/graphics/html/hist.html

*“mass spectrometrist are not IT people”.*

**Generating the Venns for Search Engines (Mascot, XTadem, Sequest) **

*” I want a Venn diagram with the share proteins identified with Sequest, XTandem and Mascot”*

Each file mascot.txt, xtandem.txt, sequest.txt *is the list of Protein IDs..**** you can use the uniprot www.uniprot.org mapping service pr PICR http://www.ebi.ac.uk/Tools/picr/ to convert different PROTEIN IDs to a unique representation.*** *

The venn diagrams are part of the **gplots*** *library* and they are really useful *to show all possible logical relations between a finite collection of sets.

When i read for the first time *“Five statistical things I wished I had been taught 20 years ago”* (Ewan Birney) the first thing that i thought was “*…which R packages must be useful for mass spectrometrist such as biologist* *case*.

- The
**ggplot2**for data visualization guaranty a set of functions to represent your data such as: Scatterplot function (Basic Introduction to ggplot2). - The
**caret**package (short for**C**lassification**A**nd**RE**gression**T**raining) is a set of functions that attempt to streamline the process for creating predictive models. It is a complete package for regression and classification techniques(caret) - The
**factominer**is an**R**package dedicated to multivariate Exploratory Data Analysis.It performs classical methods such as Principal Components Analysis (PCA), Correspondence analysis (CA), Multiple Correspondence Analysis (MCA) as well as more advanced methods. GUI is available. (factominerR) - The
**mzR**provides a unified API to the common file formats and parsers available for mass spectrometry data. It comes with a wrapper for the ISB random access parser for mass spectrometry mzXML, mzData and mzML files. (mzR) - The
**Bioconductor**provides tools for the analysis and comprehension of high-throughput biology data. Bioconductor has two releases each year, 554 software packages, and an active user community. (bioconductor) - The
**msProcess**provides tools for protein mass spectra processing including data preparation, denoising, noise estimation, baseline correction, intensity normalization, peak detection, peak alignment, peak quantification, and various functionalities for data ingestion/conversion, mass calibration, data quality assessment, and protein mass spectra simulation. (msProcess)

Learning R is an ongoing process, and once researchers have mastered the basics, they should be encouraged to explore the wealth of contributed packages on the Comprehensive R Archive Network (CRAN) (http://cran.r-project.org) and Bioconductor (http://www.bioconductor.org). If we start to use R in our labs, ** we can provide our scripts to the comunity using our manuscripts and papers, it means we can check the statistics analysis and the results.** R is the leading tool for statistics, data analysis, and machine learning in the research community is time. Time to begin!!!!

Some Ref’s:

- Statistics Using R with Biological Examples (http://cran.r-project.org/doc/contrib/Seefeld_StatsRBio.pdf)
- Biological Data Analysis Using R (http://dyerlab.bio.vcu.edu/downloads/Dyer_Data_Analysis_Using_R.pdf)
- R-bloggers (https://www.r-bloggers.com/)

**leave a comment**for the author, please follow the link and comment on their blog:

**Computational Proteomics**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.