# What is your favorite R feature? (part 2)

(This article was first published on cloudnumbers.com » R-project, and kindly contributed to R-bloggers)

This week in our blog we started a list of great R code (www.r-project.org) snippets: http://cloudnumbers.com/what-is-your-favorite-r-feature

Descriptive statistics:
A huge set of tools to describe and explore data is available in R. The default data set „attenu“ gives the peak accelerations measured at various observation stations for 23 earthquakes in California. For example, try the command summary() which (in this case) gives you a very nice descriptive overview for all observations:

``` attenu dim(attenu) attenu[1:10,] summary(attenu, digits = 4) pairs(attenu, main = "attenu data") ```

This code example loads the data set ‘attenu’, and prints the dimension and the first 10 rows of the dataset. Finally, it present a summary table and a matrix scatterplot for all observations.

R programming:
There are many ways to program in R. For implementing your first own R function please check the R manual “An Introduction to R, chapter 10“.
For example, this code creates your first function which summarizes two numbers:

``` addfunc <- function(x,y){ z <- x+y; return(z) } addfunc(3,2) ```

Or create simple for loops :

``` for(i in 1:5) print(1:i) for(n in c(2,5,10,20,50)) { x <- stats::rnorm(n) cat(n,":", sum(x^2),"\n") } ```

A bioinformatics example: Normalization of Micorarray Data
Using the Bioconductor repository (http://www.bioconductor.org/) there are many packages for the analyses of genomic data available. The affydata package is a simple data package. It provides an example dataset drawn from an actual Dilution experiment done by Gene Logic (http://www.genelogic.com/support/scientific-studies/). A standard pre-analyses step for mircorray data is the normalization process to remove production errors.

``` library(affydata) data(Dilution) Dilution phenoData(Dilution) pData(Dilution)   # first plot boxplot(Dilution,col=c(2,2,3,3)) ##pick only a few genes to reduce calculation time gn <- sample(geneNames(Dilution),100) pms <- pm(Dilution[,3:4], gn) mva.pairs(pms)   #normalization normalized.Dilution &lt;- Biobase::combine(normalize(Dilution[, 1:2]), normalize(Dilution[, 3:4])) normalize.methods(Dilution)   #second plot boxplot(normalized.Dilution, col=c(2,2,3,3), main="Normalized Arrays") pms <- pm(normalized.Dilution[, 3:4],gn) mva.pairs(pms) ```

Compare the plots before and after normalization!

For more details see the the affydata documentation (http://www.bioconductor.org/packages/2.8/data/experiment/html/affydata.html)

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

Tags: ,