**cloudnumbers.com » R-project**, and kindly contributed to R-bloggers)

This week in our blog we started a list of great R code (www.r-project.org) snippets: http://cloudnumbers.com/what-is-your-favorite-r-feature

We are going to extend this list with several more nice R features. Please feel free to add comments with your favorite R code snippets.

**Descriptive statistics**:

A huge set of tools to describe and explore data is available in R. The default data set „attenu“ gives the peak accelerations measured at various observation stations for 23 earthquakes in California. For example, try the command **summary()** which (in this case) gives you a very nice descriptive overview for all observations:

attenu dim(attenu) attenu[1:10,] summary(attenu, digits = 4) pairs(attenu, main = "attenu data")

This code example loads the data set ‘attenu’, and prints the dimension and the first 10 rows of the dataset. Finally, it present a summary table and a matrix scatterplot for all observations.

**R programming**:

There are many ways to program in R. For implementing your first own R function please check the R manual “An Introduction to R, chapter 10“.

For example, this code creates your first function which summarizes two numbers:

addfunc <- function(x,y){ z <- x+y; return(z) } addfunc(3,2)

Or create simple for loops :

for(i in 1:5) print(1:i) for(n in c(2,5,10,20,50)) { x <- stats::rnorm(n) cat(n,":", sum(x^2),"\n") }

**A bioinformatics example: Normalization of Micorarray Data**

Using the Bioconductor repository (http://www.bioconductor.org/) there are many packages for the analyses of genomic data available. The **affydata** package is a simple data package. It provides an example dataset drawn from an actual Dilution experiment done by Gene Logic (http://www.genelogic.com/support/scientific-studies/). A standard pre-analyses step for mircorray data is the normalization process to remove production errors.

library(affydata) data(Dilution) Dilution phenoData(Dilution) pData(Dilution) # first plot boxplot(Dilution,col=c(2,2,3,3)) ##pick only a few genes to reduce calculation time gn <- sample(geneNames(Dilution),100) pms <- pm(Dilution[,3:4], gn) mva.pairs(pms) #normalization normalized.Dilution <- Biobase::combine(normalize(Dilution[, 1:2]), normalize(Dilution[, 3:4])) normalize.methods(Dilution) #second plot boxplot(normalized.Dilution, col=c(2,2,3,3), main="Normalized Arrays") pms <- pm(normalized.Dilution[, 3:4],gn) mva.pairs(pms)

Compare the plots before and after normalization!

For more details see the the affydata documentation (http://www.bioconductor.org/packages/2.8/data/experiment/html/affydata.html)

We will come up with more nice R features especially for high-performance computing with R in the next blog posts. Please feel free to add comments with your favorite R code snippets.

**leave a comment**for the author, please follow the link and comment on his blog:

**cloudnumbers.com » R-project**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...