What is your favorite R feature? (part 2)

(This article was first published on cloudnumbers.com » R-project, and kindly contributed to R-bloggers)

This week in our blog we started a list of great R code (www.r-project.org) snippets: http://cloudnumbers.com/what-is-your-favorite-r-feature
We are going to extend this list with several more nice R features. Please feel free to add comments with your favorite R code snippets.

Descriptive statistics:
A huge set of tools to describe and explore data is available in R. The default data set „attenu“ gives the peak accelerations measured at various observation stations for 23 earthquakes in California. For example, try the command summary() which (in this case) gives you a very nice descriptive overview for all observations:

attenu
dim(attenu)
attenu[1:10,]
summary(attenu, digits = 4)
pairs(attenu, main = "attenu data")

This code example loads the data set ‘attenu’, and prints the dimension and the first 10 rows of the dataset. Finally, it present a summary table and a matrix scatterplot for all observations.

R programming:
There are many ways to program in R. For implementing your first own R function please check the R manual “An Introduction to R, chapter 10“.
For example, this code creates your first function which summarizes two numbers:

addfunc <- function(x,y){
  z <- x+y;
  return(z)
}
addfunc(3,2)

Or create simple for loops :

for(i in 1:5) print(1:i)
for(n in c(2,5,10,20,50)) {
  x <- stats::rnorm(n)
  cat(n,":", sum(x^2),"\n")
}

A bioinformatics example: Normalization of Micorarray Data
Using the Bioconductor repository (http://www.bioconductor.org/) there are many packages for the analyses of genomic data available. The affydata package is a simple data package. It provides an example dataset drawn from an actual Dilution experiment done by Gene Logic (http://www.genelogic.com/support/scientific-studies/). A standard pre-analyses step for mircorray data is the normalization process to remove production errors.

library(affydata)
data(Dilution)
Dilution
phenoData(Dilution)
pData(Dilution)
 
# first plot
boxplot(Dilution,col=c(2,2,3,3))
##pick only a few genes to reduce calculation time
gn <- sample(geneNames(Dilution),100)
pms <- pm(Dilution[,3:4], gn)
mva.pairs(pms)
 
#normalization
normalized.Dilution &lt;- Biobase::combine(normalize(Dilution[, 1:2]),
normalize(Dilution[, 3:4]))
normalize.methods(Dilution)
 
#second plot
boxplot(normalized.Dilution, col=c(2,2,3,3), main="Normalized Arrays")
pms <- pm(normalized.Dilution[, 3:4],gn)
mva.pairs(pms)

Compare the plots before and after normalization!

For more details see the the affydata documentation (http://www.bioconductor.org/packages/2.8/data/experiment/html/affydata.html)

 

We will come up with more nice R features especially for high-performance computing with R in the next blog posts. Please feel free to add comments with your favorite R code snippets.

To leave a comment for the author, please follow the link and comment on his blog: cloudnumbers.com » R-project.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Tags: ,

Comments are closed.