# Commonly used R commands (statistics)

February 2, 2012
By

(This article was first published on manio » R, and kindly contributed to R-bloggers)

When I say Ease of Use Improved, I mean you can simply copy, paste and run the codes in this post, without referring to other places, without downloading a data file and read it from R. This is how I like a blog article to be. You don’t need to read the whole article. You just need to Ctrl+F what your need and copy the codes there and run it.

I use R in Windows and sometimes Linux. The version is 2.13.0. The following scripts should be applicable to other versions.

### Read a File to a Table

Hmm.. You can’t copy and run this in your system, since you don’t have that file. congold is a table,  the first argument of read.table() is the path of the file. In Windows, you should use “/” in the path instead of “\”.

### Boxplot

d = rnorm(10)
t = rep(c(1,2),c(5,5))
boxplot(d~t)

### Get subset

df = data.frame(col1=c(1,2,3,4),col2=c(1,1,2,2))
subset(df,col2==2)

### Find out how many unique items in a  list

a = c(5,5,6,6,6)
length(unique(a))

In Windows

windows()

In Linux

X11()

In Mac

quartz()

### Delete Columns by Names

df <- data.frame(x=rep(1,3), y=rep(2,3), z=rep(3,3), t=rep(4,3))
df <- df[,-which(names(df) %in% c("z","t"))]

An easier way:

df <- data.frame(x=rep(1,3), y=rep(2,3), z=rep(3,3), t=rep(4,3))
df <- subset(df, select=-c(z,t))

Actually, it is done by selecting the columns you want. So we have the following:

### Select Columns by Names

df <- data.frame(x=rep(1,3), y=rep(2,3), z=rep(3,3), t=rep(4,3))
df[, c("x","y")]
df <- data.frame(x=rep(1,3), y=rep(2,3), z=rep(3,3), t=rep(4,3))
subset(df, select=c(x,y))

### Print out Column Names

df <- data.frame(x=rep(1,3), y=rep(2,3), z=rep(3,3), t=rep(4,3))
names(df)

### Change Column Names

df <- data.frame(x=rep(1,3), y=rep(2,3), z=rep(3,3), t=rep(4,3))
names(df)[]="newNameForColumn1"
df <- data.frame(x=rep(1,3), y=rep(2,3), z=rep(3,3), t=rep(4,3))
names(df)=c("newNameForColumn1", "newNameForColumn2", "newNameForColumn3","newNameForColumn4")
df <- data.frame(x=rep(1,3), y=rep(2,3), z=rep(3,3), t=rep(4,3))
names(df)[which(names(df)=="y")]= "NewNameOf_y"

### Reduction Plot

library(lattice)
x = 1:100
y = rnorm(100)
xyplot(x~y, type=c("r","p"))

### Finding out 95%th, 99%th of Each Category

library(doBy)
x = rep(c(1,2),50)
y = rnorm(100)
summaryBy(y~x, data=df, FUN=function(x){quantile(x,c(0.95,0.99))})
x = rep(c(1,2),50)
y = rnorm(100)
aggregate(y~x, data = df, function(x){quantile(x,0.95)})
aggregate(y~x, data = df, function(x){quantile(x,0.99)})

### Get Median of Each Factor in a data frame (each type has many rows)

x = rep(c(1,2),50)
y = rnorm(100)
aggregate(y~x, data = df, median)

### To count rows or columns

df <- data.frame(x=rep(1,3), y=rep(2,3), z=rep(3,3), t=rep(4,3))
nrow(df)
df <- data.frame(x=rep(1,3), y=rep(2,3), z=rep(3,3), t=rep(4,3))
ncol(df)

### Create empty matrix or vector

mymatrix <- mat.or.vec(2,3)

### Replace data in data frame

tmp = data.frame("a"=c(1,2,3,4))
selected = tmp == 2
selected
tmp[selected] = 22
tmp

### Convert Factor to Number

size <- factor(c(55,44,33,22,11))
size
as.numeric(size)
levels(size)[size]
as.numeric(levels(size)[size])

### Change the order of colums

df = data.frame("a"=c(1,1), "b"=c(2,2), "c"=c(3,3))
df
df = subset(df, select=c(c,b,a))
df

### Order Data Frame

df = data.frame(a=c(4,5,6),b=c(9,8,7))

df = df[order(df\$b),]

df = data.frame(a=c(4,5,6),b=c(9,8,7),c=c(11,12,12))
df
df[order(df\$c,df\$b),]

Too much to organize from my note…

Maybe I’ll pick it up later, nor not….

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...