[This article was first published on Milano R net, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

This article gives a brief overview of the data.table package written by M. Dowle, T. Short, S. Lianoglou.

A data.table is an extension of a data.frame created to reduce the working time of the user in two ways

1. programming time
2. compute time

The data.table sintax is inspired by the R syntax matrix A [B] where A is a matrix and B is a 2-column matrix.

As a data.table is a data.frame, will be compliant with all R functions and packages that accept data.frame as object.
The big advantage of a data.table than a data.frame is that it uses the tables as if they were tables in a database, with a speed of data access truly remarkable.

A data.table is created exactly like a data.frame, the sintax is the same.
 DF = data.frame(x=rep(c("a","b","c"),each=3), y=c(1,3,6), v=1:9) DT = data.table(x=rep(c("a","b","c"),each=3), y=c(1,3,6), v=1:9) 

DF e DT are identical but on DT can create an index by defining a key.
 setkey(DT,x) tables() NAME NROW MB COLS KEY [1,] DT 9 1 x,y,v x Total: 1MB 
DT have been re-ordered according to the values of x column.

A key consists of one or more columns which may be integer, factor, character or some other class.
A data.tables do not have rownames but may instead have a key of one or more columns using setkey. This key may be used for row indexing instead of rownames.

Now we can subsetting data
 DT["b",] # extract data for key-column = “b” DT[,v] # extract the v column 
100+ times faster than ==

A data.table is like a data.frame but i and j can be expressions of column names directly.
Furthermore i may itself be a data.table which invokes a fast table join using binary search in O(log n).

We can easily add new data
 DT[,w:=1:3] # add a w column 
500+ times faster than DF[i,j] = value

or join data.table
 DT[J("a",3:6)] # inner join (J is an alias of data.table) 

or fast grouping
 DT[,sum(v),by=x] DT[,list(vSum=sum(v), vMin=min(v), vMax=max(v)), by=list(x,y)] 
10+ times faster than tapply()

with a syntax much easier than in data.frame.

In a data.table each cell can be a different type

• each cell can be vector
• each cell can itself be a data.table
• combining list columns with i and b

 data.table(x=letters[1:3], y=list(1:10, letters[1:4], data.table(a=1:3,b=4:6))) 

In conclusion a data.table is identical to a data.frame other than:

• it doesn’t have rownames
• selecting a single row will always return a single row data.table not a vector
• the comma is optional inside [], so DT returns the 3rd row as a 1 row data.table
• [] is like a call to subset()
• [,…], is like a call to with()

this implies

• up to 10 times less memory
• up to 10 times faster to create, and copy
• simpler R code

To leave a comment for the author, please follow the link and comment on their blog: Milano R net.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

# Never miss an update! Subscribe to R-bloggers to receive e-mails with the latest R posts.(You will not see this message again.)

Click here to close (This popup will not appear again)