data.table version 1.8.1 – now allowed numeric columns and big-number (via bit64) in keys!

May 9, 2012

(This article was first published on R-statistics blog » RR-statistics blog, and kindly contributed to R-bloggers)

This is a guest post written by Branson Owen, an enthusiastic R and data.table user.

Wow, a long time desired feature of data.table finally came true in version 1.8.1! data.table now allowed numeric columns and big number (via bit64) in keys! This is quite a big thing to me and I believe to many other R users too. Now I can hardly think any weakiness of data.table. Oh, did I mention it also started to support character column in the keys (rather than coerce to factor)?

For people who are not familiar with but interested in data.table package, data.table is an enhanced data.frame for high-speed indexing, ordered joins, assignment, grouping and list columns in a short and flexible syntax. You can take a look at some task examples here:

News from datatable-help mailing list:

* New functions chmatch() and %chin%, faster versions of match() and %in% for character vectors. They are about 4 times faster than match() on the example in ?chmatch.

* New function set(DT,i,j,value) allows fast assignment to elements of DT.

M = matrix(1,nrow=100000,ncol=100)
DF =
DT =
system.time(for (i in 1:1000) DF[i,1L] <- i) # 591.000s
system.time(for (i in 1:1000) DT[i,V1:=i]) # 1.158s
system.time(for (i in 1:1000) M[i,1L] <- i) # 0.016s
system.time(for (i in 1:1000) set(DT,i,1L,i)) # 0.027s

* Numeric columns (type ‘double’) are now allowed in keys and ad hoc by. Other types which use ‘double’ (such as POSIXct and bit64) can now be fully supported.

For advanced and creative users, it also officially supported list columns awhile ago (rather than support it by accident). For example, your column could be a list of vectors, where each of the vector has different length. This can allow very flexible and creative ways to manipulate data.

The code example below use “function column”, i.e. a list of functions

> DT = data.table(ID=1:4,A=rnorm(4),B=rnorm(4),fn=list(min,max))
> str(DT)
Classes ‘data.table’ and 'data.frame': 4 obs. of 4 variables:
$ ID: int 1 2 3 4
$ A : num -0.7135 -2.5217 0.0265 1.0102
$ B : num -0.4116 0.4032 0.1098 0.0669
$ fn:List of 4
..$ :function (..., na.rm = FALSE)
..$ :function (..., na.rm = FALSE)
..$ :function (..., na.rm = FALSE)
..$ :function (..., na.rm = FALSE)
> DT[,fn[[1]](A,B),by=ID]
[1,] 1 -0.71352508
[2,] 2 0.40322625
[3,] 3 0.02648949
[4,] 4 1.01022266



To leave a comment for the author, please follow the link and comment on their blog: R-statistics blog » RR-statistics blog. offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.


Mango solutions

plotly webpage

dominolab webpage

Zero Inflated Models and Generalized Linear Mixed Models with R

Quantide: statistical consulting and training




CRC R books series

Six Sigma Online Training

Contact us if you wish to help support R-bloggers, and place your banner here.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)