jsonlite 0.9.13: high performance number formatting

October 24, 2014
By

(This article was first published on OpenCPU, and kindly contributed to R-bloggers)

opencpu logo

The jsonlite package implements a robust, high performance JSON parser and generator for R, optimized for statistical data and the web. This week version 0.9.13 appeared on CRAN which is the third release in a relatively short period focusing on performance optimization.

Fast number formatting

Version 0.9.11 and 0.9.12 had already introduced majors speedup by porting critical bottlenecks to C code and switching to a better JSON parser. The current release focuses on number formatting and incorporates C code from modp_numtoa which is several times faster than as.character, formatC or sprintf for converting doubles and integers to strings (your mileage may vary depending on platform and precision).

library(ggplot2)
nrow(diamonds)
# [1] 53940
system.time(jsonlite::toJSON(diamonds, dataframe = "row"))
#   user  system elapsed
#  0.319   0.007   0.325
system.time(jsonlite::toJSON(diamonds, dataframe = "col"))
#   user  system elapsed
#  0.073   0.002   0.075

Using the same benchmark from previous posts, time to convert the diamonds data to row-based json has gone down from 0.619s to 0.325s on my machine (about 2x speedup from jsonlite 0.9.12), and converting to column-based json has gone down from 0.330s to 0.075s (about 4x speedup).

Comparing to other JSON packages

When comparing JSON packages, it should be noted that the comparsion is never entirely fair because different packages use different settings and defaults for missing values, number of digits, etc. Both rjson and RJSONIO only support the column based format for encoding data frames. Using their default settings:

system.time(rjson::toJSON(diamonds))
#   user  system elapsed
#  0.279   0.004   0.281
system.time(RJSONIO::toJSON(diamonds))
#   user  system elapsed
#  0.918   0.027   0.944

For this particular dataset, jsonlite is about 3.5x faster than rjson and about 12x faster than RJSONIO (on my machine) to generate column-based JSON. These differences are relatively large because 7 out of the 10 columns in the diamonds dataset are numeric.

To leave a comment for the author, please follow the link and comment on their blog: OpenCPU.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers


Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)