jsonlite 0.9.13: high performance number formatting

[This article was first published on OpenCPU, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

opencpu logo

The jsonlite package implements a robust, high performance JSON parser and generator for R, optimized for statistical data and the web. This week version 0.9.13 appeared on CRAN which is the third release in a relatively short period focusing on performance optimization.

Fast number formatting

Version 0.9.11 and 0.9.12 had already introduced majors speedup by porting critical bottlenecks to C code and switching to a better JSON parser. The current release focuses on number formatting and incorporates C code from modp_numtoa which is several times faster than as.character, formatC or sprintf for converting doubles and integers to strings (your mileage may vary depending on platform and precision).

# [1] 53940
system.time(jsonlite::toJSON(diamonds, dataframe = "row"))
#   user  system elapsed
#  0.319   0.007   0.325
system.time(jsonlite::toJSON(diamonds, dataframe = "col"))
#   user  system elapsed
#  0.073   0.002   0.075

Using the same benchmark from previous posts, time to convert the diamonds data to row-based json has gone down from 0.619s to 0.325s on my machine (about 2x speedup from jsonlite 0.9.12), and converting to column-based json has gone down from 0.330s to 0.075s (about 4x speedup).

Comparing to other JSON packages

When comparing JSON packages, it should be noted that the comparsion is never entirely fair because different packages use different settings and defaults for missing values, number of digits, etc. Both rjson and RJSONIO only support the column based format for encoding data frames. Using their default settings:

#   user  system elapsed
#  0.279   0.004   0.281
#   user  system elapsed
#  0.918   0.027   0.944

For this particular dataset, jsonlite is about 3.5x faster than rjson and about 12x faster than RJSONIO (on my machine) to generate column-based JSON. These differences are relatively large because 7 out of the 10 columns in the diamonds dataset are numeric.

To leave a comment for the author, please follow the link and comment on their blog: OpenCPU.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)