# Faster arrays and matrices in jsonlite 0.9.20

**OpenCPU**, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Yesterday a new version of the jsonlite package was released to CRAN. This update includes no new features, it only introduces performance optimizations.

## Large Matrices

The jsonlite package was already highly optimized for converting vectors and data frames to json. However Gregory Jefferis and Duncan Murdoch had found that conversion of tall matrices as used by rglwidget was slower than expected.

It turned out this was indeed an edge case that I had overlooked. The new version of jsonlite fixes this problem and matrix conversion should be about 200 times faster than before. Technical details follow below; first a benchmark:

```
<span class="c1"># Old version!
</span><span class="o">></span> <span class="n">system.time</span><span class="p">(</span><span class="n">j</span><span class="o"><-</span><span class="n">toJSON</span><span class="p">(</span><span class="n">matrix</span><span class="p">(</span><span class="m">1L</span><span class="p">,</span> <span class="n">ncol</span> <span class="o">=</span> <span class="m">3</span><span class="p">,</span> <span class="n">nrow</span> <span class="o">=</span> <span class="m">50000</span><span class="p">)))</span>
<span class="n">user</span> <span class="n">system</span> <span class="n">elapsed</span>
<span class="m">4.715</span> <span class="m">0.015</span> <span class="m">4.729</span>
<span class="c1"># New version!
</span><span class="o">></span> <span class="n">system.time</span><span class="p">(</span><span class="n">j</span><span class="o"><-</span><span class="n">toJSON</span><span class="p">(</span><span class="n">matrix</span><span class="p">(</span><span class="m">1L</span><span class="p">,</span> <span class="n">ncol</span> <span class="o">=</span> <span class="m">3</span><span class="p">,</span> <span class="n">nrow</span> <span class="o">=</span> <span class="m">50000</span><span class="p">)))</span>
<span class="n">user</span> <span class="n">system</span> <span class="n">elapsed</span>
<span class="m">0.022</span> <span class="m">0.002</span> <span class="m">0.023</span>
```

This artificial example (every field has the number 1) highlights the improvement. The relative improvement might be less for matrices with actual data because of additional time spent on number formatting double/integer values (which was already optimized in jsonlite a while ago).

## Technical Details

So what was the problem? The previous version of jsonlite had an elegant solution that would recurse through the dimensions of a matrix/array and apply json conversion on each of its elements. E.g. for a matrix (2D array) it would convert each row to json, and then combine the results. However it turns out that the `apply`

call below is really slow.

```
<span class="c1"># Technical example, don't use this code !
</span><span class="n">x</span> <span class="o"><-</span> <span class="n">matrix</span><span class="p">(</span><span class="m">1L</span><span class="p">,</span> <span class="n">ncol</span> <span class="o">=</span> <span class="m">3</span><span class="p">,</span> <span class="n">nrow</span> <span class="o">=</span> <span class="m">50000</span><span class="p">)</span>
<span class="n">rows</span> <span class="o"><-</span> <span class="n">apply</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="m">1</span><span class="p">,</span> <span class="n">jsonlite</span><span class="o">:::</span><span class="n">asJSON</span><span class="p">)</span>
<span class="n">json</span> <span class="o"><-</span> <span class="n">jsonlite</span><span class="o">:::</span><span class="n">collapse</span><span class="p">(</span><span class="n">rows</span><span class="p">,</span> <span class="n">indent</span> <span class="o">=</span> <span class="n">NA</span><span class="p">)</span>
```

The new version exploits the fact that matrices and arrays are homogenous (i.e. all elements have the same type). It first removes the dimensions from the array using `c(x)`

and converts all of the individual elements to json with a single call to `asJSON`

. This results in a significant speedup because `asJSON`

is only called once rather than `n`

times.

```
<span class="c1"># Technical example, don't use this code !
</span><span class="n">str</span> <span class="o"><-</span> <span class="n">jsonlite</span><span class="o">:::</span><span class="n">asJSON</span><span class="p">(</span><span class="n">c</span><span class="p">(</span><span class="n">x</span><span class="p">),</span> <span class="n">collapse</span> <span class="o">=</span> <span class="n">FALSE</span><span class="p">)</span>
<span class="n">dim</span><span class="p">(</span><span class="n">str</span><span class="p">)</span> <span class="o"><-</span> <span class="n">dim</span><span class="p">(</span><span class="n">x</span><span class="p">)</span>
<span class="n">rows</span> <span class="o"><-</span> <span class="n">apply</span><span class="p">(</span><span class="n">str</span><span class="p">,</span> <span class="m">1</span><span class="p">,</span> <span class="n">jsonlite</span><span class="o">:::</span><span class="n">collapse</span><span class="p">,</span> <span class="n">indent</span> <span class="o">=</span> <span class="n">NA</span><span class="p">)</span>
<span class="n">json</span> <span class="o"><-</span> <span class="n">jsonlite</span><span class="o">:::</span><span class="n">collapse</span><span class="p">(</span><span class="n">rows</span><span class="p">,</span> <span class="n">indent</span> <span class="o">=</span> <span class="n">NA</span><span class="p">)</span>
```

Things get a bit more complicated for higher dimensional arrays, especially with `toJSON(x, pretty = TRUE)`

but this illustrates the core issue.

You might be thinking: can we avoid `apply`

alltogether? Yes! For the important case of 2 dimensional arrays jsonlite has a complete C implementation which makes `toJSON`

on matrices is extra fast. For higher dimensional arrays it currently still uses the solution above, which performs quite well. We might be able to further optimize this case by porting this to C as well, but working with high dimensional arrays in C makes my head hurt.

**leave a comment**for the author, please follow the link and comment on their blog:

**OpenCPU**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.