Reading files in JSON format – a comparison between R and Python

January 18, 2014

(This article was first published on Stat Of Mind, and kindly contributed to R-bloggers)

A file format that I am seeing more and more often is the JSON (JavaScript Object Notation) format. JSON is an open standard format in human-readable form that is used to transmit data between servers and web applications. Below is a typical example of data in JSON format.



  "funny": 0,

  "useful": 7,

  "cool": 0


 "user_id": "CR2y7yEm4X035ZMzrTtN9Q",

 "name": "Jim",

 "average_stars": 5.0,

 "review_count": 6,

 "type": "user"


In this post, I will compare the performance of R and Python when reading data in JSON format. More specifically, I will conduct an extremely simple analysis of the famous YELP Houston-based user ratings file (~216Mb), which will consist of reading the data and plotting a histogram of the ratings given by users. I tried to ensure that the workload in both scripts was as similar as possible, so that I can establish which language is most quickest.

In R:

# import required packages

# define function read_json
'read_json' <- function()
  # read json file
  json.file <- sprintf("%s/data/yelp_academic_dataset_review.json", getwd())
  raw.json <- scan(json.file, what="raw()", sep="\n")

  # format json text to human-readable text <- lapply(raw.json, function(x) fromJSON(x))

  # extract user rating information
  user.rating <- unlist(lapply(, function(x) x$stars))

# not shown

# compute total time needed
elapsed <- system.time(read_json())
   user  system elapsed 
 32.295   0.509  38.172 

In Python:

# import modules
import json
import glob
import os
import time

# start process time
start = time.clock()

# read in yelp data
yelp_files = "%s/data/yelp_academic_dataset_review.json" % os.getcwd()
yelp_data = []
with open(yelp_files) as f:
  for line in f:

# extract user rating information
user_rating = []
for item in yelp_data:

elapsed = (time.clock() - start)

As expected, Python was significantly faster than R (12.5s vs. 38.2s) when reading this JSON file. In fact, experience tells me that this will be the case for almost any file format… 🙂

To leave a comment for the author, please follow the link and comment on their blog: Stat Of Mind. offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.


Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)