Using Reddit’s JSON API to analyze post popularity

September 15, 2014

(This article was first published on Revolutions, and kindly contributed to R-bloggers)

Graduate student Clay McLeod decided to find out what makes a post on the social-sharing site Reddit popular. These are the questions he seeks to answer:

What’s in a post? Reddit pulls in around 115 million unique visitors each month, amassing a staggering 5 billion page views per month. For a long time, I’ve wondered what factors draw people to certain Reddit posts while shunning others – does it have to do with the time of day a post is submitted? Do certain users have a monopoly on the most viewed posts? What about text posts vs. links?

Reddit provides a JSON API to download Reddit data, and Clay created this Python script to download a CSV file with one record per post, with information about its domain, subreddit, upvotes, downvotes, numbr of comments etc. This file can then easily be analyzed with R:


If you've spent any time on Reddit none of the analysis will be very surprising: images generate a lot of votes, NSFW posts are more popular, etc. But I'm interested to see what can do make with data from Reddit.

Clay Mcleod: What's in a Post, Part 1 (via KDNuggets) 

To leave a comment for the author, please follow the link and comment on their blog: Revolutions. offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers


Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)