The three types of Reddit posts, and how they make it to the front page

November 19, 2014
By

(This article was first published on Revolutions, and kindly contributed to R-bloggers)

Todd Schneider's blog post on solving the traveling salesman problem with R hit the front page of reddit.com. This is a big deal: front-page placement on the popular social news site can drive a ton of traffic (in Todd's case, 1.3 million pageviews). But what factors determine which of reddit's contributed links make it to the front page? (There are 25 front-page slots, but more than 100,000 reddit posts on an average day.)

Todd set out to answer this question using the statistical language R, and reported his results on Mashable. He collected 6 weeks of data including 1.2 million rankings for about 15,000 posts, and looked for commonalities amongst those posts that made the top 25.

Now, you might expect that a post's front page ranking is determined by its score (the number of times it has been "liked" by a reddit user, most likely after having seen it in the "subreddit" special topic area where it was posted), and how long since it was posted (reddit's front page generally contains recent posts). But it turns out that not all subreddits are treated equally. Todd discovered that there are three different types of subreddits when it comes to how posts are promoted to the front page:

  • "Viral Candy" subreddits like funnygifs and todayilearned. Posts from this category dominate page one.
  • "Page Two" subreddits", which includes DocumentariesFitness and personalfinance. As the name suggests, posts in these subreddits almost never make it to page 1, but are often promoted to page 2.
  • "The Rest", which includes foodLifeProTips, and sports. Todd's post was in this category, in the subreddit dataisbeautiful. Posts in these subreddits make a small but significant fraction of page 1 posts.

It seems that reddit's front page (and pages 2, 3 and 4 which follow) follow a well-defined mix of posts from each of the three categories, as you can see in the chart below:

Subreddits

Starting from the left of the chart above, you can see the #1 post (on page 1) is from one of the "Viral Candy" subreddits about 97% of the time, but that a "The Rest" post does occasionally make top billing. By contrast, posts from the "Page Two" subreddits almost never appear above #10, but dominate page two (ranks 26-50). There's a pretty consistent mix on pages 3 and 4: about 65% "viral candy", about 15% "page twos" and about 25% "the rest".

As for post scores, Todd noted that posts from "Viral Candy" and "The Rest" subreddits need high scores to get on page 1: about 3500-4500 and 3000-4000 respectively for the top slot. By contrast, posts in "Page 2" reddits only need scores in the 500-1500 range to hit the lower ranks of Page 1 (but are much more likely to appear on Page 2).

If you're interested in the details of what gets a post on reddit's front page, Todd's blog post has lots more information. And if you're an R user and want to do a similar analysis, Todd's data and R code are available on github.

Todd W Schneider: The reddit Front Page is Not a Meritocracy

To leave a comment for the author, please follow the link and comment on their blog: Revolutions.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers


Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)