# Hacker News Analysis

March 13, 2011
By

(This article was first published on Edwin Chen's Blog » r, and kindly contributed to R-bloggers)

I was playing around with the Hacker News database Ronnie Roller made (thanks!), so I thought I’d post some of my findings.

# Activity on the Site

My first question was: how has activity on the site increased over time? I looked at number of posts, points on posts, and comments on posts.

## Posts

This looks like a strong linear fit, with an increase of 292 posts every month.

For comments, adding a quadratic term proved significant, so I used a quadratic regression to fit the number of comments by month.

## Points

Again, a quadratic regression was a better fit for points by month:

# Points and Comments

My next question was how points and comments related. Do, say, posts with more points also have more comments?

First, I plotted the points and comments of each individual post:

There is an overall positive correlation between points and comments (as expected), and interestingly, there are quite a few high-points posts with no comments.

Let’s try cleaning up the plot, by taking the median number of comments per points level (and removing posts at the higher end, where we have little data):

We see that posts with more points do tend to have more comments. Also, variance in number of comments is indicated by size and color, so (unsurprisingly) we see that posts with more points have larger variance in their number of comments.

# Quality of Posts

Another question was whether the quality of posts has degraded over time.

To estimate quality, I defined a “good” post as a post with points greater than $x / 10$, where $x$ is the number of points of the tenth-highest rated post in the same month. I chose the tenth-highest rated post, because it provided a fairly stable baseline (unlike choosing the highest rated post):

We see that while the overall percentage of quality posts has decreased over time:

The absolute number of quality posts has increased:

So Hacker News has probably gotten worse if you like to read every single post, but better if you only like to read the front page.

# Company Trends

Also, I wanted to see how certain topics have trended over time, so I looked at how mentions of some of the big-name companies (Google, Facebook, Microsoft, Yahoo, Twitter) have changed. For each company, I plotted the percentage of posts with the company’s name in the title, and also made a smoothed plot comparing all five at the end. Note that Microsoft, Yahoo, and Google all seem to be trending slightly downward.

To leave a comment for the author, please follow the link and comment on their blog: Edwin Chen's Blog » r.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Tags: , ,