Fun with infochimps: Animated Blog Post Hit Map

December 3, 2010

(This article was first published on Zero Intelligence Agents » R, and kindly contributed to R-bloggers)

In a few weeks I will be visiting Chicago, and JD Long—the organizer of the local R users group—has graciously invited me to give a presentation. Ostensibly, the presentation will be on my recently released infochimps package, so I thought it was a good time to start actually putting together some examples and documentation for the package.

If you have not visited the site in a week or so you will have missed my previous post on analyzing WikiLeaks data, which from the traffic and comments was at least somewhat controversial. Given this rare spotlight I thought it would be fun to use the infochimps API to map out the geo-location of everyone that visited the blog post over the last few days. Unfortunately, after nearly two years with the same web hosting service, only today did I realize that I was not capturing daily log files for my domain.

While the issue has been resolved; tragically, all of that data has been lost. In lieu of analyzing the logs from the last week, I am limited to only visualizing the traffic from today. Hopefully before my presentation in Chicago I will have another post that strikes a nerve deep within the Internet; but until then, I present an animated map hits from today to my “Why I will Not Analyze The New WikiLeaks Data.”

Animated Blog Post Hit Map from Drew Conway on Vimeo.

The timing of the hits is significantly sped up; each second of the animation representing roughly 9.5 minutes of blog post traffic. With the IP addresses of visitors to the blog post, I used the infochimps package to collect latitude and longitude coordinates for each hit, and then simply mapped these out in ggplot2 over time and created the animation with ffmpeg.

The sizes of the bubbles represent the number of concurrent hits from the same coordinate at a given second. As such, you will notice sudden bursts in some locations. I am far from a DNS expert, so I am sure someone will tell me how I am over counting certain IPs, but it is fun to watch the activity.

I will release the code and instruction for this as part of a general update to the infochimps documentation in the lead up to my trip to Chicago. In the meantime I am happy to answer any questions about this, or the package more generally.

To leave a comment for the author, please follow the link and comment on their blog: Zero Intelligence Agents » R. offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Tags: , , ,

Comments are closed.


Mango solutions

RStudio homepage

Zero Inflated Models and Generalized Linear Mixed Models with R

Quantide: statistical consulting and training


CRC R books series

Contact us if you wish to help support R-bloggers, and place your banner here.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)