Graph a github user's followers (and follower's followers).Each programming language tends to develop its own idiomatic set of data structures. In R, data frames are often the structure of choice. JSON (a subset of Javascript) has emerged a...

Few weeks ago GitHub announced, that its timeline data is available on bigquery for analysis. Moreover, it offers prizes for the best visualization of the data. Despite my art skills and minimal chances to win beauty contest, I decided to crunch GitHub data and run data analysis. After initial trial of bigquery service, I found hard

Github has made data on its code repositories, developer updates, forks etc. from the public GitHub timeline available for analysis, and is offering prizes for the most interesting visualization of the data. Sounds like a great challenge for R programmers! The R language is currently the 26th most popular on GitHub (up from #29 in December), and it would...

The common approach to estimating a binary dependent variable regression model is to use either the logit or probit model. Both are forms of generalized linear models (GLMs), which can be seen as modified linear regressions that allow the dependent variable to originate from non-normal distributions. The coefficients in a linear regression model are marginal

I added R source code v0.49 to v2.15.0 to a GitHub repository: r-source Each release is tagged by version number. This is an easy and accessible way to browse R source and diff with prior version. I couldn’t find a suitable alternative. ...

I have recently modified the basic workflow of my lab notebook since discovering knitr. Before, I would write code files which I could track on github, push figures created by the code to flickr, and then write a notebook entry on wordpress describing what I was doing. I’d embed each figure I wanted into the

Why don’t X-Y plots of latitude and longitude data look “right” compared to traditional map views? For example, here’s an X-Y scatterplot of some of Jenson Button’s McLaren telemetry data from the 2010 Australian Formula One Grand Prix: The image was generated, from a data file hosted on Google Spreadsheets, using the following R script,

I really like git. It’s the first versioning tool I’ve ever used so I have nothing else to compare it to, but in the world of statistical model building where iteration is constant (and almost never a strict linear progression)...

After using github for data mining competitions and a project on statistical language models I found I enjoyed it some much I wanted to use it at work too. The trick is there’s a lot of overlap between what I...