Blog Archives

Connecting RStudio and MySQL Docker Containers – an example using the ergast db

January 17, 2015
By
Connecting RStudio and MySQL Docker Containers – an example using the ergast db

building on Dockerising Open Data Databases – First Fumblings and my Book Extras – Data Files, Code Files and a Dockerised Application, I just figured out how to get the ergast db into a MySQL docker container and then query it from RStudio: Download and unzip the f1db.sql.gz file to f1db.sql install these docker-mysql-scripts run

Read more »

Calculating Churn in Seasonal Leagues

January 9, 2015
By
Calculating Churn in Seasonal Leagues

One of the things I wanted to explore in the production of the Wrangling F1 Data With R book was the extent to which I could draw on published academic papers for inspiration in exploring the the various results and timing datasets. In a chapter published earlier this week, I explored the notion of churn,

Read more »

Book Extras – Data Files, Code Files and a Dockerised Application

January 5, 2015
By
Book Extras – Data Files, Code Files and a Dockerised Application

Idling through the LeanPub documentation last night, I noticed that they support the ability to sell digital extras, such as bundled code files or datafiles. Along with the base book sold at one price, additional extras can be bundled into packages alongside the original book and sold at another (higher) price. As with the book

Read more »

Custom Gridlines and Line Guides in R/ggplot Charts

January 2, 2015
By
Custom Gridlines and Line Guides in R/ggplot Charts

In the last quarter of last year, I started paying more attention to the use of custom grid lines and line guides in charts I’ve been developing for the Wrangling F1 Data With R book. The use of line guides was in part inspired by canopy views from within the cockpit of one of the

Read more »

Sketching Scatterplots to Demonstrate Different Correlations

December 17, 2014
By
Sketching Scatterplots to Demonstrate Different Correlations

Looking just now for an openly licensed graphic showing a set of scatterplots that demonstrate different correlations between X and Y values, I couldn’t find one. So here’s a quick R script for constructing one, based on a Cross Validated question/answer (Generate two variables with precise pre-specified correlation): And here’s an example of the result:

Read more »

Identifying Position Change Groupings in Rank Ordered Lists

December 9, 2014
By
Identifying Position Change Groupings in Rank Ordered Lists

The title says it all, doesn’t it?! Take the following example – it happens to show race positions by driver for each lap of a particular F1 grand prix, but it could be the evolution over time of any rank-based population. The question I had in mind was – how can I identify positions that

Read more »

Information Density and Custom Chart Designs

November 21, 2014
By
Information Density and Custom Chart Designs

I’ve been doodling today with a some charts for the Wrangling F1 Data With R living book, trying to see how much information I can start trying to pack into a single chart. The initial impetus came simply from thinking about a count of laps led in a particular race by each drive; this morphed

Read more »

F1 Championship Race, 2014 – Winning Combinations…

November 8, 2014
By
F1 Championship Race, 2014 – Winning Combinations…

As we come up to the final two races of the 2014 Formula One season, the double points mechanism for the final race means that two drivers are still in with a shot at the Drivers’ Championship: Lewis Hamilton and Nico Rosberg. As James Allen describes in Hamilton closes in on world title: maths favour

Read more »

Wrangling F1 Data With R – F1DataJunkie Book

October 30, 2014
By
Wrangling F1 Data With R – F1DataJunkie Book

Earlier this year I started trying to pull together some of my #f1datajunkie R-related ramblings together in a book form. The project stalled, but to try to reboot it I’ve started publishing it as a living book over on Leanpub. Several of the chapters are incomplete – with TO DO items sketched in, others are

Read more »

Running “Native” Data Wrangling Applications in the Browser – IPython Notebooks (and R?) in Chrome

August 22, 2014
By
Running “Native” Data Wrangling Applications in the Browser – IPython Notebooks (and R?) in Chrome

Using browser based data analysis toolkits such as pandas in IPython notebooks, or R in RStudio, means you need to have access to python or R and the corresponding application server either on your own computer, or running on a remote server that you have access to. When running occasional training sessions or workshops, this

Read more »