A few weeks ago Joshua Katz published some awesome dialect maps of the United States with the help of a web interface coded with Shiny. Now we propose a similar setup for the UK data based on our rapporter.net “R-as-a-Service” with some further add-ons, like regional comparison and dynamic textual analysis of the differences:
You may also click here instead of pressing the above Submit button for a quick demonstration generated with the above tool for the “Pop vs soda?” dataset with the default colours and number of neighbours set.
Bert Vaux and Marius L. Jøhndal (University of Cambridge, United Kingdom) have just recently published some exciting results of the The Cambridge Online Survey of World Englishes that we try to analyse a bit further below.
Although we are not sharing the original dataset, it is hopefully still useful to show below how the resulting reports are being created real-time from scratch with the help of a statistical template.
We used rapporter.net as an on-line front-end to R and our rapport package of course that implements statistical templates for literate programming and reproducible reports. The above form was generated automatically as a Rapplication.
Besides the above mentioned standard tools at Rapporter, we also loaded the dismo package that provided an easy way to e.g. downloading images from Google Maps and later rendering those as
raster. We also used raster package to download the polygons with different regional details from GADM.
As the data sources used different map projections, we also had to transform those to a standard format with the help of the sp package. The colour palette of the map is picked by RColorBrewer that we also transform to transparent colors easily with
And not to forget about the stats back-end:
knn from the class package helped us to define the colours of the above mentioned polygons by k-nearest neighbour algorithm. This classification method builds and uses the survey data to determine the most likely category for the given subdivision based on the
k number of nearest neighbour(s). This means that setting
1 would find the nearest point to each subdivision centre and colour the polygons accordingly, and using a higher number for
k would return a more smoothed map of colours.
Please see the source code of the statistical template for more details or contact us at any time with any questions.
One major feature of the above described statistical template and the resulting reports that we would like to emphasise now is the last part, the list just below the “Summary” heading. Although it’s pretty basic, but still shows a proof-of-the-concept demo of how one could write really dynamic reports with diversified and randomized terms and expressions, so that it would end up like a real, not just technical report. Please also check the rather ugly and basic part of the source code as a POC demo and possibly for some further inspiration.
See you soon!
Oh, and warm greetings from useR! 2013 at Albacete, Spain where we will present this application and some of Rapporter internals from a technical point-of-view on a poster tonight!