A few weeks ago Joshua Katz published some awesome dialect maps of the United States with the help of a web interface coded with Shiny. Now we propose a similar setup for the UK data based on our rapporter.net “R-as-a-Service” with some further add-ons, like regional comparison and dynamic textual analysis of the differences:
You may also click here instead of pressing the above Submit button for a quick demonstration generated with the above tool for the “Pop vs soda?” dataset with the default colours and number of neighbours set.
Data sourcesBert Vaux and Marius L. Jøhndal (University of Cambridge, United Kingdom) have just recently published some exciting results of the The Cambridge Online Survey of World Englishes that we try to analyse a bit further below.
Although we are not sharing the original dataset, it is hopefully still useful to show below how the resulting reports are being created real-time from scratch with the help of a statistical template.
Technologies usedWe used rapporter.net as an on-line front-end to R and our rapport package of course that implements statistical templates for literate programming and reproducible reports. The above form was generated automatically as a Rapplication.
Besides the above mentioned standard tools at Rapporter, we also loaded the dismo package that provided an easy way to e.g. downloading images from Google Maps and later rendering those as
raster. We also used raster package to download the polygons with different regional details from GADM.
As the data sources used different map projections, we also had to transform those to a standard format with the help of the sp package. The colour palette of the map is picked by RColorBrewer that we also transform to transparent colors easily with
And not to forget about the stats back-end:
knnfrom the class package helped us to define the colours of the above mentioned polygons by k-nearest neighbour algorithm. This classification method builds and uses the survey data to determine the most likely category for the given subdivision based on the
knumber of nearest neighbour(s). This means that setting
1would find the nearest point to each subdivision centre and colour the polygons accordingly, and using a higher number for
kwould return a more smoothed map of colours.