UK dialect maps

[This article was first published on rapporter, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

A few weeks ago Joshua Katz published some awesome dialect maps of the United States with the help of a web interface coded with Shiny. Now we propose a similar setup for the UK data based on our “R-as-a-Service” with some further add-ons, like regional comparison and dynamic textual analysis of the differences:

You may also click here instead of pressing the above Submit button for a quick demonstration generated with the above tool for the “Pop vs soda?” dataset with the default colours and number of neighbours set.

Data sources

Bert Vaux and Marius L. Jøhndal (University of Cambridge, United Kingdom) have just recently published some exciting results of the The Cambridge Online Survey of World Englishes that we try to analyse a bit further below.

Although we are not sharing the original dataset, it is hopefully still useful to show below how the resulting reports are being created real-time from scratch with the help of a statistical template.

Technologies used

We used as an on-line front-end to R and our rapport package of course that implements statistical templates for literate programming and reproducible reports. The above form was generated automatically as a Rapplication.

Besides the above mentioned standard tools at Rapporter, we also loaded the dismo package that provided an easy way to e.g. downloading images from Google Maps and later rendering those as raster. We also used raster package to download the polygons with different regional details from GADM.

As the data sources used different map projections, we also had to transform those to a standard format with the help of the sp package. The colour palette of the map is picked by RColorBrewer that we also transform to transparent colors easily with scales::alpha.

And not to forget about the stats back-end: knn from the class package helped us to define the colours of the above mentioned polygons by k-nearest neighbour algorithm. This classification method builds and uses the survey data to determine the most likely category for the given subdivision based on the k number of nearest neighbour(s). This means that setting k to 1 would find the nearest point to each subdivision centre and colour the polygons accordingly, and using a higher number for k would return a more smoothed map of colours.

Source code

Please see the source code of the statistical template for more details or contact us at any time with any questions.

Dynamic reporting

One major feature of the above described statistical template and the resulting reports that we would like to emphasise now is the last part, the list just below the “Summary” heading. Although it’s pretty basic, but still shows a proof-of-the-concept demo of how one could write really dynamic reports with diversified and randomized terms and expressions, so that it would end up like a real, not just technical report. Please also check the rather ugly and basic part of the source code as a POC demo and possibly for some further inspiration.

See you soon!

Oh, and warm greetings from useR! 2013 at Albacete, Spain where we will present this application and some of Rapporter internals from a technical point-of-view on a poster tonight!

To leave a comment for the author, please follow the link and comment on their blog: rapporter. offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)