Surprising stories hide in seemingly mundane data

December 20, 2017
By

(This article was first published on R – thinkr, and kindly contributed to R-bloggers)

Geospatial experimentation

Recently I experimented with geospatial mapping techniques in R.  I looked at both static and interactive maps. Embedding the media into a WordPress blog would be simple enough with a static map. The latter would require (for me) a new technique to retain the interactivity inside a blog post.

My web-site visitor log, combined with longitude and latitude data from MaxMind’s GeoLite2, offered a basis for analysis. Although less precise than the GeoIP2 database, this would be more than adequate for my purpose of getting to country and city level.  I settled on the Leaflet package for visualisation given the interactivity and pleasing choice of aesthetics.

The results however were a little puzzling.


Whiling away the hours in Kansas

The concentration of page views in central London was of no immediate surprise as this was likely to be my site building, testing, and blogging. What did strike me as odd was the high concentration of page views in the centre of the US. More curious still, when I zoomed in on Kansas and found myself in the middle of the Cheney Reservoir.

Non-interactive image of the Cheney Reservoir in Kansas, US
Non-interactive image of the Cheney Reservoir in Kansas, US

I imagined someone drifting in the expanse of water with laptop, flask of coffee and box of sandwiches, whiling away the hours absorbed in my blog.  Perhaps not. How could such a small number of blog pages generate in excess of 2,000 page views in less than two months?

Then I chanced upon a BBC news article from August 2016. When unable to locate IPs, MaxMind chose the geographical centre of the US as a default. This initially turned out to be a rented house in Kansas, which was rather unfortunate for the occupants, and brought upon them all kinds of unwanted attention.

MaxMind subsequently changed its default centre points to be the middle of bodies of water. And this solved another puzzle. Some of the page views in London appeared to be in the middle of the River Thames.

R tools used

  Packages Functions
purrr map_df
readr read_csv
rgeolocate maxmind
rgdal readOGR
dplyr inner_join; mutate; arrange; if_else
stringr str_c
leaflet colorFactor; addProviderTiles; setView; addPolygons; addCircleMarkers; addLegend
htmlwidgets saveWidget

The code may be viewed here.

WordPress integration

  1. Install the WordPress plugin iframe.
  2. Upload the htmlwidget (created in R) to the WordPress media library.
  3. Embed the following shortcode in the WordPress post (ensuring it’s wrapped in square brackets, and replacing xxx with the path of the uploaded media file): iframe src=”xxx” width=”100%” height=”370″.

Citations / Attributions

Includes GeoLite2 data created by MaxMind, available from http://www.maxmind.com.

Map tiles by Stamen Design, CC BY 3.0 — Map data © OpenStreetMap

OpenStreetMap © CartoDB

World borders dataset provided by thematic mapping.org.

The post Surprising stories hide in seemingly mundane data appeared first on thinkr.

To leave a comment for the author, please follow the link and comment on their blog: R – thinkr.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)