Something Strange in the Neighborhood

August 4, 2016

(This article was first published on data science ish, and kindly contributed to R-bloggers)

Today I was so pleased to see a new data package hit CRAN, and how wonderful to see such accomplished women writing R packages.

The ghostr package includes a dataset of over 800 ghost sightings in Kentucky, with information on city, latitude, and longitude, along with URLs for finding more information about the ghost sightings.

## [1] "url"       "city"      "state"     "sightings" "lat"       "lon"
ghost_sightings %>% summarise(total = sum(sightings))
##   total
## 1   846

Getting Started with Leaflet

I’ve been wanting to get familiar with Leaflet, the popular library for interactive maps, and this seems like a perfect opportunity.

How are ghost sightings distributed across Kentucky?

m <- leaflet(ghost_sightings, width = "100%") %>%
        addProviderTiles("CartoDB.Positron") %>%
        addCircles(lng = ~lon, lat = ~lat, weight = 2.5,
                   radius = ~sqrt(sightings) * 4e3, popup = ~city,
                   color = "limegreen")

I’ve used a nice slimy green color here for the sightings, and the area of each circle is proportional to the number of sightings there.

Ain’t Afraid of No Ghost

That is very nice, but perhaps we would like to compare this to the populations in Kentucky cities and towns. Let’s find the population in towns and cities in Kentucky from the U.S. Census, using ACS table B01003. (If you haven’t used the acs package before, you will need to get an API key and run api.key.install() one time to install your key on your system.) I’ll use msa in the call to the ACS tables, which gets metropolitan/micropolitan statistical areas; this is about the best match to cities and towns you can get in the Census.

kentucky <- geo.make(state = "KY", msa = "*")
popfetch <- acs.fetch(geography = kentucky, 
                      endyear = 2014,
                      span = 5, 
                      table.number = "B01003",
                      col.names = "pretty")
popDF <- melt(estimate(popfetch)) %>%
        mutate(city = str_extract(str_sub(as.character(Var1), 1, -11), ".+?(?= \\(part)|.+"),
               population = value) %>%
        select(city, population)
##                                             city population
## 1                       Bardstown, KY Micro Area      44254
## 2                   Bowling Green, KY Metro Area     162322
## 3                  Campbellsville, KY Micro Area      25059
## 4                Cincinnati, OH-KY-IN Metro Area     432535
## 5                  Clarksville, TN-KY Metro Area      88736
## 6                        Danville, KY Micro Area      53696
## 7         Elizabethtown-Fort Knox, KY Metro Area     150917
## 8                   Evansville, IN-KY Metro Area      46394
## 9                       Frankfort, KY Micro Area      71173
## 10                        Glasgow, KY Micro Area      52716
## 11       Huntington-Ashland, WV-KY-OH Metro Area      85898
## 12              Lexington-Fayette, KY Metro Area     483997
## 13                         London, KY Micro Area     126949
## 14 Louisville/Jefferson County, KY-IN Metro Area     974532
## 15                   Madisonville, KY Micro Area      46684
## 16                       Mayfield, KY Micro Area      37451
## 17                      Maysville, KY Micro Area      17398
## 18                 Middlesborough, KY Micro Area      28234
## 19                 Mount Sterling, KY Micro Area      45190
## 20                         Murray, KY Micro Area      37981
## 21                      Owensboro, KY Metro Area     115795
## 22                     Paducah, KY-IL Micro Area      83262
## 23                 Richmond-Berea, KY Micro Area     102450
## 24                       Somerset, KY Micro Area      63505
## 25                  Union City, TN-KY Micro Area       6550

You can see here that this is fewer cities and towns than we had for the ghost sightings; there are ghost sightings records in some very small towns. Also, the acs package is great but working with it always involves a) lots of regex and b) lots of tidying. Anyway, now we need the latitude and longitude for these metropolitan and micropolitan areas; these are available from the Census.

gazetteer <- read_tsv("./2015_Gaz_cbsa_national.txt")
popDF <- left_join(popDF, gazetteer, by = c("city" = "NAME"))

Now let’s make a Leaflet map for the population of these areas in Kentucky.

m <- leaflet(popDF, width = "100%") %>%
        addProviderTiles("CartoDB.Positron") %>%
        addCircles(lng = ~INTPTLONG, lat = ~INTPTLAT, weight = 1,
                   radius = ~sqrt(population) * 50, popup = ~city)

Actually, let’s bind these data frames together and map them at the same time to compare.

mapDF <- bind_rows(popDF %>%
                           mutate(lat = INTPTLAT, long = INTPTLONG, 
                                  weight = 1, radius = sqrt(population) * 50, 
                                  type = "Population") %>%
                           select(lat, long, city, weight, radius, type),
                   ghost_sightings %>% 
                           mutate(lat = lat, long = lon, city = city, 
                                  weight = 2.5, radius = sqrt(sightings) * 4e3, 
                                  type = "Ghost Sighting") %>%
                           select(lat, long, city, weight, radius, type))
typepal <- colorFactor(c("limegreen", "blue"), mapDF$type)
m <- leaflet(mapDF, width = "100%") %>%
        addProviderTiles("CartoDB.Positron") %>%
        addCircles(lng = ~long, lat = ~lat, weight = ~weight,
                   radius = ~radius, popup = ~city, color = ~typepal(type)) %>%
        addLegend(pal = typepal, values = ~type, title = NULL)

Pretty nice! It looks to me like there are more ghost sightings in areas of higher population, but basically there are ghosts everywhere in Kentucky. The eastern part of Kentucky seems particularly full of ghosts relative to people.

The End


I am glad to have figured out a few things about Leaflet; it is very nice to use. Thanks to Kyle Walker and Kent Russell who helped me figure out how to get the maps to display at the right width both on desktop and mobile! The R Markdown file used to make this blog post is available here. I am very happy to hear feedback or questions!

To leave a comment for the author, please follow the link and comment on their blog: data science ish. offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.


Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)