Mapping the Prevalence of Alzheimer Disease Mortality in the USA

August 18, 2018
By

(This article was first published on R Programming – DataScience+, and kindly contributed to R-bloggers)

In comparison with other statistical software (e.g., SAS, STATA, and SPSS), R is the best for data visualization. Therefore, in all posts I have written for DataScience+ I take advantage of R and make plots using ggplot2 to visualize all the findings. For example, previously I plotted the percentiles of body mass index in the NHANES 2005-2014 and got exactly same results as the paper published in JAMA.

In this post, I will make a map of the prevalence of Alzheimer disease mortality by the state in the USA. The Centers for Disease Control and Prevention is providing the data for download, and they have created a beautiful map. I will try to reproduce the same results using several packages in R.

Libraries and Datasets

Load the library

library(tidyverse)
library(scales)
library(maps)
library(mapproj)

Download the .CSV file from the Centers for Disease Control and Prevention website (link is above)

dt_ad <- read.csv("~/Downloads/ALZHEIMERS2016.csv")
head(dt_ad)
  STATE RATE DEATHS                                              URL
1    AL 45.0  2,507       /nchs/pressroom/states/alabama/alabama.htm
2    AK 25.8    111         /nchs/pressroom/states/alaska/alaska.htm
3    AZ 35.8  3,082       /nchs/pressroom/states/arizona/arizona.htm
4    AR 41.3  1,475     /nchs/pressroom/states/arkansas/arkansas.htm
5    CA 36.1 15,570 /nchs/pressroom/states/california/california.htm
6    CO 34.7  1,835     /nchs/pressroom/states/colorado/colorado.htm

Load the map data of the U.S. states

dt_states = map_data("state")
head(dt_states)
            long      lat group order               region         subregion
1      -87.46201 30.38968     1     1              alabama              
2      -87.48493 30.37249     1     2              alabama              
3      -87.52503 30.37249     1     3              alabama              
4      -87.53076 30.33239     1     4              alabama              
5      -87.57087 30.32665     1     5              alabama              
6      -87.58806 30.32665     1     6              alabama              

Now, I have two datasets, one has the rate of mortality from Alzheimer disease and the other have variables with the information to create maps. I need to merge both datasets together but I dont have a similar variable for merge. Therefore, I will create a new region variable form the URL variable in the first dataset and will use to merge with the second dataset. For this purpose, I will use the function separate and gsub. In the end I will merge with states dataset by region.

#get the state name from URL
dt_ad2 = dt_ad %>% 
  separate(URL, c("a","b","c","d", "region"), sep="/") %>% 
  select(RATE, region)
# removing white space for mergin purposes
dt_states2 = dt_states %>%
  mutate(region = gsub(" ","", region))
# merge
dt_final = left_join(dt_ad2, dt_states2)

Visualization

The dt_final dataset have all the variables I need to make the map.

ggplot(dt_final, aes(x = long, y = lat, group = group, fill = RATE)) + 
  geom_polygon(color = "white") +
  scale_fill_gradient(
    name = "Death Rate", 
    low = "#fbece3", 
    high = "#6f1873", 
    guide = "colorbar",
    na.value="#eeeeee", 
    breaks = pretty_breaks(n = 5)) +
  labs(title="Mortality of Alzheimer Disease in the U.S.", x="", y="") +
  coord_map()

In this short post I showed how simple is to visualize the data in a map. I hope you like it and feel free to post a comment below or send me a message.

    Related Post

    1. Analysis of Los Angeles Crime with R
    2. Animating the Goals of the World Cup: Comparing the old vs. new gganimate and tweenr API
    3. Machine Learning Results in R: one plot to rule them all! (Part 1 – Classification Models)
    4. Seaborn Categorical Plots in Python
    5. Matplotlib Library Tutorial with Examples – Python

    Categories

    1. Visualizing Data

    Tags

    1. Data Manipulation
    2. Data Visualisation
    3. ggplot2
    4. R Programming
    5. Tips & Tricks

    To leave a comment for the author, please follow the link and comment on their blog: R Programming – DataScience+.

    R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



    If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

    Comments are closed.

    Search R-bloggers


    Sponsors

    Never miss an update!
    Subscribe to R-bloggers to receive
    e-mails with the latest R posts.
    (You will not see this message again.)

    Click here to close (This popup will not appear again)