Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Recently, I become interested to grasp the data from webpages, such as Wikipedia, and to visualize it with R. As I did in my previous post, I use rvest package to get the data from webpage and ggplot package to visualize the data.
In this post, I will map the life expectancy in White and African-American in US.
Load the required packages.
## LOAD THE PACKAGES #### library(rvest) library(ggplot2) library(dplyr) library(scales)
Import the data from Wikipedia.
## LOAD THE DATA ####
le = read_html("https://en.wikipedia.org/wiki/List_of_U.S._states_by_life_expectancy")
le = le %>%
html_nodes("table") %>%
.[[2]]%>%
html_table(fill=T)
Now I have to clean the data. Below I have explain the role of each code.
## CLEAN THE DATA ####
# check the structure of dataset
str(le)
'data.frame': 54 obs. of 417 variables:
$ X1 : chr "" "Rank\nState\nLife Expectancy, All\n(in years)\nLife Expectancy, African American\n(in years)\nLife Expectancy, Asian American\n"| __truncated__ "Rank" "1" ...
$ X2 : chr NA "Rank" "State" "Hawaii" ...
$ X3 : chr NA "State" "Life Expectancy, All\n(in years)" "81.3" ...
$ X4 : chr NA "Life Expectancy, All\n(in years)" "Life Expectancy, African American\n(in years)" "-" ...
$ X5 : chr NA "Life Expectancy, African American\n(in years)" "Life Expectancy, Asian American\n(in years)" "82.0" ...
$ X6 : chr NA "Life Expectancy, Asian American\n(in years)" "Life Expectancy, Latino\n(in years)" "76.8" ...
$ X7 : chr NA "Life Expectancy, Latino\n(in years)" "Life Expectancy, Native American\n(in years)" "-" ...
.....
.....
# select only columns with data
le = le[c(1:8)]
# get the names from 3rd row and add to columns
names(le) = le[3,]
# delete rows and columns which I am not interested
le = le[-c(1:3), ]
le = le[, -c(5:7)]
# rename the names of 4th and 5th column
names(le)[c(4,5)] = c("le_black", "le_white")
# make variables as numeric
le = le %>%
mutate(
le_black = as.numeric(le_black),
le_white = as.numeric(le_white))
# check the structure of dataset
str(le)
'data.frame': 51 obs. of 7 variables:
$ Rank : chr "1" "2" "3" "4" ...
$ State : chr "Hawaii" "Minnesota" "Connecticut" "California" ...
$ Life Expectancy, All
(in years): chr "81.3" "81.1" "80.8" "80.8" ...
$ le_black : num NA 79.7 77.8 75.1 78.8 77.4 NA NA 75.5 NA ...
$ le_white : num 80.4 81.2 81 79.8 80.4 80.5 80.4 80.1 80.3 80.1 ...
$ le_diff : num NA 1.5 3.2 4.7 1.6 ...
$ region : chr "hawaii" "minnesota" "connecticut" "california" ...
Since there are some differences in life expectancy between White and African-American, I will calculate the differences and will map it.
le = le %>% mutate(le_diff = (le_white - le_black))
I will load the map data and will merge the datasets togather.
## LOAD THE MAP DATA ####
states = map_data("state")
str(states)
'data.frame': 15537 obs. of 6 variables:
$ long : num -87.5 -87.5 -87.5 -87.5 -87.6 ...
$ lat : num 30.4 30.4 30.4 30.3 30.3 ...
$ group : num 1 1 1 1 1 1 1 1 1 1 ...
$ order : int 1 2 3 4 5 6 7 8 9 10 ...
$ region : chr "alabama" "alabama" "alabama" "alabama" ...
$ subregion: chr NA NA NA NA ...
# create a new variable name for state
le$region = tolower(le$State)
# merge the datasets
states = merge(states, le, by="region", all.x=T)
str(states)
'data.frame': 15537 obs. of 12 variables:
$ region : chr "alabama" "alabama" "alabama" "alabama" ...
$ long : num -87.5 -87.5 -87.5 -87.5 -87.6 ...
$ lat : num 30.4 30.4 30.4 30.3 30.3 ...
$ group : num 1 1 1 1 1 1 1 1 1 1 ...
$ order : int 1 2 3 4 5 6 7 8 9 10 ...
$ subregion : chr NA NA NA NA ...
$ Rank : chr "49" "49" "49" "49" ...
$ State : chr "Alabama" "Alabama" "Alabama" "Alabama" ...
$ Life Expectancy, All
(in years): chr "75.4" "75.4" "75.4" "75.4" ...
$ le_black : num 72.9 72.9 72.9 72.9 72.9 72.9 72.9 72.9 72.9 72.9 ...
$ le_white : num 76 76 76 76 76 76 76 76 76 76 ...
$ le_diff : num 3.1 3.1 3.1 3.1 3.1 ...
Now its time to make the plot. First I will plot the life expectancy in African-American in US. For few states we don’t have the data, and therefore I will color it in grey color.
## MAKE THE PLOT #### # Life expectancy in African American ggplot(states, aes(x = long, y = lat, group = group, fill = le_black)) + geom_polygon(color = "white") + scale_fill_gradient(name = "Years", low = "#ffe8ee", high = "#c81f49", guide = "colorbar", na.value="#eeeeee", breaks = pretty_breaks(n = 5)) + labs(title="Life expectancy in African American") + coord_map()
The code below is for White people in US.
# Life expectancy in White American ggplot(states, aes(x = long, y = lat, group = group, fill = le_white)) + geom_polygon(color = "white") + scale_fill_gradient(name = "Years", low = "#ffe8ee", high = "#c81f49", guide = "colorbar", na.value="Gray", breaks = pretty_breaks(n = 5)) + labs(title="Life expectancy in White") + coord_map()
Finally, I will map the differences between white and African American people in US.
# Differences in Life expectancy between White and African American ggplot(states, aes(x = long, y = lat, group = group, fill = le_diff)) + geom_polygon(color = "white") + scale_fill_gradient(name = "Years", low = "#ffe8ee", high = "#c81f49", guide = "colorbar", na.value="#eeeeee", breaks = pretty_breaks(n = 5)) + labs(title="Differences in Life Expectancy between \nWhite and African Americans by States in US") + coord_map()
On my previous post I got a comment to add the pop-up effect as I hover over the states. This is a simple task as Andrea exmplained in his comment. What you have to do is to install the plotly package, to create a object for ggplot code above, like map_data < - ggplot(states, ... , and then to use this function ggplotly(map_plot) to plot it.
Thats all! Leave a comment below if you have any question.
Related Post
- What can we learn from the statistics of the EURO 2016 – Application of factor analysis
- Visualizing obesity across United States by using data from Wikipedia
- Plotting App for ggplot2 – Part 2
- Mastering R plot – Part 3: Outer margins
- Interactive plotting with rbokeh
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

