Mapping homelessness in England

[This article was first published on R on R-house, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Introduction

For this blog post, I decided to try to find a dataset covering an issue I feel quite strongly about – homelessness. I managed to find a fairly large dataset from the Cambridgeshire Insight website.

For a while I’ve wanted to try out R’s mapping potential and hopefully generate a heatmap, so I’ve deliberately tried to find a dataset where I can try this out. It’s worth saying that this activity has been the most difficult and frustrating project I’ve taken on by far. It’s taken me 6 or 7 sessions to produce this blog, in which the first was me trying to install gganimate (which I ended up not using) and figuring out where to start with mapping.

Data wrangling

Let’s load the required packages and read the data in:

library(tidyverse)
## -- Attaching packages ---------------------------------------------------------------------- tidyverse 1.2.1 --
## v ggplot2 3.0.0     v purrr   0.2.5
## v tibble  1.4.2     v dplyr   0.7.6
## v tidyr   0.8.1     v stringr 1.3.1
## v readr   1.1.1     v forcats 0.3.0
## -- Conflicts ------------------------------------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
library(gifski)
library(sf)
## Linking to GEOS 3.6.1, GDAL 2.2.3, proj.4 4.9.3
data <- read_csv("http://opendata.cambridgeshireinsight.org.uk/files/ci_opendata/P1E-%20national-%20homelessness-CLG-tab784-to2016_1.csv")
## Parsed with column specification:
## cols(
##   .default = col_integer(),
##   `ONS code` = col_character(),
##   `Local authority area` = col_character(),
##   `2009/10 Numbers accepted as homeless and in priority need who are White` = col_character(),
##   `2009/10 Numbers accepted as homeless and in priority need who are Black or Black British` = col_character(),
##   `2009/10 Numbers accepted as homeless and in priority need who are Asian or Asian British` = col_character(),
##   `2009/10 Numbers accepted as homeless and in priority need who are Mixed` = col_character(),
##   `2009/10 Numbers accepted as homeless and in priority need who are Other ethnic origin` = col_character(),
##   `2009/10 Numbers accepted as homeless and in priority need who are Ethnic Group not Stated` = col_character(),
##   `2009/10 Number per 1000 households` = col_double(),
##   `2009/10 Total decisions where eligible homeless & in priority need but intentionally` = col_character(),
##   `2009/10 Total decisions where eligible & homeless but not in priority need` = col_character(),
##   `2009/10 Total decisions where eligible but not homeless` = col_character(),
##   `2009/10 Total homelessness decisions` = col_character(),
##   `31 March 2010 Total households in B&B (including shared annex)` = col_character(),
##   `31 March 2010 Total households in hostels` = col_character(),
##   `31 March 2010 Total households in LA/HA stock` = col_character(),
##   `31 March 2010 Total households in private sector leased (by LA or HA)` = col_character(),
##   `31 March 2010 Total households in other temp (including private landlord)` = col_character(),
##   `31 March 2010 Number per 1000 households` = col_double(),
##   `2010/11 Number per 1000 households` = col_double()
##   # ... with 30 more columns
## )
## See spec(...) for full column specifications.
names(data)
##   [1] "ONS code"                                                                                 
##   [2] "Local authority area"                                                                     
##   [3] "2009/10 Thousands of households 2006 mid-year estimate"                                   
##   [4] "2009/10 Numbers accepted as homeless and in priority need who are White"                  
##   [5] "2009/10 Numbers accepted as homeless and in priority need who are Black or Black British" 
##   [6] "2009/10 Numbers accepted as homeless and in priority need who are Asian or Asian British" 
##   [7] "2009/10 Numbers accepted as homeless and in priority need who are Mixed"                  
##   [8] "2009/10 Numbers accepted as homeless and in priority need who are Other ethnic origin"    
##   [9] "2009/10 Numbers accepted as homeless and in priority need who are Ethnic Group not Stated"
##  [10] "2009/10 Numbers accepted as homeless and in priority need total"                          
##  [11] "2009/10 Number per 1000 households"                                                       
##  [12] "2009/10 Total decisions where eligible homeless & in priority need but intentionally"     
##  [13] "2009/10 Total decisions where eligible & homeless but not in priority need"               
##  [14] "2009/10 Total decisions where eligible but not homeless"                                  
##  [15] "2009/10 Total homelessness decisions"                                                     
##  [16] "31 March 2010 Total households in B&B (including shared annex)"                           
##  [17] "31 March 2010 Total households in hostels"                                                
##  [18] "31 March 2010 Total households in LA/HA stock"                                            
##  [19] "31 March 2010 Total households in private sector leased (by LA or HA)"                    
##  [20] "31 March 2010 Total households in other temp (including private landlord)"                
##  [21] "31 March 2010 Total households in temporary accommodation"                                
##  [22] "31 March 2010 Number per 1000 households"                                                 
##  [23] "2009/10 Duty owed but no accommodation has been secured at end of March 2010"             
##  [24] "2010/11 Thousands of households 2008 mid-year estimate"                                   
##  [25] "2010/11 Numbers accepted as homeless and in priority need who are White"                  
##  [26] "2010/11 Numbers accepted as homeless and in priority need who are Black or Black British" 
##  [27] "2010/11 Numbers accepted as homeless and in priority need who are Asian or Asian British" 
##  [28] "2010/11 Numbers accepted as homeless and in priority need who are Mixed"                  
##  [29] "2010/11 Numbers accepted as homeless and in priority need who are Other ethnic origin"    
##  [30] "2010/11 Numbers accepted as homeless and in priority need who are Ethnic Group not Stated"
##  [31] "2010/11 Numbers accepted as homeless and in priority need total"                          
##  [32] "2010/11 Number per 1000 households"                                                       
##  [33] "2010/11 Total decisions where eligible homeless & in priority need but intentionally"     
##  [34] "2010/11 Total decisions where eligible & homeless but not in priority need"               
##  [35] "2010/11 Total decisions where eligible but not homeless"                                  
##  [36] "2010/11 Total homelessness decisions"                                                     
##  [37] "31 March 2011 Total households in B&B (including shared annex)"                           
##  [38] "31 March 2011 Total households in hostels"                                                
##  [39] "31 March 2011 Total households in LA/HA stock"                                            
##  [40] "31 March 2011 Total households in private sector leased (by LA or HA)"                    
##  [41] "31 March 2011 Total households in other temp (including private landlord)"                
##  [42] "31 March 2011 Total households in temporary accommodation"                                
##  [43] "31 March 2011 Number per 1000 households"                                                 
##  [44] "2010/11 Duty owed but no accommodation has been secured at end of March 2011"             
##  [45] "2011/12 Thousands of households 2008 mid-year estimate"                                   
##  [46] "2011/12 Numbers accepted as homeless and in priority need who are White"                  
##  [47] "2011/12 Numbers accepted as homeless and in priority need who are Black or Black British" 
##  [48] "2011/12 Numbers accepted as homeless and in priority need who are Asian or Asian British" 
##  [49] "2011/12 Numbers accepted as homeless and in priority need who are Mixed"                  
##  [50] "2011/12 Numbers accepted as homeless and in priority need who are Other ethnic origin"    
##  [51] "2011/12 Numbers accepted as homeless and in priority need who are Ethnic Group not Stated"
##  [52] "2011/12 Numbers accepted as homeless and in priority need total"                          
##  [53] "2011/12 Number per 1000 households"                                                       
##  [54] "2011/12 Total decisions where eligible homeless & in priority need but intentionally"     
##  [55] "2011/12 Total decisions where eligible & homeless but not in priority need"               
##  [56] "2011/12 Total decisions where eligible but not homeless"                                  
##  [57] "2011/12 Total homelessness decisions"                                                     
##  [58] "31 March 2012 Total households in B&B (including shared annex)"                           
##  [59] "31 March 2012 Total households in hostels"                                                
##  [60] "31 March 2012 Total households in LA/HA stock"                                            
##  [61] "31 March 2012 Total households in private sector leased (by LA or HA)"                    
##  [62] "31 March 2012 Total households in other temp (including private landlord)"                
##  [63] "31 March 2012 Total households in temporary accommodation"                                
##  [64] "31 March 2012 Number per 1000 households"                                                 
##  [65] "2011/12 Duty owed but no accommodation has been secured at end of March 2012"             
##  [66] "2012/13 Thousands of households 2008-based interim projections for 2012"                  
##  [67] "2012/13 Numbers accepted as homeless and in priority need who are White"                  
##  [68] "2012/13 Numbers accepted as homeless and in priority need who are Black or Black British" 
##  [69] "2012/13 Numbers accepted as homeless and in priority need who are Asian or Asian British" 
##  [70] "2012/13 Numbers accepted as homeless and in priority need who are Mixed"                  
##  [71] "2012/13 Numbers accepted as homeless and in priority need who are Other ethnic origin"    
##  [72] "2012/13 Numbers accepted as homeless and in priority need who are Ethnic Group not Stated"
##  [73] "2012/13 Numbers accepted as homeless and in priority need total"                          
##  [74] "2012/13 Number per 1000 households"                                                       
##  [75] "2012/13 Total decisions where eligible homeless & in priority need but intentionally"     
##  [76] "2012/13 Total decisions where eligible & homeless but not in priority need"               
##  [77] "2012/13 Total decisions where eligible but not homeless"                                  
##  [78] "2012/13 Total homelessness decisions"                                                     
##  [79] "31 March 2013 Total households in B&B (including shared annex)"                           
##  [80] "31 March 2013 Total households in hostels"                                                
##  [81] "31 March 2013 Total households in LA/HA stock"                                            
##  [82] "31 March 2013 Total households in private sector leased (by LA or HA)"                    
##  [83] "31 March 2013 Total households in other temp (including private landlord)"                
##  [84] "31 March 2013 Total households in temporary accommodation"                                
##  [85] "31 March 2013 Number per 1000 households"                                                 
##  [86] "2012/13 Duty owed but no accommodation has been secured at end of March 2013"             
##  [87] "2013/14 Thousands of households 2012-based interim projections for 2013"                  
##  [88] "2013/14 Numbers accepted as homeless and in priority need who are White"                  
##  [89] "2013/14 Numbers accepted as homeless and in priority need who are Black or Black British" 
##  [90] "2013/14 Numbers accepted as homeless and in priority need who are Asian or Asian British" 
##  [91] "2013/14 Numbers accepted as homeless and in priority need who are Mixed"                  
##  [92] "2013/14 Numbers accepted as homeless and in priority need who are Other ethnic origin"    
##  [93] "2013/14 Numbers accepted as homeless and in priority need who are Ethnic Group not Stated"
##  [94] "2013/14 Numbers accepted as homeless and in priority need total"                          
##  [95] "2013/14 Number per 1000 households"                                                       
##  [96] "2013/14 Total decisions where eligible homeless & in priority need but intentionally"     
##  [97] "2013/14 Total decisions where eligible & homeless but not in priority need"               
##  [98] "2013/14 Total decisions where eligible but not homeless"                                  
##  [99] "2013/14 Total homelessness decisions"                                                     
## [100] "31 March 2014 Total households in B&B (including shared annex)"                           
## [101] "31 March 2014 Total households in hostels"                                                
## [102] "31 March 2014 Total households in LA/HA stock"                                            
## [103] "31 March 2014 Total households in private sector leased (by LA or HA)"                    
## [104] "31 March 2014 Total households in other temp (including private landlord)"                
## [105] "31 March 2014 Total households in temporary accommodation"                                
## [106] "31 March 2014 Number per 1000 households"                                                 
## [107] "2013/14 Duty owed but no accommodation has been secured at end of March 2014"             
## [108] "2014/15 Thousands of households 2012-based interim projections for 2014"                  
## [109] "2014/15 Numbers accepted as homeless and in priority need who are White"                  
## [110] "2014/15 Numbers accepted as homeless and in priority need who are Black or Black British" 
## [111] "2014/15 Numbers accepted as homeless and in priority need who are Asian or Asian British" 
## [112] "2014/15 Numbers accepted as homeless and in priority need who are Mixed"                  
## [113] "2014/15 Numbers accepted as homeless and in priority need who are Other ethnic origin"    
## [114] "2014/15 Numbers accepted as homeless and in priority need who are Ethnic Group not Stated"
## [115] "2014/15 Numbers accepted as homeless and in priority need total"                          
## [116] "2014/15 Number per 1000 households"                                                       
## [117] "2014/15 Total decisions where eligible homeless & in priority need but intentionally"     
## [118] "2014/15 Total decisions where eligible & homeless but not in priority need"               
## [119] "2014/15 Total decisions where eligible but not homeless"                                  
## [120] "2014/15 Total homelessness decisions"                                                     
## [121] "31 March 2015 Total households in B&B (including shared annex)"                           
## [122] "31 March 2015 Total households in hostels"                                                
## [123] "31 March 2015 Total households in LA/HA stock"                                            
## [124] "31 March 2015 Total households in private sector leased (by LA or HA)"                    
## [125] "31 March 2015 Total households in other temp (including private landlord)"                
## [126] "31 March 2015 Total households in temporary accommodation"                                
## [127] "31 March 2015 Number per 1000 households"                                                 
## [128] "2014/15 Duty owed but no accommodation has been secured at end of March 2015"             
## [129] "2015/16 Thousands of households 2012-based interim projections for 2015"                  
## [130] "2015/16 Numbers accepted as homeless and in priority need who are White"                  
## [131] "2015/16 Numbers accepted as homeless and in priority need who are Black or Black British" 
## [132] "2015/16 Numbers accepted as homeless and in priority need who are Asian or Asian British" 
## [133] "2015/16 Numbers accepted as homeless and in priority need who are Mixed"                  
## [134] "2015/16 Numbers accepted as homeless and in priority need who are Other ethnic origin"    
## [135] "2015/16 Numbers accepted as homeless and in priority need who are Ethnic Group not Stated"
## [136] "2015/16 Numbers accepted as homeless and in priority need total"                          
## [137] "2015/16 Number per 1000 households"                                                       
## [138] "2015/16 Total decisions where eligible homeless & in priority need but intentionally"     
## [139] "2015/16 Total decisions where eligible & homeless but not in priority need"               
## [140] "2015/16 Total decisions where eligible but not homeless"                                  
## [141] "2015/16 Total homelessness decisions"                                                     
## [142] "31 March 2016 Total households in B&B (including shared annex)"                           
## [143] "31 March 2016 Total households in hostels"                                                
## [144] "31 March 2016 Total households in LA/HA stock"                                            
## [145] "31 March 2016 Total households in private sector leased (by LA or HA)"                    
## [146] "31 March 2016 Total households in other temp (including private landlord)"                
## [147] "31 March 2016 Total households in temporary accommodation"                                
## [148] "31 March 2016 Number per 1000 households"                                                 
## [149] "2015/16 Duty owed but no accommodation has been secured at end of March 2015"

The first thing to do is to try to hone in on some data I’d like to use. A quick scan of the columns and the “Local authority area” looks critical, and I’d like to see if I have yearly data for “Numbers accepted as homeless and in priority need total”:

ind <- str_detect(names(data), "priority need total")
names(data)[ind]
## [1] "2009/10 Numbers accepted as homeless and in priority need total"
## [2] "2010/11 Numbers accepted as homeless and in priority need total"
## [3] "2011/12 Numbers accepted as homeless and in priority need total"
## [4] "2012/13 Numbers accepted as homeless and in priority need total"
## [5] "2013/14 Numbers accepted as homeless and in priority need total"
## [6] "2014/15 Numbers accepted as homeless and in priority need total"
## [7] "2015/16 Numbers accepted as homeless and in priority need total"

This looks to fit the bill. Now I’ve honed in on the columns I need, let’s have a look at the structure and distribution of the data:

data_trim <- data %>% select(2, names(data)[ind])

str(data_trim, give.attr = FALSE)
## Classes 'tbl_df', 'tbl' and 'data.frame':    327 obs. of  8 variables:
##  $ Local authority area                                           : chr  "ENGLAND" "Adur" "Allerdale" "Amber Valley" ...
##  $ 2009/10 Numbers accepted as homeless and in priority need total: int  40020 71 102 30 52 42 178 93 37 232 ...
##  $ 2010/11 Numbers accepted as homeless and in priority need total: int  44160 90 104 46 79 25 194 112 46 221 ...
##  $ 2011/12 Numbers accepted as homeless and in priority need total: int  50290 58 63 53 100 16 161 126 78 199 ...
##  $ 2012/13 Numbers accepted as homeless and in priority need total: int  53770 37 41 61 129 26 199 133 100 664 ...
##  $ 2013/14 Numbers accepted as homeless and in priority need total: int  52290 10 26 64 109 85 166 116 86 853 ...
##  $ 2014/15 Numbers accepted as homeless and in priority need total: int  54430 7 30 117 191 87 152 160 86 764 ...
##  $ 2015/16 Numbers accepted as homeless and in priority need total: chr  "57740" "16" "32" "101" ...
summary(data_trim)
##  Local authority area
##  Length:327          
##  Class :character    
##  Mode  :character    
##                      
##                      
##                      
##  2009/10 Numbers accepted as homeless and in priority need total
##  Min.   :    1.0                                                
##  1st Qu.:   30.0                                                
##  Median :   63.0                                                
##  Mean   :  244.8                                                
##  3rd Qu.:  136.0                                                
##  Max.   :40020.0                                                
##  2010/11 Numbers accepted as homeless and in priority need total
##  Min.   :    1.0                                                
##  1st Qu.:   36.5                                                
##  Median :   73.0                                                
##  Mean   :  270.1                                                
##  3rd Qu.:  149.0                                                
##  Max.   :44160.0                                                
##  2011/12 Numbers accepted as homeless and in priority need total
##  Min.   :    0.0                                                
##  1st Qu.:   41.0                                                
##  Median :   85.0                                                
##  Mean   :  307.6                                                
##  3rd Qu.:  168.0                                                
##  Max.   :50290.0                                                
##  2012/13 Numbers accepted as homeless and in priority need total
##  Min.   :    0.0                                                
##  1st Qu.:   38.0                                                
##  Median :   78.0                                                
##  Mean   :  326.4                                                
##  3rd Qu.:  178.5                                                
##  Max.   :53770.0                                                
##  2013/14 Numbers accepted as homeless and in priority need total
##  Min.   :    0.0                                                
##  1st Qu.:   38.5                                                
##  Median :   82.0                                                
##  Mean   :  319.8                                                
##  3rd Qu.:  174.5                                                
##  Max.   :52290.0                                                
##  2014/15 Numbers accepted as homeless and in priority need total
##  Min.   :    0.0                                                
##  1st Qu.:   39.0                                                
##  Median :   87.0                                                
##  Mean   :  332.9                                                
##  3rd Qu.:  185.0                                                
##  Max.   :54430.0                                                
##  2015/16 Numbers accepted as homeless and in priority need total
##  Length:327                                                     
##  Class :character                                               
##  Mode  :character                                               
##                                                                 
##                                                                 
## 

I can see that apart from the annoyingly long column names, I seem to have the totals for the whole of England in the first row. So let’s fix these issues:

data_trim <- data_trim %>%
                 slice(-1) %>%
                 set_names("LAA", 2009:2015)

head(data_trim, 20)
## # A tibble: 20 x 8
##    LAA                    `2009` `2010` `2011` `2012` `2013` `2014` `2015`
##    <chr>                   <int>  <int>  <int>  <int>  <int>  <int> <chr> 
##  1 Adur                       71     90     58     37     10      7 16    
##  2 Allerdale                 102    104     63     41     26     30 32    
##  3 Amber Valley               30     46     53     61     64    117 101   
##  4 Arun                       52     79    100    129    109    191 228   
##  5 Ashfield                   42     25     16     26     85     87 93    
##  6 Ashford                   178    194    161    199    166    152 154   
##  7 Aylesbury Vale             93    112    126    133    116    160 177   
##  8 Babergh                    37     46     78    100     86     86 94    
##  9 Barking and Dagenham      232    221    199    664    853    764 941   
## 10 Barnet                    232    251    339    595    674    677 422   
## 11 Barnsley                   95     56     38     23     14     13 14    
## 12 Barrow-in-Furness          40     26     29     29     19     17 18    
## 13 Basildon                  191    232    255    282    302    351 208   
## 14 Basingstoke and Deane       1      1      2     11     22     54 46    
## 15 Bassetlaw                  18     27     48     75     41     91 65    
## 16 Bath and North East S~     68    100     86     86     65     48 68    
## 17 Bedford UA                141    107    211    242    174    164 287   
## 18 Bexley                    128    204    346    349    420    498 483   
## 19 Birmingham               3371   4207   3929   3957   3160   3140 3524  
## 20 Blaby                       2      7      2      1      0      6 11

That’s looking a bit better. I notice that there seems to be a stray “UA” at the end of some LAAs. From the output of the summary() function above, I can also see that the 2015/16 column seems to have been parsed as a character, so there’s probably some non-numeric character in there. Let’s see how many places these issues affect:

data_trim %>% filter(str_detect(LAA, " UA")) %>% select(LAA)
## # A tibble: 56 x 1
##    LAA                            
##    <chr>                          
##  1 Bath and North East Somerset UA
##  2 Bedford UA                     
##  3 Blackburn with Darwen UA       
##  4 Blackpool UA                   
##  5 Bournemouth UA                 
##  6 Bracknell Forest UA            
##  7 Brighton and Hove UA           
##  8 Bristol City of UA             
##  9 Central Bedfordshire UA        
## 10 Cheshire East UA               
## # ... with 46 more rows
data_trim %>% filter(str_detect(`2015`, "[^0-9]+")) %>% select(LAA, `2015`)
## # A tibble: 5 x 2
##   LAA                `2015`
##   <chr>              <chr> 
## 1 Chorley            -     
## 2 Eden               -     
## 3 Hyndburn           -     
## 4 Isles of Scilly UA -     
## 5 Waverley           -

56 place names ending in “UA” and five places without data in 2015! Let’s update our trimmed data to fix these issues, and make the data tidy by gathering the year headers into their own column:

data_tidy <- data_trim %>%
                mutate(LAA = str_replace(LAA, " UA", "")) %>%
                mutate(`2015` = str_replace(`2015`, "-", NA_character_) %>% as.integer()) %>%
                gather(year, num_homeless, -LAA) %>%
                mutate(year = as.integer(year))

str(data_tidy)
## Classes 'tbl_df', 'tbl' and 'data.frame':    2282 obs. of  3 variables:
##  $ LAA         : chr  "Adur" "Allerdale" "Amber Valley" "Arun" ...
##  $ year        : int  2009 2009 2009 2009 2009 2009 2009 2009 2009 2009 ...
##  $ num_homeless: int  71 102 30 52 42 178 93 37 232 232 ...

Initial analysis

Now I have the data in a more manageable format, let’s quickly plot the top 6 homelessness figures in each year:

data_tidy %>%
  group_by(year) %>%
  arrange(year, desc(num_homeless)) %>% 
  top_n(6) %>%
  ggplot(aes(x = LAA, y = num_homeless)) +
    geom_bar(stat = "identity") +
    coord_flip() +
    facet_wrap(~ year, ncol=2, scales="free_y")
## Selecting by num_homeless

We can see that Birmingham is by far the worst offender. I’m not sure of the accuracy of these figures, but if true that is truly horrifying and it hadn’t seemed to have got any better up to 2015. Which areas have seen the most drastic improvement/deterioration over the 7 years?:

extremes <- data_tidy %>%
                  drop_na() %>%
                  filter(year %in% c(2009, 2015)) %>%
                  group_by(LAA) %>%
                  mutate(homeless2009 = lag(num_homeless),
                         change = num_homeless - homeless2009) %>% 
                  ungroup() %>%
                  drop_na() %>%
                  arrange(change)

bind_rows(head(extremes, 8), tail(extremes, 8))
## # A tibble: 16 x 5
##    LAA                   year num_homeless homeless2009 change
##    <chr>                <int>        <int>        <int>  <int>
##  1 Sheffield             2015          421          946   -525
##  2 Coventry              2015          129          538   -409
##  3 North Tyneside        2015          149          502   -353
##  4 Derby                 2015           28          321   -293
##  5 Croydon               2015          222          425   -203
##  6 Durham                2015           70          264   -194
##  7 Cornwall              2015          250          419   -169
##  8 Tower Hamlets         2015          522          690   -168
##  9 Craven                2015          560            8    552
## 10 Milton Keynes         2015          789           84    705
## 11 Barking and Dagenham  2015          941          232    709
## 12 Bristol City of       2015         1006          285    721
## 13 Waltham Forest        2015         1087          286    801
## 14 Enfield               2015         1131          241    890
## 15 Dacorum               2015         1006           14    992
## 16 Newham                2015         1345           97   1248

Sheffield was the most improved with a reduction of over 500, with Newham seeing a massive increase of over 1200.

The painful part

So having never done any geospatial analysis or mapping before, I tried doing some Google searches to see if I could find any code I could use. I quickly discovered that if I was going to do any mapping of UK regions, I was going to need to access some shape files.

I managed to download some from the UK Data Service website. I also had enormous trouble getting the function to read the data from within this blog post, but I managed to make it work using the here package, which I’ve since heard good things about on Twitter.

shapes <- st_read(dsn = paste(here::here(),"./data/homelessness/BoundaryData", sep="/"), layer = "infuse_dist_lyr_2011") %>% arrange(name)
## Reading layer `infuse_dist_lyr_2011' from data source `C:\Users\J\Documents\r-house\data\homelessness\BoundaryData' using driver `ESRI Shapefile'
## Simple feature collection with 324 features and 5 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: 82643.6 ymin: 5333.602 xmax: 655989 ymax: 657599.5
## epsg (SRID):    NA
## proj4string:    +proj=tmerc +lat_0=49 +lon_0=-2 +k=0.9996012717 +x_0=400000 +y_0=-100000 +datum=OSGB36 +units=m +no_defs
str(shapes)
## Classes 'sf' and 'data.frame':   324 obs. of  6 variables:
##  $ name      : Factor w/ 324 levels "Adur","Allerdale",..: 1 2 3 4 5 6 7 8 9 10 ...
##  $ label     : Factor w/ 324 levels "E92000001E06000001",..: 243 64 70 244 195 136 55 220 292 293 ...
##  $ geo_labelw: Factor w/ 0 levels: NA NA NA NA NA NA NA NA NA NA ...
##  $ geo_label : Factor w/ 324 levels "Adur","Allerdale",..: 1 2 3 4 5 6 7 8 9 10 ...
##  $ geo_code  : Factor w/ 324 levels "E06000001","E06000002",..: 243 64 70 244 195 136 55 220 292 293 ...
##  $ geometry  :sfc_MULTIPOLYGON of length 324; first list element: List of 1
##   ..$ :List of 1
##   .. ..$ : num [1:2718, 1:2] 515970 515951 515901 515901 515855 ...
##   ..- attr(*, "class")= chr  "XY" "MULTIPOLYGON" "sfg"
##  - attr(*, "sf_column")= chr "geometry"
##  - attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA NA NA NA NA
##   ..- attr(*, "names")= chr  "name" "label" "geo_labelw" "geo_label" ...

With the intent of joining my dataframes together, I identified an inconsistency in the areas given in each table (diff() is a very handy function!):

n_distinct(data_tidy$LAA)
## [1] 326
n_distinct(shapes$name)
## [1] 324
data_diff <- setdiff(data_tidy$LAA, shapes$name)
shapes_diff <- setdiff(shapes$name, data_tidy$LAA)

data_frame(data = data_diff,
           shapes = c(shapes_diff,"",""))
## # A tibble: 11 x 2
##    data                       shapes                     
##    <chr>                      <chr>                      
##  1 Bristol City of            Bristol, City of           
##  2 City of London             City of London,Westminster 
##  3 Cornwall                   Cornwall,Isles of Scilly   
##  4 Durham                     County Durham              
##  5 Herefordshire County of    Herefordshire, County of   
##  6 Isles of Scilly            Kingston upon Hull, City of
##  7 Kingston upon Hull City of St Albans                  
##  8 St Helens                  St Edmundsbury             
##  9 St. Albans                 St. Helens                 
## 10 St. Edmundsbury            ""                         
## 11 Westminster                ""

You can see from the output above that my homelessness data has split out Westminster from the City of London, and the Isles of Scilly from Cornwall. There are also some grammatical inconsistencies that need to be sorted out. Let’s clean it up, by combining rows

data_final <- data_tidy %>%
              #mutate_at(vars("year", "num_homeless"), as.numeric) %>% 
              mutate(LAA = ifelse(LAA %in% c("City of London","Westminster"),
                                   "City of London,Westminster",
                                   LAA)) %>%
              mutate(LAA = ifelse(LAA %in% c("Cornwall","Isles of Scilly"),
                                   "Cornwall,Isles of Scilly",
                                   LAA)) %>%
              mutate(LAA = ifelse(LAA == "Bristol City of","Bristol, City of",LAA)) %>% 
              mutate(LAA = ifelse(LAA == "Durham","County Durham",LAA)) %>%
              mutate(LAA = ifelse(LAA == "Herefordshire County of","Herefordshire, County of",LAA)) %>%
              mutate(LAA = ifelse(LAA == "Kingston upon Hull City of","Kingston upon Hull, City of",LAA)) %>%
              mutate(LAA = ifelse(LAA == "St Helens","St. Helens",LAA)) %>%
              mutate(LAA = ifelse(LAA == "St. Albans","St Albans",LAA)) %>%
              mutate(LAA = ifelse(LAA == "St. Edmundsbury","St Edmundsbury",LAA)) %>%
              mutate(LAA = as.factor(LAA)) %>%
              group_by(LAA, year) %>% 
              summarise(total_homeless = sum(num_homeless)) %>%
              ungroup()

Next, I created a function to take a year and a set of regions and generate a heatmap. This function filters the homelessness data, joins it with the shape data, and then plots the data. I’ve included regions as an argument so that Birmingham can be filtered out, as it dominates the heatmap.

heatmap <- function(inp_year, regions) {
  
data_joined <- data_final %>%
                  filter(year==inp_year) %>%
                  filter(LAA %in% regions) %>%
                  right_join(shapes, by = c("LAA"="name"))

max_scale <- max(data_final %>%
                  filter(LAA %in% regions) %>%
                  select(total_homeless), na.rm=TRUE)

  p <- ggplot() +
  geom_sf(data=data_joined, aes(fill=total_homeless), col="black") +
    theme_void() + coord_sf(datum=NA) + 
    scale_fill_viridis_c(name = NULL, option = "magma",
                         limits = c(0, max_scale),
                         breaks = c(0, max_scale/2, max_scale)) +
    labs(title = paste0("Total number of people accepted as homeless and in priority need in England in ",inp_year),
       caption = "Data obtained from  http://opendata.cambridgeshireinsight.org.uk/dataset/homelessness-england")
  print(p)
}

regions_to_include <- unique(setdiff(data_final$LAA, "Birmingham"))

save_gif(walk(min(data_final$year):max(data_final$year), heatmap, regions = regions_to_include), 
         delay = 0.7, gif_file = "animation.gif")
Homelessness heatmap

Homelessness heatmap

I certainly feel this project has been a bit of a hack job. It’s taken me over a month to write because it’s been so challenging and I’ve had to leave and come back to it so many times. I’m not proud of it, mainly because I rushed it at the end because I just wanted it done.

I’ve since used Tableau, and that seems a bit easier to do heatmaps. If I were to do it again in R however, I think I’ll be taking the courses on DataCamp first!

To leave a comment for the author, please follow the link and comment on their blog: R on R-house.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)