COVID-19 Update by London Borough

[This article was first published on R & Decision Making, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

x-axis: latest 7 day average for new daily lab-confirmed cases per 10000 person

y-axis: the 30 day increase of the new cases, measured by log ratio
The closer to the bottom left the better the borough is doing.

I then created a distance metric between the borough and Greenwich as a measure of severity:
Severity = Average(Percentile Rank along x-axis, Percentile Rank along y-axis)
which I then set up as a dependent variable.
  • Some suggests there maybe clusters related to Eid festival. However there is actually a negative and insignificant coefficient when I regress vs. ethnicity data. (1)
  • Maybe wealth can be a proxy for oversea mobility and non-white-collar-job. If I use house price (link in the code) as a proxy to wealth, and result again is very weak.
  • No relationship with population density of the borough either.
Can you spot any pattern? 


Full code below:

data_url <- ""
raw_data <- fread(data_url, check.names = TRUE)
#Useful to filter out only London councils and proxy wealth
house_prices_data <- fread("land-registry-house-prices-borough.csv")
pop_density_data <- fread("housing-density-borough.csv")
pop_density_data <- pop_density_data[Year == 2020,
                                     .(population = mean(Population),
                                       density = mean(Population_per_square_kilometre)),
                                     by = “Code”]
london_data <- raw_data[Area.code %in% house_prices_data[,unique(Code)] &
                          Area.type == “utla”,,]
london_data[, := as.Date(]
london_data <- merge(london_data,
                     data.table( = seq(
                       by = “1 day”
                     )), all = TRUE, by = “”)
setnafill(london_data, type = “const”, fill = 0,
          cols = c(“Daily.lab.confirmed.cases”))
london_data[,roll_mean := frollmean(Daily.lab.confirmed.cases, n = 7, align = “right”),
            by = “Area.code”]
#Exclude last 3 days due to lag
london_data_summary <- london_data[ <= Sys.Date() -3,
  .(council =[1],
    latest_avg = roll_mean[.N],
    rate_of_new_cases = log(roll_mean[.N]/roll_mean[.N-30])),
by = “Area.code”]
london_data_summary <- merge(london_data_summary, pop_density_data, by.x = "Area.code", by.y="Code")
ggplot(london_data_summary, aes(x = latest_avg/population * 10000, y = rate_of_new_cases, label = council)) +
  geom_point(color = “#ff0000”) +
  geom_text_repel() +
  labs(x=”Latest 7 day average per 10000″, y = “Rate of increase (log ratio) over past 30 days”) +
  theme_bw() +
  ggtitle(“London: Latest 7 Day Average on New Cases and its Rate of Increase by Councils”)

#Statistical test#
rate_of_increase_ecdf <- ecdf(london_data_summary[,rate_of_new_cases])
daily_increase_ecdf <- ecdf(london_data_summary[,latest_avg]/london_data_summary[,population])
london_data_summary[,severity := (daily_increase_ecdf(latest_avg/population) + 
summary(lm(severity ~ population + density,
              data = london_data_summary))

To leave a comment for the author, please follow the link and comment on their blog: R & Decision Making. offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)