COVID-19 in the US: Back-of-the-Envelope Calculation of Actual Infections and Future Deaths

[This article was first published on R-Bloggers – Learning Machines, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.


One of the biggest problems of the COVID-19 pandemic is that there are no reliable numbers of infections. This fact renders many model projections next to useless.

If you want to get to know a simple method how to roughly estimate the real number of infections and expected deaths in the US, read on!

As we have seen many times on this blog: simple doesn’t always mean inferior, it only means more comprehensible! The following estimation is based on a simple idea from an article in DER SPIEGEL (H. Dambeck: Was uns die Zahl der Toten verrät).

The general idea goes like this:

  • The number of people having died from COVID-19 is much more reliable than the number of infections.
  • Our best estimate of the true fatality rate of COVID-19 still is 0.7% of the number of infected persons and
  • we know that the time from infection to death is about 10 days.

With this knowledge, we can infer the people that got actually infected 10 days ago and deduce the percentage of confirmed vs. actually infected persons:

# https://en.wikipedia.org/wiki/Template:2019%E2%80%9320_coronavirus_pandemic_data/United_States_medical_cases
new_inf <- c(1, 1, 1, 2, 1, 1, 1, 3, 1, 0, 2, 0, 1, 4, 5, 18, 15, 28, 26, 64, 77, 101, 144, 148, 291, 269, 393, 565, 662, 676, 872, 1291, 2410, 3948, 5417, 6271, 8631, 10410, 9939, 12226, 17050, 19046, 20093, 19118, 20463, 25396, 26732, 28812, 32182, 34068, 25717, 29362)
deaths <- c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 4, 3, 2, 0, 3, 5, 2, 5, 5, 6, 4, 8, 7, 6, 14, 21, 26, 52, 55, 68, 110, 111, 162, 225, 253, 433, 447, 392, 554, 821, 940, 1075, 1186, 1352, 1175, 1214)
data <- data.frame(new_inf, deaths)
n <- length(new_inf)

shift <- function(x, n = 10){
  c(x[-(seq(n))], rep(NA, n))
}

data$real_inf <- shift(round(data$deaths / 0.007))
data$perc_real <- round(data$new_inf / data$real_inf, 4)
data
##    new_inf deaths real_inf perc_real
## 1        1      0        0       Inf
## 2        1      0        0       Inf
## 3        1      0        0       Inf
## 4        2      0        0       Inf
## 5        1      0      143    0.0070
## 6        1      0      143    0.0070
## 7        1      0      571    0.0018
## 8        3      0      429    0.0070
## 9        1      0      286    0.0035
## 10       0      0        0       NaN
## 11       2      0      429    0.0047
## 12       0      0      714    0.0000
## 13       1      0      286    0.0035
## 14       4      0      714    0.0056
## 15       5      1      714    0.0070
## 16      18      1      857    0.0210
## 17      15      4      571    0.0263
## 18      28      3     1143    0.0245
## 19      26      2     1000    0.0260
## 20      64      0      857    0.0747
## 21      77      3     2000    0.0385
## 22     101      5     3000    0.0337
## 23     144      2     3714    0.0388
## 24     148      5     7429    0.0199
## 25     291      5     7857    0.0370
## 26     269      6     9714    0.0277
## 27     393      4    15714    0.0250
## 28     565      8    15857    0.0356
## 29     662      7    23143    0.0286
## 30     676      6    32143    0.0210
## 31     872     14    36143    0.0241
## 32    1291     21    61857    0.0209
## 33    2410     26    63857    0.0377
## 34    3948     52    56000    0.0705
## 35    5417     55    79143    0.0684
## 36    6271     68   117286    0.0535
## 37    8631    110   134286    0.0643
## 38   10410    111   153571    0.0678
## 39    9939    162   169429    0.0587
## 40   12226    225   193143    0.0633
## 41   17050    253   167857    0.1016
## 42   19046    433   173429    0.1098
## 43   20093    447       NA        NA
## 44   19118    392       NA        NA
## 45   20463    554       NA        NA
## 46   25396    821       NA        NA
## 47   26732    940       NA        NA
## 48   28812   1075       NA        NA
## 49   32182   1186       NA        NA
## 50   34068   1352       NA        NA
## 51   25717   1175       NA        NA
## 52   29362   1214       NA        NA

We see that only up to 10% of actual infections are being officially registered (although fortunately this ratio is growing). Based on this percentage, we can extrapolate the number of actual infections from the number of confirmed infections and multiply it by the death rate to arrive at the number of projected deaths for the next 10 days, i.e. over the Easter weekend:

# how many are actually newly infected?
(real_inf <- round(tail(data$new_inf, 10) / mean(data$perc_real[(n-12):(n-10)])))
##  [1] 219436 208788 223477 277350 291940 314656 351460 372057 280855 320663

# how many will die in the coming 10 days?
round(real_inf * 0.007)
##  [1] 1536 1462 1564 1941 2044 2203 2460 2604 1966 2245

Unfortunately, the numbers do not bode well: this simple projection shows that, with over 300,000 new infections per day, there is a realistic possibility to break the 2,000 deaths-per-day barrier at Easter.

Remember: this is not based on some fancy model but only on the numbers of people that probably got infected already! This is why this method cannot project beyond the 10-day horizon, yet should be more accurate than many a model tossed around at the moment (which are mainly based on mostly unreliable data).

We will soon see how all of this pans out… please share your thoughts and your own calculations in the comments below.

I wish you, despite the grim circumstances, a Happy Easter!

…and heed what Jesus would do in times of social distancing!

To leave a comment for the author, please follow the link and comment on their blog: R-Bloggers – Learning Machines.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)