Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

One of the biggest problems of the COVID-19 pandemic is that there are no reliable numbers of infections. This fact renders many model projections next to useless.

If you want to get to know a simple method how to roughly estimate the real number of infections and expected deaths in the US, read on!

As we have seen many times on this blog: simple doesn’t always mean inferior, it only means more comprehensible! The following estimation is based on a simple idea from an article in DER SPIEGEL (H. Dambeck: Was uns die Zahl der Toten verrät).

The general idea goes like this:

• The number of people having died from COVID-19 is much more reliable than the number of infections.
• Our best estimate of the true fatality rate of COVID-19 still is 0.7% of the number of infected persons and
• we know that the time from infection to death is about 10 days.

With this knowledge, we can infer the people that got actually infected 10 days ago and deduce the percentage of confirmed vs. actually infected persons:

# https://en.wikipedia.org/wiki/Template:2019%E2%80%9320_coronavirus_pandemic_data/United_States_medical_cases
new_inf <- c(1, 1, 1, 2, 1, 1, 1, 3, 1, 0, 2, 0, 1, 4, 5, 18, 15, 28, 26, 64, 77, 101, 144, 148, 291, 269, 393, 565, 662, 676, 872, 1291, 2410, 3948, 5417, 6271, 8631, 10410, 9939, 12226, 17050, 19046, 20093, 19118, 20463, 25396, 26732, 28812, 32182, 34068, 25717, 29362)
deaths <- c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 4, 3, 2, 0, 3, 5, 2, 5, 5, 6, 4, 8, 7, 6, 14, 21, 26, 52, 55, 68, 110, 111, 162, 225, 253, 433, 447, 392, 554, 821, 940, 1075, 1186, 1352, 1175, 1214)
data <- data.frame(new_inf, deaths)
n <- length(new_inf)

shift <- function(x, n = 10){
c(x[-(seq(n))], rep(NA, n))
}

data$real_inf <- shift(round(data$deaths / 0.007))
data$perc_real <- round(data$new_inf / data$real_inf, 4) data ## new_inf deaths real_inf perc_real ## 1 1 0 0 Inf ## 2 1 0 0 Inf ## 3 1 0 0 Inf ## 4 2 0 0 Inf ## 5 1 0 143 0.0070 ## 6 1 0 143 0.0070 ## 7 1 0 571 0.0018 ## 8 3 0 429 0.0070 ## 9 1 0 286 0.0035 ## 10 0 0 0 NaN ## 11 2 0 429 0.0047 ## 12 0 0 714 0.0000 ## 13 1 0 286 0.0035 ## 14 4 0 714 0.0056 ## 15 5 1 714 0.0070 ## 16 18 1 857 0.0210 ## 17 15 4 571 0.0263 ## 18 28 3 1143 0.0245 ## 19 26 2 1000 0.0260 ## 20 64 0 857 0.0747 ## 21 77 3 2000 0.0385 ## 22 101 5 3000 0.0337 ## 23 144 2 3714 0.0388 ## 24 148 5 7429 0.0199 ## 25 291 5 7857 0.0370 ## 26 269 6 9714 0.0277 ## 27 393 4 15714 0.0250 ## 28 565 8 15857 0.0356 ## 29 662 7 23143 0.0286 ## 30 676 6 32143 0.0210 ## 31 872 14 36143 0.0241 ## 32 1291 21 61857 0.0209 ## 33 2410 26 63857 0.0377 ## 34 3948 52 56000 0.0705 ## 35 5417 55 79143 0.0684 ## 36 6271 68 117286 0.0535 ## 37 8631 110 134286 0.0643 ## 38 10410 111 153571 0.0678 ## 39 9939 162 169429 0.0587 ## 40 12226 225 193143 0.0633 ## 41 17050 253 167857 0.1016 ## 42 19046 433 173429 0.1098 ## 43 20093 447 NA NA ## 44 19118 392 NA NA ## 45 20463 554 NA NA ## 46 25396 821 NA NA ## 47 26732 940 NA NA ## 48 28812 1075 NA NA ## 49 32182 1186 NA NA ## 50 34068 1352 NA NA ## 51 25717 1175 NA NA ## 52 29362 1214 NA NA We see that only up to 10% of actual infections are being officially registered (although fortunately this ratio is growing). Based on this percentage, we can extrapolate the number of actual infections from the number of confirmed infections and multiply it by the death rate to arrive at the number of projected deaths for the next 10 days, i.e. over the Easter weekend: # how many are actually newly infected? (real_inf <- round(tail(data$new_inf, 10) / mean(data\$perc_real[(n-12):(n-10)])))
##  [1] 219436 208788 223477 277350 291940 314656 351460 372057 280855 320663

# how many will die in the coming 10 days?
round(real_inf * 0.007)
##  [1] 1536 1462 1564 1941 2044 2203 2460 2604 1966 2245

Unfortunately, the numbers do not bode well: this simple projection shows that, with over 300,000 new infections per day, there is a realistic possibility to break the 2,000 deaths-per-day barrier at Easter.

Remember: this is not based on some fancy model but only on the numbers of people that probably got infected already! This is why this method cannot project beyond the 10-day horizon, yet should be more accurate than many a model tossed around at the moment (which are mainly based on mostly unreliable data).