Testing for Covid-19 in the U.S.

Posted on April 28, 2020 by arthur charpentier in R bloggers | 0 Comments

[This article was first published on R-english – Freakonometrics, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

For almost a month, on a daily basis, we are working with colleagues (Romuald, Chi and Mathieu) on modeling the dynamics of the recent pandemic. I learn of lot of things discussing with them, but we keep struggling with the tests. Paul, in Montréal, helped me a little bit, but I think we will still have to more to get a better understand. To but honest, we stuggle with two very simple questions

how many people are tested on a daily basis ?

Recently, I discovered Modelling COVID-19 exit strategies for policy makers in the United Kingdom, which is very close to what we try to do… and in the document two interesting scenarios are discussed, with, for the first one, “1 million ‘reliable’ daily tests are deployed” (in the U.K.) and “5 million ‘useless’ daily tests are deployed”. There are about 65 millions unhabitants in the U.K. so we talk here about 1.5% people tested, on a daily basis, or 7.69% people ! It could make sense, but our question was, at some point, is that realistic ? where are we today with testing ? In the U.S. https://covidtracking.com/ collects interesting data, on a daily basis, per state.

url = "https://raw.githubusercontent.com/COVID19Tracking/covid-tracking-data/master/data/states_daily_4pm_et.csv"
download.file(url,destfile="covid.csv")
base = read.csv("covid.csv")

Unfortunately, there is no information about the population. That we can find on wikipedia. But in that table, the state is given by its full name (and the symbol in the previous dataset). So we new also to match the two datasets properly,

url="https://en.wikipedia.org/wiki/List_of_states_and_territories_of_the_United_States_by_population"
download.file(url,destfile = "popUS.html")
#pas contaminé 2/3 R=3
library(XML)
tables=readHTMLTable("popUS.html")
T=tables[[1]][3:54,c("V3","V4")]
names(T)=c("state","pop")
url="https://en.wikipedia.org/wiki/List_of_U.S._state_abbreviations"
download.file(url,destfile = "nameUS.html")
tables=readHTMLTable("nameUS.html")
T2=tables[[1]][13:63,c(1,4)]
names(T2)=c("state","symbol")
T=merge(T,T2)
T$population = as.numeric(gsub(",", "", T$pop, fixed = TRUE))
names(base)[2]="symbol"
base = merge(base,T[,c("symbol","population")])

Now our dataset is fine… and we can get a function to plot the number of people tested in the U.S. (cumulated). Here, we distinguish between the positive and the negative,

drawing = function(st ="NY"){
sbase=base[base$symbol==st,c("date","positive","negative","population")]
sbase$DATE = as.Date(as.character(sbase$date),"%Y%m%d")
sbase=sbase[order(sbase$DATE),]
par(mfrow=c(1,2))
plot(sbase$DATE,(sbase$positive+sbase$negative)/sbase$population,ylab="Proportion Test (/population of state)",type="l",xlab="",col="blue",lwd=3)
lines(sbase$DATE,sbase$positive/sbase$population,col="red",lwd=2)
legend("topleft",c("negative","positive"),lwd=2,col=c("blue","red"),bty="n")
title(st)
plot(sbase$DATE,sbase$positive/(sbase$positive+sbase$negative),ylab="Ratio of positive tests",ylim=c(0,1),type="l",xlab="",col="black",lwd=3)
title(st)}

Let us start with New York

drawing("NY")

As at now, 4% of the entiere population got tested… over 6 weeks…. The graph on the right is the proportion of people who tested positive… I won’t get back on that one here today, I keep it for our work. In New Jersey, we got about 2.5% of the entiere population tested, overall,

drawing("NJ")

Let us try a last one, Florida

drawing("FL")

As at today, it is 1.5% of the population, over 6 weeks. Overall, in the U.S. less than 0.1% people are tested, on a daily basis. Which is far from the 1.5% in the U.K. scenarios. Now, here come the second question,

what are we actually testing for ?

On that one, my experience in biology is… very limited, and Paul helped me. He mentioned this morning a nice report, from a lab in UC Berkeley

One of my question was for instance, if you get tested positive, and you do it again, can you test negative ? Or, in the context of our data, do we test different people ? are some people tested on a regular basis (perhaps every week) ? For instance, with antigen tests (Reverse Transcription Quantitative Polymerase Chain Reaction (RT-qPCR) – also called molecular or PCR – Polymerase Chain Reaction – test) we test if someone is infectious, while with antibody test (using serological immunoassays that detect viral-specific antibodies — Immunoglobin M (IgM) and G (IgG) — also called serology test), we test for immunity. Which is rather different…

I have no idea what we have in our database, to be honest… and for the past six weeks, I have seen a lot of databases, and most of the time, I don’t know how to interpret, I don’t know what is measured… and it is scary. So, so far, we try to do some maths, to test dynamics by tuning parameters “the best we can” (and not estimate them). But if anyone has good references on testing, in the context of Covid-19 (for instance on specificity, sensitivity of all those tests) I would love to hear about it !

To leave a comment for the author, please follow the link and comment on their blog: R-english – Freakonometrics.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

R-bloggers

R news and tutorials contributed by hundreds of R bloggers

Testing for Covid-19 in the U.S.

Related

Related

Never miss an update! Subscribe to R-bloggers to receive e-mails with the latest R posts. (You will not see this message again.)

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)