High incidence in Measles Data in Project Tycho

April 21, 2014

(This article was first published on Wiekvoet, and kindly contributed to R-bloggers)

In this third post on Measles data I want to have a look at some high incidence occasions. As described before, the data is from Project Tycho, which contains data from all weekly notifiable disease reports for the United States dating back to 1888. These data are freely available to anybody interested.


Data reading follows the posting Looking at Measles Data in Project Tycho, part II. In the plot there, some data over 10 seemed to be displayed, which converts to 10 persons per 1000 in a week.
r6 <- r5[complete.cases(r5),]

      YEAR abb WEEK Cases   State pop incidence
49841 1939  MT   19  5860 Montana 555  10.55856
51076 1939  WY   17  3338 Wyoming 248  13.45968
51090 1939  WY   18  2509 Wyoming 248  10.11694

Indeed in 1939 three values are over 10. I have always thought you could only catch measles once, so this suggests a number of years with hardly measles must have occurred before.


To have a decent plot I need a decent time variable.

Quick and dirty

My quick and dirty approach was to add a small fraction for weeks:
r6$time <- r6$YEAR+

Create a date using Formatting

After reading the post Date formating in R I tried a different approach. According to the manual:
Week of the year as decimal number (00–53) using Sunday as the first day 1 of the week (and typically with the first Sunday of the year as day 1 of week 1). The US convention.

Unfortunately for me that did not work out for reading data:
> as.Date(’02 1929′,format=’%U %Y’)
[1] “1929-04-20″

20th of April is the day I am running the code, not the second week of 1929. It seems the %U is ignored in the R I compiled.

Reconstructing a date

I thought to start at the correct date and just add 7 for each week:
uu <- unique(data.frame(YEAR=r5$YEAR,WEEK=r5$WEEK))
uu <- uu[order(uu$YEAR,uu$WEEK),]
uu$Date <- as.Date(’01-01-1928′,

r7 <- merge(r6,uu)

       YEAR WEEK       Date
112196 1970   47 1970-11-25
112197 1970   48 1970-12-02
112177 1970   49 1970-12-09
112183 1970   50 1970-12-16
112176 1970   51 1970-12-23
112191 1970   52 1970-12-30

Note that I cannot confirm the correct date. Second day of 1963 formats to week 0, which does not match my data. The year is correct though.
format(as.Date(‘1963-01-02′),format=’%d %b %Y week: %U’)
[1] “02 Jan 1963 week: 00″


The plot is at this point easy.
ggplot(r7[r7$State %in% c(‘Wyoming’,’Montana’) &
                r7$YEAR<1945 &
                r7$YEAR>1930 &
                r7$incidence >0,],
        aes(Date, incidence,group=State,colour=State)) +
    ylab(‘Incidence registered Measles Cases per week per 1000′) +
    theme(text=element_text(family=’Arial’)) +
    geom_line() +

Indeed the years before 1939 have lower incidence of measles. What surprised me, the first years after 1939 also have less incidence.


Willem G. van Panhuis, John Grefenstette, Su Yon Jung, Nian Shong Chok, Anne Cross, Heather Eng, Bruce Y Lee, Vladimir Zadorozhny, Shawn Brown, Derek Cummings, Donald S. Burke. Contagious Diseases in the United States from 1888 to the present. NEJM 2013; 369(22): 2152-2158.


Completely unrelated, but if you are living in Amsterdam foto expo “Do you see me?” in Cafe Restaurant Nel might be worth a visit.

To leave a comment for the author, please follow the link and comment on their blog: Wiekvoet.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.


Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)