In this third post on Measles data I want to have a look at some high incidence occasions. As described before, the data is from Project Tycho, which contains data from all weekly notifiable disease reports for the United States dating back to 1888. These data are freely available to anybody interested.
Data reading follows the posting Looking at Measles Data in Project Tycho, part II. In the plot there, some data over 10 seemed to be displayed, which converts to 10 persons per 1000 in a week.
r6 <- r5[complete.cases(r5),]
YEAR abb WEEK Cases State pop incidence
49841 1939 MT 19 5860 Montana 555 10.55856
51076 1939 WY 17 3338 Wyoming 248 13.45968
51090 1939 WY 18 2509 Wyoming 248 10.11694
Indeed in 1939 three values are over 10. I have always thought you could only catch measles once, so this suggests a number of years with hardly measles must have occurred before.
To have a decent plot I need a decent time variable.
Quick and dirty
My quick and dirty approach was to add a small fraction for weeks:
r6$time <- r6$YEAR+
Create a date using Formatting
After reading the post Date formating in R I tried a different approach. According to the manual:
Week of the year as decimal number (00–53) using Sunday as the first day 1 of the week (and typically with the first Sunday of the year as day 1 of week 1). The US convention.
Unfortunately for me that did not work out for reading data:
> as.Date(’02 1929′,format=’%U %Y’)
20th of April is the day I am running the code, not the second week of 1929. It seems the %U is ignored in the R I compiled.
Reconstructing a date
I thought to start at the correct date and just add 7 for each week:
uu <- unique(data.frame(YEAR=r5$YEAR,WEEK=r5$WEEK))
uu <- uu[order(uu$YEAR,uu$WEEK),]
uu$Date <- as.Date(’01-01-1928′,
r7 <- merge(r6,uu)
YEAR WEEK Date
112196 1970 47 1970-11-25
112197 1970 48 1970-12-02
112177 1970 49 1970-12-09
112183 1970 50 1970-12-16
112176 1970 51 1970-12-23
112191 1970 52 1970-12-30
Note that I cannot confirm the correct date. Second day of 1963 formats to week 0, which does not match my data. The year is correct though.
format(as.Date(‘1963-01-02′),format=’%d %b %Y week: %U’)
 “02 Jan 1963 week: 00″
The plot is at this point easy.
ggplot(r7[r7$State %in% c(‘Wyoming’,’Montana’) &
aes(Date, incidence,group=State,colour=State)) +
ylab(‘Incidence registered Measles Cases per week per 1000′) +
Indeed the years before 1939 have lower incidence of measles. What surprised me, the first years after 1939 also have less incidence.
Willem G. van Panhuis, John Grefenstette, Su Yon Jung, Nian Shong Chok, Anne Cross, Heather Eng, Bruce Y Lee, Vladimir Zadorozhny, Shawn Brown, Derek Cummings, Donald S. Burke. Contagious Diseases in the United States from 1888 to the present. NEJM 2013; 369(22): 2152-2158.