The Flint River in Flint, Michigan, USA, in the late 1970s. By U.S. Army Corps of Engineers, photographer unknown via Wikimedia Commons
IntroductionAs many have heard recently residents of Flint Michigan have been rightly outraged due to the high presence of toxic chemicals including lead in their drinking water. The question arises how did this occur and was it a forseeable incident? The backstory that led up to this incident can be generalized into a few main chapters.
- Flint had long sourced their water from the Detroit Water and Sewerage Department (DWSD)
- The city had financial incentive to reduce spending because they were under financial stress
- Flint went into an agreement with the Karegnondi Water Authority (KWA) and their to be completed source from Lake Huron(end of 2016)
- The existing supplier DWSD provided their 12 month notice that their supply contract would end on April 2014
- The flint river was relied on to supply water in the interim
HypothesisIn this analysis we will be exploring data from the US Geological Water Quality Survey to analyse the Flint incident starting at the source or pre-treated water as well as nearby streams in Detroit and near Lake Huron. It is not meant to serve as conclusive evidence of any kind. We will be looking specifically at chloride concentrations to see if Flint has very corrosive water to begin with. Before we begin let’s check for and install any necessary packages for this story
setwd("~/DSTribune/Stories/FlintWaterQuality") library(ggplot2) library(dplyr) library(xtable)This is really a story of three water sources and three counties. Flint located in Genesee County originally sourced their water from the Detroit Water and Sewage Department which sources its water from multiple rivers including the Detroit River as well as Lake Huron. The Detroit River is located in Wayne County and Lake Huron intake in Sanilac County. The KWA plant that was to replace the more expensive DWSD water is still under construction and will source its water from Lake Huron. We will download fresh water data to reflect the Detroit River and Flint Rivers and merge them into one data frame. Our goal is to understand relative untreated corrosivity between the two rivers and with the hypothesis that Flint’s water might be more initially corrosive.
#Genesee (Flint) temp <- tempfile() download.file("http://waterqualitydata.us/Result/search?countrycode=US&statecode=US%3A26&countycode=US%3A26%3A049&sampleMedia=Water&characteristicType=Inorganics%2C+Major%2C+Non-metals&characteristicName=Chloride&mimeType=csv&zip=yes&sorted=no", temp) wqGen<- read.csv(unz(temp, "result.csv")) wqGen$County = "Genesee" #Wayne (Detroit River) temp <- tempfile() download.file("http://waterqualitydata.us/Result/search?countrycode=US&statecode=US%3A26&countycode=US%3A26%3A163&sampleMedia=Water&characteristicType=Inorganics%2C+Major%2C+Non-metals&characteristicName=Chloride&mimeType=csv&zip=yes&sorted=no", temp) wqWayne<- read.csv(unz(temp, "result.csv")) wqWayne$County = "Wayne" #Merge the three County Water Measurements wqDf <- rbind(wqGen, wqWayne) #Save an offline version of the merged county water data write.csv(wqDf, file ="MI3CountyCountyWaterData.csv")We filtered our data for high quality measurements only taken at the surface. We specifically collected data on dissolved chloride concentrations because chloride ions are the key element in contributing to the corrosion in Flint pipes leading the leaching of metals such as lead. In the second half of this story we will also cover how the addition of chlorine escalated chloride concentrations but for now we will focus on pre-treatment water quality.
wqDf <- filter(wqDf, ActivityMediaSubdivisionName == "Surface Water", ResultSampleFractionText == 'Dissolved', ResultStatusIdentifier == 'Accepted' | ResultStatusIdentifier == 'Final' | ResultStatusIdentifier == 'Historical') wqDf$MonitoringLocationIdentifier <- as.character(wqDf$MonitoringLocationIdentifier) wqDf$ActivityStartDate <- as.POSIXct(wqDf$ActivityStartDate) wqDf <- wqDf %>% filter(ResultMeasureValue != "NA")We now would like to see if there is a significant difference in pre-treated chloride concentrations between the two counties.
#What we want is a percentage of samples binned by concentration percentConc<- wqDf %>% group_by(County) %>% summarise(Avg = mean(ResultMeasureValue, na.rm = TRUE), Max = max(ResultMeasureValue, na.rm = TRUE), Median = median(ResultMeasureValue, na.rm = TRUE), LatestSample = max(ActivityStartDate, na.rm = TRUE), totalSamples = n(), stdError = sd(ResultMeasureValue, na.rm = TRUE)) percentConc$min <- percentConc$Avg - percentConc$stdError percentConc$max <- percentConc$Avg + percentConc$stdError plot1 <- ggplot(percentConc, aes(x=County)) plot1 <- plot1 + geom_errorbar(aes(ymin=min,ymax=max),data=percentConc,width = 0.5) plot1 <- plot1 + geom_boxplot(aes(y=Avg)) plot1 <- plot1 + ggtitle("Surface Water Chloride Concentrations n in Genesse and Wayne County MI (USGS)") + ylab("Average Chloride Concentration") plot1On first glance it appears that Genesee County overall has a higher concentration of chloride in the surface water. Let’s see if this is statistically significant or not as their is overlap in the standard error.
Gen <- filter(wqDf, County == "Genesee") Way <- filter(wqDf, County == "Wayne") Gen_Way <- t.test(Gen$ResultMeasureValue, Way$ResultMeasureValue, alternative=c("greater")) Gen_Way$p.value ##  1.04371e-05The p-value for this t-test shows that Genesee County has a significantly greater chloride concentration in its surface water compared to Dwayne county. Remember Dwayne county contains the Detroit River one of the sources of water that Flint was originally obtaining its water from before switching. Fig 1: Looking at just summary of all water reading in both counties
tapply(wqDf$ResultMeasureValue, wqDf$County, median) ## Genesee Wayne ## 21.0 8.5Turns out the median was nowhere near the mean. The median shows Genesee County having a chloride concentration of 21.0 mg/l and Wayne with a 8.5 mg/l concentration. Genesee County has almost 3X the pre-treatment or initial chloride concentration compared to Wayne county. The discrepancy between the median and mean could be outliers or a non-normal distribution. If my experience has taught me any thing in these circumstances I need to see the full distribution and see what is happening here.
ggplot(wqDf, aes(x = ResultMeasureValue, fill = County)) + geom_density(alpha = 0.3) + ggtitle("Density of Chloride Concentrations n Genesee and Wayne County Surface Water") + xlab("[Chloride] (mg/l)") + ylab("Frequency")That distribution sure doesn’t look normal. It appears Wayne county has a lot of samples with low concentrations of chloride. It could be that one sampling site has so many samples that it is warping the mean and median. Perhaps what we should be doing is collecting an average by sample site and looking at the distribution of sample site averages.
percentConc<- wqDf %>% group_by(MonitoringLocationIdentifier, County) %>% summarise(Avg = mean(ResultMeasureValue, na.rm = TRUE), Max = max(ResultMeasureValue, na.rm = TRUE), Median = median(ResultMeasureValue, na.rm = TRUE), LatestSample = max(ActivityStartDate, na.rm = TRUE), totalSamples = n(), stdError = sd(ResultMeasureValue, na.rm = TRUE)) tapply(percentConc$Median, percentConc$County, mean) ## Genesee Wayne ## 33.2125 115.5000At first it appeared as though Genesee County had significantly higher concentrations of Chloride than Wayne County. However once we aggregated median concentrations by Site and aggregated by County it appears that Wayne County has 5X the amount of chloride in its surface water. To put this to rest we will conduct one more filter to remove sites with less than 3 samples to remove possible outlier measurements at unique sites. Remember running even one water sample requires multiple labs, USGS employees sampling at a site, and tens of thousands of dollars. So 3 samples is a big deal in this world (I should know I used to sample and analyze water for 4 years for the US Geological Survey)
HighSampleSizePercentConc <- filter(percentConc, totalSamples >= 3) tapply(percentConc$Median, percentConc$County, mean) ## Genesee Wayne ## 33.2125 115.5000