Using the dataRetrieval Stats Service
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Introduction
This script utilizes the new dataRetrieval
package access to the USGS Statistics Web Service. We will be pulling daily mean data using the daily value service in readNWISdata
, and using the stats service data to put it in the context of the site’s history. Here we are retrieving data for July 12th in the Upper Midwest, where a major storm system had recently passed through. You can modify this script to look at other areas and dates simply by modifying the states
and storm.date
objects.
To run this code, we recommend having either dataRetreival
version 2.5.13 (currently the latest release on CRAN) or version 2.6.1 (currently the latest Github release).
Get the data
There are two separate dataRetrieval
calls here — one to retrieve the daily discharge data, and one to retrieve the historical discharge statistics. Both calls are inside loops to split them into smaller pieces, to accomodate web service restrictions. The daily values service allows only single states as a filter, so we loop over the list of states. The stats service does not allow requests of more than ten sites, so the loop iterates by groups of ten site codes. Retrieving the data can take a few tens of seconds. Once we have both the daily value and statistics data, the two data frames are joined by site number via dplyr’s left_join
function. We use a pipe to send the output of the join to na.omit()
function. Then we add a column to the final data frame to hold the color value for each station.
#example stats service map, comparing real-time current discharge to history for each site #reusable for other state(s) #David Watkins June 2016 library(maps) library(dplyr) library(lubridate) library(dataRetrieval) #pick state(s) and date states <- c("WI","MN","ND","SD","IA") storm.date <- "2016-07-12" #download each state individually for(st in states){ stDV <- renameNWISColumns(readNWISdata(service="dv", parameterCd="00060", stateCd = st, startDate = storm.date, endDate = storm.date)) if(st != states[1]){ storm.data <- full_join(storm.data,stDV) sites <- full_join(sites, attr(stDV, "siteInfo")) } else { storm.data <- stDV sites <- attr(stDV, "siteInfo") } } #retrieve stats data, dealing with 10 site limit to stat service requests reqBks <- seq(1,nrow(sites),by=10) statData <- data.frame() for(i in reqBks) { getSites <- sites$site_no[i:(i+9)] currentSites <- readNWISstat(siteNumbers = getSites, parameterCd = "00060", statReportType="daily", statType=c("p10","p25","p50","p75","p90","mean")) statData <- rbind(statData,currentSites) } statData.storm <- statData[statData$month_nu == month(storm.date) & statData$day_nu == day(storm.date),] finalJoin <- left_join(storm.data,statData.storm) finalJoin <- left_join(finalJoin,sites) finalJoin[,grep("_va",names(finalJoin))] <- sapply(finalJoin[,grep("_va",names(finalJoin))], function(x) as.numeric(x)) #remove sites without current data finalJoin <- finalJoin[!is.na(finalJoin$Flow),] #classify current discharge values finalJoin$class <- NA finalJoin$class[finalJoin$Flow > finalJoin$p75_va] <- "navy" finalJoin$class[finalJoin$Flow < finalJoin$p25_va] <- "red" finalJoin$class[finalJoin$Flow > finalJoin$p25_va & finalJoin$Flow <= finalJoin$p50_va] <- "green" finalJoin$class[finalJoin$Flow > finalJoin$p50_va & finalJoin$Flow <= finalJoin$p75_va] <- "blue" finalJoin$class[is.na(finalJoin$class) & finalJoin$Flow > finalJoin$p50_va] <- "cyan" finalJoin$class[is.na(finalJoin$class) & finalJoin$Flow < finalJoin$p50_va] <- "yellow" #take a look at the columns that we will plot later: head(finalJoin[,c("dec_lon_va","dec_lat_va","class")]) ## dec_lon_va dec_lat_va class ## 1 -92.09389 46.63333 navy ## 2 -91.59528 46.53778 navy ## 3 -90.96324 46.59439 navy ## 4 -90.59000 46.39472 navy ## 5 -90.69630 46.48661 navy ## 6 -90.90417 46.49722 navy
Make the static plot
The base map consists of two plots. The first makes the county lines with a gray background, and the second overlays the heavier state lines. After that we add the points for each stream gage, colored by the column we added to finalJoin
. In the finishing details, grconvertXY
is a handy function that converts your inputs from a normalized (0-1) coordinate system to the actual map coordinates, which allows the legend and scale to stay in the same relative location on different maps.
#convert states from postal codes to full names states <- stateCdLookup(states, outputType = "fullName") par(pty="s") map('county',regions=states,fill=TRUE, col="gray87", lwd=0.5) map('state',regions=states,fill=FALSE, lwd=2, add=TRUE) points(finalJoin$dec_lon_va, finalJoin$dec_lat_va, col=finalJoin$class, pch=19) title(paste("Daily discharge value percentile rank\n",storm.date),line=1) par(mar=c(5.1, 4.1, 4.1, 6), xpd=TRUE) legend.colors <- c("cyan","yellow", "red", "green","blue", "navy") legend.names <- c("Q > P50*","Q < P50*", "Q < P25", "P25 < Q < P50","P50 < Q < P75", "Q > P75") legend("bottomleft",inset=c(0.01,.01), legend=legend.names, pch=19,cex = 0.75,pt.cex = 1.2, col = legend.colors, ncol = 2) map.scale(ratio=FALSE,cex = 0.75, grconvertX(.07,"npc"), grconvertY(.2, "npc")) text("*Other percentiles not available for these sites", cex=0.75, x=grconvertX(0.2,"npc"), y=grconvertY(-0.08, "npc"))
Make an interactive plot
Static maps are great for papers and presentations. When possible, interactive maps allow the reader more flexibility to examine the data. The R leaflet
package makes it easy to create useful interactive maps:
library(leaflet) finalJoin$popup <- with(finalJoin, paste("<b>",station_nm,"</b></br>", "Measured Flow:",Flow,"ft3/s</br>", "25% historical:",p25_va,"ft3/s</br>", "50% historical:",p50_va,"ft3/s</br>", "75% historical:",p75_va,"ft3/s")) leafMapStat <- leaflet(data=finalJoin) %>% addProviderTiles("CartoDB.Positron") %>% addCircleMarkers(~dec_lon_va,~dec_lat_va, color = ~class, radius=3, stroke=FALSE, fillOpacity = 0.8, opacity = 0.8, popup=~popup) leafMapStat <- addLegend(leafMapStat, position = 'bottomleft', colors= legend.colors, labels= legend.names, opacity = 0.8)
Disclaimer: The NWIS stats web service that dataRetrieval
accesses here is in beta, and its output could change in the future.
Questions
Please direct any questions or comments on dataRetrieval
to: https://github.com/USGS-R/dataRetrieval/issues
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.