# The cdlTools R package

July 15, 2016
By

(This article was first published on MeanMean, and kindly contributed to R-bloggers)

This is a brief tutorial on the cdlTools package developed by Lu Chen and I to download and perform some simple analysis on USDA’s cropland data layer (CDL).
This package is not currently available on CRAN, but is available via github, and can be installed through Hadley Wickham’s devtools package.

The cropland data layer is a set of raster images classified into up-to 256 possible land uses across the coterminous 48 states in the United States. The temporal coverage extends from 2008 to present, with some states having earlier coverage. Details on the classification process can be found in (Boryan et al, 2011). This tutorial will be broken into three parts:

However, it is not always convenient to manually download the data and extract it to a directory.
This task can be easily accomplished through the cdlTools package.

To do this we first need to load the cdlTools package.
Hadley Wickham’s devtools works well for this job.

library(devtools)
install_github('jlisic/cdlTools')
library(cdlTools)


In the next step we want to download the CDL raster images in geotiff format to a local directory from USDA’s cropscape server.
The directory I’m saving these raster images to is the CDL directory in my home folder.
It is important to make sure that this directory exists first.
Note, that the CDL cache may not be populated for 2015, so if this example doesn’t work you may want to change the year range to 2008:2014.
It’s also possible to just manually download the files below through the cropscape web interface and extract them to a directory.

# years to query
years <- 2008:2015
state <- 'maryland'

cdl.raster <- getCDL(state,years,location="~/CDL")


If you already have the data in CDL_YYYY_FF.tif form, where YYYY
is the year and FF is the two digit fips code, you can just use
a file url to define the directory.
Note: This may or maynot work depending on your operating system and method

cdl.raster <- getCDL(state,years,alternativeUrl="file:///Users/jonathanlisic/CDL")


If you want to be fancy and setup an anonymous FTP of HTTP server, you can just download the data to that server.
Then, access it through the alternativeUrl parameter.
This is useful for organizations that may want to conserve bandwidth or improve performance of applications in a network.

cdl.raster <- getCDL(state,years,alternativeUrl="ftp://10.0.1.1/CDL/CDL_Maryland")


library(raster)
plot(cdl.raster$2015)  ## Zonal statistics The gdalUtils package provides several functions offered by the GDAL library. This includes rasterization of vector files. Rasterization of vector files is an effective way to quickly calculate statistics for areal units within a CDL raster image. In this example we are going to quickly calculate corn pixels per county in Maryland. To do this we are first going to use the gdal function gdal_rasterize. It is assumed that this command is in the path of the system it is run on. An alternative is to use the gdalUtils package, but I wrote this function and most of cdlTools before this package existed and haven’t had time to modify rasterExtract. To get the county boundaries we are also going to download some U.S. census county data for 2010. # rasterExtract is a quick function that polygonizes a SpatialPolygonsDataFrame 'x' # to a given raster image extent and resolution 'r' with values specified # by 'variable' in rasterExtract <- function(x,r,variable) { tempTif <- tempfile(fileext=".tif") tempShp <- tempfile(fileext=".shp") # write polygon writeOGR(x,dirname(tempShp),gsub("[.]shp","", basename(tempShp)),driver="ESRI Shapefile",overwrite=TRUE) # use gdalPolygonize if( missing(variable) ) { system( sprintf( "gdal_rasterize -burn 1 -ot UInt32 -a_nodata 0 -a_srs '%s' -tr %f %f -te %f %f %f %f %s %s", projection(r), xres(r), yres(r), xmin(r), ymin(r), xmax(r), ymax(r), tempShp, tempTif) ) } else { system( sprintf( "gdal_rasterize -a '%s' -ot UInt32 -a_nodata 0 -a_srs '%s' -tr %f %f -te %f %f %f %f %s %s", variable, projection(r), xres(r), yres(r), xmin(r), ymin(r), xmax(r), ymax(r), tempShp, tempTif) ) } # create a new raster object r.out <- raster(extent(r), ncols=ncol(r), nrows=nrow(r), crs=projection(r)) values(r.out) <- values(raster(tempTif)) # free up the data unlink(tempTif) unlink(tempShp) return(r.out) } library(rgdal) library(rgeos) # download shape files url <- sprintf("https://www2.census.gov/geo/tiger/TIGER2010/COUNTY/2010/tl_2010_%02d_county10.zip",fips(state)) download.file(url,destfile=sprintf('~/CDL/tl_2010_%02d_county10.zip',fips(state))) #unzip unzip(sprintf('~/CDL/tl_2010_%02d_county10.zip',fips(state)),exdir='~/CDL') #load county <- readOGR(path.expand('~/CDL'),sprintf('tl_2010_%02d_county10',fips(state)))  ## OGR data source with driver: ESRI Shapefile ## Source: "/Users/jonathanlisic/CDL", layer: "tl_2010_24_county10" ## with 24 features ## It has 17 fields  #re-project county <- spTransform(county,projCDL) #rasterize the shape files for maryland counties county.raster <- rasterExtract(county, cdl.raster$2015,'COUNTYFP10')

#plot out our current output to check if things look sane
plot(county.raster)


Finally, we are going to calculate the zonal statistics with the matchCount function in cdlTools. This function takes two raster images and calculates the number of pairs in each image.

library(dplyr)
# we use matchCount to get the matched pairs between the rasters
# the parameter m is set to the maximum value in either raster image
cropByCounty <- matchCount(county.raster,cdl.raster$2015,m=510) cropByCounty <- as.data.frame(cropByCounty) # get all CDL classes that contain corn, available in the cdlTools 'corn' data cornByCounty <- subset(cropByCounty,CDL_2015_24 %in% corn) tmp <- county@data tmp$layer <- as.numeric(as.character(county@data\$COUNTYFP10))
cornByCounty <- left_join(cornByCounty, tmp, by=c("layer"="layer"))
aggregate(count ~ NAME10, data=cornByCounty,sum)

##             NAME10  count
## 1         Allegany   4001
## 2     Anne Arundel  18101
## 3        Baltimore  89781
## 4          Calvert  14012
## 5         Caroline 162192
## 6          Carroll 140647
## 7            Cecil 117161
## 8          Charles  21131
## 9       Dorchester 109068
## 10       Frederick 166194
## 11         Garrett  21036
## 12         Harford 100710
## 13          Howard  28593
## 14            Kent 191487
## 15      Montgomery  40234
## 16 Prince George's  12844
## 17    Queen Anne's 221128
## 18        Somerset  58131
## 19      St. Mary's  34430
## 20          Talbot 123497
## 21      Washington 100252
## 22        Wicomico 104576
## 23       Worcester 154048


## Land cover change

Year-to-year land cover change is now available through the cropscape.
However, the API documentation does not say how to query this new data, nor is it possible to extend to longer lags, e.g. corn-soybeans-corn.

To calculate year-to-year change you first need to have raster images that match. This can be a bit of a problem with the CDL data since some earlier years are at 56 by 56 meters, instead of the current 30 by 30 meters.
To get everything to match, we will use the resample function from the raster package. Note, that this function can take quite a while to run.

# Figure out what needs to be resampled to match our latest year's projection and extent.
cdl.raster.resample <- sapply(
cdl.raster,
function(x) !identical( extent(x), extent(cdl.raster[[as.character(last(years))]]) )
)

# Resample the data, this may take a while.  It is possible to accelerate this operation
# with the snow package.
cdl.raster.resampled <- lapply(
cdl.raster[ cdl.raster.resample ],
function(x) resample( x, cdl.raster[[as.character(last(years))]],method="ngb" )
)
cdl.raster[ cdl.raster.resample ] <- cdl.raster.resampled

# Create raster brick, this also is pretty slow.
cdl.brick <- brick(cdl.raster)


After resampling, we went ahead and put all the images into a raster brick.
We will now use this brick to get the year to year change of all crops.
Because, all these operations do take a while, the resulting csv is provided

# Get year-by-year change in tidy format.
a <- subset(cdl.brick,sprintf('X%d',first(years)))
result <- c()
for(year in years[-1]) {
b <- subset(cdl.brick,sprintf('X%d',year))
d <- matchCount(a,b,m=256)
d <- cbind(d,year)
colnames(d) <- c('In','Out','Pixels','Year')
d.df <- as.data.frame(d)
result <- rbind(result,d.df)
}

write.csv(result,"md-change.csv")


If we are just interested in pixels moving from corn to corn each year, we can simply query the result using dplyr and magrittr.

# assuming we are starting from scratch
library(cdlTools)
library(dplyr)
library(magrittr)

mdChangeCorn <- mdChange %>%
subset(In %in% corn & Out %in% corn) %>%
group_by(Year) %>% summarize(total = sum(Pixels))


## References

Boryan, Claire, et al. “Monitoring US agriculture: the US department of agriculture, national agricultural statistics service, cropland data layer program.” Geocarto International 26.5 (2011): 341-358.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...