Advanced Raster Data: Exercises

[This article was first published on R-exercises, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Geospatial data is becoming increasingly used to solve numerous ‘real-life’ problems (check out some examples here.) In turn, R is becoming a powerful open-source solution to handle this type of data, currently providing an exceptional range of functions and tools for GIS and Remote Sensing data analysis.

In particular, raster data provides support for representing spatial phenomena by diving the surface into a grid (or matrix) composed of cells of regular size. Each raster data-set has a certain number of columns and rows and each cell contains a value with information for the variable of interest. Stored data can be either: (i) thematic – representing a discrete variable, (ex. land cover classification map) or continuous (ex. elevation).

The raster package currently provides an extensive set of functions to create, read, export, manipulate and process raster data-sets. It also provides low-level functionalities for creating more advanced processing chains, as well as the ability to manage large data-sets. For more information, see: vignette("functions", package = "raster"). You can also check more about raster data on the tutorial series about this topic here.

In this exercise set, we will explore the following topics in raster data processing and geostatistical analysis (previously discussed in this tutorial series):

  • Unsupervised classification/clustering of satellite data
  • Regression-kriging (RK)

We will also address how to use the package RSToolbox (link) to calculate the:

  • Tasseled Cap Transformation (TCT)
  • PCA rotation/transformation

Both data compression techniques examined here will use spectral data from satellite imagery.

Answers to these exercises are available here.


Exercise 1

Use the data in this link (Landsat-8 surface reflectance data bands 1-7, for Peneda-Geres National Park – PGNP, NW Portugal) to answer the next exercises (1 to 6). Download the data, uncompress and create a raster brick. How many pixels and layers does the data have?

Exercise 2

Make an RGB plot with bands 5, 1, and 3 with linear stretching.

Exercise 3

Using k-means algorithm performs an unsupervised classification/clustering of the data with 5 clusters.

Exercise 4

Use the CLARA algorithm (package cluster) to perform an unsupervised classification/clustering of the data with 5 clusters and Euclidean distance.

Exercise 5

Using package RStoolbox, calculate the Tasseled Cap Transformation of the data (remember it is Landsat-8 data with bands 1-7).

Exercise 6

Using package RStoolbox, calculate the standardized PCA transform. What is the cumulative % of explained variance in the three first components?

Exercise 7

  1. Use the data in this link to answer the next exercises (annual average temperature for weather stations in Portugal; col AvgTemp). Using Lat and Lon columns from the clim_data_pt.csv table, create a SpatialPointsDataFrame object with CRS WGS 1984.
  2. Using Ordinary Kriging from package gstat, interpolate temperature values employing a Spherical empirical variogram. Calculate the RMSE from 5-fold cross-validation (see function and use the set.seed(12345).

Exercise 8

Using the previous question rationale, experiment now with an Exponential model. Calculate RMSE also from 5-fold CV. Which one was the best model according to RMSE?

Exercise 9

Using the cubist regression algorithm (package Cubist), predict the based AvgTempon latitude (Lat), elevation (column Elev) and distance to the coastline (column distCoast). Calculate the RMSE for a random test set of 15 observations. Use the set.seed(12345).

Exercise 10

From the previous exercise, extract the train residuals and interpolate them. Following a Regression-kriging approach, add the interpolated residuals and the regression results. Calculate the RMSE for the test set (defined in E9) and check if this improves the modeling performance any further.

To leave a comment for the author, please follow the link and comment on their blog: R-exercises. offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)