rakeR v0.1.1 released on CRAN

September 19, 2016
By

(This article was first published on rstats – philmikejones.me, and kindly contributed to R-bloggers)

I’m proud to announce the initial release of rakeR, v0.1.1, has been published on CRAN! It’s licensed under the GPLv3 so you can use it for any projects you wish.

Purpose

The goal behind rakeR is to make performing spatial microsimulation in R as easy as possible. R is a succinct and expressive language, but previously performing spatial microsimulation required multiple stages, including weighting, integerising, expanding, and subsetting. This doesn’t even include testing inputs and outputs, and validation of the results. To make matters worse, each stage of the microsimulation required the input data to be in a slightly different format, adding to the workload of the analyst and the complexity of the task, introducing multiple opportunities for errors to creep in.

rakeR reduces this complexity, risk and time needed by:

  • Performing the data re-formatting for each stage silently and automatically: no manual processing of data between stages.
  • Accepting standardised arguments. The core functions accept either: two data frames and a character vector of variables to constrain over; or a table of weights.
  • Robust error checking. The core functions are designed to be strict and not to try to correct any inputs. This means getting the inputs in the right format can sometimes be tricky, but because of this you can be confident in the results. There are also functions to help you check your inputs to make this as pain-free as possible.

Installation

Install the stable version from CRAN:

install.packages("rakeR")

Alternatively install the development version with devtools:

# Obtain devtools if you don't already have it installed
# install.packages("devtools")

# Install rakeR development version from GitHub
devtools::install_github("philmikejones/rakeR")

Load the package with:

library("rakeR")
#> 
#> Attaching package: 'rakeR'
#> The following object is masked from 'package:stats':
#> 
#>     simulate

Usage

To perform the raking you should supply two data frames, one with the constraint information with counts per category for each zone (e.g. census counts) and one with individual–level data (i.e. one row per individual). In addition supply a character vector with constraint variable names.

cons <- data.frame(
  "zone"   = letters[1:3],
  "a0_49"  = c(8, 2, 7),
  "a_gt50" = c(4, 8, 4),
  "f"      = c(6, 6, 8),
  "m"      = c(6, 4, 3)
)

inds <- data.frame(
  "id"     = LETTERS[1:5],
  "age"    = c("a_gt50", "a_gt50", "a0_49", "a_gt50", "a0_49"),
  "sex"    = c("m", "m", "m", "f", "f"),
  "income" = c(2868, 2474, 2231, 3152, 2473),
  stringsAsFactors = FALSE
)

vars <- c("age", "sex")
  • (Re-)weighting is done with weight() which returns a data frame of fractional weights.
  • Integerisation is performed with integerise() which returns a data frame of integerised weights.
  • simulate() takes care of creating the final microsimulated data and returns a data frame of simulated cases in zones.

These functions can be combined with pipes:

# obtain magrittr if not already installed
# install.packages("magrittr")
library("magrittr")

sim_df <- weight(cons, inds, vars) %>% integerise() %>% simulate(inds = inds)
head(sim_df)
#>     id    age sex income zone
#> 1    A a_gt50   m   2868    a
#> 2    B a_gt50   m   2474    a
#> 3    C  a0_49   m   2231    a
#> 3.1  C  a0_49   m   2231    a
#> 3.2  C  a0_49   m   2231    a
#> 3.3  C  a0_49   m   2231    a

Alternatively use the rake() function, which is a wrapper for weight() %>% integerise() %>% simulate():

sim_df <- rake(cons, inds, vars)
head(sim_df)
#>     id    age sex income zone
#> 1    A a_gt50   m   2868    a
#> 2    B a_gt50   m   2474    a
#> 3    C  a0_49   m   2231    a
#> 3.1  C  a0_49   m   2231    a
#> 3.2  C  a0_49   m   2231    a
#> 3.3  C  a0_49   m   2231    a

Contributing

All software is a work in progress, and rakeR is no exception. Feedback, comments, and suggestions are very welcome, as are bug/issue reports, and pull requests.

Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.

Acknowledgements

Many of the functions in this package are based on code written by Robin Lovelace and Morgane Dumont for their book Spatial Microsimulation with R (2016), Chapman and Hall/CRC Press, licensed under the terms below:

Copyright (c) 2014 Robin Lovelace

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Their book is also an excellent resource for learning about spatial microsimulation and understanding what’s going on under the hood of this package.

The rewighting (ipfp) algorithm itself is written by Andrew Blocker and is written in C for maximum speed and efficiency.

Thanks to Tom Broomhead for his feedback on error handling and suggestions on function naming.

To leave a comment for the author, please follow the link and comment on their blog: rstats – philmikejones.me.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)