# MCHT, Closures, and R Environments: Making MCHTest Objects Self-Contained

**R – Curtis Miller's Personal Website**, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

## Introduction

Last week I announced the first release of **MCHT**, an R package that facilitates bootstrap and Monte Carlo hypothesis testing. In this article, I will elaborate on some important technical details about making `MCHTest`

objects, explaining in the process how closures and R environments work.

To recap, last week I made a basic `MCHTest`

-class object. These are S3-class objects; really they are just functions with a `class`

attribute. All the work is done in the initial function call creating the object. But there’s more to the story.

We want these objects to be self-contained. Specifically, we don’t want changes in the global namespace to change how a `MCHTest`

object behaves. By default, these objects are *not* self-contained and a programmer who isn’t careful can accidentally break these objects. Here I explain how to prefent this from happening.

## Closures and Environments

I highly recommend those who want to learn more about closures and environments read [1], but I will briefly explain these critical concepts here.

A closure is a function created by another function. `MCHTest`

objects are closures, functions created by `MCHTest()`

(then given a `class`

attribute). An environment is an R object where other R objects are effectively defined. For example, there is the global environment where most R objects created by users live.

environment() ## <environment: R_GlobalEnv> globalenv() ## <environment: R_GlobalEnv>

Ever wonder why a variable defined inside a function doesn’t affect anything outside of that function and why it simply disappears? It’s because when a function is called, a new environment is created, and all assignments within the function are done within that new environment. We can see this occuring with some clever use of `print()`

.

x <- 2 u <- function() { x } u() ## [1] 2 f <- function() { x <- 1 function() { x } } g <- f() g() ## [1] 1 environment(g) ## <environment: 0x9c45d78> environment(u) ## <environment: R_GlobalEnv> parent.env(environment(g)) ## <environment: R_GlobalEnv>

`u()`

is a function and lives in the global environment so it looks for variables in the global environment. `g()`

, however, lives in an environment created by `f()`

. Normally, when a function creates an environment, it disappears the moment the function finishes execution. Closures, however, still use that environment created by the function, so the environment doesn’t disappear when the function finishes execution.

When a function looks for an object, it first looks for that object in its environment. If it doesn’t find the object there, it looks for the object in the parent environment of its environment. It will continue this process until it either finds the object or discovers that none of its environment’s ancestors has the object (prompting an error).

This means that the function is sensitive to changes in its environment or its environment’s ancestors, as we see here:

x <- 3 h <- function() { function() { x } } u() ## [1] 3 j <- h() environment(j) ## <environment: 0xa6e7cb4> parent.env(environment(j)) ## <environment: R_GlobalEnv> j() ## [1] 3

## Side Effects and MCHTest objects

One of R’s attractive features is that it promotes a style of programming that discourages side effects, where changes to one object doesn’t change the behavior of another. But the examples above show how closures can suffer side effects when objects in the global namespace are changed. The closures created above depend on the global environment in surprising ways for those not familiar with how environments in R work.

By default, `MCHTest`

objects can suffer from these side effects, and they can creep in if the functions passed to the parameters of `MCHTest()`

are carelessly defined, as we see below. (The tests being defined are effectively Monte Carlo -tests; learn about the -test here.)

library(MCHT) ## .------..------..------..------. ## |M.--. ||C.--. ||H.--. ||T.--. | ## | (\/) || :/\: || :/\: || :/\: | ## | :\/: || :\/: || (__) || (__) | ## | '--'M|| '--'C|| '--'H|| '--'T| ## `------'`------'`------'`------' v. 0.1.0 ## Type citation("MCHT") for citing this R package in publications library(doParallel) registerDoParallel(detectCores()) ts <- function(x, sigma = 1) { sqrt(length(x)) * mean(x)/sigma # z-test for mean = 0 } sg <- function(x, sigma = 1) { x <- sigma * x ts(x, sigma = sigma) # unsafe } unsafe.test.1 <- MCHTest(ts, sg, rnorm, seed = 100, N = 100, fixed_params = "sigma") unsafe.test.1(rnorm(10)) ## ## Monte Carlo Test ## ## data: rnorm(10) ## S = 1.1972, sigma = 1, p-value = 0.15 ts <- function(x) { sqrt(length(x)) * mean(x) # Effective make sigma = 1 } sg <- function(x) { ts(x) # again, unsafe } unsafe.test.2 <- MCHTest(ts, sg, rnorm, seed = 100, N = 100) unsafe.test.2(rnorm(10)) ## ## Monte Carlo Test ## ## data: rnorm(10) ## S = 0.22926, p-value = 0.46 # ERROR unsafe.test.1(rnorm(10)) ## Error in {: task 1 failed - "unused argument (sigma = sigma)"

What happened? Let’s pick it apart by looking at the `stat_gen`

parameter of `unsafe.test.1()`

.

get_MCHTest_settings(unsafe.test.1)$stat_gen ## function(x, sigma = 1) { ## x <- sigma * x ## ts(x, sigma = sigma) # unsafe ## }

This function depends on an object called `ts()`

. When the function looks for `ts()`

, it looks *in the global namespace!* This means that changes to `ts()`

in that namespace will change the behavior of the function. The most recent version of `ts()`

does not have a parameter called `sigma`

, prompting an error. *The object is not self-contained!*

## Making MCHTest Objects Self-Contained

How can we prevent side effects like this? One answer is to define the functions passed to `MCHTest()`

in a way that doesn’t depend on objects defined in the global namespace. For example, we would not call `ts()`

in `sg()`

above but instead rewrite the test statistic as we defined it in `ts()`

. (Using functions and objects defined in packages is okay, though, since these generally don’t change in an R session.)

However, this is not always practical. The test statistic written in `ts()`

could be complicated, and writing that same statistic again would not only be a lot of work but be tempting bugs to invade. Fortunately, `MCHTest()`

supports methods for making `MCHTest`

objects self-contained.

The first step is to set the `localize_functions`

parameter to `TRUE`

. This changes the environment of the `test_stat`

`stat_gen`

, `rand_gen`

, and `pval_func`

functions so that they belong to the environment the `MCHTest`

object lives in. Not only does this help make the function self-contained we may even be able to write our inputs in a more idiomatic way, like so:

ts <- function(x, sigma = 1) { sqrt(length(x)) * mean(x)/sigma } sg <- function(x, sigma = 1) { x <- sigma * x test_stat(x, sigma = 1) # Would not be able to do this if localize_functions # were FALSE } safe.test.1 <- MCHTest(ts, sg, function(n) {rnorm(n)}, seed = 100, N = 100, fixed_params = "sigma", localize_functions = TRUE) safe.test.1(rnorm(10)) ## ## Monte Carlo Test ## ## data: rnorm(10) ## S = 2.0277, sigma = 1, p-value = 0.02 ts <- function(x) { sqrt(length(x)) * mean(x) # Effective make sigma = 1 } sg <- function(x) { ts(x) } safe.test.1(rnorm(10)) # Still works ## ## Monte Carlo Test ## ## data: rnorm(10) ## S = 1.0038, sigma = 1, p-value = 0.21

(Notice how `rand_gen`

was handled; it was wrapped in a function rather than passed directly. In short, this is to prevent the function `rnorm`

from being stripped of its namespace, since it needs functions from that namespace.)

This is the first step to removing side effects. (In fact it makes our functions better written since we can anticipate the existence of `test_stat`

as a function). However, we could still have variables or functions defined outside of our input functions. We can expose these functions to our localized input functions via the `imported_objects`

parameter, a list (the doppleganger of R’s environments) containing these objects.

ts <- function(x, sigma = 1) { sqrt(length(x)) * mean(x)/sigma } sg <- function(x, sigma = 1) { x <- sigma * x ts(x) # We're going to do this safely now } safe.test.2 <- MCHTest(ts, sg, function(n) {rnorm(n)}, seed = 100, N = 100, fixed_params = "sigma", localize_functions = TRUE, imported_objects = list("ts" = ts)) safe.test.2(rnorm(10)) ## ## Monte Carlo Test ## ## data: rnorm(10) ## S = 0.57274, sigma = 1, p-value = 0.39 ts <- function(x) { sqrt(length(x)) * mean(x) # Effective make sigma = 1 } sg <- function(x) { ts(x) } safe.test.2(rnorm(10)) ## ## Monte Carlo Test ## ## data: rnorm(10) ## S = 0.24935, sigma = 1, p-value = 0.45

Both `safe.test.1()`

and `safe.test.2()`

are now immune to changes in the global namespace. They are self-contained and thus safe to use.

## Conclusion

By default, `localize_functions`

is `FALSE`

. I thought of making it `TRUE`

by default but I feared that those not familiar with the concept of environments would be bewildered by all the errors that would be thrown whenever they tried to use a function they defined. Setting the parameter to `TRUE`

makes using `MCHTest()`

more difficult.

That said, I highly recommend using the parameter in a longer script. It makes the function safer (errors are good when they’re enforcing safety), so become acquainted with it.

(Next post: maximized Monte Carlo hypothesis testing)

## References

- H. Wickham,
*Advanced R*(2015), CRC Press, Boca Raton

Packt Publishing published a book for me entitled *Hands-On Data Analysis with NumPy and Pandas*, a book based on my video course *Unpacking NumPy and Pandas*. This book covers the basics of setting up a Python environment for data analysis with Anaconda, using Jupyter notebooks, and using NumPy and pandas. If you are starting out using Python for data analysis or know someone who is, please consider buying my book or at least spreading the word about it. You can buy the book directly or purchase a subscription to Mapt and read it there.

If you like my blog and would like to support it, spread the word (if not get a copy yourself)!

**leave a comment**for the author, please follow the link and comment on their blog:

**R – Curtis Miller's Personal Website**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.