Infinite generators in R

[This article was first published on Cartesian Faith » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

This is first in a series of posts about creating simulations in R. As a foundational discussion, I first look at generators and how to create them in R. Note: If you are following along, all the examples rely on lambda.r, so be sure to have that installed (from CRAN) first. If you are not familiar with lambda.r, you can read the introduction.

Put simply a generator returns a function that can produce a sequence. They are common in Python and are defined as functions that “allow you to declare a function that behaves like an iterator, i.e. it can be used in a for loop.” [1]. In R for loops are generally avoided in favor of functional approaches that use the *apply suite of higher order functions. Generators are still useful under this paradigm. Haskell has a similar concept for producing infinite data structures based on lazy evaluation of functions [2]. In short generators are useful because they allow you to programmatically construct a sequence.

Construction

Using closures it is easy to emulate this behavior in R. There are two key ingredients to make this work: a generator function and a new apply function that applies over an iterator (the return function of the generator).

The generator

At its most basic a generator is simply a function that returns a closure: a function bound to an environment that references non-local variables. Closures are a fundamental building block in functional programming and can eliminate the need for (dangerous) global variables. Our iterator is the closure and will reference a number of non-local variables defined in the scope of the generator.

seq.gen(start) %as%
{
  value <- start - 1
  function() {
    value <<- value + 1
    return(value)
  }
}

For simplicity the first generator is infinite: it will continue producing values for as long as it is called. Let’s think about that for a moment. We can produce an infinitely long sequence so long as we keep calling this function. In R we typically consider sequences as being finite. We also think about them being produced as a batch i.e. in a single function call. For example, creating a sequence from 1 to 10 is simply seq(1,10) or 1:10. The sequence is then passed to some other function as a vector. If passed to apply, it will then be iterated over element by element. This is fine for data analysis or batch-oriented back testing. However, what if instead of a batch we want to run a simulation as though the model or system were acting for real? This is where an iterator is useful because it can produce inputs that behave like real inputs.

Introducing iapply

Since the standard suite of apply functions expect a complete sequence, this technique cannot be used out of the box. Instead we need to create our own apply function, which we’ll call iapply (i for iterator). It acts like the other apply functions with the exception that it understands iterators.

iapply(iterator, fn, simplify=TRUE, formatter=function(x) format(x,"%Y-%m-%d")) %as%
{
  out <- list()
  while (! is.null(input <- iterator()))
  {
    df <- data.frame(fn(input))
    if (ncol(df) > 1)
      out[formatter(input)][[1]] <- df
    else
      out[formatter(input)] <- df
  }
  if (simplify) out <- do.call(rbind,out)
  out
}

There is no magic in iapply. As shown it’s really just looping over the iterator, calling the function fn with the result of the iterator, doing some formatting, and finally collecting the result. The format function I use is for dates because my primary use case is to create a sequence of dates. I then simulate data over a sequence of dates and pass that into my system as though it were real data. The advantage is that I only write the model once, and I also don’t have to worry about accidentally using data in the past since model testing behaves exactly the same as real world operation.

Embedding control into an iterator

Since the current generator is infinite, the sequence will never stop. This means that iapply will never return. To resolve this minor detail, we need to add some control logic into the iterator. Remember that the iterator is a closure, so it is easy to add some more variables to the non-local scope and use that for control. The updated function provides a stop value, a step interval, and a way to reset the iterator back to the original starting value. There’s also an additional clause to handle character data and convert them to Date objects for convenience.

seq.gen(start, stop, step=1) %when% {
  is.character(start)
  is.character(stop)
} %as% {
  seq.gen(as.Date(start), as.Date(stop), step)
}

seq.gen(start, stop=Inf, step=1) %as%
{
  first <- value <- start - step
  function(reset=FALSE) {
    if (reset) { value <<- first; return(invisible()) }
    if (value >= stop) return(NULL)

    value <<- value + step
    return(value)
  }
}

As an example, we can now call the generator to provide a Date iterator. You can pass reset=TRUE at any time to reset the iterator to the first value.

> date.fn <- seq.gen('2013-01-01', '2013-02-01')
> date.fn()
[1] "2013-01-01"
> date.fn()
[1] "2013-01-02"
> date.fn()
[1] "2013-01-03"
> date.fn(reset=TRUE)
> date.fn()
[1] "2013-01-01"

Keep in mind that once you hit the end of the iterator, it will return NULL unless you tell it to reset.

A complete example

The iterator is now ready to use within iapply. To use the technique, simply call the generator to create an instance and pass it along with a function to iapply.

> date.fn <- seq.gen('2013-01-01', '2013-02-01')
> iapply(date.fn, function(x) rnorm(1))
 [,1]
2013-01-01 -1.311821e+00
2013-01-02 4.112014e-01
2013-01-03 6.985409e-02
2013-01-04 3.905463e-01
...

Admittedly using all this structure to generate a sequence of random numbers is counterproductive. In the next post, I’ll describe how to simulate stock price data using this approach and how to plug it into a model. It will then be clear what advantages this technique provides.

Conclusion

Generators are a powerful technique for programming a sequence. Armed with an iterable function it is possible to create simulations that operate realistically through the system. The advantage to this approach is that model development, system development, and testing can all share the same code base, meaning less work for you.

References

[1] Generators
[2] Functions


To leave a comment for the author, please follow the link and comment on their blog: Cartesian Faith » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)