I was pleased to see the announcement yesterday of simmer 3.61. a discrete-event simulation (DES) package for R. I’ve long had an interest in DES, and as I will explain below, implementing DES in R brings up interesting issues about R that transcend the field of DES. I had been planning to discuss them in the context of my own DES package for R, and the above announcement will make a good springboard for that.
First, what is DES? It is simulation of stochastic processes with discrete state space. Consider for instance the classic M/M/1 queue: exponential interarrival and service times, with a single server. The state space can be defined as the queue length, which is integer-valued and thus “discrete.” This contrasts to, say, simulating a weather system, where state such as temperature is continuous.
The key component of such a simulator, no matter which programmer world view the software takes (see below) is the event list. At any simulated time t, the event list records all the events that are supposed to happen after time t. In the M/M/1 queue example, for instance, at time 168.0, there might be a service completion scheduled for time 169.1, and an arrival at 182.2.
The main loop of a DES program then consists of:
- Remove the earliest event e in the event list; e will be represented as some kind of data structure, including event time, event type and so on.
- Update the current simulated time to the time in e.
- Simulate the execution of e.
The looping continues until the user-specified maximum simulation time is exceeded.
The M/M/1 case is simple (not to mention having a closed-form theoretical solution), but in complex systems the event list can be quite large, say thousands of events. In such cases, good performance means executing the first of the above three bulleted items efficiently. Classically, there has been much research on this (including theoretical models using renewal theory and the like). But what about doing this in R?
The simmer package handles this by…NOT doing it in R. If one needs the performance, this is the obvious route to take. (This is not quite true in a certain sense, also explained below.)
I developed my DES package entirely in R, mainly because I aimed it only as proof-of-concept. It’s been around for a few years, first on my own Web page then more recently on GitHub. I did it for use by my students, and because it seemed that periodically there have been questions on r-help along the lines of “Is there a DES package available for R?”
Most algorithms for handling event lists use some kind of priority queue, implemented a a binary tree. Since R lacks pointers, it is not easy to develop such thing, much less do it efficiently. So I just chose to implement the event list in DES as straight R vector, maintained in sorted order. But when a new event is inserted, a time-consuming operation ensues, due to the need to keep the event list in ascending order. Originally, I implemented this as binary search.
Later I realized that this was anti-R. I knew it would be slow, of course, but didn’t think much about alternatives. Then later it occurred to me:
- Just add new events at the tail end.
- Don’t keep the event list in sorted order.
- In first bullet above in the event loop, simply find the earliest event by calling R’s which.min().
True, which.min() does an inefficient linear search. But it does it at C (not sea) level! Unless the event list is larger than any one I know of in practice, this will be a win.
Now, what about my pure-R DES package vs. simmer, which does its core operations in C++? The simmer package ought to be faster, right? Actually, yes, but one must first understand that they actually are not comparable, as follows.
There are two main programming paradigms (“world views”) in DES. Let’s illustrate that with M/M/1:
- Event-oriented: Here the code explictly recognizes how one event triggers others. For a job arrival in M/M/1, the code to react to that arrival will see whether to add the job to the server queue, vs. starting it now if the queue is empty, and that same code will schedule the next job arrival.
- Process-oriented. Here each entity more or less “minds its own business,” with fewer lines of code that explicitly show interactions between entities. In M/M/1 we might have an arrival process function and a server process function. The latter function might watch the queue continually to see when it becomes nonempty, and the former function might add the newly-arriving job to the queue, but NOT start service for the job in the case of an empty queue, as would be the case for the event-oriented approach.
The pros and cons are: The event-oriented approach is much easier to implement, but arguably less clear. The process-oriented approach requires threading of some kind (not necessarily the “classical” kind), but most people, including me, consider it clearer and more elegant.
The simmer package is process-oriented, and in fact is modeled on SimPy, a popular DES library written in Python. I’m a big fan of SimPy, which is another reason why I like simmer.
HOWEVER, the process-oriented approach, ceteris paribus, tends to be slow. This is due to various reasons related to the threading, but at any rate, the event-oriented approach, for all its inelegance, does tend to excel somewhat in this regard.
My DES package is event-oriented. So, I was curious which package would be faster, the pure-R event-oriented code or the R-calls-C++ process-oriented code. So I did a small experiment. (Disclaimer: Not claimed to generalize.) Both packages include M/M/1 examples, so I ran them for various values of the mean interarrival and service rates. I won’t enumerate the results here, but generally the C++/process-oriented runs were about 25-50% faster than the R/event-oriented ones.
There may be many issues here. For instance, DES’ deletion of an event involves code like
simlist$evnts <- simlist$evnts[-i,,drop=F]
This may involve reallocating memory for the entire matrix.
I will leave all this as an exercise for the reader.