Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Whenever I go to the grocery store it always seems to be a lesson in statistics. I go get the things I need to buy and then  I try to select the checkout register that will decrease the amount of time I have to wait. Inevitably, I select the one line where there is some sort of problem and I just sit there and wait and wait.  I will often mark the lines that I am not waiting in to see which line I would have made it through faster. So then the question is whether or not I get out of line and try to find a new line that is faster. As a statistician I know that wait time is often modeled using a memory-less exponential distribution and I could very easily choose the line with the fewest number of people in front of me only find that the cash register ran out of paper or there is some other delay. So it’s a cruel statistical game trying to find the one line that minimizes my wait time. But this game could easily be simplified, as well as removing the anxiety of customers watching others who arrived after them finish their transaction sooner. By eliminating multiple lines and having all customers go through one line — and then break off at the very end  — the customers would generally end up getting through the line faster and with less variability. However, I do understand that sometimes there are operational limitations due to the physical store layout or otherwise.

I ran a simulation on the two wait line strategies. The first is when there is only one line that everyone goes through and then break off at the end to the next available cash register.  I have often seen this approach in places like Banks, Airport Security, and U.S. Customs. The second strategy is that there are multiple lines and once you’re in the line you stay there until you complete your transaction.  This approach is often seen at places like Costco, Walmart, Target, and just about every other grocery store. The assumptions I made for this simulation is that the wait time is distributed as an exponential distribution with a mean of three minutes (EXP( $\theta$=3)).  This graph shows a comparison — when using a single line strategy — between the maximum wait time and the minimum wait time. This graph shows the range difference for each of the service locations (cash registers).  When using multiple lines there is a wide range between the maximum total time of one of the locations and the minimum time of the locations.  However, when using only one line that range gets dampened down due to the process as it tries to equalize the wait time at each of the locations.  Ultimately, in order to get every last person out of the system it will take roughly the same amount of time.  But what ends up happening is that there could be a whole group of people stuck in one of the problem lines thus creating higher variability.  And anyone who is involved in process control, such as manufacturing, knows that higher variability is generally not a good thing and is less efficient. library(reshape2) ## For use in the acast() function
service.locations = 3 ## i.e. Number of cash registers
total.obs = service.locations*10 ## Just needed a number and wanted to make is divisible by the service locations
nsims = 10000 ## Number of simulation replicates
x = replicate(nsims, rexp(total.obs, 1/3))

cum.sum = aggreg.multi.mat = aggreg.one.mat = NULL

for(j in 1:ncol(x)){
## Multiple lines assigned randomly (sequentially)
wait.multiple = cbind( (seq(1:total.obs) %% service.locations)+1, x[,j] )
aggreg.values = aggregate(wait.multiple[,2], by=list( wait.multiple[,1] ), sum)
aggreg.multi.mat = rbind(aggreg.multi.mat, t( acast(aggreg.values, Group.1~., value.var="x") ) )

## One line and breaking off at the very end
wait.bucket = matrix(NA, ncol=service.locations,nrow(x))
queue = x[,j]

## Preload the first service locations
for(d in (1:service.locations)){
wait.bucket[d,d] = x[d,j]
}

## Cumulative sum without NA then find min location as it will serve the next customer
for(i in service.locations+1:(nrow(x)-service.locations) ){
for(k in 1:service.locations){
cum.sum[k] = sum(wait.bucket[1:i,k], na.rm=T)

}
wait.bucket[i,which(cum.sum==min(cum.sum), arr.ind=TRUE)] = x[i,j]
}
aggreg.one.mat = rbind(aggreg.one.mat, apply(wait.bucket, 2, sum, na.rm=T))

}

## View the graphs
my.hist.one.1 = hist( apply(aggreg.one.mat,1,max), nclass=100, plot=FALSE)
my.hist.one.2 = hist( apply(aggreg.one.mat,1,min), nclass=100, plot=FALSE)
max.counts = max(my.hist.one.1$counts, my.hist.one.2$counts)
par(mfrow=c(2,1))
hist( apply(aggreg.one.mat,1,max), nclass=100, xlim=c(0,60), co=3, main=expression(paste("Single-Line -- Distribution of Max Wait Time With EXP(",theta,"=3)")), xlab="Total Minutes Per Register")
hist( apply(aggreg.one.mat,1,min), nclass=100, xlim=c(0,60), col=2, main=expression(paste("Single-Line -- Distribution of Min Wait Time With EXP(",theta,"=3)")), xlab="Total Minutes Per Register")

my.hist.data.1 = hist( apply(aggreg.multi.mat,1,max), nclass=100, plot=FALSE)
my.hist.data.2 = hist( apply(aggreg.multi.mat,1,min), nclass=100, plot=FALSE)
max.counts = max(my.hist.data.1$counts, my.hist.data.2$counts)
par(mfrow=c(2,1))
hist( apply(aggreg.multi.mat,1,max), nclass=100, xlim=c(0,60), ylim=c(0,max.counts), co=3, main=expression(paste("Multiple Lines -- Distribution of Max Wait Time With EXP(",theta,"=3) ")), xlab="Total Minutes Per Register")
hist( apply(aggreg.multi.mat,1,min), nclass=100, xlim=c(0,60), col=2, main=expression(paste("Multiple Lines -- Distribution of Min Wait Time With EXP(",theta,"=3) ")), xlab="Total Minutes Per Register")

par(mfrow=c(2,1))
hist( apply(aggreg.one.mat,1,max)-apply(aggreg.one.mat,1,min), nclass=100, xlim=c(0,60), col=2, main="Single Line", xlab="Range Difference Between Max and Min in Minutes of Service Locations")
hist( apply(aggreg.multi.mat,1,max)-apply(aggreg.multi.mat,1,min), nclass=100, xlim=c(0,60), col=3, main="Multiple Lines", xlab="Range Difference Between Max and Min in Minutes of Service Locations")