**R – insightR**, and kindly contributed to R-bloggers)

**By Yuri Fonseca**

In this post we are going to make an Uber assignment simulation and calculate some metrics of waiting time through simulation.

## Setting

Suppose we live in a 100×100 block city where each block takes 1 minute to cross by car. Drivers can pick up passengers only on corners, and passengers must call Uber on corners. Inferior-left corner is (1,1) and superior-right corner is (100,100).

## Functions

In order to calculate the average waiting time of a passenger, we need a couple of functions. The first function calculates the distance of a specific passenger and a specific driver, the second function creates the passengers, the third creates the cars and finally, the last function builds the distance matrix between cars and passengers in order to assign them in a optimal way.

Since our city is a perfect square, the distance between a passenger and a car is simple the distance in the x-axis plus the distance in the y-axis. Moreover, signs don’t matter. In this simple example we just need to know the initial position of the driver, initial position of the car and final destination. Each of these positions is a 2×1 vector representation in the city map.

#### Observation: distance matrix

We are going to use an linear programming solver in order to allocate optimally cars to passangers. This allocation problem just work with square matrix. So, if we have more cars than passengers, we need to create fictionary passengers (zero distances) in order to the solver converge. Note that we are going to do just a single round of allocation, so the number of cars needs to be bigger or equal than the number of passengers asking for a UBER driver.

create_passenger = function(id){ initial.position = sample(50, 2, replace = TRUE) final.destination = sample(50, 2, replace = TRUE) return(list('number' = id, 'initial' = initial.position, 'final' = final.destination)) } create_car = function(id){ initial.position = sample(50, 2, replace = TRUE) return(list('number' = id, 'position' = initial.position)) } distance = function(x,y){ sum(abs(x-y)) } distance.matrix = function(cars, passengers){ d.matrix = matrix(0, nrow = length(cars), ncol = length(cars)) for (i in 1:length(cars)){ for (j in 1:length(passengers)){ d.matrix[i,j] = distance(cars[[i]]$position, passengers[[j]]$initial) } } return(d.matrix) }

#### Example MAP

Let’s check an example of 10 passengers and 10 cars:

library(lpSolve) library(ggplot2) set.seed(20) passengers = lapply(seq(1:10), create_passenger) cars = lapply(seq(1:10), create_car) d.matrix = distance.matrix(cars, passengers) opt.allocation = lp.assign(d.matrix) passengers.points = sapply(passengers, function(x) x$initial) cars.points = sapply(cars, function(x) x$position) points = t(cbind(passengers.points, cars.points)) assignments = apply(opt.allocation$solution, 1, which.max) #checking the assignment for each car df1 = data.frame('x.axis' = points[,1], 'y.axis' = points[,2], 'id' = c(rep('Passenger',10), rep('Car',10))) # df.assign = data.frame('x' = cars.points[1,], # 'y' = cars.points[2,], # 'xend' = passengers.points[1,assignments], # 'yend' = passengers.points[2,assignments]) df.assign1 = data.frame('x' = cars.points[1,], 'y' = cars.points[2,], 'xend' = passengers.points[1,assignments], 'yend' = cars.points[2,]) df.assign2 = data.frame('x' = passengers.points[1,assignments], 'y' = cars.points[2,], 'xend' = passengers.points[1,assignments], 'yend' = passengers.points[2,assignments]) ggplot(df1, aes(x.axis,y.axis)) + geom_point(aes(color = id, group = id), size = 3) + # car and passengers geom_segment(aes(x = x, y = y, xend = xend, yend = yend), data = df.assign1) + geom_segment(aes(x = x, y = y, xend = xend, yend = yend), data = df.assign2, arrow = arrow(length = unit(0.02, "npc"), type = 'closed')) + scale_x_continuous(minor_breaks = seq(1, 50, 1)) + scale_y_continuous(minor_breaks = seq(1, 50, 1)) + ggtitle('Optimal Allocation')

## Monte Carlo simulation

Now the waiting time… Suppose that we have 10 passengers asking for cars in the city, we are going to see how the waiting time changes with the numbers of cars. As example, we are going to change the numbers of cars from 10 up to 30 cars, with 500 hundred Monte Carlo simulation.

simulations = function(N, MC) { ncars = N times = matrix(0, nrow = MC, ncol = N) for (i in 1:MC){ passengers = lapply(seq(1:10), create_passenger) cars = lapply(seq(1:ncars), create_car) d.matrix = distance.matrix(cars, passengers) opt.allocation = lp.assign(d.matrix) times[i,] = colSums(opt.allocation$solution*opt.allocation$costs) # waiting time for each passenger } return(times) } results = lapply(seq(10,30,2), simulations, MC = 500) # MC = 500 just to save some time

Now it is possible to check how the numbers of cars affect the mean waiting time and the 10% and 90% confidence level for the waiting time.

df2 = data.frame('WaitingTime' = sapply(results, mean), 'LB' = sapply(results, quantile, probs = 0.10), 'UB' = sapply(results, quantile, probs = 0.90), 'Cars' = seq(10,30,2)) ggplot(df2, aes(x = Cars)) + geom_line(aes(y = WaitingTime), lwd = 1.2) + geom_ribbon(aes(ymin = LB, ymax = UB), fill = 'blue', alpha = .3)

**leave a comment**for the author, please follow the link and comment on their blog:

**R – insightR**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...