Kantorovich distance with the ‘ompr’ package

[This article was first published on Saturn Elephant, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Do you know my former blog? It contains four posts about the computation of the Kantorovich distance:

The Julia way using the JumP library has the most convenient syntax:

using JuMP 
mu = [1/7, 2/7, 4/7]
nu = [1/4, 1/4, 1/2]
n = length(mu)
m = Model()
@defVar(m, p[1:n, 1:n] >= 0)
@setObjective(m, Min, sum{p[i, j], i in 1:n, j in 1:n; i != j})
for k in 1:n
  @addConstraint(m, sum(p[k, :]) == mu[k])
  @addConstraint(m, sum(p[:, k]) == nu[k])

This allows to get the Kantorovich distance between the two probabilities mu and nu corresponding to the 0-1 distance (assuming mu and nu have the same support). This is totally useless because one can straightforwardly get this distance: it is one minus the total weight of the infimum measure of the two probability measures (1 - sum(pmin(mu, nu)) in R). But this is just for a simple illustration purpose. This problem is not trivial for another distance on the support of mu and nu. Encoding this distance as a matrix D, the linear programming model allowing to get the corresponding Kantorovich distance is obtained by replacing

sum{p[i, j], i in 1:n, j in 1:n; i != j}


sum{p[i, j] * D[i, j], i in 1:n, j in 1:n; i != j}

Now I want to show again how to compute the Kantorovich distance with R, but using another package I discovered yesterday: the ompr package. It allows to write the model with a convenient syntax, close to the mathematical language, similar to the one above with JumP. Here is the model:


mu <- c(1/7, 2/7, 4/7)
nu <- c(1/4, 1/4, 1/2)
n <- length(mu)

model <- MIPModel() |>
  add_variable(p[i, j], i = 1:n, j = 1:n, type = "continuous") |>
  add_constraint(p[i, j] >= 0, i = 1:n, j = 1:n) |>
  add_constraint(sum_over(p[i, j], j = 1:n) == mu[i], i = 1:n) |>
  add_constraint(sum_over(p[i, j], i = 1:n) == nu[j], j = 1:n) |>
  set_objective(sum_over(p[i, j], i = 1:n, j = 1:n, i != j), "min") 

This is nicely readable. Now we solve the problem:

optimization <- model |>
  solve_model(with_ROI(solver = "glpk"))

And we get the Kantorovich distance:

## [1] 0.1071429
To leave a comment for the author, please follow the link and comment on their blog: Saturn Elephant.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)