Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I had the opportunity recently to be asked to join a committee which had to select candidates for an upcoming election. In the candidate selection phase, the members of the committee use preferential voting to rank the candidates. Prior to the meeting, I did some research on preferential voting systems. I found two main classes of methods used to determine winners-those based on some form of Borda count and those based on some form of Condorcet method.

In previous years, this committee selected the order of the candidates purely on an average rank basis, which is functionally equivalent to a Borda count (although not a modified Borda count). Being informed that I was going to be the tabulator and calculator, I decided that this year I also wanted to use a Condorcet method and compare it with the average rank. I decided that my Condorcet method was going to take the following path:

1. Calculate if there is a pure Condorcet method
2. If there is a cycle (no Condorcet winner) use the Schulze method
3. If there is no unique member of the Schwartz set (the Schulze winner) then select one of a number of tiebreakers

I will gloss over the embarrassing details of how I missed an edge case while checking for stopping conditions and ended up in an infinite while loop, and focus on how I used Rcpp to greatly speed up the process. Condorcet methods require making a pairwise comparison between every candidate. This is slow—$$\mathcal{O}(n^2)$$. The Schulze method has a rather simple implementation , but it is even slower—$$\mathcal{O}(n^3)$$.

Here is a sample ballot:

##   Candidates Vote A Vote B Vote C Vote D Vote E Vote F Vote G Vote H
## 1     Albert      1      2      1      3      2      1      3      4
## 2      Bruce      2      4      5      4      5      4      1      2
## 3    Charles      3      1      3      1      1      2      4      5
## 4      David      4      5      2      5      4      5      2      1
## 5     Edward      5      3      4      2      3      3      5      3

Here is some simple code to calculate the average rank of the candidates:

AvgRank <- function(BallotMatrix) {
Ballots <- as.matrix(BallotMatrix[, -1], mode = "numeric")
Num_Candidates <- dim(Ballots)[1]
Names <- BallotMatrix[, 1]
Ballots[is.na(Ballots)] <- Num_Candidates + 1  #Treat blanks as one worse than min
MeanRanks <- rowMeans(Ballots)
Rankings <- data.frame(Names, MeanRanks)
Rankings <- Rankings[order(rank(Rankings[, 2], ties.method = "random")),
]  #Ties handled through random draw
Rankings <- data.frame(Rankings, seq_along(Rankings[, 1]))
names(Rankings) <- c("Names", "Average Rank", "Position")
return(Rankings)
}

The above ballot would result in the following ranking:

##     Names Average Rank Position
## 1  Albert        2.125        1
## 3 Charles        2.500        2
## 2   Bruce        3.375        3
## 5  Edward        3.500        4
## 4   David        3.500        5

Here is some (simplified) code to calculate Condorcet and Schulze winners:

VoteExtract <- function(BallotMatrix) {
Votes <- as.matrix(BallotMatrix[, -1], mode = "numeric")
Votes[is.na(Votes)] <- Num_Candidates + 1  #Treat blanks as one worse than min
}

Pairwise <- matrix(nrow = Num_Candidates, ncol = Num_Candidates)
for (CurCand in 1:Num_Candidates) {
for (Pairs in 1:Num_Candidates) {
Pairwise[CurCand, Pairs] <- sum(Pref_Cur_Cand[, Pairs] > 0)
}
}
return(Pairwise)
}

Schulze <- function(PairsMatrix) {
size <- dim(PairsMatrix)[1]
p <- matrix(nrow = size, ncol = size)
for (i in 1:size) {
for (j in 1:size) {
if (i != j) {
if (PairsMatrix[i, j] > PairsMatrix[j, i]) {
p[i, j] <- PairsMatrix[i, j]
} else {
p[i, j] <- 0
}
}
}
}
for (i in 1:size) {
for (j in 1:size) {
if (i != j) {
for (k in 1:size) {
if (i != k && j != k) {
p[j, k] <- max(p[j, k], min(p[j, i], p[i, k]))
}
}
}
}
}
diag(p) <- 0
return(p)
}

CondorcetRank <- function(BallotMatrix) {
Num_Candidates <- dim(BallotMatrix)[1]
Rankings <- matrix(nrow = Num_Candidates, ncol = 3)
CurrentBallot <- BallotMatrix
CurrentRank <- 1
while (CurrentRank <= Num_Candidates) {
CurrentNames <- as.vector(CurrentBallot[, 1])
CurrentSize <- length(CurrentNames)
Pairwise <- matrix(nrow = CurrentSize, ncol = CurrentSize)
Winner <- vector(length = CurrentSize)

# Check for Condorcet Winner

for (i in 1:CurrentSize) {
Winner[i] <- sum(Pairwise[i, ] > Pairwise[, i]) == (CurrentSize -
1)
}
if (sum(Winner == TRUE) == 1) {
# Condorcet Winner Exists
CurrentWinner <- which(Winner == TRUE)
Rankings[CurrentRank, ] <- c(CurrentNames[CurrentWinner], CurrentRank,
"Condorcet")
} else {

# Condorcet Winner does not exist, calculate Schulze beatpaths

Pairwise <- Schulze(Pairwise)
for (i in 1:CurrentSize) {
Winner[i] <- sum(Pairwise[i, ] > Pairwise[, i]) == (CurrentSize -
1)
}
if (sum(Winner == TRUE) == 1) {
# Schwartz set has unique member
CurrentWinner <- which(Winner == TRUE)
Rankings[CurrentRank, ] <- c(CurrentNames[CurrentWinner], CurrentRank,
"Schulze")
}
}
CurrentBallot <- CurrentBallot[-CurrentWinner, ]
CurrentRank = CurrentRank + 1
}
Rankings <- data.frame(Rankings)
names(Rankings) <- c("Name", "Rank", "Method")
return(Rankings)
}

The pairwise matrix would be:

##      [,1] [,2] [,3] [,4] [,5]
## [1,]    0    6    5    6    6
## [2,]    2    0    3    5    3
## [3,]    3    5    0    5    7
## [4,]    2    3    3    0    4
## [5,]    2    5    1    4    0

The beatpath matrix for rank 1 (all ballots) would be:

##      [,1] [,2] [,3] [,4] [,5]
## [1,]    0    6    5    6    6
## [2,]    0    0    0    5    0
## [3,]    0    5    0    5    7
## [4,]    0    0    0    0    0
## [5,]    0    5    0    5    0

and just for completeness, the full Condorcet ranking (selecting a winner, removing that entry from the ballot, and repeating on the smaller set) would be:

##      Name Rank    Method
## 1  Albert    1 Condorcet
## 2 Charles    2 Condorcet
## 3  Edward    3   Schulze
## 4   Bruce    4 Condorcet
## 5   David    5 Condorcet

When profiling this code, 81% of the time was spent in the Schulze algorithm, another 12% was spent in the PairCount algorithm, and the remaining 7% was spent on everything else (the actual ranking had multiple steps to handle cases when there was no Schulze winner). To speed up the procedure, I ported the Schulze and PairCount functions to C++ using Rcpp. Below is the C++ code:

#include <Rcpp.h>
using namespace Rcpp;

// [[Rcpp::export]]
IntegerMatrix Schulze_C(IntegerMatrix Pairs) {
int nrow = Pairs.nrow();
IntegerMatrix Schulze(nrow, nrow);
for (int i = 0; i < nrow; i++) {
for (int j = 0; j < nrow; j++) {
if (i != j) {
if (Pairs(i, j) > Pairs(j, i)) {
Schulze(i, j) = Pairs(i, j);
} else {
Schulze(i, j) = 0;
}
}
}
}
for (int i = 0; i < nrow; i++) {
for (int j = 0; j < nrow; j++) {
if (i != j) {
for (int k = 0; k < nrow; k++) {
if ((i != k) && (j != k)) {
Schulze(j, k) = (std::max)(Schulze(j, k), (std::min)(Schulze(j, i), Schulze(i, k)));
}
}
} else {
if ((i = j)) {
Schulze(i, j) = 0;
}
}
}
}
return(Schulze);
}

// [[Rcpp::export]]
IntegerMatrix Pairwise(Num_Candidates, Num_Candidates);
for (int CurCand = 0; CurCand < Num_Candidates; CurCand++) {
IntegerMatrix Pref_Cur_Cand(Num_Candidates, Num_Ballots);
for (int i = 0; i < Num_Candidates; i++) {
for (int j = 0; j < Num_Ballots; j++) {
Pref_Cur_Cand(i, j) = Votes(i, j) - CandRank(j);
}
}
for (int i = 0; i < Num_Candidates; i++) {
int G0 = 0;
for (int j = 0; j < Num_Ballots; j++) {
if (Pref_Cur_Cand(i, j) > 0) G0 += 1;
}
Pairwise(CurCand, i) = G0;
}
}
return(Pairwise);
}

First, we need to check that the functions return the same values:

all.equal(PairCount(VoteExtract(Ballot)), PairCount_C(VoteExtract(Ballot)))
## [1] TRUE
all.equal(Schulze(PairCount(VoteExtract(Ballot))), Schulze_C(PairCount_C(VoteExtract(Ballot))))
## [1] TRUE

Now let's compare the functions' speed:

library(microbenchmark)
V <- VoteExtract(Ballot)
P <- PairCount(V)
microbenchmark(PairCount(V), PairCount_C(V), Schulze(P), Schulze_C(P), times = 100L)

## Unit: microseconds
##            expr     min      lq  median      uq      max neval
##    PairCount(V) 184.360 189.126 191.657 195.678  267.754   100
##  PairCount_C(V)   5.064   5.958   6.851   7.447   22.637   100
##      Schulze(P) 363.955 369.912 376.762 386.292 1006.682   100
##    Schulze_C(P)   1.490   2.085   2.979   3.277    7.446   100

The PairCount function is over 25 times faster in C++ and the Schulze function is over 120 times as fast! Moreover, the PairCount algorithm reads more logically in C++, as the way R handles vectors and matrices, when subtracting the current rank, the resulting matrix ended up transposed with the candidates across the columns.

So with easy-to-read code that results in speed gains of multiple orders of magnitude, what's not to like?!