“The Winner Takes It All” – Tuning and Validating R Recommendation Models Inside Tableau

May 4, 2014
By

(This article was first published on Data * Science + R, and kindly contributed to R-bloggers)

Introduction

My last blog article shows how to build an interactive recommendation engine in Tableau using a simple model utilizing the cosine similarity measure. While this can be a good way to explore unknown data, it is wise to validate any model before using it for recommendation in practice in order to get an estimate of how the model performs on unknown data. In the interactive approach the parameters were not questioned, but in reality we have the choice to modify the parameters and thereby driving the performance of the resulting model. That’s why the current blog post shows how to prepare the data, setup a simple recommendation model in R and validating its performance over different sets of parameters.

Background

Most readers familiar with data analytics know that building, tuning and validating a model is in most cases not a sequential workflow, but is carried out in an iterative manner - going through these tasks in small cycles. In each cycle we think about how we should change parameters, then modify the model and validate it at last. This means that also the tools, a data analyst uses, should support this mode of work as good as possible - especially I want something that is capable of supporting all of these steps. That’s why I want to give Tableau a trial - as a frontend for developing, tuning and validating recommendation models where the computational work is done with R in the back. If you want to start with a general overview on how R is integrated in Tableau, you can find some useful information and tutorials here, here, here and here.

If you didn’t read the other two articles (here and here) on my blog explaining the used dataset, here is a short summary: The data are about check-ins from the location based social network (LBSN) Foursquare crawled over a couple of month during 2012 (data showing a sample for the city of Cologne in Germany). Every data point tells the location/venue, user and a timestamp when the user checked in at the specified location (and quit some other “metadata” about location type, latitude, longitude, e.g.). LBSNs are a good use case for recommendation because users that share similar “movement patterns” in the geographical space may have common interests and so recommendations about locations someone haven’t visited yet, but similar uses already had can be interesting for the person.

Recommendation Models and Workflow

I don’t want to dive too deep into the theory of recommendation models because there is a waste of good literature out there - for example the tutorial of the R package “recommenderlab” (that is used in this post) is a good way to start. Here, we use two very general modeling approaches, called “User-based Collaborative Filtering” and “Item-based Collaborative Filtering”. The first one selects the recommended items based on the ones from other similar users whereas the second one selects the recommended items as the ones that are similar to the items the users already had.

Let’s think about how the overall workflow should look like:
The first step is transform the raw data into a format that can be used for recommendation modeling. A simple format that is often used, is to prepare a table that contains a line for every venue and user together with the corresponding score. In our case we could use the number of user check-ins per venue as score. But this can be misleading as well because a user may have to visit a venue every day (e.g. bus station on the way to work) but that doesn’t mean he likes it very much or would recommend it to a friend. If no explicit score is given like in this case, we should think about using a form of a “binary” score instead. Here 1 indicates that the user has visited the venue and 0 means not.
Second step is then to feed the prepared data into some framework that can be used to test different algorithms and validate the results. Therefore I choose the recommenderlab package because within these framework we get everything we need (including the right data structures and functions for predicting future outcomes). The package is expandable, meaning that everybody can include his own algorithms if needed but also comes with a set of generic algorithms.
Third step is to parameterize the chosen algorithms in a way that we can use Tableau to modify them from our dashboard.
The fourth and last step is then to validate the resulting models for every specific set of parameters and bring back the validation results to Tableau for visualization purposes. We use two visualizations for this – both well-known in the area of recommendation systems: (1) ROC plot (showing the relation between true-positive rate (hit rate or recall) and false positive rate (false alarm rate) together with the AUC value - area under curve - a single value describing model performance) and (2) Precision-Recall plot (showing the relationship between precision and hit rate).
The validation scheme itself is based on a train / test data split, where the unseen data from the test sample is used to calculate the quality measures. In every run and for every model we determine the TOP-N recommendations, where we use twelve different values for $$N\epsilon\{1,2,3,4,5,6,7,8,9,10,15,20\}$$. Based on a comparison of the TOP-N recommendations with the true venues that the user have visited, hit rate, false alarm rate and precision can be calculated.
Additionally to the “User-based Collaborative Filtering” and the “Item-based Collaborative Filtering” we use two more simple modeling approaches as benchmarks (a random model and a model that is based on the global popularity of the venues) to show if a “personalized” model can beat them.

Data Preparation

As mentioned the data preparation is done completely in R. The reason is that R offers all the capabilities and flexibility to build up some “mini ETL” to transform the raw data into the desired format. For me Tableau here is not an option because it is intended as a data visualization not a data preparation workbench. The result is a CSV file that can be fed into Tableau, where all other steps for modeling and validation take place. The code for creating the CSV looks as follows:

# --------------------------------------
# filter parameters
# --------------------------------------
minNumberOfCheckinsPerVenue = 12
minNumberOfUsersPerVenue = 4

# --------------------------------------
# --------------------------------------
require(data.table)

# --------------------------------------
# --------------------------------------
fileName <- "DATA/Foursquare_Checkins_Cologne.csv"
dataset <- read.csv2(fileName, colClasses = c(rep("character", 7),
rep("factor",2), rep("character", 2)),
dec = ",", encoding = "UTF-8")

# how the first 10 elements look like

# --------------------------------------
# data preprocessing
# --------------------------------------
dataset$CHECKIN_DATE <- as.POSIXct(dataset$CHECKIN_DATE, format = "%Y-%m-%d %H:%M:%S")
dataset$LAT <- sub(",", ".", dataset$LAT)
dataset$LNG <- sub(",", ".", dataset$LNG)
dataset$LAT <- as.numeric(dataset$LAT)
dataset$LNG <- as.numeric(dataset$LNG)
dataset$HOUR24 <- as.numeric(format(dataset$CHECKIN_DATE, "%H"))
venueDataset <- unique(dataset[, c("VENUEID", "LNG",
"LAT", "VENUE_NAME", "CATEGORY_NAME")])

# use data.table for aggregation
datasetDT <- data.table(dataset)
venueUserDataset <- datasetDT[, list(COUNT_CHECKINS = length(unique(CHECKIN_ID))),
by = list(VENUEID, USERID)]
# filter for artificial venues
tmpVenueDataset <- data.table:::merge.data.table(datasetDT[,
list(COUNT_USER_ALL = length(unique(USERID))),
by = list(VENUEID)][COUNT_USER_ALL >= minNumberOfUsersPerVenue],
datasetDT[, list(COUNT_CHECKINS_ALL = length(unique(CHECKIN_ID))),
by = list(VENUEID)][COUNT_CHECKINS_ALL >= minNumberOfCheckinsPerVenue],
by = "VENUEID")
venueUserDataset <- data.table:::merge.data.table(venueUserDataset,
tmpVenueDataset, by = "VENUEID")
venueUserDataset <- data.frame(venueUserDataset)

# --------------------------------------
# merge scoring table with venue information and save as csv
# --------------------------------------

# venueDataset contains all venues venueUserDataset contains all the
# relationships (ratings)
venueUserFullDataset <- merge(x = venueUserDataset, y = venueDataset,
by = "VENUEID")

# move IDs to 1:N range
venueUserFullDataset$USERID <- match(venueUserFullDataset$USERID,
unique(venueUserFullDataset$USERID)) venueUserFullDataset$VENUEID <- match(venueUserFullDataset$VENUEID, unique(venueUserFullDataset$VENUEID))

write.csv2(venueUserFullDataset,
file = "DATA/Venue_Recommendation_Dataset_Example.csv",
row.names = FALSE)


Integrating R-package “recommenderlab” in Tableau

The first step in Tableau is to connect to the CSV dataset. We focus on the two dimensions (USERID and VENUEID) that are absolutely necessary for creating our recommendations. Now it is time to think about the way we have to approach the R interface of Tableau. It is important to know that Tableau requires that the dimension of the input from Tableau to R is the same as the output from R to Tableau. This means that if you specify the inputs for Tableaus R table calculation functions as a couple of data vectors, each with length 100, then Tableau expects to get exactly one vector having 100 values back (one exception: a singular value is also fine). This concept has two important consequences:

1. Because we want to display the values of the defined quality measures graphically, the number of return values is significantly lower than the number of input tuples. For example, the dataset contains more than 5,000 tuples, but the output for the false alarm rate consist of only 4 x 12 values (12 values per model - 1 for every Top N list).

2. The result from calling R functions in Tableau is restricted to be exactly one measure, but we need several of them - one for each quality metric. We can solve this by calling the whole modeling procedure several times, where each iteration gives back the values for one quality metric. But this will kick down performance significantly and is, therefore, not an option.

The solution for the first problem is simple. The interesting values are written at the beginning of the result vector. One can fill the rest of the result vector with NA values and filter them out afterwards in Tableau - what’s left are the data we want.
The solution for the second issue is a little bit more complicated. An approach is, to do the whole calculation in a separate task (a calculated field) once and save the result as R object “in-memory”. After that, we create additional calculated fields, one for every quality metric we need. Their only purpose is to extract the metrics from the “in-memory” result object – vector by vector. To make this work, we need to run the R computations inside a “shared session” of Rserve (Rserve is the middleware needed for the communication between Tableau and R). This means that all the calculation share the same context (variable, functions, objects, …), which is not the case for Linux implementation. In Linux every calculation creates a new session. Therefore, the current implementation only works when Rserve is installed on a windows machine. I’m sure, there are options to bring this calculation also to a Linux system – If you have an idea, please leave a comment.

Having all these in mind the next step is to create a calculated field that transforms the input from Tableau into a suitable data structure for recommender systems. As mentioned above we want to model the score as a binary one, so starting with a rating matrix with decimal scores, we use the binarize function to yield the final 0/1 rating matrix:

# Tableau field: SCRIPT_INT('
require(recommenderlab)
rm(list = ls())
n <- length(.arg1)
userid <- .arg1
venueid <- .arg2
score <- .arg3
recMatrixTmp <- sparseMatrix(i = userid, j = venueid,
x = score, dimnames = list(as.character(sort(unique(userid))),
as.character(sort(unique(venueid)))))
recMatrix <- new("realRatingMatrix", data = recMatrixTmp)
recMatrix <- binarize(recMatrix, minRating = 1)
return(1)
# ',MAX([USERID]),MAX([VENUEID]),MAX([COUNT_CHECKINS]))


Subsequently all the parameters need to be defined, which we want to use when optimizing the recommendation model. For the user-based collaborative filtering approach these are the method to calculate the similarity between users (“Similarity Function”), the number of users for calculating the recommendation (“NN”), if we want to use weighted distances when aggregating the ratings of the most similar users (“Weighted (UBCF)”) and if we want to use only a sample of the given data (“Sample (UBCF)”). The parameters for the item-based collaborative filtering approach are the similarity method (“Similarity Function”), the number of most similar items that should be taken into account (“k”) and if we want to normalize the similarity matrix to remove user rating bias (“Normalize Similarity Matrix (ICBF)”).
We also need two parameters for the validation schema: The fraction of data we want to be used as test data (“Train / Test Split”) and how many iterations of the train/test cycle we need to calculate the quality metrics (“Evaluation Runs”). Of course more parameters are possible, but I limited the numbers because of the demo character of this implementation.

The main workhorse of the dashboard is the calculated field “RComputeRecommendation(Multi)” that does all the modeling, validation and output shaping. I will first show the code and afterwards explain parts in greater detail:

# Tableau field - SCRIPT_INT('
require(recommenderlab)

# extract parameters
seed <- 1234
trainProp <- .arg1[1]
simMethod <- .arg2[1]
evRuns <- .arg3[1]
NN <- .arg4[1]
WEIGHTED <- as.logical(.arg5[1])
if (.arg6[1] == "FALSE") {
SAMPLE <- as.logical(.arg6[1])
} else {
SAMPLE <- round(dim(recMatrix)[1] * trainProp * as.numeric(.arg6[1]))
}
kIBCF <- .arg7[1]
NORMALIZESIMMAT <- as.logical(.arg8[1])

# assemble parameterization
set.seed(seed)
es <- evaluationScheme(recMatrix, method = "split",
k = evRuns, given = 1, train = trainProp)
algorithms <- list(
"random items" = list(name = "RANDOM",
param = list(normalize = "center")),
"popular items" = list(name = "POPULAR",
param = list(normalize = "center")),
"user-based CF" = list(name = "UBCF",
param = list(method = simMethod, nn = NN,
weighted = WEIGHTED, sample = SAMPLE)),
"item-based CF" = list(name = "IBCF",
param = list(method = simMethod, k = kIBCF,
normalize_sim_matrix = NORMALIZESIMMAT)))

# run validation and grab results
set.seed(seed + 1)
listN <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20)
ev <- evaluate(es, algorithms, n = listN)
resMat <- do.call("rbind", avg(ev))
resMat <- cbind(resMat, LIST_LENGTH = as.numeric(row.names(resMat)))
row.names(resMat) <- rep(names(avg(ev)), each = length(listN))

# calculate AUC
AUC <- as.numeric(by(resMat, rep(1:length(ev),
each = length(listN)), function(x) {
tpr <- c(0, x[, "TPR"], 1)
fpr <- c(0, x[, "FPR"], 1)
i <- 2:length(tpr)
return((fpr[i] - fpr[i - 1]) %*% (tpr[i] + tpr[i - 1])/2)
}))
resMat <- cbind(resMat, AUC = rep(AUC, each = length(listN)))

# pull out quality metrics
recall <- as.numeric(c(resMat[, 6], rep(NA, n - length(resMat[, 6]))))
precision <- as.numeric(c(resMat[, 7], rep(NA, n - length(resMat[, 7]))))
tpr <- as.numeric(c(resMat[, 9], rep(NA, n - length(resMat[, 9]))))
fpr <- as.numeric(c(resMat[, 8], rep(NA, n - length(resMat[, 8]))))
listlength <- as.numeric(c(resMat[, 10], rep(NA, n - length(resMat[, 10]))))
type <- as.character(c(row.names(resMat), rep(NA, n - length(row.names(resMat)))))
config <- rep(sapply(algorithms, function(i) {
as.character(paste(unlist(i), collapse = "/"))
}), each = length(listN))
algoConf <- as.character(c(config, rep(NA, n - length(config))))
auc <- as.numeric(c(resMat[, 11], rep(NA, n - length(resMat[, 11]))))
return(1)
# ',MAX([Train / Test Split]), MAX([Similarity Function]), MAX([Evaluation
# Runs]), MAX([NN (UBCF)]), MAX([Weighted (UBCF)]), MAX([Sample (UBCF)]),
# MAX([k (ICBF)]), MAX([Normalize Similarity Matrix (ICBF)]),
# [RInitDataStructure])


The first block stores all the parameter values from Tableau into their R counterparts. The only thing special is the “Sample (UBCF)” parameter, where the absolute size of the sample has to be calculated if sampling is enabled.
We also define a “seed” here, which is quite important to get equal quality metrics across different visualizations when sampling is involved. Because there will be two sheets at the end (ROC- and precision-recall-plot), showing two perspectives of the same model, we have to make sure that the computational results they show are the same.
Next we define the parameterization of the algorithms and afterwards run our validation schema using the train/test split.
In principle that’s all - everything later on is about how we extract the results and bring them back to Tableau. As explained before, we therefore create a vector of the same length as the input data for every quality metric. The first N elements contain the true values whereas the other only contain NA values. Later in the dashboard we filter all the NA elements.

To be able to visualize the defined quality metrics, we use calculated fields for them. To be more precise - for every metric we need one calculated field: true-positive rate, false positive rate, precision, recall, AUC, length of the TOP-N list (as label), type of the algorithm (as label) and parameterization (as label). In the following you find the code for the false-positive rate as an example - this is how every calculated field is defined:

# Tableau field: SCRIPT_REAL('
cat("++++++RFieldFPR++", format(Sys.time(), "%Y/%m/%d_%H:%M:%S"), "+++++\n")
return(fpr)
# ',[RComputeRecommendation(Multi)])


It is really simple: the field just contains one line that is used for debugging purposes and then return the proper metric. But take a close look at the field ‘RComputeRecommendation(Multi)’ (our computation workhorse) that is used as input but then is never used inside the R calculation. The reason for this is that we have to force Tableau, to do the recalculation of the models (based on changed model parameters) before we extract the results. Prior to this finding I always got into some strange situations, where parts of the metrics seems to contain the results previous to the parameter change. By using the calculated field as a parameter, we make sure that the extraction of the results takes place after the evaluation (I know doesn’t look nice - maybe someone knows how to handle this in a more gentle way).

Creating Visualizations for Model Tuning

Now we have everything we need to assemble those two visualizations showing the ROC plot and the precision-recall plot for the different models. To do so we place the true-positive rate on the row and the false-positive rate on the column shelf (respectively precision and recall for the precision-recall-plot). Furthermore, we need to place the type of algorithm field on color and shape to visualize the curves for the different algorithms. The length of the TOP-N list will serve as label. Additional fields are used as part of the tooltips. Important also to take one of the calculated field as filter and exclude all NULL values as described above. Last point is to deselect “Aggregate Measures” under “Analysis” to make all the single points visible.

After we created the two different visualization, it is time to place everything onto on single dashboard, arrange all the parameter fields and configure them to trigger changes in both plots. Because every parameter change will cause Tableau to recalculate the model it can be wise to deselect the automatic update under “Dashboard” -> “Auto Update” and instead trigger an update manually by hitting F10. And then – happy tuning …

Summary

We now have a really nice dashboard that also enables users not familiar to R, to find well suited recommendation models and compare the effect of different parameterizations. Because the whole implementation is only based on three attributes (User ID, Product/Venue ID and Score), it is easy to change the dataset leaving most of the R calculations and the visualizations untouched.

A couple of words at the end about my experiences regarding the interface connecting Tableau and R. If you want the whole modeling cycle to take place in Tableau, you have to use some hacks to circumvent obstacles given by the current implementation of the interface. One is the dependence between the dimension of input and output. Especially in the scenario given here, a data analyst is often not interest into the outcomes of the prediction of the training data. Instead, he wants to know more about quality metrics that consists typically of only a small set of values. Using the “NA-padding” strategy is some way to bypass these issue.

The other point is that the output/result of an R table calculation can only contain one vector – in this case we can use the idea of storing the results as R object and accessing it with different calculated field (should only work under windows) or putting all the fields belonging to one entity into one string (together with a specific delimiter) and parsing it afterwards in Tableau (this option should also increase performance because we send less data back to Tableau).

Last point is that if I want to use the calculated fields in different visualization that are all part of one dashboard, Tableau will execute the code inside the fields for every sheet and this may introduce significant performance issues (e.g. calculates the model twice). An option to avoid those kinds of problems is to shift some logic into R, in a way such that R checks the given parameters and only recalculates the model if something changes.

As always feel free to comment on these post especially when you have good ideas/alternatives on how to solve some of the mentioned points.

Please also find the workbook with the R code from this blog post attached: Tuning_Recommendation_Models_in_Tableau.twbx