[This article was first published on Posts on Matthew Smith R Shenanigans, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

# Machine Learning at the Boundary:

There is nothing new in the fact that machine learning models can outperform traditional econometric models but I want to show as part of my research why and how some models make given predictions or in this instance classifications.

I wanted to show the decision boundary in which my binary classification model was making. That is, I wanted to show the partition space that splits my classification into each class. The problem and code can be split into a multi-classification problem with some tweeks.

## Initialisation:

I first load in a series of packages and initialise a logistic function to convert log-odds to a logistic probability function later on.

library(dplyr)
library(patchwork)
library(ggplot2)
library(knitr)
library(kableExtra)
library(purrr)
library(stringr)
library(tidyr)
library(xgboost)
library(lightgbm)
library(keras)
library(tidyquant)
##################### Pre-define some functions ###################################
###################################################################################

logit2prob <- function(logit){
odds <- exp(logit)
prob <- odds / (1 + odds)
return(prob)
}

# The data:

I use the iris dataset which contains information on 3 different plant variables collected by British statistican Ronald Fisher in 1936. The dataset consists of $$4$$ different characteristics of plant species which should uniquely distinguish the $$3$$ different species (Setosa, Virginica and Versicolor). However, my problem required a binary classification problem and not a multi-classifciation problem. In the following code I import the iris data and remove a type of plant Species virginica to bring it from a multi-classification to a binary classification problem.

###################################################################################
###################################################################################

data(iris)
df <- iris %>%
filter(Species != "virginica") %>%
mutate(Species = +(Species == "versicolor"))
str(df)
## 'data.frame':    100 obs. of  5 variables:
##  $Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ... ##$ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
##  $Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ... ##$ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
##  Species : int 0 0 0 0 0 0 0 0 0 0 ... ################################################################################### ################################################################################### I plot the data by first storing the ggplot objects changing only the x and y variables in each of the plt’s. plt1 <- df %>% ggplot(aes(x = Sepal.Width, y = Sepal.Length, color = factor(Species))) + geom_point(size = 4) + theme_bw(base_size = 15) + theme(legend.position = "none") plt2 <- df %>% ggplot(aes(x = Petal.Length, y = Sepal.Length, color = factor(Species))) + geom_point(size = 4) + theme_bw(base_size = 15) + theme(legend.position = "none") plt3 <- df %>% ggplot(aes(x = Petal.Width, y = Sepal.Length, color = factor(Species))) + geom_point(size = 4) + theme_bw(base_size = 15) + theme(legend.position = "none") plt3 <- df %>% ggplot(aes(x = Sepal.Length, y = Sepal.Width, color = factor(Species))) + geom_point(size = 4) + theme_bw(base_size = 15) + theme(legend.position = "none") plt4 <- df %>% ggplot(aes(x = Petal.Length, y = Sepal.Width, color = factor(Species))) + geom_point(size = 4) + theme_bw(base_size = 15) + theme(legend.position = "none") plt5 <- df %>% ggplot(aes(x = Petal.Width, y = Sepal.Width, color = factor(Species))) + geom_point(size = 4) + theme_bw(base_size = 15) + theme(legend.position = "none") plt6 <- df %>% ggplot(aes(x = Petal.Width, y = Sepal.Length, color = factor(Species))) + geom_point(size = 4) + theme_bw(base_size = 15) + theme(legend.position = "none") I also wanted to use the new patchwork package which makes displaying ggplot plots very easy. i.e the below code plots our graphics as its written (1 top plot stretching the length of the grid space, 2 middle plots, another single plot and a further 2 more plots at the bottom.)  (plt1) / (plt2 + plt3) Alternatively, we can re-arrange the plots into any way we wish and plot them in the following way:  (plt1 + plt2) / (plt5 + plt6) Which I think looks awesome. ## Objective The objective is to build a classification algorithm to distinguish between the two classes and then compute the decision boundaries in order to better see how the models made such predictions. In order to create the decision boundary plots for each variable combination we need the different combinatons of variables in the data. ################################################################################### ################################################################################### var_combos <- expand.grid(colnames(df[,1:4]), colnames(df[,1:4])) %>% filter(!Var1 == Var2) ################################################################################### ################################################################################### var_combos %>% head() %>% kable(caption = "Variable Combinations", escape = F, align = "c", digits = 2) %>% kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"), font_size = 9, fixed_thead = T, full_width = F) %>% scroll_box(width = "100%", height = "200px") Table 1: Variable Combinations Var1 Var2 Sepal.Width Sepal.Length Petal.Length Sepal.Length Petal.Width Sepal.Length Sepal.Length Sepal.Width Petal.Length Sepal.Width Petal.Width Sepal.Width Next, I want to use the different variable combinations previously and create lists (one for each variable combination) and populate these lists with synthetic data - or data from the minimum to maximum value of each variable combination. This will act as our synthetically created test data in which we make predictions on and build the decision boundary. It’s important to note that the plots will eventually be 2 dimensional for illustrative purposes, therefore we only train the Machine Learning models on two variables, but for each combination of the two variables, these variables are the first two variables in the boundary_lists data frames. boundary_lists <- map2( .x = var_combosVar1,
.y = var_combos$Var2, ~select(df, .x, .y) %>% summarise( minX = min(.[[1]], na.rm = TRUE), maxX = max(.[[1]], na.rm = TRUE), minY = min(.[[2]], na.rm = TRUE), maxY = max(.[[2]], na.rm = TRUE) ) ) %>% map(., ~tibble( x = seq(.x$minX, .x$maxX, length.out = 200), y = seq(.x$minY, .x$maxY, length.out = 200), ) ) %>% map(., ~tibble( xx = rep(.x$x, each = 200),
yy = rep(.x$y, time = 200) ) ) %>% map2(., asplit(var_combos, 1), ~ .x %>% set_names(.y)) ################################################################################### ################################################################################### We can see how the first 4 observations of the first and last two lists look like: boundary_lists %>% map(., ~head(., 4)) %>% head(2) ## [[1]] ## # A tibble: 4 x 2 ## Sepal.Width Sepal.Length ## ## 1 2 4.3 ## 2 2 4.31 ## 3 2 4.33 ## 4 2 4.34 ## ## [[2]] ## # A tibble: 4 x 2 ## Petal.Length Sepal.Length ## ## 1 1 4.3 ## 2 1 4.31 ## 3 1 4.33 ## 4 1 4.34 boundary_lists %>% map(., ~head(., 4)) %>% tail(2) ## [[1]] ## # A tibble: 4 x 2 ## Sepal.Width Petal.Width ## ## 1 2 0.1 ## 2 2 0.109 ## 3 2 0.117 ## 4 2 0.126 ## ## [[2]] ## # A tibble: 4 x 2 ## Petal.Length Petal.Width ## ## 1 1 0.1 ## 2 1 0.109 ## 3 1 0.117 ## 4 1 0.126 ## Training time: Now that we have the synthetically created testing data set up, I want to train the models on the actual observed observations. I use each data point in the plots above as my training data. I apply the following models: • Logistic Model • Support Vector Machine with a linear kernel • Support Vector Machine with a polynomial kernel • Support Vector Machine with a radial kernel • Support Vector Machine with a sigmoid kernel • Random Forest • Extreme Gradiant Boosting (XGBoost) model with default parameters • Single layer Keras Neural Network (with linear components) • Deeper layer Keras Neural Network (with linear components) • Deeper’er layer Keras Neural Network (with linear components) • Light Gradient Boosting model (LightGBM) with default parameters Side note: I am not an expert in Deep learning/Keras/Tensorflow, so I am sure better models will yield better decision boundaries but it was a fun task getting the different Machine Learning models to fit inside a purrr, map call. ################################################################################### ################################################################################### # params_lightGBM <- list( # objective = "binary", # metric = "auc", # min_data = 1 # ) # To install Light GBM try the following in your RStudio terinal # git clone --recursive https://github.com/microsoft/LightGBM # cd LightGBM # Rscript build_r.R models_list <- var_combos %>% mutate(modeln = str_c('mod', row_number())) %>% pmap(~ { xname = ..1 yname = ..2 modelname = ..3 df %>% select(Species, xname, yname) %>% group_by(grp = 'grp') %>% nest() %>% mutate(models = map(data, ~{ list( # Logistic Model Model_GLM = { glm(Species ~ ., data = .x, family = binomial(link='logit')) }, # Support Vector Machine (linear) Model_SVM_Linear = { e1071::svm(Species ~ ., data = .x, type = 'C-classification', kernel = 'linear') }, # Support Vector Machine (polynomial) Model_SVM_Polynomial = { e1071::svm(Species ~ ., data = .x, type = 'C-classification', kernel = 'polynomial') }, # Support Vector Machine (sigmoid) Model_SVM_radial = { e1071::svm(Species ~ ., data = .x, type = 'C-classification', kernel = 'sigmoid') }, # Support Vector Machine (radial) Model_SVM_radial_Sigmoid = { e1071::svm(Species ~ ., data = .x, type = 'C-classification', kernel = 'radial') }, # Random Forest Model_RF = { randomForest::randomForest(formula = as.factor(Species) ~ ., data = .) }, # Extreme Gradient Boosting Model_XGB = { xgboost( objective = 'binary:logistic', eval_metric = 'auc', data = as.matrix(.x[, 2:3]), label = as.matrix(.x$Species), # binary variable
nrounds = 10)
},
# Kera Neural Network
Model_Keras = {
mod <- keras_model_sequential() %>%
layer_dense(units = 2, activation = 'relu', input_shape = 2) %>%
layer_dense(units = 2, activation = 'sigmoid')

mod %>% compile(
loss = 'binary_crossentropy',
optimizer_sgd(lr = 0.01, momentum = 0.9),
metrics = c('accuracy')
)
fit(mod,
x = as.matrix(.x[, 2:3]),
y = to_categorical(.x$Species, 2), epochs = 5, batch_size = 5, validation_split = 0 ) print(modelname) assign(modelname, mod) }, # Kera Neural Network Model_Keras_2 = { mod <- keras_model_sequential() %>% layer_dense(units = 2, activation = 'relu', input_shape = 2) %>% layer_dense(units = 2, activation = 'linear', input_shape = 2) %>% layer_dense(units = 2, activation = 'sigmoid') mod %>% compile( loss = 'binary_crossentropy', optimizer_sgd(lr = 0.01, momentum = 0.9), metrics = c('accuracy') ) fit(mod, x = as.matrix(.x[, 2:3]), y = to_categorical(.x$Species, 2),
epochs = 5,
batch_size = 5,
validation_split = 0
)
print(modelname)
assign(modelname, mod)

},
# Kera Neural Network
Model_Keras_3 = {
mod <- keras_model_sequential() %>%
layer_dense(units = 2, activation = 'relu', input_shape = 2) %>%
layer_dense(units = 2, activation = 'relu', input_shape = 2) %>%
layer_dense(units = 2, activation = 'linear', input_shape = 2) %>%
layer_dense(units = 2, activation = 'sigmoid')

mod %>% compile(
loss = 'binary_crossentropy',
optimizer_sgd(lr = 0.01, momentum = 0.9),
metrics = c('accuracy')
)
fit(mod,
x = as.matrix(.x[, 2:3]),
y = to_categorical(.x$Species, 2), epochs = 5, batch_size = 5, validation_split = 0 ) print(modelname) assign(modelname, mod) }, # LightGBM model Model_LightGBM = { lgb.train( data = lgb.Dataset(data = as.matrix(.x[, 2:3]), label = .x$Species),
objective = 'binary',
metric = 'auc',
min_data = 1
#params = params_lightGBM,
#learning_rate = 0.1
)
}

)

}
))

}) %>%
map(
., ~unlist(., recursive = FALSE)
)

## Testing time:

Now that the models have been trained, we can begin makign the predictions on the synthetically created data we created in the boundary_lists data.

models_predict <- map2(models_list, boundary_lists, ~{
mods <- purrr::pluck(.x, "models")
dat <- .y
map(mods, function(x)
tryCatch({
if(attr(x, "class")[1] == "glm"){
# predict the logistic model
tibble(
modelname = attr(x, "class")[1],
prediction = predict(x, newdata = dat)
) %>%
mutate(
prediction = logit2prob(prediction),
prediction = case_when(
prediction > 0.5 ~ 1,
TRUE ~ 0
)
)
}
else if(attr(x, "class")[1] == "svm.formula"){
# predict the SVM model
tibble(
modelname = attr(x, "class")[1],
prediction = as.numeric(as.character(predict(x, newdata = dat)))
)
}
else if(attr(x, "class")[1] == "randomForest.formula"){
# predict the RF model
tibble(
modelname = attr(x, "class")[1],
prediction = as.numeric(as.character(predict(x, newdata = dat)))
)
}
else if(attr(x, "class")[1] == "xgb.Booster"){
# predict the XGBoost model
tibble(
modelname = attr(x, "class")[1],
prediction = predict(x, newdata = as.matrix(dat), type = 'prob')
) %>%
mutate(
prediction = case_when(
prediction > 0.5 ~ 1,
TRUE ~ 0
)
)
}
else if(attr(x, "class")[1] == "keras.engine.sequential.Sequential"){
# Keras Single Layer Neural Network
tibble(
modelname = attr(x, "class")[1],
prediction = predict_classes(object = x, x = as.matrix(dat))
)
}
else if(attr(x, "class")[2] == "keras.engine.training.Model"){
# NOTE:::: This is a very crude hack to have multiple keras NN models
# Needs fixing such that the models are named better - (not like) - (..., "class")[2], ..., "class")[3]... and so on.
# Keras Single Layer Neural Network
tibble(
modelname = attr(x, "class")[2], # needs changing also.
prediction = predict_classes(object = x, x = as.matrix(dat))
)
}
else if(attr(x, "class")[1] == "lgb.Booster"){
# Light GBM model
tibble(
modelname = attr(x, "class")[1],
prediction = predict(object = x, data = as.matrix(dat), rawscore = FALSE)
) %>%
mutate(
prediction = case_when(
prediction > 0.5 ~ 1,
TRUE ~ 0
)
)
}
}, error = function(e){
print('skipping\n')
}
)
)
}
) %>%
map(.,
~setNames(.,
map(.,
~c(.x$modelname[1] ) ) ) ) %>% map(., ~map(., ~setNames(., c( paste0(.x$modelname[1], "_Model"),
paste0(.x$modelname[1], "_Prediction") ) ) ) ) ## Calibrating the data Now we have our trained models, along with predictions we can map these predictions into data which we can plot using ggplot and then arrange using patchwork!. plot_data <- map2( .x = boundary_lists, .y = map( models_predict, ~map(., ~tibble(.) ) ), ~bind_cols(.x, .y) ) names(plot_data) <- map_chr( plot_data, ~c( paste( colnames(.)[1], "and", colnames(.)[2], sep = "_") ) ) Now that we have our predictions we can create the ggplots. ggplot_lists <- plot_data %>% map( ., ~select( ., -contains("Model") ) %>% pivot_longer(cols = contains("Prediction"), names_to = "Model", values_to = "Prediction") ) %>% map( .x = ., ~ggplot() + geom_point(aes( x = !!rlang::sym(colnames(.x)[1]), y = !!rlang::sym(colnames(.x)[2]), color = factor(!!rlang::sym(colnames(.x)[4])) ), data = .x) + geom_contour(aes( x = !!rlang::sym(colnames(.x)[1]), y = !!rlang::sym(colnames(.x)[2]), z = !!rlang::sym(colnames(.x)[4]) ), data = .x) + geom_point(aes( x = !!rlang::sym(colnames(.x)[1]), y = !!rlang::sym(colnames(.x)[2]), color = factor(!!rlang::sym(colnames(df)[5])) # this is the status variable ), size = 8, data = df) + geom_point(aes( x = !!rlang::sym(colnames(.x)[1]), y = !!rlang::sym(colnames(.x)[2]) ), size = 8, shape = 1, data = df) + facet_wrap(~Model) + theme_bw(base_size = 25) + theme(legend.position = "none") ) Plot all the different combinations of the decision boundaries. Note: The above code will work better in your console, when I ran the code to compile the blog post the plots were too small. Therefore, I provide individual plots for a sample of the models & variable combinations. I first needed to select the first two columns which are the variables of interest (Petal.Width, Petal.Length, Sepal.Width and Sepal.Length). Then I wanted to take a random sample of the columns thereafter (which are the different Machine Learning Model predictions). plot_data_sampled <- plot_data %>% map( ., ~select( ., -contains("Model") ) %>% select(., c(1:2), sample(colnames(.), 2) ) %>% pivot_longer( cols = contains("Prediction"), names_to = "Model", values_to = "Prediction") )  Next I can make the plots by taking a random sample of the lists. plot_data_sampled %>% rlist::list.sample(1) %>% map( .x = ., ~ggplot() + geom_point(aes( x = !!rlang::sym(colnames(.x)[1]), y = !!rlang::sym(colnames(.x)[2]), color = factor(!!rlang::sym(colnames(.x)[4])) ), data = .x) + geom_contour(aes( x = !!rlang::sym(colnames(.x)[1]), y = !!rlang::sym(colnames(.x)[2]), z = !!rlang::sym(colnames(.x)[4]) ), data = .x) + geom_point(aes( x = !!rlang::sym(colnames(.x)[1]), y = !!rlang::sym(colnames(.x)[2]), color = factor(!!rlang::sym(colnames(df)[5])) # this is the status variable ), size = 3, data = df) + geom_point(aes( x = !!rlang::sym(colnames(.x)[1]), y = !!rlang::sym(colnames(.x)[2]) ), size = 3, shape = 1, data = df) + facet_wrap(~Model) + #coord_flip() + theme_tq(base_family = "serif") + theme( #aspect.ratio = 1, axis.line.y = element_blank(), axis.ticks.y = element_blank(), legend.position = "bottom", #legend.title = element_text(size = 20), #legend.text = element_text(size = 10), axis.title = element_text(size = 20), axis.text = element_text(size = "15"), strip.text.x = element_text(size = 15), plot.title = element_text(size = 30, hjust = 0.5), strip.background = element_rect(fill = 'darkred'), panel.background = element_blank(), panel.grid.major = element_blank(), panel.grid.minor = element_blank(), #axis.text.x = element_text(angle = 90), axis.text.y = element_text(angle = 90, hjust = 0.5), #axis.title.x = element_blank() legend.title = element_blank(), legend.text = element_text(size = 20) ) ) ##$Sepal.Width_and_Petal.Length
## Warning: Row indexes must be between 0 and the number of rows (0). Use NA as row index to obtain a row full of NA values.
## This warning is displayed once per session.

Some other random models:

## $Sepal.Width_and_Sepal.Length ##$Sepal.Width_and_Sepal.Length

## $Petal.Length_and_Sepal.Length ##$Petal.Width_and_Petal.Length

## $Petal.Length_and_Petal.Width ## Warning in grDevices::contourLines(x = sort(unique(data$x)), y =
## sort(unique(data\$y)), : todos los valores de z son iguales
## Warning: Not possible to generate contour data

Natually the linear models made a linear decision boundary. It looks like the random forest model overfit a little the data, where as the XGBoost and LightGBM models were able to make better, more generalisable decision boundaries. The Keras Neural Networks performed poorly because they should be trained better.

• glm = Logistic Model
• keras.engine.sequential.Sequential Prediction...18 = Single layer Neural Network
• keras.engine.sequential.Sequential Prediction...18 = Deeper layer Neural Network
• keras.engine.sequential.Sequential Prediction...22 = Deeper’er layer Neural Network
• lgb.Booster Prediction = Light Gradient Boosted Model
• randomForest.formula Prediction = Random Forest Model
• svm.formula Prediction...10 = Support Vector Machine with a Sigmoid Kernel
• svm.formula Prediction...12 = Support Vector Machine with a Radial Kernel
• svm.formula Prediction...6 = Support Vector Machine with a Linear Kernel
• svm.formula Prediction...8 = Support Vector Machine with a Polynomial Kernel
• xgb.Booster Prediction = Extreme Gradient Boosting Model

In many of the combinations the Keras Neural Network model just predicted all observations to be of a specific class (again by my poor tuning of the models and the fact that the Neural Networks only had 100 observations to learn from and 40,000 observation to predict on). That is, it coloured the whole background blue or red and made many mis-classifications. In some of the plots the Neural Networks managed to mae perfect classifications, in other it made strange decision boundaries. - Neural Networks are fun.

As some brief analysis of the plots, it looks like the simple logistic model made near-perfect classifications. Which isn’t suprising given that each of the variable ralationships are linearly seperable. However, I have a preferece for XGBoost and LightGBM models since they can handle non-linear relationships through the incorporation of regularisation in its objective functions which allows them to make more robust decision boundaries. Random Forest models fail here which is why their decision boundary appears to do a good job but is also slightly erratic and sharpe in it’s decision boundaries.

Of course it goes without saying that these decision boundaries can become significantly more complex and non-linear with the inclusion of more variables and higher dimensions.

for(i in 1:length(plot_data)){
print(ggplot_lists[[i]])
}

## End note:

I wrote this model on an Amazon Ubuntu EC2 Instance however, when I went to compile the blog post in R on my Windows system I ran into some problems. These problems were mostly down to installing the lightgbm package and package versions. The code was working without error using the following package versions (i.e. using the most up-to-date package versions)

sessionInfo()
## R version 3.6.1 (2019-07-05)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 17763)
##
## Matrix products: default
##
## locale:
## [1] LC_COLLATE=Spanish_Spain.1252  LC_CTYPE=Spanish_Spain.1252
## [3] LC_MONETARY=Spanish_Spain.1252 LC_NUMERIC=C
## [5] LC_TIME=Spanish_Spain.1252
##
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base
##
## other attached packages:
##  [1] tidyquant_0.5.7            quantmod_0.4-15
##  [3] TTR_0.23-6                 PerformanceAnalytics_1.5.3
##  [5] xts_0.11-2                 zoo_1.8-6
##  [7] lubridate_1.7.4            keras_2.2.5.0
##  [9] lightgbm_2.3.2             R6_2.4.1
## [11] xgboost_0.90.0.1           tidyr_1.0.0
## [13] stringr_1.4.0              purrr_0.3.2
## [15] kableExtra_1.1.0.9000      knitr_1.25.4
## [17] ggplot2_3.2.1              patchwork_1.0.0
## [19] dplyr_0.8.99.9000
##
## loaded via a namespace (and not attached):
##  [1] Rcpp_1.0.3           lattice_0.20-38      class_7.3-15
##  [4] utf8_1.1.4           assertthat_0.2.1     zeallot_0.1.0
##  [7] digest_0.6.24        e1071_1.7-2          evaluate_0.14
## [10] httr_1.4.1           blogdown_0.15        pillar_1.4.3.9000
## [13] tfruns_1.4           rlang_0.4.4          lazyeval_0.2.2
## [16] curl_4.0             rstudioapi_0.10      data.table_1.12.8
## [19] whisker_0.3-2        Matrix_1.2-17        reticulate_1.14-9001
## [22] rmarkdown_1.14       lobstr_1.1.1         labeling_0.3
## [25] webshot_0.5.1        readr_1.3.1          munsell_0.5.0
## [28] compiler_3.6.1       xfun_0.8             pkgconfig_2.0.3
## [31] base64enc_0.1-3      tensorflow_2.0.0     htmltools_0.3.6
## [34] tidyselect_1.0.0     tibble_2.99.99.9014  bookdown_0.13
## [37] quadprog_1.5-7       randomForest_4.6-14  fansi_0.4.1
## [40] viridisLite_0.3.0    crayon_1.3.4         withr_2.1.2
## [43] rappdirs_0.3.1       grid_3.6.1           Quandl_2.10.0
## [46] jsonlite_1.6.1       gtable_0.3.0         lifecycle_0.1.0
## [49] magrittr_1.5         scales_1.0.0         rlist_0.4.6.1
## [52] cli_2.0.1            stringi_1.4.3        xml2_1.2.2
## [55] ellipsis_0.3.0       generics_0.0.2       vctrs_0.2.99.9005
## [58] tools_3.6.1          glue_1.3.1           hms_0.5.1
## [61] yaml_2.2.0           colorspace_1.4-1     rvest_0.3.4

To leave a comment for the author, please follow the link and comment on their blog: Posts on Matthew Smith R Shenanigans.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

# Never miss an update! Subscribe to R-bloggers to receive e-mails with the latest R posts.(You will not see this message again.)

Click here to close (This popup will not appear again)