Predicting Player Positions of FIFA 18 Players

November 23, 2017
By

(This article was first published on schochastics, and kindly contributed to R-bloggers)

In this post, I will use the results of the exploratory analysis from the previous post and try
to predict the position of players in FIFA 18 using different machine learning algorithms.

As a quick reminder, these were the figures we obtained using PCA, t-SNE and a self organizing map.

#used packages
library(tidyverse)  # for data wrangling
library(hrbrthemes) # nice themes for ggplot
library(caret) # ML algorithms

Data

The data we use comes from Kaggle
and contains around 18,000 players of the game FIFA 18
with 75 features per player. Check the
previous post.
for the data preparation steps.

After the preprocessing, the data has the following structure.

glimpse(fifa_tbl)
## Observations: 17,981
## Variables: 36
## $ Acceleration          89, 92, 94, 88, 58, 79, 57, 93, 60, 78, 7...
## $ Aggression            63, 48, 56, 78, 29, 80, 38, 54, 60, 50, 8...
## $ Agility               89, 90, 96, 86, 52, 78, 60, 93, 71, 75, 7...
## $ Balance               63, 95, 82, 60, 35, 80, 43, 91, 69, 69, 6...
## $ `Ball control`        93, 95, 95, 91, 48, 89, 42, 92, 89, 85, 8...
## $ Composure             95, 96, 92, 83, 70, 87, 64, 87, 85, 86, 8...
## $ Crossing              85, 77, 75, 77, 15, 62, 17, 80, 85, 68, 6...
## $ Curve                 81, 89, 81, 86, 14, 77, 21, 82, 85, 74, 7...
## $ Dribbling             91, 97, 96, 86, 30, 85, 18, 93, 79, 84, 6...
## $ Finishing             94, 95, 89, 94, 13, 91, 13, 83, 76, 91, 6...
## $ `Free kick accuracy`  76, 90, 84, 84, 11, 84, 19, 79, 84, 62, 6...
## $ `GK diving`           7, 6, 9, 27, 91, 15, 90, 11, 10, 5, 11, 1...
## $ `GK handling`         11, 11, 9, 25, 90, 6, 85, 12, 11, 12, 8, ...
## $ `GK kicking`          15, 15, 15, 31, 95, 12, 87, 6, 13, 7, 9, ...
## $ `GK positioning`      14, 14, 15, 33, 91, 8, 86, 8, 7, 5, 7, 10...
## $ `GK reflexes`         11, 8, 11, 37, 89, 10, 90, 8, 10, 10, 11,...
## $ `Heading accuracy`    88, 71, 62, 77, 25, 85, 21, 57, 54, 86, 9...
## $ Interceptions         29, 22, 36, 41, 30, 39, 30, 41, 85, 20, 8...
## $ Jumping               95, 68, 61, 69, 78, 84, 67, 59, 32, 79, 9...
## $ `Long passing`        77, 87, 75, 64, 59, 65, 51, 81, 93, 59, 7...
## $ `Long shots`          92, 88, 77, 86, 16, 83, 12, 82, 90, 82, 5...
## $ Marking               22, 13, 21, 30, 10, 25, 13, 25, 63, 12, 8...
## $ Penalties             85, 74, 81, 85, 47, 81, 40, 86, 73, 70, 6...
## $ Positioning           95, 93, 90, 92, 12, 91, 12, 85, 79, 92, 5...
## $ Reactions             96, 95, 88, 93, 85, 91, 88, 85, 86, 88, 8...
## $ `Short passing`       83, 88, 81, 83, 55, 83, 50, 86, 90, 75, 7...
## $ `Shot power`          94, 85, 80, 87, 25, 88, 31, 79, 87, 88, 7...
## $ `Sliding tackle`      23, 26, 33, 38, 11, 19, 13, 22, 69, 18, 9...
## $ `Sprint speed`        91, 87, 90, 77, 61, 83, 58, 87, 52, 80, 7...
## $ Stamina               92, 73, 78, 89, 44, 79, 40, 79, 77, 72, 8...
## $ `Standing tackle`     31, 28, 24, 45, 10, 42, 21, 27, 82, 22, 8...
## $ Strength              80, 59, 53, 80, 83, 84, 64, 65, 74, 85, 8...
## $ Vision                85, 90, 80, 84, 70, 78, 68, 86, 88, 70, 6...
## $ Volleys               88, 85, 83, 88, 11, 87, 13, 79, 82, 88, 6...
## $ position              ST, RW, LW, ST, GK, ST, GK, LW, CDM, ST,...
## $ position2             O, O, O, O, GK, O, GK, O, M, O, D, M, GK...

Machine Learning in R and the Caret Package

If you have looked into machine learning in R or own a copy of this
book, you may have noticed that there are a gazillion packages for it. This can be quite
overwhelming as a start, also given that most packages use different syntax and data structures.
Fortunately, there is the comprehensive caret package that brings together many
of these packages in a standardized way. At the time of writing this post, the package contained 238(!!!) implementations of machine learning algorithms. Knowing this package is thus enough
to solve almost any supervised problem. It comes with a uniform interface to several
machine learning algorithms and standardizes tasks such as

  • Data splitting
  • Data pre-processing
  • Feature selection
  • Variable importance estimation

There is an extensive manual for it and
I can only recommend to take a look at it. I also found this
introduction quite helpful.

Creating Training and Test Sets

Once the data is prepared for an analysis, you should split it into a
training and test set.
As far as I understand, there is no golden standard
on the ratio. I have seen 70/30, 75/25 and 90/10 as training/test split. I will go
with the Machine Learning with R book and take an 80/20 split. You may also want to
consider this stackoverflow question.

Splitting the data can be done with the createDataPartition() function.
The function tries to balance out the class distribution so that they are similar in
both the training and test set. The parameter p controls the split ratio.

set.seed(6185)

train_sample <- createDataPartition(fifa_tbl$position2,p = 0.8,list = FALSE)
train_data   <- fifa_tbl[train_sample,] %>% select(-position) 
test_data    <- fifa_tbl[-train_sample,] %>% select(-position) 
position GK D M O
training set 0.11 0.30 0.40 0.19
test set 0.11 0.3 0.4 0.19

Notice how evenly the data is split.

Training a Model

So now we are good to go for some machine learning mastery. As I mentioned before,
the caret package implements more than 200 methods making it hard to choose the “right” one.
I settled for four rather common methods listed below, together with some helpful
links.

  • K-Nearest Neighbors (Wiki,
    1,
    2)
  • Random Forest (Wiki,
    1)
  • Support Vector Machine (Wiki,
    1,
    2)
  • Neural Network (Wiki,
    1)

I will not spend any time on the details of these algorithms, since others did a much
greater job in introducing them.

Before any of these algorithms is trained, we have to set some basic control parameters
for the training step. This is done with the function trainControl(). There are
many different arguments that can be set, where the following are the most important for
our task1.

  • method defines the resampling method during training. We here use a repeated k-fold cross validation.
  • number sets the “k” in k-fold. We will use the standard 10 here.
  • repeats defines how often the cross validation should be repeated (3 times).

The train() function is used to train all our models. Most of its arguments are
self-explanatory, the once that are not are listed below.

  • preProcess is used to scale and center the data.
  • tuneGrid defines a grid of hyperparameters to be used in the training. At the end of
    the cross-validation, the best set of hyperparameters is chosen for the final model.2

After the training, we can check the performance on the training set with the function
predict(). The resulting performance can then be viewed with the function confusionMatrix().

K-Nearest Neighbors

The only hyperparameter of this model is k, the number of considered neighbors.

train_control <- trainControl(method = "repeatedcv", number = 10, repeats = 3)
grid_knn <- expand.grid(.k=seq(20,120,5))
fifa_knn <- train(position2~., data=train_data, method = "knn",
                   trControl = train_control,preProcess = c("center","scale"),
                   tuneGrid = grid_knn)

The summary of the training is given below.

fifa_knn
## k-Nearest Neighbors 
## 
## 14387 samples
##    34 predictor
##     4 classes: 'GK', 'D', 'M', 'O' 
## 
## Pre-processing: centered (33), scaled (33) 
## Resampling: Cross-Validated (10 fold, repeated 3 times) 
## Summary of sample sizes: 12948, 12947, 12948, 12949, 12948, 12949, ... 
## Resampling results across tuning parameters:
## 
##   k    Accuracy   Kappa    
##    20  0.8199772  0.7415707
##    25  0.8222250  0.7447378
##    30  0.8220862  0.7443668
##    35  0.8236380  0.7465378
##    40  0.8229891  0.7455043
##    45  0.8222942  0.7444655
##    50  0.8216221  0.7434531
##    55  0.8216686  0.7434513
##    60  0.8222477  0.7442770
##    65  0.8225490  0.7446787
##    70  0.8222019  0.7441699
##    75  0.8215065  0.7430829
##    80  0.8214832  0.7430529
##    85  0.8205563  0.7416563
##    90  0.8206723  0.7418073
##    95  0.8206489  0.7417507
##   100  0.8201158  0.7409817
##   105  0.8192590  0.7397423
##   110  0.8191429  0.7395715
##   115  0.8195599  0.7401216
##   120  0.8193513  0.7398058
## 
## Accuracy was used to select the optimal model using  the largest value.
## The final value used for the model was k = 35.

The highest accuracy is achieved for k=35, which can also be seen by plotting the
result.

plot(fifa_knn)

fifa_knn_predict <- predict(fifa_knn,newdata = test_data)
confusionMatrix(fifa_knn_predict,test_data$position2)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction   GK    D    M    O
##         GK  405    0    0    0
##         D     0  920  152    7
##         M     0  167 1168  207
##         O     0    1  114  453
## 
## Overall Statistics
##                                           
##                Accuracy : 0.8197          
##                  95% CI : (0.8067, 0.8321)
##     No Information Rate : 0.399           
##     P-Value [Acc > NIR] : < 2.2e-16       
##                                           
##                   Kappa : 0.7409          
##  Mcnemar's Test P-Value : NA              
## 
## Statistics by Class:
## 
##                      Class: GK Class: D Class: M Class: O
## Sensitivity             1.0000   0.8456   0.8145   0.6792
## Specificity             1.0000   0.9366   0.8269   0.9607
## Pos Pred Value          1.0000   0.8526   0.7575   0.7975
## Neg Pred Value          1.0000   0.9332   0.8704   0.9293
## Prevalence              0.1127   0.3027   0.3990   0.1856
## Detection Rate          0.1127   0.2560   0.3250   0.1260
## Detection Prevalence    0.1127   0.3002   0.4290   0.1580
## Balanced Accuracy       1.0000   0.8911   0.8207   0.8199

Random Forest

For the random forest, we have to decide on how many features should be used for each
split in the decision try. A rule of thumb is to used the square root of the
number of features. After some trial-and-error, I settled with 4, 8, 12, 16 and 20.

train_control <- trainControl(method = "repeatedcv",number = 10, repeats = 3)
grid_rf <- expand.grid(.mtry=c(4,8,12,16,20))
fifa_rf <- train(position2~., data=train_data, method = "rf",
                 trControl = train_control,preProcess = c("center","scale"),
                 tuneGrid = grid_rf)
fifa_rf
## Random Forest 
## 
## 14387 samples
##    34 predictor
##     4 classes: 'GK', 'D', 'M', 'O' 
## 
## Pre-processing: centered (33), scaled (33) 
## Resampling: Cross-Validated (10 fold, repeated 3 times) 
## Summary of sample sizes: 12949, 12948, 12949, 12949, 12948, 12947, ... 
## Resampling results across tuning parameters:
## 
##   mtry  Accuracy   Kappa    
##    4    0.8367979  0.7664186
##    8    0.8372147  0.7671352
##   12    0.8385584  0.7691039
##   16    0.8386507  0.7692242
##   20    0.8382106  0.7686364
## 
## Accuracy was used to select the optimal model using  the largest value.
## The final value used for the model was mtry = 16.
plot(fifa_rf)

fifa_rf_predict <- predict(fifa_rf,newdata = test_data)
confusionMatrix(fifa_rf_predict,test_data$position2)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction   GK    D    M    O
##         GK  405    0    0    0
##         D     0  957  162    7
##         M     0  129 1149  186
##         O     0    2  123  474
## 
## Overall Statistics
##                                           
##                Accuracy : 0.8306          
##                  95% CI : (0.8179, 0.8427)
##     No Information Rate : 0.399           
##     P-Value [Acc > NIR] : < 2.2e-16       
##                                           
##                   Kappa : 0.7576          
##  Mcnemar's Test P-Value : NA              
## 
## Statistics by Class:
## 
##                      Class: GK Class: D Class: M Class: O
## Sensitivity             1.0000   0.8796   0.8013   0.7106
## Specificity             1.0000   0.9326   0.8542   0.9573
## Pos Pred Value          1.0000   0.8499   0.7848   0.7913
## Neg Pred Value          1.0000   0.9469   0.8662   0.9356
## Prevalence              0.1127   0.3027   0.3990   0.1856
## Detection Rate          0.1127   0.2663   0.3197   0.1319
## Detection Prevalence    0.1127   0.3133   0.4073   0.1667
## Balanced Accuracy       1.0000   0.9061   0.8277   0.8340

Support Vector Machines

We use a linear SVM here. If you are interested in non-linear ones, you should
familiarize yourself with the Kernel trick.
The hyperparameter to be set is a cost value, which determines the margin around the
separating hyperplanes (the bigger the value, the smaller the margin).

train_control <- trainControl(method = "repeatedcv", number = 10, repeats = 3)
grid_svm <- expand.grid(.cost=c(0.75, 0.9, 1, 1.1, 1.25,2))
fifa_svm_linear <- train(position2 ~., data = train_data, method = "svmLinear2",
                    trControl=train_control,
                    preProcess = c("center", "scale"),
                    tuneGrid = grid_svm)
fifa_svm_linear  
## Support Vector Machines with Linear Kernel 
## 
## 14387 samples
##    34 predictor
##     4 classes: 'GK', 'D', 'M', 'O' 
## 
## Pre-processing: centered (33), scaled (33) 
## Resampling: Cross-Validated (10 fold, repeated 3 times) 
## Summary of sample sizes: 12949, 12949, 12949, 12949, 12947, 12949, ... 
## Resampling results across tuning parameters:
## 
##   cost  Accuracy   Kappa    
##   0.75  0.8333450  0.7614291
##   0.90  0.8333682  0.7614680
##   1.00  0.8334145  0.7615323
##   1.10  0.8333682  0.7614632
##   1.25  0.8332987  0.7613696
##   2.00  0.8330671  0.7610370
## 
## Accuracy was used to select the optimal model using  the largest value.
## The final value used for the model was cost = 1.
plot(fifa_svm_linear)  

fifa_svm_linear_predict <- predict(fifa_svm_linear,newdata = test_data)
confusionMatrix(fifa_svm_linear_predict,test_data$position2) 
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction   GK    D    M    O
##         GK  405    0    0    0
##         D     0  945  153    8
##         M     0  140 1164  194
##         O     0    3  117  465
## 
## Overall Statistics
##                                           
##                Accuracy : 0.8289          
##                  95% CI : (0.8162, 0.8411)
##     No Information Rate : 0.399           
##     P-Value [Acc > NIR] : < 2.2e-16       
##                                           
##                   Kappa : 0.7547          
##  Mcnemar's Test P-Value : NA              
## 
## Statistics by Class:
## 
##                      Class: GK Class: D Class: M Class: O
## Sensitivity             1.0000   0.8686   0.8117   0.6972
## Specificity             1.0000   0.9358   0.8454   0.9590
## Pos Pred Value          1.0000   0.8544   0.7770   0.7949
## Neg Pred Value          1.0000   0.9425   0.8712   0.9329
## Prevalence              0.1127   0.3027   0.3990   0.1856
## Detection Rate          0.1127   0.2629   0.3239   0.1294
## Detection Prevalence    0.1127   0.3077   0.4168   0.1628
## Balanced Accuracy       1.0000   0.9022   0.8285   0.8281

Neural Network

To train a neural network, we need to decide on the number of hidden nodes. The more
you choose, the more complex the model (but not necessarily better!).

train_control <- trainControl(method = "repeatedcv",number = 10, repeats = 3)
grid_nnet <- expand.grid(.decay=c(0.2,0.5,0.7),.size=c(1,5,10,15))
fifa_nnet <- train(position2~., data=train_data, method = "nnet",
                    trControl = train_control, preProcess = c("center","scale"),
                    tuneGrid = grid_nnet, maxit = 500, abstol=1e-2)
fifa_nnet
## Neural Network 
## 
## 14387 samples
##    34 predictor
##     4 classes: 'GK', 'D', 'M', 'O' 
## 
## Pre-processing: centered (33), scaled (33) 
## Resampling: Cross-Validated (10 fold, repeated 3 times) 
## Summary of sample sizes: 12948, 12949, 12948, 12949, 12948, 12948, ... 
## Resampling results across tuning parameters:
## 
##   decay  size  Accuracy   Kappa    
##   0.00    1    0.7963667  0.7100156
##   0.00    5    0.8301466  0.7570811
##   0.00   10    0.8170800  0.7387677
##   0.00   15    0.8030628  0.7187341
##   0.00   20    0.7938647  0.7063276
##   0.01    1    0.7930771  0.7049725
##   0.01    5    0.8320470  0.7598107
##   0.01   10    0.8223396  0.7462697
##   0.01   15    0.8170799  0.7387687
##   0.01   20    0.8082991  0.7262713
##   0.10    1    0.7961579  0.7102102
##   0.10    5    0.8319543  0.7596776
##   0.10   10    0.8291743  0.7559585
##   0.10   15    0.8215055  0.7451117
##   0.10   20    0.8138834  0.7342370
##   0.20    1    0.7943741  0.7078687
##   0.20    5    0.8338773  0.7624642
##   0.20   10    0.8298462  0.7568273
##   0.20   15    0.8233824  0.7476862
##   0.20   20    0.8179835  0.7400740
##   0.50    1    0.7937018  0.7067714
##   0.50    5    0.8346653  0.7634174
##   0.50   10    0.8308658  0.7582532
##   0.50   15    0.8264866  0.7521366
##   0.50   20    0.8229418  0.7471506
## 
## Accuracy was used to select the optimal model using  the largest value.
## The final values used for the model were size = 5 and decay = 0.5.
plot(fifa_nnet)

fifa_nnet_predict <- predict(fifa_nnet,newdata = test_data)
confusionMatrix(fifa_nnet_predict,test_data$position2)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction   GK    D    M    O
##         GK  405    0    0    0
##         D     0  967  162    9
##         M     0  119 1144  190
##         O     0    2  128  468
## 
## Overall Statistics
##                                           
##                Accuracy : 0.8303          
##                  95% CI : (0.8176, 0.8424)
##     No Information Rate : 0.399           
##     P-Value [Acc > NIR] : < 2.2e-16       
##                                           
##                   Kappa : 0.7573          
##  Mcnemar's Test P-Value : NA              
## 
## Statistics by Class:
## 
##                      Class: GK Class: D Class: M Class: O
## Sensitivity             1.0000   0.8888   0.7978   0.7016
## Specificity             1.0000   0.9318   0.8569   0.9556
## Pos Pred Value          1.0000   0.8497   0.7873   0.7826
## Neg Pred Value          1.0000   0.9507   0.8645   0.9336
## Prevalence              0.1127   0.3027   0.3990   0.1856
## Detection Rate          0.1127   0.2691   0.3183   0.1302
## Detection Prevalence    0.1127   0.3166   0.4043   0.1664
## Balanced Accuracy       1.0000   0.9103   0.8274   0.8286

Summary

#best training performances
train_performance <- t(rbind(knn = apply(fifa_knn$results[,2:3],2,max),
                             rf  = apply(fifa_rf$results[,2:3],2,max),
                             svm = apply(fifa_svm_linear$results[,2:3],2,max),
                             nnet= apply(fifa_nnet$results[,3:4],2,max))) 
predicted <-  tibble(knn = fifa_knn_predict,
                     rf  = fifa_rf_predict,
                     svm = fifa_svm_linear_predict,
                     nnet= fifa_nnet_predict)
 
#test performance
test_performance <- apply(predicted, 2, postResample, obs = test_data$position2) 

Best training performances

knn rf svm nnet
Accuracy 0.8236 0.8387 0.8334 0.8348
Kappa 0.7465 0.7692 0.7615 0.7637

Best test performances

knn rf svm nnet
Accuracy 0.8197 0.8306 0.8289 0.8303
Kappa 0.7409 0.7576 0.7547 0.7573

The random forest predicted the correct position of players with the highest accuracy.
But really only by a tiny margin. If this would be some kind of competition, this would
make a difference. But since we did this “just for fun”, even the knn results are
acceptable.

Overall, all models did a very good job in predicting positions. If you take a closer
look at the confusion matrices, you will notice that goalkeepers were always 100%
correctly classified. This comes with no real surprise, given that their skills are
rather distinct from other players. Also, the dimension reduction in the last post
was a dead giveaway that it will be easy to separate them from the rest.

Addendum

This post should not be taken as a serious introduction to the caret package. I have left out
many tuning possibilities or hyperparameter for each model. I tried to link as much
as possible to other sources that do a way better job in describing the functionality of the package.


  1. You may notice that I set the same trainingControl in each code chunk. This
    is certainly not necessary. I just realized to late that I did that and I did not want to
    re-train all the models (which took an awful long time).

  2. You can also use tuneLength and let the function decide on the hyperparameters to use.

To leave a comment for the author, please follow the link and comment on their blog: schochastics.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)