# Articles by schochastics

### Rdew Valley: Optimizing Farming with R

November 13, 2018 |

I recently picked up a copy of my favorite game Stardew Valley again. If you don’t know the game, I can highly recommend it! You inherit a pixel farm and you are in charge of everything. Crops, animals, fishing, mining and never forget to socialize. My plan was to ... [Read more...]

### Analyzing the Greatest Strikers in Football II: Visualizing Data

October 6, 2018 |

This is the second part of Analyzing the Greatest Strikers in Football. In the first part, we created the function get_goals() which allows us to conveniently scrape detailed information of players career goals from transfermarkt.co.uk. In this part, we are going to explore the data.
```library(tidyverse) # for data wrangling
library(lubridate) # for date formats
library(ggimage)   # adding images to ggplot
library(patchwork) # attaching ggplot objects
library(viridis)   # viridis color schemes```

### Analyzing the Greatest Strikers in Football I: Getting Data

October 4, 2018 |

I do not always come up with new ideas for my blog, but rather get inspired by the great work of others. In this case, it was a reddit post by u/Cheapo_Sam, who charted world footballs greatest goal scorers in a marvelous way. According to the post, the ... [Read more...]

### Six Degrees of Zlatan Ibrahimovic

September 27, 2018 |

This post is based on the Six Degrees of Kevin Bacon which itself is an adoption of the Erdős number in math. Readers familiar with the concepts can skip the following paragraph and go directly to the calculation of the Zlatan number. I have done this before on my ...

### Stress based graph layouts

September 12, 2018 |

I academically grew up among graph drawers, that is, computer scientists and mathematicians interested in deriving two-dimensional depictions of graphs. One may despicably call it pixel science, yet a lot of hard theoretical work is put into producing pretty graph layouts. Although I am not at all an expert in ... [Read more...]

### Fast Fiedler Vector Computation

June 23, 2018 |

This is a short post on how to quickly calculate the Fiedler vector for large graphs with the igraph package.
```#used libraries
library(igraph)    # for network data structures and tools
library(microbenchmark)    # for benchmark results```
Fiedler Vector with eigen My goto approach at the start was using the eigen() function to compute the whole spectrum of the Laplacian Matrix.
```g <- sample_gnp(n = 100,p = 0.1,directed = FALSE,loops = FALSE)
M <- laplacian_matrix(g,sparse = FALSE)
spec <- eigen(M)
comps <- sum(round(spec\$values,8)==0)
fiedler <- spec\$vectors[,comps-1]```
While this is easy ... [Read more...]

### Analyzing NBA Player Data III: Similarity Networks

March 9, 2018 |

This is the last part of the mini series Analysing NBA Player data. The first part was concerned with scraping and cleaning player statistics from any NBA season. The second part showed how to use principal component analysis and k means clustering to “revolutionize” player positions. Which kind of failed. ... [Read more...]

### Analyzing NBA Player Data II: Clustering Players

March 3, 2018 |

This is the second post of my little series Analyzing NBA player data. The first part was concerned with scraping and cleaning player statistics from any NBA season. This post is dealing with gaining some inside in the player stats. In particular, clustering players according to their stats to produce ... [Read more...]

### Analyzing NBA Player Data I: Getting Data

March 2, 2018 |

As a football (soccer) data enthusiast, I have always been jealous of the amount of available data for American sports. While much of the interesting football data is proprietary, you can can get virtually anything of interest for the NBA, MLB, NFL or NHL. I have decided to move away ... [Read more...]

### Using UMAP in R with rPython

February 13, 2018 |

I wrote about dimensionality reduction methods before and now, there seems to be a new rising star in that field, namely the Uniform Manifold Approximation and Projection, short UMAP. The paper can be found here, but be warned: It is really math-heavy. From the abstract: UMAP is constructed from a ... [Read more...]

### Sample Entropy with Rcpp

February 6, 2018 |

Entropy. I still shiver when I hear that word, since I never fully understood that concept. Today marks the first time I was kind of forced to look into it in more detail. And by “in detail”, I mean I found a StackOverflow question that had something to do with ... [Read more...]

### SOMs and ggplot

January 23, 2018 |

```#used packages
library(tidyverse)  # for data wrangling
library(stringr)    # for string manipulations
library(kohonen)    # implements self organizing maps
library(ggforce)    # for additional ggplot features```
I introduced self-organizing maps (SOM) in a previous post and since then I am using the kohonen package on a daily basis. However, I prefer the ggplot style plotting, so I reimplemented the SOM plots of the package with the ggplot2 package. But don’t get me wrong, the ... [Read more...]

### Traveling Beerdrinker Problem

January 18, 2018 |

Whenever I participate in a Science Slam, I try to work in an analysis of something typical for the respective city. My next gig will be in Munich, so there are two natural options: beer or football. In the end I choose both, but here I will focus on the ... [Read more...]

### A wild R package appears! Pokemon/Gameboy inspired plots in R

December 16, 2017 |

I have to comute quite long every day and I always try to keep occupied with little projects. One of my first projects was to increase my knowledge on how to create R packages. The result of it is Rokemon, a Pokemon/Game Boy inspired package. In this post, I ... [Read more...]

### Predicting Player Positions of FIFA 18 Players

November 23, 2017 |

In this post, I will use the results of the exploratory analysis from the previous post and try to predict the position of players in FIFA 18 using different machine learning algorithms. As a quick reminder, these were the figures we obtained using PCA, t-SNE and a self organizing map.
```#used packages
library(tidyverse)  # for data wrangling
library(hrbrthemes) # nice themes for ggplot
library(caret) # ML algorithms```

### Dimensionality Reduction Methods Using FIFA 18 Player Data

November 18, 2017 |

In this post, I will introduce three different methods for dimensionality reduction of large datasets.
```#used packages
library(tidyverse)  # for data wrangling
library(stringr)    # for string manipulations
library(ggbiplot)   # pca biplot with ggplot
library(Rtsne)      # implements the t-SNE algorithm
library(kohonen)    # implements self organizing maps
library(hrbrthemes) # nice themes for ggplot
library(GGally)     # to produce scatterplot matrices```
Data The data we use comes from Kaggle and contains around 18,000 players of the game FIFA 18 with 75 features per player.
`glimpse(fifa_tbl)`
```## Observations: 17,981
## Variables: 75
## \$ X1                    <int> 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12...
## \$ Name                  <chr> "Cristiano Ronaldo", "L. Messi", "Neymar...
## \$ Age                   <int> 32, 30, 25, 30, 31, 28, 26, 26, 27, 29, ...
## \$ Photo                 <chr> "https://cdn.sofifa.org/48/18/players/20...
## \$ Nationality           <chr> "Portugal", "Argentina", "Brazil", "Urug...
## \$ Flag                  <chr> "https://cdn.sofifa.org/flags/38.png", "...
## \$ Overall               <int> 94, 93, 92, 92, 92, 91, 90, 90, 90, 90, ...
## \$ Potential             <int> 94, 93, 94, 92, 92, 91, 92, 91, 90, 90, ...
## \$ Club                  <chr> "Real Madrid CF", "FC Barcelona", "Paris...
## \$ `Club Logo`           <chr> "https://cdn.sofifa.org/24/18/teams/243....
## \$ Value                 <chr> "€95.5M", "€105M", "€123M", "€97M", "€61...
## \$ Wage                  <chr> "€565K", "€565K", "€280K", "€510K", "€23...
## \$ Special               <int> 2228, 2154, 2100, 2291, 1493, 2143, 1458...
## \$ Acceleration          <int> 89, 92, 94, 88, 58, 79, 57, 93, 60, 78, ...
## \$ Aggression            <int> 63, 48, 56, 78, 29, 80, 38, 54, 60, 50, ...
## \$ Agility               <int> 89, 90, 96, 86, 52, 78, 60, 93, 71, 75, ...
## \$ Balance               <int> 63, 95, 82, 60, 35, 80, 43, 91, 69, 69, ...
## \$ `Ball control`        <int> 93, 95, 95, 91, 48, 89, 42, 92, 89, 85, ...
## \$ Composure             <int> 95, 96, 92, 83, 70, 87, 64, 87, 85, 86, ...
## \$ Crossing              <int> 85, 77, 75, 77, 15, 62, 17, 80, 85, 68, ...
## \$ Curve                 <int> 81, 89, 81, 86, 14, 77, 21, 82, 85, 74, ...
## \$ Dribbling             <int> 91, 97, 96, 86, 30, 85, 18, 93, 79, 84, ...
## \$ Finishing             <int> 94, 95, 89, 94, 13, 91, 13, 83, 76, 91, ...
## \$ `Free kick accuracy`  <int> 76, 90, 84, 84, 11, 84, 19, 79, 84, 62, ...
## \$ `GK diving`           <int> 7, 6, 9, 27, 91, 15, 90, 11, 10, 5, 11, ...
## \$ `GK handling`         <int> 11, 11, 9, 25, 90, 6, 85, 12, 11, 12, 8,...
## \$ `GK kicking`          <int> 15, 15, 15, 31, 95, 12, 87, 6, 13, 7, 9,...
## \$ `GK positioning`      <int> 14, 14, 15, 33, 91, 8, 86, 8, 7, 5, 7, 1...
## \$ `GK reflexes`         <int> 11, 8, 11, 37, 89, 10, 90, 8, 10, 10, 11...
## \$ `Heading accuracy`    <int> 88, 71, 62, 77, 25, 85, 21, 57, 54, 86, ...
## \$ Interceptions         <int> 29, 22, 36, 41, 30, 39, 30, 41, 85, 20, ...
## \$ Jumping               <int> 95, 68, 61, 69, 78, 84, 67, 59, 32, 79, ...
## \$ `Long passing`        <int> 77, 87, 75, 64, 59, 65, 51, 81, 93, 59, ...
## \$ `Long shots`          <int> 92, 88, 77, 86, 16, 83, 12, 82, 90, 82, ...
## \$ Marking               <int> 22, 13, 21, 30, 10, 25, 13, 25, 63, 12, ...
## \$ Penalties             <int> 85, 74, 81, 85, 47, 81, 40, 86, 73, 70, ...
## \$ Positioning           <int> 95, 93, 90, 92, 12, 91, 12, 85, 79, 92, ...
## \$ Reactions             <int> 96, 95, 88, 93, 85, 91, 88, 85, 86, 88, ...
## \$ `Short passing`       <int> 83, 88, 81, 83, 55, 83, 50, 86, 90, 75, ...
## \$ `Shot power`          <int> 94, 85, 80, 87, 25, 88, 31, 79, 87, 88, ...
## \$ `Sliding tackle`      <int> 23, 26, 33, 38, 11, 19, 13, 22, 69, 18, ...
## \$ `Sprint speed`        <int> 91, 87, 90, 77, 61, 83, 58, 87, 52, 80, ...
## \$ Stamina               <int> 92, 73, 78, 89, 44, 79, 40, 79, 77, 72, ...
## \$ `Standing tackle`     <int> 31, 28, 24, 45, 10, 42, 21, 27, 82, 22, ...
## \$ Strength              <int> 80, 59, 53, 80, 83, 84, 64, 65, 74, 85, ...
## \$ Vision                <int> 85, 90, 80, 84, 70, 78, 68, 86, 88, 70, ...
## \$ Volleys               <int> 88, 85, 83, 88, 11, 87, 13, 79, 82, 88, ...
## \$ CAM                   <dbl> 89, 92, 88, 87, NA, 84, NA, 88, 83, 81, ...
## \$ CB                    <dbl> 53, 45, 46, 58, NA, 57, NA, 47, 72, 46, ...
## \$ CDM                   <dbl> 62, 59, 59, 65, NA, 62, NA, 61, 82, 52, ...
## \$ CF                    <dbl> 91, 92, 88, 88, NA, 87, NA, 87, 81, 84, ...
## \$ CM                    <dbl> 82, 84, 79, 80, NA, 78, NA, 81, 87, 71, ...
## \$ ID                    <int> 20801, 158023, 190871, 176580, 167495, 1...
## \$ LAM                   <dbl> 89, 92, 88, 87, NA, 84, NA, 88, 83, 81, ...
## \$ LB                    <dbl> 61, 57, 59, 64, NA, 58, NA, 59, 76, 51, ...
## \$ LCB                   <dbl> 53, 45, 46, 58, NA, 57, NA, 47, 72, 46, ...
## \$ LCM                   <dbl> 82, 84, 79, 80, NA, 78, NA, 81, 87, 71, ...
## \$ LDM                   <dbl> 62, 59, 59, 65, NA, 62, NA, 61, 82, 52, ...
## \$ LF                    <dbl> 91, 92, 88, 88, NA, 87, NA, 87, 81, 84, ...
## \$ LM                    <dbl> 89, 90, 87, 85, NA, 82, NA, 87, 81, 79, ...
## \$ LS                    <dbl> 92, 88, 84, 88, NA, 88, NA, 82, 77, 87, ...
## \$ LW                    <dbl> 91, 91, 89, 87, NA, 84, NA, 88, 80, 82, ...
## \$ LWB                   <dbl> 66, 62, 64, 68, NA, 61, NA, 64, 78, 55, ...
## \$ `Preferred Positions` <chr> "ST LW", "RW", "LW", "ST", "GK", "ST", "...
## \$ RAM                   <dbl> 89, 92, 88, 87, NA, 84, NA, 88, 83, 81, ...
## \$ RB                    <dbl> 61, 57, 59, 64, NA, 58, NA, 59, 76, 51, ...
## \$ RCB                   <dbl> 53, 45, 46, 58, NA, 57, NA, 47, 72, 46, ...
## \$ RCM                   <dbl> 82, 84, 79, 80, NA, 78, NA, 81, 87, 71, ...
## \$ RDM                   <dbl> 62, 59, 59, 65, NA, 62, NA, 61, 82, 52, ...
## \$ RF                    <dbl> 91, 92, 88, 88, NA, 87, NA, 87, 81, 84, ...
## \$ RM                    <dbl> 89, 90, 87, 85, NA, 82, NA, 87, 81, 79, ...
## \$ RS                    <dbl> 92, 88, 84, 88, NA, 88, NA, 82, 77, 87, ...
## \$ RW                    <dbl> 91, 91, 89, 87, NA, 84, NA, 88, 80, 82, ...
## \$ RWB                   <dbl> 66, 62, 64, 68, NA, 61, NA, 64, 78, 55, ...
## \$ ST                    <dbl> 92, 88, 84, 88, NA, 88, NA, 82, 77, 87, ...```
In this post, we are only interested in the attributes and the ... [Read more...]
1 2