Blog Archives

Objects types and some useful R functions for beginners

December 23, 2018
By
Objects types and some useful R functions for beginners

This blog post is an excerpt of my ebook Modern R with the tidyverse that you can read for free here. This is taken from Chapter 2, which explains the different R objects you can manipulate as well as some functions to get you started. Objects, types and useful R functions to get started All objects in R have a given type. You...

Read more »

Using the tidyverse for more than data manipulation: estimating pi with Monte Carlo methods

December 20, 2018
By
Using the tidyverse for more than data manipulation: estimating pi with Monte Carlo methods

This blog post is an excerpt of my ebook Modern R with the tidyverse that you can read for free here. This is taken from Chapter 5, which presents the {tidyverse} packages and how to use them to compute descriptive statistics and manipulate data. In the text below, I show how you can use the {tidyverse} functions and principles for the estimation of...

Read more »

Manipulate dates easily with {lubridate}

December 14, 2018
By
Manipulate dates easily with {lubridate}

This blog post is an excerpt of my ebook Modern R with the tidyverse that you can read for free here. This is taken from Chapter 5, which presents the {tidyverse} packages and how to use them to compute descriptive statistics and manipulate data. In the text below, I scrape a table from Wikipedia, which shows when African countries gained independence from other...

Read more »

What hyper-parameters are, and what to do with them; an illustration with ridge regression

What hyper-parameters are, and what to do with them; an illustration with ridge regression

This blog post is an excerpt of my ebook Modern R with the tidyverse that you can read for free here. This is taken from Chapter 7, which deals with statistical models. In the text below, I explain what hyper-parameters are, and as an example I run a ridge regression using the {glmnet} package. The book is still being written, so comments are...

Read more »

A tutorial on tidy cross-validation with R

November 24, 2018
By
A tutorial on tidy cross-validation with R

Introduction This blog posts will use several packages from the {tidymodels} collection of packages, namely {recipes}, {rsample} and {parsnip} to train a random forest the tidy way. I will also use {mlrMBO} to tune the hyper-parameters of the random forest. Set up Let’s load the needed packages: library("tidyverse") library("tidymodels") library("parsnip") library("brotools") library("mlbench") Load the data, included in the {mlrbench} package: data("BostonHousing2") I will train a random forest to predict the housing price, which is the...

Read more »

The best way to visit Luxembourguish castles is doing data science + combinatorial optimization

November 20, 2018
By
The best way to visit Luxembourguish castles is doing data science + combinatorial optimization

Inspired by David Schoch’s blog post, Traveling Beerdrinker Problem. Check out his blog, he has some amazing posts! Introduction Luxembourg, as any proper European country, is full of castles. According to Wikipedia, “By some optimistic estimates, there are as many as 130 castles in Luxembourg but more realistically there are probably just over a hundred, although many of these could be considered large residences or manor...

Read more »

Using a genetic algorithm for the hyperparameter optimization of a SARIMA model

November 15, 2018
By
Using a genetic algorithm for the hyperparameter optimization of a SARIMA model

Introduction In this blog post, I’ll use the data that I cleaned in a previous blog post, which you can download here. If you want to follow along, download the monthly data. In my last blog post I showed how to perform a grid search the “tidy” way. As an example, I looked for the right hyperparameters of a SARIMA model. However, the goal of...

Read more »

Searching for the optimal hyper-parameters of an ARIMA model in parallel: the tidy gridsearch approach

November 14, 2018
By
Searching for the optimal hyper-parameters of an ARIMA model in parallel: the tidy gridsearch approach

Introduction In this blog post, I’ll use the data that I cleaned in a previous blog post, which you can download here. If you want to follow along, download the monthly data. In the previous blog post, I used the auto.arima() function to very quickly get a “good-enough” model to predict future monthly total passengers flying from LuxAirport. “Good-enough” models can be all you need in...

Read more »

Easy time-series prediction with R: a tutorial with air traffic data from Lux Airport

November 13, 2018
By
Easy time-series prediction with R: a tutorial with air traffic data from Lux Airport

In this blog post, I will show you how you can quickly and easily forecast a univariate time series. I am going to use data from the EU Open Data Portal on air passenger transport. You can find the data here. I downloaded the data in the TSV format for Luxembourg Airport, but you could repeat the analysis for any airport. Once you...

Read more »

Analyzing NetHack data, part 2: What players kill the most

Analyzing NetHack data, part 2: What players kill the most

Link to webscraping the data Link to Analysis, part 1 Introduction This is the third blog post that deals with data from the game NetHack, and oh boy, did a lot of things happen since the last blog post! Here’s a short timeline of the events: I scraped data from alt.org/nethack and made a package with the data available on Github (that package was too...

Read more »

Search R-bloggers


Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)