Blog Archives

What hyper-parameters are, and what to do with them; an illustration with ridge regression

What hyper-parameters are, and what to do with them; an illustration with ridge regression

This blog post is an excerpt of my ebook Modern R with the tidyverse that you can read for free here. This is taken from Chapter 7, which deals with statistical models. In the text below, I explain what hyper-parameters are, and as an example I run a ridge regression using the {glmnet} package. The book is still being written, so comments are...

Read more »

A tutorial on tidy cross-validation with R

November 24, 2018
By
A tutorial on tidy cross-validation with R

Introduction This blog posts will use several packages from the {tidymodels} collection of packages, namely {recipes}, {rsample} and {parsnip} to train a random forest the tidy way. I will also use {mlrMBO} to tune the hyper-parameters of the random forest. Set up Let’s load the needed packages: library("tidyverse") library("tidymodels") library("parsnip") library("brotools") library("mlbench") Load the data, included in the {mlrbench} package: data("BostonHousing2") I will train a random forest to predict the housing price, which is the...

Read more »

The best way to visit Luxembourguish castles is doing data science + combinatorial optimization

November 20, 2018
By
The best way to visit Luxembourguish castles is doing data science + combinatorial optimization

Inspired by David Schoch’s blog post, Traveling Beerdrinker Problem. Check out his blog, he has some amazing posts! Introduction Luxembourg, as any proper European country, is full of castles. According to Wikipedia, “By some optimistic estimates, there are as many as 130 castles in Luxembourg but more realistically there are probably just over a hundred, although many of these could be considered large residences or manor...

Read more »

Using a genetic algorithm for the hyperparameter optimization of a SARIMA model

November 15, 2018
By
Using a genetic algorithm for the hyperparameter optimization of a SARIMA model

Introduction In this blog post, I’ll use the data that I cleaned in a previous blog post, which you can download here. If you want to follow along, download the monthly data. In my last blog post I showed how to perform a grid search the “tidy” way. As an example, I looked for the right hyperparameters of a SARIMA model. However, the goal of...

Read more »

Searching for the optimal hyper-parameters of an ARIMA model in parallel: the tidy gridsearch approach

November 14, 2018
By
Searching for the optimal hyper-parameters of an ARIMA model in parallel: the tidy gridsearch approach

Introduction In this blog post, I’ll use the data that I cleaned in a previous blog post, which you can download here. If you want to follow along, download the monthly data. In the previous blog post, I used the auto.arima() function to very quickly get a “good-enough” model to predict future monthly total passengers flying from LuxAirport. “Good-enough” models can be all you need in...

Read more »

Easy time-series prediction with R: a tutorial with air traffic data from Lux Airport

November 13, 2018
By
Easy time-series prediction with R: a tutorial with air traffic data from Lux Airport

In this blog post, I will show you how you can quickly and easily forecast a univariate time series. I am going to use data from the EU Open Data Portal on air passenger transport. You can find the data here. I downloaded the data in the TSV format for Luxembourg Airport, but you could repeat the analysis for any airport. Once you...

Read more »

Analyzing NetHack data, part 2: What players kill the most

Analyzing NetHack data, part 2: What players kill the most

Link to webscraping the data Link to Analysis, part 1 Introduction This is the third blog post that deals with data from the game NetHack, and oh boy, did a lot of things happen since the last blog post! Here’s a short timeline of the events: I scraped data from alt.org/nethack and made a package with the data available on Github (that package was too...

Read more »

Analyzing NetHack data, part 1: What kills the players

Analyzing NetHack data, part 1: What kills the players

Abstract In this post, I will analyse the data I scraped and put into an R package, which I called {nethack}. NetHack is a roguelike game; for more context, read my previous blog post. You can install the {nethack} package and play around with the data yourself by installing it from github: devtools::install_github("b-rodrigues/nethack") And to use it: library(nethack) data("nethack") The data contains information on games played from 2001 to...

Read more »

From webscraping data to releasing it as an R package to share with the world: a full tutorial with data from NetHack

From webscraping data to releasing it as an R package to share with the world: a full tutorial with data from NetHack

If someone told me a decade ago (back before I'd ever heard the term "roguelike") what I'd be doing today, I would have trouble believing this...Yet here we are. pic.twitter.com/N6Hh6A4tWl— Josh Ge (@GridSageGames) June 21, 2018 Abstract In this post, I am going to show you how you can scrape tables from a website, and then create a package with the tidied...

Read more »

Maps with pie charts on top of each administrative division: an example with Luxembourg’s elections data

Maps with pie charts on top of each administrative division: an example with Luxembourg’s elections data

Abstract You can find the data used in this blog post here: https://github.com/b-rodrigues/elections_lux This is a follow up to a previous blog post where I extracted data of the 2018 Luxembourguish elections from Excel Workbooks. Now that I have the data, I will create a map of Luxembourg by commune, with pie charts of the results on top of each commune! To do this,...

Read more »

Search R-bloggers


Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)