Blog Archives

Binning Outliers in a Histogram

April 26, 2017
By
Binning Outliers in a Histogram

I guess we all use it, the good old histogram. One of the first things we are taught in Introduction to Statistics and routinely applied whenever coming across a new continuous variable. However, it easily gets messed up by outliers. Putting most of the data into a single bin or a few bins, and scattering the outliers barely visible...

Read more »

Preparing Datetime Data for Analysis with padr and dplyr

March 19, 2017
By
Preparing Datetime Data for Analysis with padr and dplyr

Two months ago padr was introduced, followed by an improved version that allowed for applying pad on group level. See the introduction blogs or the vignette("padr") for more package information. In this blog I give four more elaborate examples on how to go from raw data to insight with padr, dplyr and ggplot2. They might serve as recipes for...

Read more »

Tree-based univariate testing

February 26, 2017
By
Tree-based univariate testing

When building a predictive model it is a good idea to do a univariate analysis, before throwing the whole bunch in a complex algorithm. This way we get a feel for the potential contribution of each predictor. When a lot of predictors are available one can often make a first selection and only use predictors that show univariate predictive...

Read more »

padr::pad does now do group padding

February 18, 2017
By

A few weeks ago padr was introduced on CRAN, allowing you to quickly get datetime data ready for analysis. If you have missed this, see the introduction blog or vignette("padr") for a general introduction. In v0.2.0 the pad function is extended with a group argument, which makes your life a lot easier when you want to do padding within...

Read more »

A wrapper around nested ifelse

February 7, 2017
By

The ifelse function is the way to do vectorised if then else in R. One of the first cool things I learned to do in R a few years back, I got from Norman Matloff’s The Art of R Programming. When you have more than one if then statements, you just nest multiple ifelse functions before you reach the...

Read more »

Introducing padr

January 17, 2017
By
Introducing padr

I am happy to introduce the padr package, which is now available on CRAN. If you frequently work with data containing a timestamp, especially automatically created data, you might find this package helpful. It solves two problems that you can be confronted with when preparing datetime data for analysis. First, data is often recorded on too low a level...

Read more »

Building a column selecter

November 27, 2016
By
Building a column selecter

Maybe the following sounds familiar. You have a large data set with many, many columns of which the most are irrelevant to you. Typically, a dump from a database or the full set extracted from an API. Several times I found myself the better part of an afternoon going back and forth between a view of the data where...

Read more »

Designing our bathroom with R

August 10, 2016
By
Designing our bathroom with R

R has been an indispensable tool since I started working with it about five years ago. Of course in my day job as a data scientist I couldn’t live without it, but it also proved to be a great aid in private life. Recently we bought our first house and R came to the rescue several times in the...

Read more »

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)