Blog Archives

A ggplot-based Marimekko/Mosaic plot

November 1, 2017
By
A ggplot-based Marimekko/Mosaic plot

One of my first baby steps into the open source world, was when I answered this SO question over four years ago. Recently I revisited the post and saw that Z.Lin did a very nice and more modern implementation, using dplyr and facetting in ggplot2. I decided to merge here ideas with mine to create a general function that...

Read more »

Non-standard evaluation, how tidy eval builds on base R

September 10, 2017
By

As with many aspects of the tidyverse, its non-standard evaluation (NSE) implementation is not something entirely new, but built on top of base R. What makes this one so challenging to get your mind around, is that the Honorable Doctor Sir Lord General and friends brought concepts to the realm of the mortals that many of us had no,...

Read more »

Tidy evaluation, most common actions

August 25, 2017
By

Tidy evaluation is a bit challenging to get your head around. Even after reading programming with dplyr several times, I still struggle when creating functions from time to time. I made a small summary of the most common actions I perform, so I don’t have to dig in the vignettes and on stackoverflow over and over. Each is accompanied...

Read more »

Span Dates and Times without Overhead

July 26, 2017
By

I am working on v.0.4.0 of the padr package this summer. Two new features that will be added are wrappers around seq.Date and seq.POSIXt. Since it is going to take a while before the new release is on CRAN, I go ahead and do an early presentation of these functions. Date and datetime parsing in base R are powerful...

Read more »

Quickly Check your id Variables

July 20, 2017
By

Virtually every dataset has them; id variables that link a record to a subject and/or time point. Often one column, or a combination of columns, forms the unique id of a record. For instance, the combination of patient_id and visit_id, or ip_adress and visit_time. The first step in most of my analyses is almost always checking the uniqueness of...

Read more »

Check Data Quality with padr

June 26, 2017
By
Check Data Quality with padr

The padr package was designed to prepare datetime data for analysis. That is, to take raw, timestamped data, and quickly convert it into a tidy format that can be analyzed with all the tidyverse tools. Recently, a colleague and I discovered a second use for the package that I had not anticipated: checking data quality. Every analysis should contain...

Read more »

Here is the new padr

May 16, 2017
By
Here is the new padr

I am very happy to announce v0.3.0 of the padr package, which was introduced in January. As requested by many, you are now able to use intervals of which the unit is different from 1. In earlier version the eight interval values only allowed for a single unit (e.g. year, day, hour). Now you can use any time period...

Read more »

Binning Outliers in a Histogram

April 26, 2017
By
Binning Outliers in a Histogram

I guess we all use it, the good old histogram. One of the first things we are taught in Introduction to Statistics and routinely applied whenever coming across a new continuous variable. However, it easily gets messed up by outliers. Putting most of the data into a single bin or a few bins, and scattering the outliers barely visible...

Read more »

Preparing Datetime Data for Analysis with padr and dplyr

March 19, 2017
By
Preparing Datetime Data for Analysis with padr and dplyr

Two months ago padr was introduced, followed by an improved version that allowed for applying pad on group level. See the introduction blogs or the vignette("padr") for more package information. In this blog I give four more elaborate examples on how to go from raw data to insight with padr, dplyr and ggplot2. They might serve as recipes for...

Read more »

Tree-based univariate testing

February 26, 2017
By
Tree-based univariate testing

When building a predictive model it is a good idea to do a univariate analysis, before throwing the whole bunch in a complex algorithm. This way we get a feel for the potential contribution of each predictor. When a lot of predictors are available one can often make a first selection and only use predictors that show univariate predictive...

Read more »

Search R-bloggers

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)