Articles by Paul van der Laken

David Robinson’s R Programming Screencasts

June 16, 2020 | 0 Comments

David Robinson (aka drob) is one of the best known R programmers. Since a couple of years David has been sharing his knowledge through streaming screencasts of him programming. It’s basically part of R’s #tidytuesday movement. Alex Cookson decided to do us all a favor and annotate all ...
[Read more...]

Visualizing and interpreting Cohen’s d effect sizes

June 9, 2020 | 0 Comments

Cohen’s d (wiki) is a statistic used to indicate the standardised difference between two means. Resarchers often use it to compare the averages between groups, for instance to determine that there are higher outcomes values in a experimental group than in a control group. Researchers often use general guidelines ...
[Read more...]

How to Write a Git Commit Message, in 7 Steps

May 11, 2020 | 0 Comments

Version control is an essential tool for any software developer. Hence, any respectable data scientist has to make sure his/her analysis programs and machine learning pipelines are reproducible and maintainable through version control. Often, we use git for version control. If you don’t know what git is yet, ... [Read more...]

Generative art: Let your computer design you a painting

May 2, 2020 | 0 Comments

I really like generative art, or so-called algorithmic art. Basically, it means you take a pattern or a complex system of rules, and apply it to create something new following those patterns/rules. When I finished my PhD, I got a beautiful poster of where the k-nearest neighbors algorithms was ...
[Read more...]

Free Springer Books during COVID19

April 24, 2020 | 0 Comments

Book publisher Springer just released over 400 book titles that can be downloaded free of charge following the corona-virus outbreak. Here’s fhe full overview: https://link.springer.com/search?facet-content-type=%22Book%22&package=mat-covid19_textbooks&facet-language=%22En%22&sortOrder=newestFirst&showAll=true Most of these books will normally set you back about $50 ...
[Read more...]

Simulating and visualizing the Monty Hall problem in Python & R

April 14, 2020 | 0 Comments

I recently visited a data science meetup where one of the speakers spoke about playing out the Monty Hall problem with his kids. The Monty Hall problem is probability puzzle. Based on the American television game show Let’s Make a Deal and its host, named Monty Hall: You’re ...
[Read more...]

Curated Regular Expression Resources

April 7, 2020 | 0 Comments

Regular expression (also abbreviated to regex) really is a powertool any programmer should know. It was and is one of the things I most liked learning, as it provides you with immediate, godlike powers that can speed up your (data science) workflow tenfold. I’ve covered many regex related topics ...
[Read more...]

Visualizing decision tree partition and decision boundaries

March 31, 2020 | 0 Comments

Grant McDermott develop this new R package I had thought of: parttree parttree includes a set of simple functions for visualizing decision tree partitions in R with ggplot2. The package is not yet on CRAN, but can be installed from GitHub using: Using the familiar ggplot2 syntax, we can simply ...
[Read more...]

How to standardize group colors in data visualizations in R

March 20, 2020 | 0 Comments

One best practice in visualization is to make your color scheme consistent across figures. For instance, if you’re making multiple plots of the dataset — say a group of 5 companies — you want to have each company have the same, consistent coloring across all these plots. R has some great data ...
[Read more...]

paletteer: Hundreds of color palettes in R

March 17, 2020 | 0 Comments

Looking for just the right colors for your data visualization? I often cover tools to pick color palettes on my website (e.g. here, here, or here) and also host a comprehensive list of color packages in my R programming resources overview. However, paletteer is by far my favorite package ...
[Read more...]

Solutions to working with small sample sizes

March 10, 2020 | 0 Comments

Both in science and business, we often experience difficulties collecting enough data to test our hypotheses, either because target groups are small or hard to access, or because data collection entails prohibitive costs. Such obstacles may result in data sets that are too small for the complexity of the statistical ...
[Read more...]

Simulating data with Bayesian networks, by Daniel Oehm

February 11, 2020 | 0 Comments

Daniel Oehm wrote this interesting blog about how to simulate realistic data using a Bayesian network. Bayesian networks are a type of probabilistic graphical model that uses Bayesian inference for probability computations. Bayesian networks aim to model conditional dependence, and therefore causation, by representing conditional dependence by edges in a ...
[Read more...]

Learn Julia for Data Science

February 10, 2020 | 0 Comments

Most data scientists favor Python as a programming language these days. However, there’s also still a large group of data scientists coming from a statistics, econometrics, or social science and therefore favoring R, the programming language they learned in university. Now there’s a new kid on the block: ...
[Read more...]

Why Gordon Shotwell uses R

January 6, 2020 | 0 Comments

This blog by Gordon Shotwell has passed my Twitter feed a couple of times now and I thought I’d share it here: blog.shotwell.ca/posts/why_i_use_r It in, Gordon present his reasons for using R, describing R’s four unique selling point, and outlining a ...
[Read more...]

Anomaly Detection Resources

December 19, 2019 | 0 Comments

Carnegie Mellon PhD student Yue Zhao collects this great Github repository of anomaly detection resources: https://github.com/yzhao062/anomaly-detection-resources The repository consists of tools for multiple languages (R, Python, Matlab, Java) and resources in the form of: Books & Academic Papers Online Courses and Videos Outlier Datasets Algorithms and Applications ... [Read more...]

Need to save R’s lm() or glm() models? Trim the fat!

December 4, 2019 | 0 Comments

I was training a predictive model for work for use in a Shiny App. However, as the training set was quite large (700k+ obs.), the model object to save was also quite large in size (500mb). This slows down your operation significantly! Basically, all you really need are the coefficients (... [Read more...]

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)