Articles by Paul van der Laken

calmcode.io > video tutorials for open source tools

September 28, 2021 | Paul van der Laken

calmcode.io is an e-learning platform that I really really really recommend to programmers and data scientists: It is free. It involves open source tools. It uses bite-sized tutorial videos. It explains tools clearly. It explains everything calmly. There’s tons of content about computer programming, data science, and personal ...

How to confuse your shareholders by bad data visualization

August 31, 2021 | Paul van der Laken

Like many people during the COVID19 crisis, I turned to the stock market as a new hobby. Like the ignorant investor that I am, I thought it wise to hop on the cloud computing bandwagon. Hence, I bought, among others, a small position in Rackspace Technologies. A long way down ...

ppsr live on CRAN!

March 2, 2021 | Paul van der Laken

Finding predictive patterns in your dataset with one line of code! Today — March 2nd 2021 — my first R package was published on the comprehensive R archive network (CRAN). ppsr is the R implementation of the Predictive Power Score (PPS). The PPS is an asymmetric, data-type-agnostic score that can detect linear or ...

ppsr: An R implementation of the Predictive Power Score

January 12, 2021 | Paul van der Laken

A few months ago, I wrote about the Predictive Power Score (PPS): a handy metric to quickly explore and quantify the relationships in a dataset. As a social scientist, I was taught to use a correlation matrix to describe the relationships in a dataset. Yet, in my opinion, the PPS ...

JavaScript for R — ebook

December 1, 2020 | Paul van der Laken

The R programming language has seen the integration of many languages; C, C++, Python, to name a few, can be seamlessly embedded into R so one can conveniently call code written in other languages from the R console. Little known to many, R works just as well with JavaScript—this ...

Bayesian Statistics using R, Python, and Stan

October 20, 2020 | Paul van der Laken

For a year now, this course on Bayesian statistics has been on my to-do list. So without further ado, I decided to share it with you already. Richard McElreath is an evolutionary ecologist who is famous in the stats community for his work on Bayesian statistics. At the Max Planck ...

10 Guidelines to Better Table Design

September 1, 2020 | Paul van der Laken

Jon Schwabisch recently proposed ten guidelines for better table design. Next to the academic paper, Jon shared his recommendations in a Twitter thread. Let me summarize them for you: Right-align your numbers Left-align your texts Use decimals appropriately (one or two is often enough) Display units (e.g., $, %) sparsely (e....

How most statistical tests are linear models

August 25, 2020 | Paul van der Laken

Jonas Kristoffer Lindeløv wrote a great visual explanation of how the most common statistical tests (t-test, ANOVA, ANCOVA, etc) are all linear models in the back-end. Jonas’ original blog uses R programming to visually show how the tests work, what the linear models look like, and how different approaches ...

Create a publication-ready correlation matrix, with significance levels, in R

July 28, 2020 | Paul van der Laken

In most (observational) research papers you read, you will probably run into a correlation matrix. Often it looks something like this: In Social Sciences, like Psychology, researchers like to denote the statistical significance levels of the correlation coefficients, often using asterisks (i.e., *). Then the table will look more like ... [Read more...]

David Robinson’s R Programming Screencasts

June 16, 2020 | Paul van der Laken

David Robinson (aka drob) is one of the best known R programmers. Since a couple of years David has been sharing his knowledge through streaming screencasts of him programming. It’s basically part of R’s #tidytuesday movement. Alex Cookson decided to do us all a favor and annotate all ...

Visualizing and interpreting Cohen’s d effect sizes

June 9, 2020 | Paul van der Laken

Cohen’s d (wiki) is a statistic used to indicate the standardised difference between two means. Resarchers often use it to compare the averages between groups, for instance to determine that there are higher outcomes values in a experimental group than in a control group. Researchers often use general guidelines ...

How to Write a Git Commit Message, in 7 Steps

May 11, 2020 | Paul van der Laken

Version control is an essential tool for any software developer. Hence, any respectable data scientist has to make sure his/her analysis programs and machine learning pipelines are reproducible and maintainable through version control. Often, we use git for version control. If you don’t know what git is yet, ... [Read more...]

Predictive Power Score: Finding predictive patterns in your dataset

May 4, 2020 | Paul van der Laken

Last week, I shared this Medium blog on PPS — or Predictive Power Score — on my LinkedIn and got so many enthousiastic responses, that I had to share it with here too. Basically, the predictive power score is a normalized metric (values range from 0 to 1) that shows you to what extent ...

Generative art: Let your computer design you a painting

May 2, 2020 | Paul van der Laken

I really like generative art, or so-called algorithmic art. Basically, it means you take a pattern or a complex system of rules, and apply it to create something new following those patterns/rules. When I finished my PhD, I got a beautiful poster of where the k-nearest neighbors algorithms was ...

Free Springer Books during COVID19

April 24, 2020 | Paul van der Laken

Book publisher Springer just released over 400 book titles that can be downloaded free of charge following the corona-virus outbreak. Here’s fhe full overview: https://link.springer.com/search?facet-content-type=%22Book%22&package=mat-covid19_textbooks&facet-language=%22En%22&sortOrder=newestFirst&showAll=true Most of these books will normally set you back about $50 ...

Simulating and visualizing the Monty Hall problem in Python & R

April 14, 2020 | Paul van der Laken

I recently visited a data science meetup where one of the speakers spoke about playing out the Monty Hall problem with his kids. The Monty Hall problem is probability puzzle. Based on the American television game show Let’s Make a Deal and its host, named Monty Hall: You’re ...

Curated Regular Expression Resources

April 7, 2020 | Paul van der Laken

Regular expression (also abbreviated to regex) really is a powertool any programmer should know. It was and is one of the things I most liked learning, as it provides you with immediate, godlike powers that can speed up your (data science) workflow tenfold. I’ve covered many regex related topics ...

Visualizing decision tree partition and decision boundaries

March 31, 2020 | Paul van der Laken

Grant McDermott develop this new R package I had thought of: parttree parttree includes a set of simple functions for visualizing decision tree partitions in R with ggplot2. The package is not yet on CRAN, but can be installed from GitHub using: Using the familiar ggplot2 syntax, we can simply ...

How to standardize group colors in data visualizations in R

March 20, 2020 | Paul van der Laken

One best practice in visualization is to make your color scheme consistent across figures. For instance, if you’re making multiple plots of the dataset — say a group of 5 companies — you want to have each company have the same, consistent coloring across all these plots. R has some great data ...

paletteer: Hundreds of color palettes in R

March 17, 2020 | Paul van der Laken

Looking for just the right colors for your data visualization? I often cover tools to pick color palettes on my website (e.g. here, here, or here) and also host a comprehensive list of color packages in my R programming resources overview. However, paletteer is by far my favorite package ...

1 2 »

Copyright © 2025 | MH Corporate basic by MH Themes