June 2018

Let R/Python send messages when the algorithms are done training

June 24, 2018 | 0 Comments

As Data Scientists, we often train complex algorithms in order to tackle certain business problems and generate value. These algorithms, however, can take a while to train. Sometimes they take a couple of hours, hours which I’m not going to spend just sitting and waiting. But regularly checking whether ... [Read more...]

Fast Fiedler Vector Computation

June 23, 2018 | 0 Comments

This is a short post on how to quickly calculate the Fiedler vector for large graphs with the igraph package. #used libraries library(igraph) # for network data structures and tools library(microbenchmark) # for benchmark results Fiedler Vector with eigen My goto approach at the start was using the eigen() function ... [Read more...]

A forecast ensemble benchmark

June 23, 2018 | 0 Comments

Forecasting benchmarks are very important when testing new forecasting methods, to see how well they perform against some simple alternatives. Every week I get sent papers proposing new forecasting methods that fail to do better than even the simplest benchmark. They are rejected without review. Typical benchmarks include the naï...
[Read more...]

Correspondence Analysis of Mexican Discourses

June 23, 2018 | 0 Comments

Correspondence Analysis Correspondence analysis is a multivariate statistical technique that summarizes a set of categorical data in a two dimensional form. It’s like the equivalent of Principal Component Analysis but for categorical data. Correspondence analysis is usually applied to contigency tables. In this post, we will apply it to ... [Read more...]

Forecasting my weight with R

June 23, 2018 | 0 Comments

I’ve been measuring my weight almost daily for almost 2 years now; I actually started earlier, but not as consistently. The goal of this blog post is to get re-acquaiented with time series; I haven’t had the opportunity to work with time series for a long time now and ... [Read more...]

A primer in using Java from R – part 1

June 23, 2018 | 0 Comments

Introduction This primer shall consist of two parts and its goal is to provide a walk-through of using resources developed in Java from R. It is structured as more of a “note-to-future-self” rather than a proper educational article, I however hope that some readers may still find it useful. It ...
[Read more...]

A primer in using Java from R – part 1

June 23, 2018 | 0 Comments

Introduction This primer shall consist of two parts and its goal is to provide a walk-through of using resources developed in Java from R. It is structured as more of a “note-to-future-self” rather than a proper educational article, I however hope that some readers may still find it useful. It ...
[Read more...]

ICA on Images with Python

June 23, 2018 | 0 Comments

Click here to see my recommended reading list. What is Independent Component Analysis (ICA)? If you’re already familiar with ICA, feel free to skip below to how we implement it in Python. ICA is a type of dimensionality reduction algorithm that transforms a set of variables to a new ...
[Read more...]

Ten Years vs The Spread: Calculating publication lag times in R

June 23, 2018 | 0 Comments

There have been several posts on this site about publication lag times. You can read them here. Lag times are the delays in the dissemination of scientific data introduced by the process of publishing the paper in a journal. Nowadays, your paper can be online in a few hours using ...
[Read more...]

But We Won Everywhere but the Scoreboard

June 22, 2018 | 0 Comments

Something that gets to many a footy fan, is the feeling that your team has won the game in most areas expect on the scoreboard. Thinking about this statement a little bit deeper has the following implication. That there are some areas of the game, that if you win, you ... [Read more...]

future.apply – Parallelize Any Base R Apply Function

June 22, 2018 | 0 Comments

Got compute? future.apply 1.0.0 - Apply Function to Elements in Parallel using Futures - is on CRAN. With this milestone release, all* base R apply functions now have corresponding futurized implementations. This makes it easier than ever before to parallelize your existing apply(), lapply(), mapply(), … code - just prepend future_ ...
[Read more...]

Thanks for Reading!

June 22, 2018 | 0 Comments

As I've been blogging more about statistics, R, and research in general, I've been trying to increase my online presence, sharing my blog posts in groups of like-minded people. Those efforts seem to have paid off, based on my view counts over the past ...
[Read more...]

A guide to working with character data in R

June 22, 2018 | 0 Comments

R is primarily a language for working with numbers, but we often need to work with text as well. Whether it's formatting text for reports, or analyzing natural language data, R provides a number of facilities for working with character data. Handling Strings with R, a free (CC-BY-NC-SA) e-book by ... [Read more...]

Using DataCamp’s Autograder to Teach R

June 22, 2018 | 0 Comments

Immediate and personalized feedback has been central to the learning experience on DataCamp since we launched the first courses. If students submit code that contains a mistake, they are told where they made a mistake, and how they can fix this. You can play around with it in our free ... [Read more...]

Melt and cast the shape of your data.frame – Exercises

June 22, 2018 | 0 Comments

  Datasets often arrive to us in a form that is different from what we need for our modelling or visualisations functions who in turn don’t necessary require the same format. Reshaping data.frames is a step that all analysts need but many struggle with. Practicing this meta-skill will in ...
[Read more...]

Creating Slopegraphs with R

June 22, 2018 | 0 Comments

Presenting data results in the most informative and compelling manner is part of the role of the data scientist. It's all well and good to master the arcana of some algorithm, to manipulate and master the numbers and bend them to your will to produce a “solution” that is both ...
[Read more...]

Exploring the Stack Overflow Dev Survey with Shiny – part 1

June 22, 2018 | 0 Comments

The challenge Recently I saw that StackOverflow released their survey data and had been posted on Kaggle. The data came with the following context “Want to dive into the results yourself and see what you can learn about salaries or machine learning or diversity in tech?” and given that June ...
[Read more...]

Parallelizing Linear Regression or Using Multiple Sources

June 21, 2018 | 0 Comments

My previous post was explaining how mathematically it was possible to parallelize computation to estimate the parameters of a linear regression. More speficially, we have a matrix which is matrix and a -dimensional vector, and we want to compute by spliting the job. Instead of using the observations, we’ve ...
[Read more...]
1 2 3 4 5 6 15

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)