## Function basis and regression

March 1, 2020
By

In the first part of the course on linear models, we’ve seen how to construct a linear model when the vector of covariates is given, so that is either simply (for standard linear models) or a functional of (in GLMs). But more generally, we can consider transformations of the covariates, so that a linear model can be used. In...

## The probabilities implied by bookmaker odds: Introducing the ‘implied’ package

March 1, 2020
By

My package for converting bookmaker odds into probabilities is now on available from CRAN. The package contains several different conversion algorithms, which are all accessible via the implied_probabilities() function. I have written an introduction on how you can use the … Continue reading →

## R tips and tricks – Paste a plot from R to a word file

March 1, 2020
By

In this post you will learn how to properly paste an R plot\chart\image to a word file. There are few typical problems that occur when people try to do that. Below you can find a simple, clean and repeatable solution. When you google how to paste a plot from R to a word file you... Related posts: R tips and tricks...

## Using R: 10 years with R

March 1, 2020
By

Yesterday, 29 Feburary 2020, was the 20th anniversary of the release R 1.0.0. Jozef Hajnala’s blog has a cute anniversary post with some trivia. I realised that it is also (not to the day, but to the year) my R anniversary. Today is the 20th anniversary of the release of R 1.0.0. pic.twitter.com/gwItCBGYV4 — The

## SR2 Chapter 2 Hard

February 29, 2020
By

SR2 Chapter 2 Hard Posted on 1 March, 2020 by Brian Tags: statistical rethinking, solutions, conditional probability, counting, bayes rule, pandas Category: statistical-rethinking-2 Here’s my solution to the hard exercises in chapter...

## Predicting the misclassification cost incurred in air pressure system failure in heavy vehicles

February 29, 2020
By

Abstract The Air Pressure System (APS) is a type of function used in heavy vehicles to assist braking and gear changing. The APS failure dataset consists of the daily operational sensor data from failed Scania trucks. The dataset is crucial to the man...

## Source code chapter of ‘evidence-based software engineering’ reworked

February 29, 2020
By

The Source code chapter of my evidence-based software engineering book has been reworked (draft pdf). When writing the first version of this chapter, I was not certain whether source code was a topic warranting a chapter to itself, in an evidence-based software engineering book. Now I am certain. Source code is the primary product delivery,

## Log transform or log link? And confounding variables. by @ellis2013nz

February 29, 2020
By

Last week I wrote about the relationship between weight and height in US adults, as seen in the US Centers for Disease Control and prevention (CDC) Behavioral Risk Factor Surveillance System, an annual telephone survey of around 400,000 interviews per year. In particular, I tested the widely-circulated claim that Body Mass Index (BMI) exaggerates the “fatness” of tall people...

## matricks 0.8.2 available on CRAN

February 28, 2020
By

matricks package in 0.8.2 version has been released on CRAN! In this post I will present you, what are advantages of using matricks and how you can use it. Creating matrices The main function the package started with is m. It’s a smart shortcut fo...

## Drawdowns by the data

February 28, 2020
By

We’re taking a break from our series on portfolio construction for two reasons: life and the recent market sell-off. Life got in the way of focusing on the next couple of posts on rebalancing. And given the market sell-off we were too busy gamma hedging our convexity exposure, looking for cheap tail risk plays, and trying to figure out...

## SR2 Chapter 2 Medium

February 28, 2020
By

SR2 Chapter 2 Medium Posted on 29 February, 2020 by Brian Tags: statistical rethinking, solutions, conditional probability, counting, grid approximation Category: statistical-rethinking-2 Here’s my solutions to the medium exercises in chapter 2 of...

## What to know before you adopt Hugo/blogdown

Fancy (re-)creating your website using Hugo, with or without blogdown? Feeling a bit anxious? This post is aimed at being the Hugo equivalent of “What to know before you adopt a pet”. We shall go through things that can/will break in the future, and what you can do to prevent future pain. I’m writing this post with R users in mind, which means...

## The significance of the sector on the salary in Sweden, a comparison between different occupational groups, part 3

February 28, 2020
By

To complete the analysis on the significance of the sector on the salary for different occupational groups in Sweden I will in this post examine the correlation between salary and sector using statistics for education. The F-value from the Anova table is used as the single value to discriminate how much the region and salary correlates. For exploratory analysis, the...

## How to Acquire Large Satellite Image Datasets for Machine Learning Projects

February 28, 2020
By

Introduction Historically, only governments and large corporations have had access to quality satellite images. In recent years, satellite image datasets have become available to anyone with a computer and an internet connection. The quality, quantity, and precision of these datasets is continuously improving, and there are many free and commercial platforms at your disposal to Article How to Acquire...

## All you need to know on PCA …

February 28, 2020
By

All you need to do with PCA is in Factoshiny! PCA – Principal Component Analysis – is a well known method for exploring and visualizing data. The function Factoshiny of the package Factoshiny allows you to perform PCA in a really easy way. You can include extras information such as categorical variables, manage missing data,

## Machine Learning with R: A Hands-on Introduction from Robert Muenchen at Machine Learning Week, Las Vegas

February 28, 2020
By

Join Robert Muenchen’s workshop about Machine Learning with R at Machine Learning Week on May 31 – June 4, 2020 in Las Vegas!  Workshop Description  The Workshop will take place in May 31, 2020.  R offers a wide variety of machine learning (ML) functions, each of which works in a slightly different way. This one-day, … Continue reading Machine...

## XGBoostLSS – An extension of XGBoost to probabilistic forecasting

February 28, 2020
By

Introduction  To reason rigorously under uncertainty we need to invoke the language of  probability (Zhang et al. 2020). Any model that falls short of providing quantification of the uncertainty attached to its outcome is likely to yield an incomplete and potentially misleading picture. While this is an irrevocable consensus in statistics, a common misconception, albeit a … Continue reading XGBoostLSS...

## Convolutional Neural Network under the Hood

February 27, 2020
By

Neural networks have really taken over for solving image recognition and high sample rate data problems in the last couple of years. In all honesty, I promise I won’t be teaching you what neural networks are or CNN’s are. There are hundred’s of resources that are published everyday explaining them. I’ll post few links below.... Continue Reading →

## Building A base dplyr With Primitives: Grouped Operations, Pipes and More!

February 27, 2020
By

Introduction In my last post we looked at how we can recreate base equivalents of the dplyr functions select(), filter(), mutate() and arrange(), amongst others. I wrote these functions and presented them in a new package called poorman. In this post I will be discussing new functionality that I have since added to poorman including grouped operations, renaming columns, summarising...

## Decision Boundary for a Series of Machine Learning Models

Machine Learning at the Boundary: There is nothing new in the fact that machine learning models can outperform traditional econometric models but I want to show as part of my research why and how some models make given predictions or in this instance classifications. I wanted to show the decision boundary in which my binary classification model was making. That is,...

## Version 0.4.0 of nnetsauce, with fruits and breast cancer classification

February 27, 2020
By

Version 0.4.0 of nnetsauce, with fruits and breast cancer classification

## Student’s t-test in R and by hand: how to compare two groups under different scenarios

February 27, 2020
By

Introduction Null and alternative hypothesis Hypothesis testing Different versions of the Student’s t-test How to compute Student’s t-test by hand? Scenario 1: Independent samples with 2 known variances Scenario 2: Independent samples with 2 equal but unknown variances Scenario 3: Independent samples with 2 unequal and unknown variances Scenario 4: Paired samples where the variance of the differences is known Scenario 5: Paired samples where the variance of...

## Data Science in Manufacturing: An Overview

February 27, 2020
By

Original article published in opendatascience.com In the last couple of years, data science has seen an immense influx in various industrial applications across the board. Today, we can see data science applied in health care, customer service, governments, cyber security, mechanical, aerospace, and other industrial applications. Among these, manufacturing has gained more prominence to achieve... Continue Reading →

## Developing a complex R Shiny app – the good, the bad and the ugly

February 27, 2020
By

Together with Clara Bicalho (UC Berkeley) and Sisi Huang (WZB), I recently developed a web application that acts as a … Read More →

## MLOPS for R with Azure Machine Learning

February 26, 2020
By

The video recording of my RStudio::conf talk, MLOPS for R with Azure Machine Learning, is now available for streaming thanks to the fine folks at RStudio. The talk begins with a general discussion of MLOps (Machine Learning Operations) and how it differs from DevOps as applied to traditional (non-ML-based) applications. This is a theme I plan to develop further...

## RStudio Package Manager 1.1.2 – Windows

February 26, 2020
By

RStudio Package Manager 1.1.2 introduces beta support for Windows package binaries. These binaries make it easier and faster to install R packages on Windows Desktop. With this release, all the benefits of Package Manager are available to ...

## if … else and ifelse

February 26, 2020
By

Let’s make this a quick and quite basic one. There is this incredibly useful function in R called ifelse(). It’s basically a vectorized version of an if … else control structure every programming language has in one way or the other. ifelse() has, in my view, two major advantages over if … else: It’s super fast. It’s more convenient to use. The...

## chain of lynx and drove of hares

February 26, 2020
By

A paper (and an introduction to the paper) in Nature this week seems to have made progress on the existence of indefinite predator-prey cyles. As in the lynx/hare dataset available on R. The paper is focusing on another pair, an invertebrate and its prey, an algae. For which the authors managed a 50 cycle sequence.