## Batch Deployment of WoE Transformations

April 20, 2019
By

After wrapping up the function batch_woe() today with the purpose to allow users to apply WoE transformations to many independent variables simultaneously, I have completed the development of major functions in the MOB package that can be usable for the model development in a production setting. The function batch_woe() basically is the wrapper around cal_woe()

## Quick Example of Latent Profile Analysis in R

April 19, 2019
By

Latent Profile Analysis (LPA) tries to identify clusters of individuals (i.e., latent profiles) based on responses to a series of continuous variables (i.e., indicators). LPA assumes that there are...

## Control Charts Another Package

April 19, 2019
By

I got an email from Alex Zanidean, who runs the xmrr package “You might enjoy my package xmrr for similar charts – but mine recalculate the bounds automatically” and if...

## ODSC East 2019 Talks to Expand and Apply R Skills

R programmers are not necessary data scientists, but rather software engineers. We have an entirely new multitrack focus area that helps engineers learn AI skills – AI for Engineers....

## tint 0.1.2: Some cleanups

April 19, 2019
By

A new version 0.1.2 of the tint package is arriving at CRAN as I write this. It follows the recent 0.1.1 release which included two fabulous new vignettes...

## Animating the US Treasury yield curve rates by @ellis2013nz

April 19, 2019
By

My eye was caught by this tweet by Robin Wigglesworth of the Financial Times with an Alan Smith animation of the US Treasury yield curve from 2005 to 2009....

## Generating the Ultimate List of 41 Data Science Podcasts by Crowdsourcing Google Results

April 18, 2019
By

Confession time: years ago, I was skeptical of podcasts. I was a music-only listener on commutes. Can you imagine? But around 2016, I gave in and finally took the...

## Using ecmwfr to measure global warming

April 18, 2019
By

For my research I needed to download gridded weather data from ERA-Interim, which is a big dataset generated by the ECMWF. Getting long term data through their website is...

April 18, 2019
By

Metadata are an essential part of a robust data science workflow ; they record the meaning of each variable : its units, quality, allowed range, how we collect it,...

## Base Rate Fallacy – or why No One is justified to believe that Jesus rose

April 18, 2019
By

In this post we are talking about one of the most unintuitive results of statistics: the so called false positive paradox which is an example of the so called...

## Applying gradient descent – primer / refresher

April 18, 2019
By

Every so often a problem arises where it’s appropriate to use gradient descent, and it’s fun (and / or easier) The post Applying gradient descent – primer / refresher...

## Common Uncommon Notations that Confuse New R Coders

April 17, 2019
By

Here are a few of the more commonly used notations found in R code and documentation that confuse coders of any skill level who are new to R. Be...

## A Comparative Review of the JASP Statistical Software

April 17, 2019
By

JASP is a free and open source statistics package that targets beginners looking to point-and-click their way through analyses. This article is one of a series of reviews which...

## Edit datatables in R shiny app

April 17, 2019
By

Tables are very much the standard way of representing data in dashboard along with visualizations. Wouldnt it be more useful if you could edit the values in the tables...

## mlr-2.14.0

Filters Learners Resampling mlr-org NEWS Roadmap for mlr The last mlr release was in August 2018 - so it was definitely time for a new release after around 9 months of development! The NEWS file...

## ANCOVA example – April 18, 2019

April 17, 2019
By

I recently had the need to run an ANCOVA, not a task I perform all that often and my first time using R to do so (I’ve done it...

## RStudio Package Manager 1.0.8 – System Requirements

April 17, 2019
By

Installing R packages on Linux systems has always been a risky affair. In RStudio Package Manager 1.0.8, we’re giving administrators and R users the information they need to make installing packages...

## When Standards Go Wild – Software Review for a Manuscript

Stefanie Butland, rOpenSci Community Manager Some things are just irresistible to a community manager – PhD student Hugo Gruson’s recent tweets definitely fall into that category. I was surprised and...

## Explore the landscape of R packages for automated data exploration

April 17, 2019
By

Do you spend a lot of time on data exploration? If yes, then you will like today’s post about AutoEDA written by Mateusz Staniak. If you ever dreamt of...

## Bayes vs. the Invaders! Part Three: The Parallax View

April 17, 2019
By

The Parallax View In the previous post of this series unveiling the relationship between UFO sightings and population, we crossed the threshold of normality underpinning linear models to construct...

## A Detailed Guide to Plotting Line Graphs in R using ggplot geom_line

April 16, 2019
By

When it comes to data visualization, it can be fun to think of all the flashy and exciting ways to display a dataset. But if you're trying to convey...

## Tidy correlation tests in R

April 16, 2019
By

When we try to estimate the correlation coefficient between multiple variables, the task is more complicated in order to obtain a simple and tidy result. A simple solution is...

## Setting up RStudio Server on a Cloud for Collaboration and Reproducibility

April 16, 2019
By

Roland Stevenson is a data scientist and consultant who may be reached on Linkedin. When setting up R and RStudio Server on a cloud Linux instance, some thought should be...

## Vectorizing functions in R is easy

April 16, 2019
By

Imagine you have a function that only takes one argument, but you would really like to work on a vector of values. A short example on how function Vectorize()...

## Two interesting facts about high-dimensional random projections

April 16, 2019
By
$Two interesting facts about high-dimensional random projections$

John Cook recently wrote an interesting blog post on random vectors and random projections. In the post, he states two surprising facts of high-dimensional geometry and gives some intuition...

## Controlling Data Layout With cdata

April 16, 2019
By

Here is an example how easy it is to use cdata to re-layout your data. Tim Morris recently tweeted the following problem (corrected). Please will you take pity on...

## Writing a letter to DataCamp

April 15, 2019
By

Since 2017 I have been an instructor for DataCamp, the VC-backed online data science education platform. What this means is that I am not an employee, but I have...

## Customize Your Interactive EDA: Explore the Fuel Economy of the U.S. Car Market

Interactive EDA is nice but customized interactive EDA is even nicer. To celebrate the new CRAN version of my ‘ExPanDaR’ package I prepare a customized variant of ‘ExPanD’ to...

## Customize Your Interactive EDA: Explore the Fuel Economy of the U.S. Car Market

Interactive EDA is nice but customized interactive EDA is even nicer. To celebrate the new CRAN version of my ‘ExPanDaR’ package I prepare a customized variant of ‘ExPanD’ to...