## ODSC East 2019 Talks to Expand and Apply R Skills

R programmers are not necessary data scientists, but rather software engineers. We have an entirely new multitrack focus area that helps engineers learn AI skills – AI for Engineers. This focus area is designed specifically to help programmers get familiar with AI-driven software that utilizes deep learning and machine learning models to enable conversational AI, … Continue reading ODSC...

## tint 0.1.2: Some cleanups

April 19, 2019
By

A new version 0.1.2 of the tint package is arriving at CRAN as I write this. It follows the recent 0.1.1 release which included two fabulous new vignettes...

## Generating the Ultimate List of 41 Data Science Podcasts by Crowdsourcing Google Results

April 18, 2019
By

Confession time: years ago, I was skeptical of podcasts. I was a music-only listener on commutes. Can you imagine? But around 2016, I gave in and finally took the...

April 18, 2019
By

Metadata are an essential part of a robust data science workflow ; they record the meaning of each variable : its units, quality, allowed range, how we collect it,...

## Base Rate Fallacy – or why No One is justified to believe that Jesus rose

April 18, 2019
By

In this post we are talking about one of the most unintuitive results of statistics: the so called false positive paradox which is an example of the so called...

## Applying gradient descent – primer / refresher

April 18, 2019
By

Every so often a problem arises where it’s appropriate to use gradient descent, and it’s fun (and / or easier) The post Applying gradient descent – primer / refresher...

## Common Uncommon Notations that Confuse New R Coders

April 17, 2019
By

Here are a few of the more commonly used notations found in R code and documentation that confuse coders of any skill level who are new to R. Be...

## A Comparative Review of the JASP Statistical Software

April 17, 2019
By

JASP is a free and open source statistics package that targets beginners looking to point-and-click their way through analyses. This article is one of a series of reviews which...

## ANCOVA example – April 18, 2019

April 17, 2019
By

I recently had the need to run an ANCOVA, not a task I perform all that often and my first time using R to do so (I’ve done it...

## RStudio Package Manager 1.0.8 – System Requirements

April 17, 2019
By

Installing R packages on Linux systems has always been a risky affair. In RStudio Package Manager 1.0.8, we’re giving administrators and R users the information they need to make installing packages...

## When Standards Go Wild – Software Review for a Manuscript

Stefanie Butland, rOpenSci Community Manager Some things are just irresistible to a community manager – PhD student Hugo Gruson’s recent tweets definitely fall into that category. I was surprised and...

## Explore the landscape of R packages for automated data exploration

April 17, 2019
By

Do you spend a lot of time on data exploration? If yes, then you will like today’s post about AutoEDA written by Mateusz Staniak. If you ever dreamt of...

## Bayes vs. the Invaders! Part Three: The Parallax View

April 17, 2019
By

The Parallax View In the previous post of this series unveiling the relationship between UFO sightings and population, we crossed the threshold of normality underpinning linear models to construct...

## A Detailed Guide to Plotting Line Graphs in R using ggplot geom_line

April 16, 2019
By

When it comes to data visualization, it can be fun to think of all the flashy and exciting ways to display a dataset. But if you're trying to convey...

## Tidy correlation tests in R

April 16, 2019
By

When we try to estimate the correlation coefficient between multiple variables, the task is more complicated in order to obtain a simple and tidy result. A simple solution is...

## Setting up RStudio Server on a Cloud for Collaboration and Reproducibility

April 16, 2019
By

Roland Stevenson is a data scientist and consultant who may be reached on Linkedin. When setting up R and RStudio Server on a cloud Linux instance, some thought should be...

## Vectorizing functions in R is easy

April 16, 2019
By

Imagine you have a function that only takes one argument, but you would really like to work on a vector of values. A short example on how function Vectorize()...

## Two interesting facts about high-dimensional random projections

April 16, 2019
By
$Two interesting facts about high-dimensional random projections$

John Cook recently wrote an interesting blog post on random vectors and random projections. In the post, he states two surprising facts of high-dimensional geometry and gives some intuition...

## Controlling Data Layout With cdata

April 16, 2019
By

Here is an example how easy it is to use cdata to re-layout your data. Tim Morris recently tweeted the following problem (corrected). Please will you take pity on...

## Writing a letter to DataCamp

April 15, 2019
By

Since 2017 I have been an instructor for DataCamp, the VC-backed online data science education platform. What this means is that I am not an employee, but I have...

## Customize Your Interactive EDA: Explore the Fuel Economy of the U.S. Car Market

Interactive EDA is nice but customized interactive EDA is even nicer. To celebrate the new CRAN version of my ‘ExPanDaR’ package I prepare a customized variant of ‘ExPanD’ to...

## Customize Your Interactive EDA: Explore the Fuel Economy of the U.S. Car Market

Interactive EDA is nice but customized interactive EDA is even nicer. To celebrate the new CRAN version of my ‘ExPanDaR’ package I prepare a customized variant of ‘ExPanD’ to...

## Even with randomization, mediation analysis can still be confounded

April 15, 2019
By

Randomization is super useful because it usually eliminates the risk that confounding will lead to a biased estimate of a treatment effect. However, this only goes so far. If...

## The sinh-arcsinh normal distribution

April 15, 2019
By
$The sinh-arcsinh normal distribution$

This month’s issue of Significance magazine has a very nice summary article of the sinh-arcsinh normal distribution. (Unfortunately, the article seems to be behind a paywall.) This distribution was...

## BayesComp 20 [full program]

April 15, 2019
By

The full program is now available on the conference webpage of BayesComp 20, next 7-10 Jan 2020. There are eleven invited sessions, including one j-ISBA session, and a further...

## Bioconductor S4 classes for high-throughput omics data

April 15, 2019
By

Bioconductor S4 classes for high-throughput omics data Motivation Multi-omics data integration and analysis. What a beast! It is one of the major challenges in the era of...

## R Programmers Earn More than Python Programmers

April 14, 2019
By

At least globally, that is. According to the 2019 Stack Overflow Developer Survey, R users globally reported earning an average of \$64k per year, \$1k more than the \$63k...

## How do we combine errors? The linear case

In our research work, we usually fit models to experimental data. Our aim is to estimate some biologically relevant parameters, together with their standard errors. Very often, these parameters...

## New package: GetBCBData

April 14, 2019
By

The Central Bank of Brazil (BCB) offers access to its SGS system (sistema gerenciador de series temporais) with a official API available here. With time, I find myself using more...