## New Course: Modeling with Data in the Tidyverse

September 12, 2018
By

Here is the course link. Course Description In this course, you will learn to model with data. Models attempt to capture the relationship between an outcome variable of interest and a series of explanatory/predictor variables. Such models can be used for both explanatory purposes, e.g. "Does knowing professors' ages help explain their teaching evaluation scores?", and predictive purposes, e.g., "How well...

## Using R to Create Free Online Dashboards

September 12, 2018
By

It is now possible to create public dashboards, based on R code, for free! To illustrate how it works, I’ve used the free version of Displayr...

## What you missed at satRday @Amsterdam

September 12, 2018
By

As I spent an exciting day at satRday conference in Amsterdam I would like to share with you my thoughts. TL;DR: Event was cool, my presentation went fine 😉 General Wow, I don’t remember an event where I had so much fun. From most of the talks, I took out something valuable. People were really Artykuł What you missed...

## Interpretation of the AUC

September 12, 2018
By

The AUC* or concordance statistic c is the most commonly used measure for diagnostic accuracy of quantitative tests. It is a discrimination measure which tells us how well we can classify patients in two groups: those with and those without the outcome of interest. Since the measure is based on ranks, it is not sensitive The post Interpretation of...

## Is Rolex Truly A Luxury Brand? (How Online Market Is Diluting A Brand)

September 11, 2018
By

Background Information Brands like Rolex employ a number of methods to maintain their position as a luxury watchmaker. They are selective about the brand ambassadors they hire, the location and decoration of flagship stores, the events they sponsor, and, most importantly, the price tag of the watches. With such effort, Rolex has become one of

## Getting started with deep learning in R

September 11, 2018
By

There are good reasons to get into deep learning: Deep learning has been outperforming the respective “classical” techniques in areas like image recognition and natural language processing for a while now, and it has the potential to bring interesting insights even to the analysis of tabular data. For many R users interested in deep learning, the hurdle is not...

## GDP Data via API

September 11, 2018
By

Today, we will look at the GDP data that is released every quarter or so by the Bureau of Economic Analysis (BEA), and get familiar with the BEA API (see the documentation here). For a primer on GDP in general, BEA publishes this guide. To access the BEA API, we will need two packages, httr and jsonlite. library(tidyverse) library(tidyquant) library(httr) library(jsonlite) We also need to...

## udpipe version 0.7 for Natural Language Processing (#NLP) alongside #tidytext, #quanteda, #tm

September 11, 2018
By

This blogpost announces the release of the udpipe R package version 0.7 on CRAN. udpipe is an R package which does tokenization, parts of speech tagging, lemmatization, morphological feature tagging and dependency parsing. It's main feature is that it is a lightweight R package which works on more than 50 languages and gives you rich NLP output out of...

## Video: R and Python in in Azure HDInsight

September 11, 2018
By

Azure HDInisght was recently updated with version 9.3 of ML Services in HDInsight, which provides integration with R and Python. In particular, it makes it possible to run R and Python within HDInsight's managed Spark instance. The integration provides: R and Python support, with interaction via Visual Studio, VS Code, or RStudio Specialized distributed analytics libraries for R and...

## ImageNet needs more Wild Boar Photos

September 10, 2018
By

Is your deep convolutional network misclassifying images? You can find out why with a heatmap of class activation overlaid on its misclassified pictures. A heatmap overlay shows parts of an image most activated in a neural network’s last convolution...

## Modeling Frequency Outcomes with Ordinal Models

September 10, 2018
By

When modeling frequency outcomes, we often need to go beyond the standard Poisson regression due to the strict distributional assumption and to consider more flexible alternatives. In general, there are two broad categories of modeling approaches in light of practical concerns about frequency outcomes. The first category of models are mainly intended to address the

September 10, 2018
By

A first update to the AsioHeaders package arrived on CRAN today. Asio provides a cross-platform C++ library for network and low-level I/O programming. It is also included in Boost – but requires linking when used as part of Boost. This standalone v...

## Going from a human readable Excel file to a machine-readable csv with {tidyxl}

September 10, 2018
By

I won’t write a very long introduction; we all know that Excel is ubiquitous in business, and that it has a lot of very nice features, especially for business practitioners that do not know any programming. However, when people use Excel for purposes it was not designed for, it can be a hassle. Often, people use Excel as a reporting tool, which...

## Binary, beta, beta-binomial

September 10, 2018
By

I’ve been working on updates for the simstudy package. In the past few weeks, a couple of folks independently reached out to me about generating correlated binary data. One user was not impressed by the copula algorithm that is already implemented. I’ve added an option to use an algorithm developed by Emrich and Piedmonte in 1991, and will be...

## What have these birds been studied for? Querying science outputs with R

In the second post of the series where we obtained data from eBird we determined what birds were observed in the county of Constance, and we complemented this knowledge with some taxonomic and trait information in the fourth post of the series. Now, we could be curious about the occurrence of these birds in scientific work. In this post, we will query the scientific literature and...

## riddles on a line [#2]

September 10, 2018
By

A second Riddle(r), with a puzzle related with the integer set Ð={,12,3,…,N}, in that it summarises as Given a random walk on Ð, starting at the middle N/2, with both end states being absorbing states, and a uniform random move left or right of the current value to the (integer) middle of the corresponding (left

## Pivoting on Text in R (vs. Excel)

September 10, 2018
By

This post follows a previous tutorial on pivoting on text in Excel. In this post I will reproduce the exercise in R. This way you begin to see the similarities and differences of the program and begin to diversify your data skill base.   Related: 5 Things Excel Users Should Know About R — Free

## A Quick Appreciation of the R transform Function

September 10, 2018
By

R users who also use the dplyr package will be able to quickly understand the following code that adds an estimated area column to a data.frame. suppressPackageStartupMessages(library("dplyr")) iris %__% mutate( ., Petal.Area = (pi/4)*Petal.Width*Petal.Length) %__% head(.) ## Sepal.Length Sepal.Width Petal.Length Petal.Width Species Petal.Area ## 1 5.1 3.5 1.4 0.2 setosa 0.2199115 ## 2 4.9 3.0 … Continue reading A...

## Naïve Numerical Sums in R

September 10, 2018
By
$Naïve Numerical Sums in R$

Introduction The Kolmogorov distribution (which I call ) is as follows: There is no known simpler form and we have to work with this sum as it is. This is an infinite sum. How can we compute the value of this infinite sum numerically? Naïvely we can do the following: In other words, sum up…Read more Naïve Numerical Sums...

## Step Up Your Dashboard With Shinydashboard – Part 1: Exercises

September 10, 2018
By

The shinydashboard package provides a well-designed dashboard theme for Shiny apps and allows for an easy assembly of a dashboard from a couple of basic building blocks. The package is widely used in commercial environments as well, due to its neat features for building convenient and robust layouts. This exercise set will help you practice Related exercise sets: Spatial Data...

## How to Compute D-Error for a Choice Experiment Using Displayr

September 10, 2018
By

In other articles I provide the mathematical definitions of D-error and worked examples of how to calculate D-error; but in the real world, most people...

## Exploring San Francisco Bay Area’s Bike Share System

September 9, 2018
By

Congested streets and slow-crawling traffic are a fact of life in many metropolitan areas, such as New York City, Los Angeles, and Chicago. Bike sharing is an innovative solution for such problems, and it works by dispersing a large fleet of publicly-available bikes throughout crowded cities for personal transport. Implemented in 2013, Ford GoBike is Related Post Analysis and Visualization...

## Use `purrr` to feed four cats

September 9, 2018
By

Use purrr to feed four cats In this example we will show you how to go from a ‘for loop’ to purrr. Use this as a cheatsheet when you want to replace your for loops. Imagine having 4 cats. (like this one:) Four real cats who need food, care and love to live a happy life. They are starting to meow, so it’s...

## survHE update

September 9, 2018
By

Because I have been preparing an extended presentation on (Bayesian) survival analysis in health economic evaluation, I took the opportunity to make some tweaks to survHE — nothing major, but I was aware of couple of imprecisions in the code or things I wanted to make a bit better, so while I was knitring my slides, I made the...

## sabre: or how to compare two maps?

Creating or determination of regions is a useful way to describe the world. Regionalization does not only allow for a quicker understanding of spatial patterns but also can influence how regions are managed. Regions are created in various disciplines. We can delineate regions based on a single property (e.g. landform regions or climate regions) or several factors (e.g. ecoregions). There are also political...

## survHE update

September 9, 2018
By

Because I have been preparing an extended presentation on (Bayesian) survival analysis in health economic evaluation, I took the opportunity to make some tweaks to survHE — nothing major, but I was aware of couple of imprecisions in the code or things I wanted to make a bit better, so while I was knitring my slides, I made the...

A while ago we onboarded an exciting package, codemetar by Carl Boettiger. codemetar is an R specific information collector and parser for the CodeMeta project. In particular, codemetar can digest metadata about an R package in order to fill the terms recognized by CodeMeta. This means extracting information from DESCRIPTION but also from e.g. continuous integration badges in the README! In this note, we’ll take advantage of codemetar::extract_badges function to...

## Driving Drill Dynamically with Docker and Updating Storage Configurations On-the-fly with sergeant

September 9, 2018
By

The sergeant🔗 package has a minor update that adds REST API coverage for two “new” storage endpoints that make it possible to add, update and remove storage configurations on-the-fly without using the GUI or manually updating a config file. This is an especially handy feature when paired with Drill’s new, official Docker container since that... Continue reading →

## Fitting exponential decays in R, the easy way

September 9, 2018
By

Exponential decays can describe many physical phenomena: capacitor discharge, temperature of a billet during cooling, kinetics of first order chemical reactions, radioactive decay, and so on. They are very useful functions, but can be tricky to fit in R: you’ll quickly run into a “singular gradient” error. Thankfully, self-starting functions provide an easy and automatic fix. Read on to...