## Datashader is a big deal

March 22, 2017
I recently got back from Strata West 2017 (where I ran a very well received workshop on R and Spark). One thing that really stood out for me at the exhibition hall was Bokeh plus datashader from Continuum Analytics. I had the privilege of having Peter Wang himself demonstrate datashader for me and answer a … Continue...

## Running your R code on Azure with mrsdeploy

March 22, 2017
by John-Mark Agosta, data scientist manager at Microsoft Let’s say you’ve built a model in R that is larger than you can conveniently run locally, and you want to take advantage of Azure’s resources simply to run it on a larger machine. This blog explains how to provision and run an Azure virtual machine (VM) for this, using the...

## February 2017 New Package Picks

March 22, 2017
by Joseph Rickert One hundred and forty-five new packages were added to CRAN in February. Here are 47 interesting packages organized into five categories; Biostatistics, Data, Data Science, Statistics and Utilities. Biostatistics BaTFLED3D v0.1.7: Implements a machine learning algorithm to make predictions and determine interactions in data that varies along three independent modes. It was

## The Hitchhiker’s Guide to Ggplot2 in R

March 22, 2017
Published: 2016-11-30 Updated: 2017-03-23 "Any bleeder knows that books are never finished, only abandoned." Why Information Grows About the book You can find the book here. This is a book that may look complete but changes in R package are al...

## Suggests != Depends

March 22, 2017
A number of packages on CRAN use Suggests: casually. They list other packages as "not required" in Suggests: -- as opposed to absolutely required via Imports: or the older Depends: -- yet do not test for their use in either examples or, more commonly, unit tests. So e.g. the unit tests are bound to fail because, well, Suggests != Depends. This has...

## San Francisco EARL: First round of speakers announced

March 22, 2017
We’re excited to announced the first round of gReat speakers for San Francisco EARL. Alongside our keynote speakers, Hilary Parker and Ricardo Bion, R Users from a range of industries will share their R stories. Take a look at our … Continue reading →

## Simulating Unown encounter rates in Pokémon Go

March 21, 2017
Pokémon Go is an augmented reality game where people with smartphones walk around and catch Pokémon. As in the classic games, players are Pokémon “trainers” who have to travel around and collect creatures. Some types are rarer than others, som...

## anytime 0.2.2

March 21, 2017
A bugfix release of the anytime package arrived at CRAN earlier today. This is tenth release since the inaugural version late last summer, and the second (bugfix / feature) release this year. anytime is a very focused package aiming to do just one th...

## February 2017 New Package Picks

March 21, 2017
One hundred and forty-five new packages were added to CRAN in February. Here are 47 interesting packages organized into five categories; Biostatistics, Data, Data Science, Statistics and Utilities. Biostatistics BaTFLED3D v0.1.7: Implements a machine learning algorithm to make predictions and determine interactions in data that varies along three independent modes. It was developed to predict the growth of...

## Parallel benchmarking with OpenML and mlr

March 21, 2017
With this post I want to show you how to benchmark several learners (or learners with different parameter settings) using several data sets in a structured and parallelized fashion. For this we want to use batchtools. The data that we will use here is stored on the open machine learning platform openml.org and we can download it together...

## Use mlrMBO to optimize via command line

March 21, 2017
Many people who want to apply Bayesian optimization want to use it to optimize an algorithm that is not implemented in R but runs on the command line as a shell script or an executable. We recently published mlrMBO on CRAN. As a normal package it normally operates inside of R, but with this post I want to demonstrate how...

## Data Analytics for Societal Good

March 21, 2017
At my workplace, employees celebrate a month of Data Analysis for Societal good, every year. During this time, we try to help NPOs (Non-Profit Organisations) in gaining insights from their data, for free. Whilst we are engaged in this practice at workp...

## Financial time series forecasting – an easy approach

March 21, 2017
Financial time series analysis and their forecasting have an history of remarkable contributions. It is then quite hard for the beginner to get oriented and capitalize from reading such scientific literature as it requires a solid understanding of basic statistics, a detailed study of the ground basis of time series analysis tools and the knowledge Related Post

## The Next Era of Research Communication

March 21, 2017
From the days of actual research papers (before the digital age), to now where research papers are posted online first, not much has really changed in the way we communicate. We still use static images, formulas and a bunch of text to show what we have...

## Give a talk about an application of R at EARL

March 21, 2017
The EARL (Enterprise Applications of R) conference is one of my favourite events to go to. As the name of the conference suggests, the focus of the conference is where the rubber of the R language meets the road of it being used to solve real-world problems. Prior conferences have included presentations on how Maersk uses R to optimize...

## The one thing you need to master data science

March 21, 2017
The most important factor for mastering data science is ... The post The one thing you need to master data science appeared first on SHARP SIGHT LABS.

March 21, 2017
camsRad is a lightweight R client for the CAMS Radiation Service, that provides satellite-based time series of solar irradiation for the actual weather conditions as well as for clear-sky conditions. Satellite-based solar irradiation data have been around roughly as long our modern era satellites. But the price tag has been very high, in the range of several...

## Simultaneous intervals for derivatives of smooths revisited

March 21, 2017
Eighteen months ago I screwed up! I’d written a post in which I described the use of simulation from the posterior distribution of a fitted GAM to derive simultaneous confidence intervals for the derivatives of a penalized spline. It was a nice post that attracted some interest. It was also wrong. In December I corrected the first...

## Is it possible to use RevoScaleR package in Power BI?

March 20, 2017
I was invited to deliver a session for Belgium User Group on SQL Server and R integration. After the session – which we did online using web based Citrix  – I got an interesting question: “Is it possible to use RevoScaleR performance computational functions within Power BI?“. My first answer was,  a sceptical yes. But … Continue...

## Sentiment Analysis of Warren Buffett’s Letters to Shareholders

March 20, 2017
Last week, I was reading through Warren Buffett's most recent letter to Berkshire Hathaway shareholders. Every year, he writes a letter that he makes publicly available on the Berkshire Hathaway website. In the letters he talks about the performance of...

## Alteryx integrates with Microsoft R

March 20, 2017
You can now use Alteryx Designer, the data science workflow tool from Alteryx, as a drag-and-drop interface for many of the big-data statistical modeling tools included with Microsoft R. Alteryx v11.0 includes expanded support for Microsoft SQL Server 2016, Microsoft R Server, Azure SQL Data Warehouse, and Microsoft Analytics Platform System (APS), with new workflow tools to access functionality...

## What’s in the words? Comparing artists and lyrics with R.

March 20, 2017
It's been a while since I had the opportunity to post something on music. Let's get back to that.I got my hands on some song lyrics by a range of artists. (I have an R script to download all lyrics for a given artist from a lyrics website. Since these lyrics are protected by copyright law, I...

## Linear Regression and ANOVA shaken and stirred (Part 2)

March 20, 2017
In the first part of this entry I did show some mtcars examples, where am can be useful to explain ANOVA as its observations are defined as: $$am_i = \begin{cases}1 &\text{ if car } i \text{ is manual} \cr 0 &\text{ if car } i \text{ is automatic}\end{cases}$$ Now I'll show another example to continue the last example from...

## survminer 0.3.0

March 20, 2017
I’m very pleased to announce that survminer 0.3.0 is now available on CRAN. survminer makes it easy to create elegant and informative survival curves. It includes also functions for summarizing and inspecting graphically the Cox proportional hazards model assumptions. This is a big release...

## R Correlation Tutorial

March 20, 2017
In this tutorial, you explore a number of data visualization methods and their underlying statistics. Particularly with regard to identifying trends and relationships between variables in a data frame. That’s right, you’ll focus on concepts suc...

## Data validation with the assertr package

March 20, 2017
Version 2.0 of my data set validation package assertr hit CRAN just this weekend. It has some pretty great improvements over version 1. For those new to the package, what follows is a short and new introduction. For those who… Continue reading →