## Tutorial: Launch a Spark and R cluster with HDInsight

September 22, 2017
By

If you'd like to get started using R with Spark, you'll need to set up a Spark cluster and install R and all the other necessary software on the nodes. A really easy way to achieve that is to launch an HDInsight cluster on Azure, which is just a managed Spark cluster with some useful extra components. You'll just...

## Multi-Dimensional Reduction and Visualisation with t-SNE

September 22, 2017
By

t-SNE is a very powerful technique that can be used for visualising (looking for patterns) in multi-dimensional data. Great things have been said about this technique. In this blog...

September 22, 2017
By

There are substantial differences between ad-hoc analyses (be they: machine learning research, data science contests, or other demonstrations) and production worthy systems. Roughly: ad-hoc analyses have to be correct...

## Mining USPTO full text patent data – Analysis of machine learning and AI related patents granted in 2017 so far – Part 1

September 21, 2017
By

The United States Patent and Trademark office (USPTO) provides immense amounts of data (the data I used are in the form of XML files). After coming across these datasets,...

## Will Stanton hit 61 home runs this season?

September 21, 2017
By

## Pirating Pirate Data for Pirate Day

September 21, 2017
By

This past Tuesday was Talk Like A Pirate Date, the unofficial holiday of R (aRRR!) users worldwide. In recognition of the day, Bob Rudis used R to create this...

## Exploratory Data Analysis of Tropical Storms in R

September 21, 2017
By

Exploratory Data Analysis of Tropical Storms in R The disastrous impact of recent hurricanes, Harvey and Irma, generated a large influx of data within the online community. I was...

## Gold-Mining – Week 3 (2017)

September 21, 2017
By

Week 3 Gold Mining and Fantasy Football Projection Roundup now available. Go get that free agent gold! The post Gold-Mining – Week 3 (2017) appeared first on Fantasy Football Analytics.

## Don’t teach students the hard way first

September 21, 2017
By

Imagine you were going to a party in an unfamiliar area, and asked the host for directions to their house. It takes you thirty minutes to get there, on...

## ggformula: another option for teaching graphics in R to beginners

September 21, 2017
By

A previous entry (http://sas-and-r.blogspot.com/2017/07/options-for-teaching-r-to-beginners.html) describes an approach to teaching graphics in R that also “get students doing powerful things quickly”, as David Robinson suggested. In this guest blog entry, Randall Pruim...

## Comparing Trump and Clinton’s Facebook pages during the US presidential election, 2016

September 21, 2017
By

R has a lot of packages for users to analyse posts on social media. As an experiment in this field, I decided to start with the biggest one: Facebook....

## Visualizing the Spanish Contribution to The Metropolitan Museum of Art

September 21, 2017
By

Well I walk upon the river like it’s easier than land (Love is All, The Tallest Man on Earth) The Metropolitan Museum of Art provides here a dataset with...

## Pandigital Products: Euler Problem 32

September 20, 2017
By

Euler Problem 32 returns to pandigital numbers, which are numbers that contain one of each digit. Like so many of the Euler Problems, these numbers serve no practical purpose...

## Report from Mexico City

September 20, 2017
By

Editors Note: It has been heartbreaking watching the images from México City. Teresa Ortiz, co-organizer of R-Ladies CDMX reports on efforts of data scientists to help. Our thoughts are...

## Monte Carlo Simulations & the "SimDesign" Package in R

September 20, 2017
By

Past posts on this blog have included several relating to Monte Carlo simulation - e.g., see here, here, and here.Recently I came across a great article by Matthew Sigal...

## Answer probability questions with simulation (part-2)

September 20, 2017
By

This is the second exercise set on answering probability questions with simulation. Finishing the first exercise set is not a prerequisite. The difficulty level is about the same –...

## EARL London 2017 – That’s a wrap!

September 20, 2017
By

...

## Preview: ALTREP promises to bring major performance improvements to R

September 20, 2017
By

Changes are coming to the internals of the R engine which promise to improve performance and reduce memory use, with dramatic impacts in some circumstances. The changes were first...

## pinp 0.0.2: Onwards

September 20, 2017
By

A first update 0.0.2 of the pinp package arrived on CRAN just a few days after the initial release. We added a new vignette for the package (see below),...

## MLJAR R API

September 20, 2017
By

Hi! We have added R API for mljar - so you can run sklearn, xgboost, lightGBM, Keras, RGF from one R line :) Please check it on https://github.com/mljar/mljar-api-R

## Major update of D3partitionR: Interactive viz’ of nested data with R and D3.js

September 20, 2017
By

D3partitionR is an R package to visualize interactively nested and hierarchical data using D3.js and HTML widget. These last few weeks I’ve been working on a major D3partitionR update...

## Regression Analysis — What You Should’ve Been Taught But Weren’t, and Were Taught But Shouldn’t Have Been

September 20, 2017
By

The above title was the title of my talk this evening at our Bay Area R Users Group. I had been asked to talk about my new book, and...

## 12 Visualizations to Show a Single Number

September 20, 2017
By

Infographics, dashboards, and reports often need to highlight or visualize a single number. But how do you highlight a single number so that it has an impact and looks...

## Improve the Quality of Data Visualizations Using Redundancy

September 20, 2017
By

Using multiple visual elements to represent one variable in a chart can increase accuracy and improve readability. This is called adding redundancy or redundant encoding and, if done right, it will...

## From Biology to Industry. A Blogger’s Journey to Data Science.

September 19, 2017
By

Today, I have given a webinar for the Applied Epidemiology Didactic of the University of Wisconsin - Madison titled “From Biology to Industry. A Blogger’s Journey to Data Science.” I...

## A simstudy update provides an excuse to talk a little bit about latent class regression and the EM algorithm

September 19, 2017
By

I was just going to make a quick announcement to let folks know that I’ve updated the simstudy package to version 0.1.4 (now available on CRAN) to include functions...

## Enterprise-ready dashboards with Shiny and databases

September 19, 2017
By

Inside the enterprise, a dashboard is expected to have up-to-the-minute information, to have a fast response time despite the large amount of data that supports it, and to be...

## Hurricane Irma’s rains, visualized with R

September 19, 2017
By

The USGS has followed up their visualization of Hurricane Harvey rainfalls with an updated version of the animation, this time showing the rain and flooding from Hurricane Irma in...

## Time Series Analysis in R Part 1: The Time Series Object

September 19, 2017
By

Many of the methods used in time series analysis and forecasting have been around for quite some time but have taken a back seat to machine learning techniques in...