Blog Archives

Combine choropleth data with raster maps using R

December 28, 2016
By
Combine choropleth data with raster maps using R

Switzerland is a country with lots of mountains, and several large lakes. While the political subdivisions (called municipalities) cover the high mountains and lakes, nothing much of economic interest happens in these places. (Raclette and sailing are wonderful, but don't count for our purposes.) For this reason, the Swiss Federal Statistical Office publishes the boundaries of the "productive" parts...

Read more »

The Basics of Bayesian Statistics

December 26, 2016
By

Bayesian Inference is a way of combining information from data with things we think we already know. For example, if we wanted to get an estimate of the mean height of people, we could use our prior knowledge that people are generally between 5 and 6 feet tall to inform the results from the data we collect. If our...

Read more »

Merry ChRistmas!

December 23, 2016
By
Merry ChRistmas!

Christmas day is soon upon us, so here's a greeting made with R: Each frame is a Voronoi Tesselation: about 1,000 points are chosen across the plane, which each generate a polygon comprising the region closer to it than any other selected point. These process is repeated for three designs (a heart, the word "Merry", and the word "Xmas"),...

Read more »

Take a Test Drive of the Linux Data Science Virtual Machine

December 22, 2016
By
Take a Test Drive of the Linux Data Science Virtual Machine

If you've been thinking about trying out the Data Science Virtual Machine on Linux, but don't yet have an Azure account, you can now take a free test drive -- no credit card required! Just visit the Linux DSVM Marketplace page and click the blue button: The Linux Data Science Virtual Machine includes all of the tools a modern...

Read more »

Interactive decision trees with Microsoft R

December 20, 2016
By
Interactive decision trees with Microsoft R

Even though ensembles of trees (random forests and the like) generally have better predictive power and robustness, fitting a single decision tree to data can often be very useful for: understanding the important variables in a data set exploring unusual subsegments of the data (and the explanatory variables that define them) presenting a simple, decision-based model to management to...

Read more »

Mixed Integer Programming in R with the ompr package

December 19, 2016
By
Mixed Integer Programming in R with the ompr package

Numerical optimization is an important tool in the data scientist's toolbox. Many classical statistical problems boil down to finding the highest (or lowest) point on a multi-dimensional surface: the base R function optim provides many techniques for solving such maximum likelihood problems. Counterintuitively, numerical optimizations are easiest (though rarely actually easy) when all of the variables are continuous and...

Read more »

Predicting flu deaths with R

December 16, 2016
By
Predicting flu deaths with R

As Google learned, predicting the spread of influenza, even with mountains of data, is notoriously difficult. Nonetheless, bioinformatician and R user Shirin Glander has created a two-part tutorial about predicting flu deaths with R (part 2 here). The analysis is based on just 136 cases of influenza A H7N9 in China in 2013 (data provided in the outbreaks package)...

Read more »

How the State of Indiana uses R and Azure to forecast employment

December 15, 2016
By
How the State of Indiana uses R and Azure to forecast employment

"Big Data" generates a lot of news these days, but sometimes small data still means big computation. Indiana's Department of Workforce Development has the responsibility to forecast future employment rates in the State of Indiana. And not just the number of jobs available: the department also needs to forecast the types of jobs that will be available, so the...

Read more »

One Page R: A Survival Guide to Data Science with R

December 14, 2016
By

If you're looking to get started with data science in R, a great place to start is OnePageR by Graham Williams. (Graham is the creator of Rattle, author of Data Mining with Rattle and R, and Director of Data Science at Microsoft.) This free (CC-licensed) resource is a series of hands-on mini-chapters and associated R code, organized into four...

Read more »

Visualizing taxi trips between NYC neighborhoods with Spark and Microsoft R Server

December 14, 2016
By
Visualizing taxi trips between NYC neighborhoods with Spark and Microsoft R Server

by Ali Zaidi, Data Scientist at Microsoft In previous post we showcased the use of the sparklyr package for manipulating large datasets using a familiar dplyr syntax on top of Spark HDInsight Clusters. In this post, we will take a look at the RxSpark API for R, part of the RevoScaleR package and the Microsoft R Server distribution of...

Read more »

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)