Mastering R plot – Part 2: Axis

February 6, 2016
By
Mastering R plot – Part 2: Axis

This is the second part of the Mastering R plot series. The standard plot function in R allows extensive tuning of every element being plotted. There are, however, many possible ways and the standard help file are hard to grasp at the beginning. In this article we will see how to control every aspects of

Read more »

What has Kaggle learned from 2 million machine learning models?

February 5, 2016
By

What has Kaggle learned from 2 million machine learning models? Anthony Goldbloom, founder and CEO...

Read more »

Pitfall of XML package: to know the cause

February 5, 2016
By
Pitfall of XML package:  to know the cause

This is the sequel to the previous report “issues specific to cp932 locale, Japanese Shift-JIS, on Windows“.  In this report, I will dig the issues deeper to find out what … Continue reading →

Read more »

Speaking at DataPhilly February 2016

February 5, 2016
By
Speaking at DataPhilly February 2016

The next DataPhilly meetup will feature a medley of machine-learning talks, including an Intro to ML from yours truly. Check out the speakers list and be sure to RSVP. Hope to see you there! Thursday, February 18, 2016 6:00 PM to 9:00 PM Speakers: Corey Chivers Randy Olson Austin Rochford Corey Chivers (Penn Medicine) Abstract:

Read more »

Introducing Microsoft R Open: Replay and slides

February 5, 2016
By

We had a fantastic turnout to last week's webinar, Introduction to Microsoft R Open. If you missed it, you can watch the replay below. In the talk, I gives some background on the R language and its applications, describe the performance and reproducibility benefits of Microsoft R Open, and give a demonstration of the basics of the R language...

Read more »

Shiny Developer Conference 2016 Recap

February 5, 2016
By

This is a guest post from VP Nagraj, a data scientist embedded within UVA’s Health Sciences Library, who runs our Data Analysis Support Hub (DASH) service.Last weekend I was fortunate enough to be able to participate in the first ever Shiny Developer Conference hosted by RStudio at Stanford University....

Read more »

Cricket analytics with cricketr in paperback and Kindle versions

February 5, 2016
By
Cricket analytics with cricketr in paperback and Kindle versions

My book “Cricket analytics with cricketr” is now available in paperback and Kindle versions. The paperback is available from Amazon (US, UK and Europe) for $ 48.99. The Kindle version can be downloaded from the Kindle store for $2.50 (Rs 169/-). Do pick your copy. It should be a good read for a Sunday afternoon.

Read more »

New Version of “Wrangling F1 Data With R” Just Released…

February 5, 2016
By
New Version of “Wrangling F1 Data With R” Just Released…

So I finally got round to pushing a revised (and typo corrected!) version of Wrangling F1 Data With R: A Data Junkie’s Guide, that also includes a handful of new section and chapters, including descriptions of how to detect undercuts, the new style race history chart that shows the on-track position of each driver for

Read more »

Data from the World Health Organization API

February 5, 2016
By
Data from the World Health Organization API

Eric Persson released yesterday a new WHO R package which allows easy access to the World Health Organization’s data API. He’s also done a nice vignette introducing its use. I had a play and found it was easy access to some interesting data. Some time down the track I might do a comparison of this...

Read more »

Alternate R Markdown Templates

February 4, 2016
By

The knitr/R markdown system is a great way to organize reports and analyses. However, the built-in ones (that come with RStudio/the rmarkdown package) rely on Bootstrap and also use jQuery. There’s nothing wrong with that, but the generated standalone HTML documents (which are a great way to distribute reports) don’t really need all that cruft

Read more »

Death Comes to Us All

February 4, 2016
By
Death Comes to Us All

I have been working with a data set on causes of death in my adopted home state of Utah for a little while now, and I had been struggling with the best way to visualize it. This week, David Robinson released the gganimate package to create animated ggplot2 plots and I thought “AH HA! This is what I...

Read more »

OpenCPU Server Release 1.5.4

February 4, 2016
By
OpenCPU Server Release 1.5.4

Version 1.5.4 of the OpenCPU server has been released to Launchpad (Ubuntu) and OBS (Fedora). This update does not introduce any changes to the OpenCPU API itself; it improves to the deb/rpm installation packages and upgrades the bundled opencpu system R package library. Installing and Updating Existing Ubuntu and Fedora serves...

Read more »

Free video course: applied Bayesian A/B testing in R

February 4, 2016
By
Free  video course: applied Bayesian A/B testing in R

As a “thank you” to our blog, mailing list, and Twitter followers (@WinVectorLLC) we at Win-Vector LLC have decided to re-release our formerly fee-based A/B testing video course as a free (advertisement supported) video course here on Youtube. The course emphasizes how to design A/B tests using prior “guestimates” of effect sizes (often you have … Continue reading...

Read more »

Weekly R-Tips: Visualizing Predictions

February 4, 2016
By
Weekly R-Tips: Visualizing Predictions

Lets say that we estimated a linear regression model on time series data with lagged predictors. The goal is to estimate sales as a function of inventory, search volume, and media spend from two months ago. After using the lm function to perform linear regression, we predict sales using values from two month ago. If

Read more »

Predicting wine quality using Random Forests

February 4, 2016
By
Predicting wine quality using Random Forests

Hello everyone! In this article I will show you how to run the random forest algorithm in R. We will use the wine quality data set (white) from the UCI Machine Learning Repository. What is the Random Forest Algorithm? In a previous post, I outlined how to build decision trees in R. While decision trees

Read more »

Using Microsoft R Open with RStudio

February 4, 2016
By
Using Microsoft R Open with RStudio

by Joseph Rickert A frequent question that we get here at Microsoft about MRO (Microsoft R Open) is: can be used with RStudio? The short answer is absolutely yes! In fact, more than just being compatible, MRO is the perfect complement for the RStudio environment. MRO is a downstream distribution of open source R that supports multiple operating systems...

Read more »

The R-Podcast Episode 17: A Simply Radiant Chat with Vincent Nijs

February 3, 2016
By

The R-Podcast continues its series on Shiny and the first-ever Shiny Developer Conference by catching up with Vincent Nijs, associate professor of marketing at UC San Diego and one of the earliest adopters of Shiny. Some of the topics we cover include...

Read more »

optimal simulation on a convex set

February 3, 2016
By
optimal simulation on a convex set

This morning, we had a jam session at the maths department of Paris-Dauphine where a few researchers & colleagues of mine presented their field of research to the whole department. Very interesting despite or thanks to the variety of topics, with forays into the three-body problem(s) , mean fields for Nash equilibrium (or

Read more »

Simple Distributions for Mixtures?

February 3, 2016
By
Simple Distributions for Mixtures?

The idea of GLMs is that given some covariates,  has a distribution in the exponential family (Gaussian, Poisson, Gamma, etc). But that does not mean that  has a similar distribution… so there is no reason to test for a Gamma model for  before running a Gamma regression, for instance. But are there cases where it might work? That the non-conditional distribution is...

Read more »

Mapping the world’s longest plane fights

February 3, 2016
By
Mapping the world’s longest plane fights

If you're one of those people that dreads long plane flights, this map by Matt Strimas-Mackey will help you find routes to avoid. It shows Wikipedia's list of the top 30 scheduled commercial flights by distance (with code-share duplicates removed), represented as a map showing the routes colour-coded by the time spent in the air. Don't be distracted by...

Read more »

When k-means Clustering Fails

February 2, 2016
By
When k-means Clustering Fails

This entry is part 19 of 19 in the series Using RLetting the computer automatically find groupings in data is incredibly powerful and is at the heart of “data mining” and “machine learning”. One of the most widely used methods …   read more ...

Read more »

Commonmark: Super Fast Markdown Rendering in R

February 2, 2016
By
Commonmark: Super Fast Markdown Rendering in R

A few months ago I first announced the commonmark R package. Since then there have been a few more releases… time for an update! What is CommonMark? Markdown is used in many places these days, however the original spec actually leaves some ambiguity which makes it difficult to optimize and leads to inconsistencies...

Read more »

Unemployment in Europe

February 2, 2016
By
Unemployment in Europe

A couple of years I have made plots of unemployment and its change over the years. At first this was a bigger and complex piece of code. As things have progressed, the code can now become pretty concise. There are just plenty of packages to do the heav...

Read more »

memoise 1.0.0

February 2, 2016
By
memoise 1.0.0

We are pleased to announce version 1.0.0 of the memoise package is now available on CRAN. Memoization stores the value of function call and returns the cached result when the function is called again with the same arguments. The following function computes Fibonacci numbers and illustrates the usefulness of memoization. Because the function definition is

Read more »

tidyr 0.4.0

February 2, 2016
By
tidyr 0.4.0

I’m pleased to announce tidyr 0.4.0. tidyr makes it easy to “tidy” your data, storing it in a consistent form so that it’s easy to manipulate, visualise and model. Tidy data has a simple convention: put variables in the columns and observations in the rows. You can learn more about it in the tidy data

Read more »

httr 1.1.0 (and 1.0.0)

February 2, 2016
By
httr 1.1.0 (and 1.0.0)

httr 1.1.0 is now available on CRAN. The httr packages makes it easy to talk to web APIs from R. Learn more in the quick start vignette. Install the latest version with: install.packages("devtools") When writing this blog post I discovered that I forgot to annouce httr 1.0.0. This was a major release marking the transition

Read more »

7 Ways to Perplex a Data Scientist

February 2, 2016
By
7 Ways to Perplex a Data Scientist

On the heels of a report showing the inefficacy of government-run cyber security, it’s imperative to understand the limitations of …Continue reading →

Read more »

Devtools 1.10.0

February 2, 2016
By
Devtools 1.10.0

Devtools 1.10.0 is now available on CRAN. Devtools makes package building so easy that a package can become your default way to organise code, data, documentation, and tests. You can learn more about creating your own package in R packages. Install devtools with: install.packages("devtools") This version is mostly a collection of bug fixes and minor

Read more »

2015 in review and a preview of 2016

February 2, 2016
By

DataCamp’s mission is to build the best online platform for data science education with a focus on R and Python. In this post, we share our journey during 2015 and our plans for 2016 (hint: we’re hiring)*: 2015 IN REVIEW 2015 - The data** Since we’re all obsessed with data, let’s start with some numbers: In 2015… the...

Read more »

Sponsors