Articles by Joseph Rickert

Learning from Learning Curves

March 29, 2016 | Joseph Rickert

by Bob Horton, Senior Data Scientist, Microsoft This is a follow-up to my earlier post on learning curves. A learning curve is a plot of predictive error for training and validation sets over a range of training set sizes. Here we’re using simulated data to explore some fundamental relationships ... [Read more...]

Get ready for the New York R Conference

March 24, 2016 | Joseph Rickert

by Joseph Rickert Last year, I wrote the New York R Conference “set the bar pretty darn high for a first time conference”. Not only was there an outstanding lineup of speakers, but the energy and enthusiasm that conference attendees brought with them, or maybe just generated on the spot, ... [Read more...]

Scoring R Models with Excel

March 17, 2016 | Joseph Rickert

by Joseph Rickert In a post late last year, my colleague and fellow blogger, Andrie de Vries described enhancements to the AzureML R package that makes it easy to publish R functions that consume data frames as Azure Web Services. A very nice consequence is that it is now feasible ... [Read more...]

Computing Classification Evaluation Metrics in R

March 11, 2016 | Joseph Rickert

by Said Bleik, Shaheen Gauher, Data Scientists at Microsoft Evaluation metrics are the key to understanding how your classification model performs when applied to a test dataset. In what follows, we present a tutorial on how to compute common metrics that are often used in evaluation, in addition to metrics ... [Read more...]

Bay Area R User Group at Strata and PAW

March 10, 2016 | Joseph Rickert

by Joseph Rickert I always think of Strata Hadoop World and Predictive Analytics World as initiating the Spring conference season here in the San Francisco Bay Area. The rainy season is usually over by the end of March and it is a perfect time to visit. If you are traveling ... [Read more...]

Confidence Intervals for Random Forests

March 3, 2016 | Joseph Rickert

by Joseph Rickert Random Forests, the "go to" classifier for many data scientists, is a fairly complex algorithm with many moving parts that introduces randomness at different levels. Understanding exactly how the algorithm operates requires some work, and assessing how good a Random Forests model fits the data is a ... [Read more...]

Analysing the movements of a cat

March 1, 2016 | Joseph Rickert

by Verena Haunschmid Since I have a cat tracker, I wanted to do some analysis of the behavior of my cats. I have shown how to do some of these things here. Data Collection The data was collected using the Tractive GPS Pet Tracker over a period of about one ... [Read more...]

Generating and Visualizing Multivariate Data with R

February 25, 2016 | Joseph Rickert

By Joseph Rickert The ability to generate synthetic data with a specified correlation structure is essential to modeling work. As you might expect, R’s toolbox of packages and functions for generating and visualizing data from multivariate distributions is impressive. The basic function for generating multivariate normal data is mvrnorm() ... [Read more...]

Bay Area useR Group Lightning Talks

February 18, 2016 | Joseph Rickert

by Joseph Rickert Earlier this month the Bay Area useR Group (BARUG) held it annual lightning talk meeting. This is by far our most popular meeting format: eight, 15 minute talks (12 minutes speaking and 3 minutes Q & A while the next speaker is setting up) packed into a two hour time slot. ... [Read more...]

More R User Group Sites

February 16, 2016 | Joseph Rickert

by Joseph Rickert Last month I wrote about how several R user groups were making use of GitHub and listed some sites that I thought had interesting material. A few readers were kind enough to point out sites that I had missed; so I would just like to give a ... [Read more...]

R User Groups in Poland

February 9, 2016 | Joseph Rickert

by Przemyslaw Biecek The first meeting of R users in Poland took place in Wroclaw in 2008. It was a one-day conference with 27 participants and 6 talks. Today, we have three large groups of R users in major Polish cities (according to meetup.com there is 640 users in SER - Warsaw, 235 in ... [Read more...]

Using Microsoft R Open with RStudio

February 4, 2016 | Joseph Rickert

by Joseph Rickert A frequent question that we get here at Microsoft about MRO (Microsoft R Open) is: can be used with RStudio? The short answer is absolutely yes! In fact, more than just being compatible, MRO is the perfect complement for the RStudio environment. MRO is a downstream distribution ... [Read more...]

R User Groups on GitHub

January 28, 2016 | Joseph Rickert

by Joseph Rickert Quite a few times over the past few years I have highlighted presentations posted by R user groups on their websites and recommended these sites as a source for interesting material, but I have never thought to see what the user groups were doing on GitHub. As ... [Read more...]

Pipelining R and Python in Notebooks

January 26, 2016 | Joseph Rickert

by Micheleen Harris Microsoft Data Scientist As a Data Scientist, I refuse to choose between R and Python, the top contenders currently fighting for the title of top Data Science programming language. I am not going to argue about which is better or pit Python and R against each other. ... [Read more...]

Getting Started with Markov Chains: Part 2

January 22, 2016 | Joseph Rickert

by Joseph Rickert In a previous post, I showed some elementary properties of discrete time Markov Chains could be calculated, mostly with functions from the markovchain package. In this post, I would like to show a little bit more of the functionality available in that package by fitting a Markov ... [Read more...]

A gentle introduction to parallel computing in R

January 19, 2016 | Joseph Rickert

by John Mount Ph.D. Data Scientist at Win-Vector LLC Let's talk about the use and benefits of parallel computation in R. IBM's Blue Gene/P massively parallel supercomputer (Wikipedia). Parallel computing is a type of computation in which many calculations are carried out simultaneously." Wikipedia quoting: Gottlieb, Allan; Almasi, ... [Read more...]

New Data Sources for R

January 14, 2016 | Joseph Rickert

by Joseph Rickert Over the past few months, a number of new CRAN packages have appeared that make it easier for R users to gain access to curated data. Most of these provide interfaces to a RESTful API written by the data publishers while a few just wrap the data ... [Read more...]
1 2 3 4 5 6 17

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)