Blog Archives

Working with the RStudio CRAN logs

June 25, 2015
By
Working with the RStudio CRAN logs

by Joseph Rickert The installr package has some really nice functions for working with the daily package download logs for the RStudio CRAN mirror which RStudio graciously makes available at http://cran-logs.rstudio.com/. The following code uses the download_RStudio_CRAN_data() function to download a month's worth of .gz compressed daily log files into the test3 directory and then uses the function read_RStudio_CRAN_data()to...

Read more »

DeployR Data I/O

June 22, 2015
By

by Sean Wells, Senior Software Engineer, Microsoft and David Russell DeployR exists to solve a number of fundamental R analytics integration problems faced by application developers. For example, have you ever wondered how you might execute an R script from within a Web-based dashboard, an enterprise middleware solution, or a mobile application? DeployR makes it very simple. In fact,...

Read more »

Fishing for packages in CRAN

June 18, 2015
By
Fishing for packages in CRAN

by Joseph Rickert It is incredibly challenging to keep up to date with R packages. As of today (6/16/15), there are 6,789 listed on CRAN. Of course, the CRAN Task Views are probably the best resource for finding what's out there. A tremendous amount of work goes into maintaining and curating these pages and we should all be grateful...

Read more »

Pairwise-complete correlation considered dangerous

June 16, 2015
By

by B. W. Lewis This note warns about potentially misleading results when using the use=pairwise.complete.obs and related options in R’s cor and cov functions. Pitfalls are illustrated using a very simple pathological example followed by a brief list of alternative ways to deal with missing data and some references about them. Known unknowns R includes excellent facilities for handling...

Read more »

R User Groups are Everywhere

June 11, 2015
By
R User Groups are Everywhere

by Joseph Rickert In a little over three weeks useR! 2015 will convene in Aalborg, Denmark and I am looking forward to being there and learning and talking about R user groups. The following map shows the big picture for R User Groups around the world. However, it is very difficult to keep it up to date. Just after...

Read more »

Some Impressions from R Finance 2015

June 4, 2015
By
Some Impressions from R Finance 2015

by Joseph Rickert The R/Finance 2015 Conference wrapped up last Saturday at UIC. It has been seven years already, but R/Finance still has the magic! - mostly very high quality presentations and the opportunity to interact and talk shop with some of the most accomplished R developers, financial modelers and even a few industry legends such as Emanuel Derman...

Read more »

Using Azure as an R datasource: Part 2 – Pulling data from MySQL/MariaDB

June 2, 2015
By
Using Azure as an R datasource: Part 2 – Pulling data from MySQL/MariaDB

by Gregory Vandenbrouck Software Engineer, Mirosoft This post is the second in a series that covers pulling data from various Windows Azure hosted storage solutions (such as MySQL, or Microsoft SQL Server) to an R client on Windows or Linux. Last time we covered pulling data from SQL Azure to an R client on Windows. This time we’ll be...

Read more »

RevoScaleR’s Naive Bayes Classifier rxNaiveBayes()

May 28, 2015
By
RevoScaleR’s Naive Bayes Classifier rxNaiveBayes()

by Joseph Rickert, Because of its simplicity and good performance over a wide spectrum of classification problems the Naïve Bayes classifier ought to be on everyone's short list of machine learning algorithms. Now, with version 7.4 we have a high performance Naïve Bayes classifier in Revolution R Enterprise too. Like all Parallel External Memory Algorithms (PEMAs) in the RevoScaleR...

Read more »

Situational Baseball: Analyzing Runs Potential Statistics

May 26, 2015
By

By Mark Malter A few weeks ago, I wrote about my Baseball Stats R shiny application, where I demonstrated how to calculate runs expectancies based on the 24 possible bases/outs states for any plate appearance. In this article, I’ll explain how I expanded on that to calculate the probability of winning the game, based on the current score/inning/bases/outs state....

Read more »

First Day Highlights from the Extremely Large Databases Conference

May 21, 2015
By
First Day Highlights from the Extremely Large Databases Conference

by Joseph Rickert The 8th XLDB (Extremely Large Databases) Conference open at Stanford on Tuesday with an outstanding program. This conference has been providing leadership in the "Big Data" world since its first workshop which was held in 2007. For example, the summary report for that year notes: "Both communities (industry and science) are moving towards parallel ... architectures...

Read more »