Blog Archives

R for more powerful clustering

April 21, 2015
By
R for more powerful clustering

by Vidisha Vachharajani Freelance Statistical Consultant R showcases several useful clustering tools, but the one that seems particularly powerful is the marriage of hierarchical clustering with a visual display of its results in a heatmap. The term “heatmap” is often confusing, making most wonder – which is it? A "colorful visual representation of data in a matrix" or "a...

Read more »

R User Group Meetings this week in the Bay Area and around the world

April 16, 2015
By
R User Group Meetings this week in the Bay Area and around the world

by Joseph Rickert Tracking R user group meetings is a good way to stay informed about what's happening in the R world. On Tuesday the Bay Area useR Group (BARUG) met at AdRoll in San Francisco. It was a mini-conference with 6 talks: Bryan Galvin our host at AdRoll (many thanks for the pizza and beer) kicked off the...

Read more »

RPowerLabs: Electric power system virtual laboratories online

April 14, 2015
By
RPowerLabs: Electric power system virtual laboratories online

by Ben Ubah Founder, RPowerLabs No disregard to R's colleagues, R is pioneering the creation of online virtual electric power system laboratories via RPowerLABS. RPowerLABS is a project, with the vision of deploying online, a vast array of highly demanded power system simulations for teaching and research using R. It started as an attempt to apply R to electric...

Read more »

Where are the R users?

April 9, 2015
By
Where are the R users?

by Joseph Rickert A recent post by David Smith included a map that shows the locations of R user groups around the world. While is exhilarating to see how R user groups span the globe, the map does not give any idea about the size of the community at each location. The following plot, constructed from information on the...

Read more »

Exploring San Francisco with choroplethrZip

April 7, 2015
By
Exploring San Francisco with choroplethrZip

by Ari Lamstein Introduction Today I will walk through an analysis of San Francisco Zip Code Demographics using my new R package choroplethrZip. This package creates choropleth maps of US Zip Codes and connects to the US Census Bureau. A choropleth is a map that shows boundaries of regions (such as zip codes) and colors those regions according to...

Read more »

Coarse Grain Parallelism with foreach and rxExec

April 2, 2015
By

by Joseph Rickert I have written a several posts about the Parallel External Memory Algorithms (PEMAs) in Revolution Analytics’ RevoScaleR package, most recently about rxBTrees(), but I haven’t said much about rxExec(). rxExec() is not itself a PEMA, but it can be used to write parallel algorithms. Pre-built PEMAs such as rxBTrees(), rxLinMod(), etc are inherently parallel algorithms designed...

Read more »

Targeted Learning R Packages for Causal Inference and Machine Learning

March 31, 2015
By
Targeted Learning R Packages for Causal Inference and Machine Learning

by Sherri Rose Assistant Professor of Health Care Policy Harvard Medical School Targeted learning methods build machine-learning-based estimators of parameters defined as features of the probability distribution of the data, while also providing influence-curve or bootstrap-based confidence internals. The theory offers a general template for creating targeted maximum likelihood estimators for a data structure, nonparametric or semiparametric statistical model,...

Read more »

Review of "Hands-On Programming with R"

March 26, 2015
By

by Joseph Rickert There have been well over a hundred books on R published within the last ten years. Most of these texts with titles like “Introduction Statistics with R” or “Time Series with R” offer the reader a way to jump right in and perform some concrete statistical analysis using R’s myriad built-in functions and extensive visualization features....

Read more »

R Package ‘smbinning’: Optimal Binning for Scoring Modeling

March 24, 2015
By
R Package ‘smbinning’: Optimal Binning for Scoring Modeling

by Herman Jopia What is Binning? Binning is the term used in scoring modeling for what is also known in Machine Learning as Discretization, the process of transforming a continuous characteristic into a finite number of intervals (the bins), which allows for a better understanding of its distribution and its relationship with a binary variable. The bins generated by...

Read more »

A first look at rxBTrees

March 19, 2015
By
A first look at rxBTrees

by Joseph Rickert The gradient boosting machine as developed by Friedman, Hastie, Tibshirani and others, has become an extremely successful algorithm for dealing with both classification and regression problems and is now an essential feature of any machine learning toolbox. R’s gbm() function (gbm package) is a particularly well crafted implementation of the gradient boosting machine that served as...

Read more »