Blog Archives

R Package ‘smbinning’: Optimal Binning for Scoring Modeling

March 24, 2015
By
R Package ‘smbinning’: Optimal Binning for Scoring Modeling

by Herman Jopia What is Binning? Binning is the term used in scoring modeling for what is also known in Machine Learning as Discretization, the process of transforming a continuous characteristic into a finite number of intervals (the bins), which allows for a better understanding of its distribution and its relationship with a binary variable. The bins generated by...

Read more »

A first look at rxBTrees

March 19, 2015
By
A first look at rxBTrees

by Joseph Rickert The gradient boosting machine as developed by Friedman, Hastie, Tibshirani and others, has become an extremely successful algorithm for dealing with both classification and regression problems and is now an essential feature of any machine learning toolbox. R’s gbm() function (gbm package) is a particularly well crafted implementation of the gradient boosting machine that served as...

Read more »

Some thoughts on Vim

March 17, 2015
By
Some thoughts on Vim

by Gary R. Moser Director of Institutional Research and Planning The California Maritime Academy I recently contacted Joseph Rickert about inviting Vim guru Drew Niel (web: vimcasts.org, book: "Practical Vim: Edit Text at the Speed of Thought") to speak at the Bay Area R User Group group. Due to Drew's living in Great Britain that might not be easily...

Read more »

A Monte Carlo Simulation for Pi Day

March 12, 2015
By
A Monte Carlo Simulation for Pi Day

by Joseph Rickert What will you be doing at 26 minutes and 53 seconds past 9 this coming Saturday morning? I will probably be running simulations. I have become obsessed with an astounding result from number theory and have been trying to devise Monte Carlo simulations to get at it. The result, well known to number theorists says: choose...

Read more »

R User Group Activity

March 5, 2015
By
R User Group Activity

by Joseph Rickert R user group activity is still on the rise. The following plot of the number of R user group meetings listed on Revolution Analytics' Community Calendar over the most recent 114 weeks shows a slight to upward trend along with a couple of annual cycles. Predictably, meetings trail off in the summer months and again late...

Read more »

Plotly Graphs with Domino’s New R Notebook

March 3, 2015
By
Plotly Graphs with Domino’s New R Notebook

by Matt Sundquist co-founder of Plotly Domino's new R Notebook and Plotly's R API let you code, make interactive R and ggplot2 graphs, and collaborate entirely online. Here is the Notebook in action: Published R Notebook To execute this Notebook, or to build your own, head to Domino's Plotly Project. The GIF below shows how to get started: choose...

Read more »

Collaborative Computing with distcomp

February 26, 2015
By
Collaborative Computing with distcomp

by Joseph Rickert Distcomp, a new R package available on GitHub from a group of Stanford researchers has the potential to significantly advance the practice of collaborative computing with large data sets distributed over separate sites that may be unwilling to explicitly share data. The fundamental idea is to be able to rapidly set up a web service based...

Read more »

Some R Conferences in 2015

February 19, 2015
By

by Joseph Rickert For the past few years, the Strata + Hadoop World Conference in San Jose has kicked off my personal conference season. With its focus on Data Science, Strata always seems to present some interesting R related talks, and I am looking forward to the various events over the next couple of days. But, Strata and other...

Read more »

The HP Workshop on Distributed Computing in R

February 12, 2015
By
The HP Workshop on Distributed Computing in R

by Joseph Rickert In the last week of January, HP Labs in Palo Alto hosted a workshop on distributed computing in R that was organized by Indrajit Roy (Principal Researcher, HP) and Michael Lawrence (Genentech and R-core member). The goal was to bring together a small group of R developers with significant experience in parallel and distributed computing to...

Read more »

rcrunchbase – An API Interface to CrunchBase

February 10, 2015
By

James Peruvankal Sr. Program Manager, Revolution Analytics Information about the technology business ecosystems is valuable to both established companies as well as startups. Fortunately CrunchBase - the world’s most comprehensive dataset of startup activity, captures quite a bit of such information. Founded in 2007 by Mike Arrington, CrunchBase began as a simple crowd-sourced database to track startups covered on...

Read more »