**Revolutions**, and kindly contributed to R-bloggers)

*by Derek McCrae Norton, Senior Sales Engineer*

In this second installment of Extending RevoScaleR for Mining Big Data we look at how to use the building blocks provided by RevoScaleR to transform continuous variables into discrete.

### Motivation: Discretize continuous variables on big data.

Discretization is a technique to convert continuous variables into discrete variables, and it is sometimes useful in data mining models such as Naïve Bayes. There are two basic methods, Equal Width and Equal Frequency, as well as many advanced methods such as Chi2, ChiMerge, and Tree Based methods.

If we consider the two basic methods, they are quite easy to implement in RevoScaleR.

Equal Width – Simply divide range into k buckets. The range is precalculated in XDF files which means most of the work is already done!

Equal Frequency – rxQuantile is a function that efficiently calculates k quantiles.

Bring it all together and use cut inside of a rxDataStep tranform to create new discretized variables.

*You can test this out yourself with the function rxDiscretize at github.*

Look for upcoming posts on other ways to extend RevoScaleR for Mining Big Data.

**leave a comment**for the author, please follow the link and comment on their blog:

**Revolutions**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...