# Quantile Regression with Random Forests

**Mad (Data) Scientist**, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

In my December 22 blog, I first introduced the classic parametric quantile regression (QR) concept. I then showed how one could use the **qeML** package to perform quantile regression nonparametrically, using the package’s **qeKNN **function for a k-Nearest Neighbors approach. A reader then asked if this could be applied to random forests (RFs). The answer is yes, and this will be the topic of the current post.

My goals in this post, as in the previous one, are to introduce the capabilities of **qeML** and to point out some general ML issues. The key example of the latter here is the fact that leaves in an RF tree are very similar to neighborhoods in k-NN, which implies that in principle one should be able to do QR in an RFs context, just as we did last time with k-NN.

However, as the saying goes, “Easier said than done.” What was key in the kNN case last time was that the **qeKNN **function argument **smoothingFtn** gives the user access to the neighborhoods, in that it allows the user to specify a function that performs a user-requested operation in each neighborhood; **smoothingFtn** offers a local-linear option, for instance, and in the last post I showed how one could achieve QR via a user-written function.

The situation for RFs is not so simple. The problem is that typical RF software does not provide “hooks” directly analogous to **smoothingFtn**. Some implementations do provide some useful hooks that could play a role, such as **randomForests::getTree**, but putting them together for the desired result may not be easy, especially given ambiguities in the documentation.

Fortunately, the **grf** package includes a QR app. The **qeML** function **qeRFgrf** originally wrapped the “ordinary” and local linear options in **grf**, and I’ve now added QR in v.1.2.

The name ‘grf’ stands for “Generalized Random Forests,” with the main generalizing being similar to **smoothingFtn**, i.e. to allow functions other than the mean to be applied to the data in the leaves. A second generalization aspect is to tailor the node-splitting process to the type of smoothing done in the leaves.

In particular, **grf** includes the function **quantile_forest**, providing just what our reader inquired about. One specifies the quantiles of interest in an argument **quantiles**, and later calls the paired **predict** function to obtain the estimated quantiles of “Y” at requested values of the “X” variables.

The **qeML** package has an interface to **grf**, as the function **qeRFgrf.** To access the QR option (**qeML** v.1.2), set the **qeRFgrf** argument **quantls** to a nonnull value. Here is an example using the North American major league baseball players data (included in **qeML** with the permission of the UCLA Stat Dept.). We find the 20th, 40th, 60th and 80th percentiles of weight, for each height.

library(qeML) data(mlb1) z <-qeRFgrf(mlb1[,2:3],'Weight',quantls=c(0.2,0.4,0.6,0.8),holdout=NULL) w <- predict(z,mlb1[,2,drop=F]) df1 <- data.frame(x=mlb1[,2,drop=F],y=w[,1],z='0.20') df2 <- data.frame(x=mlb1[,2,drop=F],y=w[,2],z='0.40') df3 <- data.frame(x=mlb1[,2,drop=F],y=w[,3],z='0.60') df4 <- data.frame(x=mlb1[,2,drop=F],y=w[,4],z='0.80') dfall <- rbind(df1,df2,df3,df4) qeML:::qePlotCurves(dfall,xlab='ht',ylab='wt')

The convenience function **qePlotCurves** is essentially the code I used in the previous post, now added to v.1.2.

I highly recommend the **grf **package. My attention was immediately drawn to it when it first came out, as I was pleased to see that I could now do analysis in RFs using non-mean smoothing, as I had been doing with **qeKNN**. It was written by some top researchers, who also developed the supporting theory.

**leave a comment**for the author, please follow the link and comment on their blog:

**Mad (Data) Scientist**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.