# qeML Example: Nonparametric Quantile Regression

**Mad (Data) Scientist**, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

In this post, I will first introduce the concept of quantile regression (QR), a powerful technique that is rarely taught in stat courses. I’ll give an example from the **quantreg **package, and then will show how **qeML** can be used to do model-free QR estimation. Along the way, I will also illustrate the use of *closures* in R.

Notation: We are predicting a scalar Y (including the case of dummy/one-hot variables) from a feature vector X.

In its simplest form, QR estimates the conditional median of Y given X, as opposed to the usual conditional mean, using a linear model. As we all know, the median is less affected by outliers than is the mean, so QR is giving us outlier robustness. As a bonus, we dispense with the homoskedasticity assumption, i.e. constant Var(Y|X).

But it’s more than that. We can model any conditional quantile, e.g. estimate the 80th percentile weight for each human height. Quantile analysis has a variety of applications.

One can conduct QR in R with the **quantreg** package, written by Prof. Roger Koenker, one of the major names in the QR field. Here is an example, using the **qeML** dataset **mlb**:

> data(mlb) > library(quantreg) > z <- rq(Weight ~ Height,data=mlb,tau=0.80) > summary(z) Call: rq(formula = Weight ~ Height, tau = 0.8, data = mlb) tau: [1] 0.8 Coefficients: Value Std. Error t value Pr(>|t|) (Intercept) -201.66667 17.58797 -11.46617 0.00000 Height 5.66667 0.23856 23.75376 0.00000

As you can see, the call form here is like that of the R linear model function **lm**, and we could have had multiple predictors, e.g. age in addition to height.

But what if we don’t believe a linear model is appropriate? Of course, as usual we may consider adding polynomial terms, and there is also a package **quantreg.nonpar**. But we obtain model-free estimates easily using **qeKNN** in **qeML**.

Standard k-Nearest Neighbors estimation is simple. Say to predict the weight of someone 70 inches tall and 28 years ago, we find the k closest data points in our training data to the vector (70,28). We then compute the mean weight among those k people, and it’s then our predicted weight for the new person who has known height and age but unknown weight.

But **qeKNN **offers the user more flexibility, via an argument **smoothingFtn**. Instead of computing mean Y among the neighbors, we can specify the median, or even specify that a small linear model be fit to the neighboring data. The latter may be useful if the new person to be predicted is either very short or very tall, as things tend to be biased near the edges of a dataset. If the new person is 77 inches tall, most or all people in our neighboring data will be shorter than this, thus lighter, so our prediction based on the mean will be biased downward.

But we can also specify our own smoothingFtn, perfect for QR. We simply define a function that gives us the desired Y quantile among the neighbors.

The call form is

smoothingFtn(nearIdxs,x,y,predpt)

Here **x** and **y** are our X and Y training data, **predpt** is the new X value at which we wish to predict (redundant in most cases), and **nearIdxs** are the indices in **x **and **y** of the nearest neighbors to **predpt**. Note that at the time kNN calls **smoothingFtn**, the indices have already been computed.

Our code is then

sftn <- function(nearIdxs,x,y,predpt) { nearYs <- y[nearIdxs] quantile(nearYs,0.80) } u <- mlb[c('Height','Age','Weight')] set.seed(9999) # qeML ftns do random holdout z <- qeKNN(u,'Weight',smoothingFtn=sftn) predict(z,c(70,28)) # prints 200

It would be nice, though, to run this for a general quantile level **q**, rather than the special case 0.80. But we can’t do that directly, because the **smoothingFtn** argument to **qeKNN** must be a function object, no provision there for an argument to **smoothingFtn**. But we can accomplish what we want via R *closures*.

makeSmFtn <- function(q) function(newIdxs,x,y,predpt) quantile(y[newIdxs],q)

To understand this, one must first know more about the R reserved word **function**. Consider this simple example:

f <- function(x) x^2

Here we are saying, “R, please create a function for me. Its formal argument will be named **x**, and it will compute and return the square of that quantity. After you create that function–an object, just like other R entities–assign it to **f**.” In other words, **function** creates functions. As I like to tell my students,

The function of the function named function is to create functions!

Now, going back to **makeQFtn** above, it creates a function object (the call to quantile), and returns that object, just as with **f **above, but the key point is that here the value of **q** will be “baked in” to that object.

So our call to **qeKNN** for general **q** would be

z <- qeKNN(u,’Weight’,smoothingFtn=makeSmFtn(q))

**leave a comment**for the author, please follow the link and comment on their blog:

**Mad (Data) Scientist**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.