# Quantile LOESS – Combining a moving quantile window with LOESS (R function)

April 1, 2010
By

(This article was first published on R-statistics blog » R, and kindly contributed to R-bloggers)

In this post I will provide R code that implement’s the combination of repeated running quantile with the LOESS smoother to create a type of “quantile LOESS” (e.g: “Local Quantile Regression”).

This method is useful when the need arise to fit robust and resistant (Need to be verified) a smoothed line for a quantile (an example for such a case is provided at the end of this post).

If you wish to use the function in your own code, simply run inside your R console the following line:

### Background

I came a cross this idea in an article titled “High throughput data analysis in behavioral genetics” by Anat Sakov, Ilan Golani, Dina Lipkind and my advisor Yoav Benjamini. From the abstract:

In recent years, a growing need has arisen in different fields, for the development of computational systems for automated analysis of large amounts of data (high-throughput). Dealing with non-standard noise structure and outliers, that could have been detected and corrected in manual analysis, must now be built into the system with the aid of robust methods. [...] we use a non-standard mix of robust and resistant methods: LOWESS and repeated running median.

The motivation for this technique came from “Path data” (of mice) which is

prone to suffer from noise and outliers. During progression a tracking system might lose track of the animal, inserting (occasionally very large) outliers into the data. During lingering, and even more so during arrests, outliers are rare, but the recording noise is large relative to the actual size of the movement. The statistical implications are that the two types of behavior require different degrees of smoothing and resistance. An additional complication is that the two interchange many times throughout a session. As a result, the statistical solution adopted needs not only to smooth the data, but also to recognize, adaptively, when there are arrests. To the best of our knowledge, no single existing smoothing technique has yet been able to fulfill this dual task. We elaborate on the sources of noise, and propose a mix of LOWESS (Cleveland, 1977) and the repeated running median (RRM; Tukey, 1977) to cope with these challenges

If all we wanted to do was to perform moving average (running average) on the data, using R, we could simply use the rollmean function from the zoo package.
But since we wanted also to allow quantile smoothing, we turned to use the rollapply function.

### R function for performing Quantile LOESS

Here is the R function that implements the LOESS smoothed repeated running quantile (with implementation for using this with a simple implementation for using average instead of quantile):

### Update: I changed in the article’s name from LOWESS to LOESS

After A considerate e-mail from Dirk Eddelbuettel I corrected myself from using LOWESS to LOESS throughout the article. Here’s an explanation to why I did it and also why I corrected it -

Dirk wrote to me:

You have a post entitled ‘quantile lowess’ but you then (correctly) use loess. Do you understand that there are two functions lowess() and loess()?
The former is sort-of a predecessor but nobody but really old books still talks about it. Google for (maybe) ‘Brian Ripley lowess loess’ as he drove
that point home a few times on r-help.

Thanks Dirk, [...]
Regarding the loess != lowess, I noticed that this is indeed the case when I first wrote the post but I was in a predicament:
On the one hand, LOESS is the more modern approach (and what I used in the script). But on the other hand, LOWESS is what the original article’s authors where using. I ended up deciding I would call it the way I did, but after reading what you wrote, I realized I made a mistake.
I went through the article and corrected the lowess to loess, while also adding a paragraph for explain my reasoning.

### Update: regarding the method being robust

After Nicholas’s comment I went checking and came across a R-help thread by
Martin Maechler explaining how to update my code from above so that the system will be robust. Martin wrote (My notes are added in []):
One gotcha [when comparing lowess to loess is]– particularly if you were used to the fact that lowess() by default is resistant to outliers {well, in many cases at least} :

• lowess() per default has “iter = 3″ which means it uses 3 “robustifying” (also called “huberizing” for Huber (1960)) iterations .
• loess() on the other hand has an argument `family’ with possible values “gaussian” and “symmetric” (can be abbreviated) where the *first* one is the default (unfortunately, in my opinion).

I.e., loess() by default is not resistant/robust where as lowess() is. [...] I would however recommend using loess(….., family = “sym”) routinely.

* * *

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...