Robust Estimators of Location and Scale

January 20, 2013
By

(This article was first published on Rcpp Gallery, and kindly contributed to R-bloggers)

First, the median_Rcpp function is defined to compute the median of the given input vector. It is assumed that the input vector is unsorted, so a copy of the input vector is made using clone and then std::nth_element is used to access the nth sorted element. Since we only care about accessing one sorted element of the vector for an odd length vector and two sorted elements for an even length vector, it is faster to use std::nth_element than either std::sort or std::partial_sort.

#include <Rcpp.h>
using namespace Rcpp;

// [[Rcpp::export]]
double median_rcpp(NumericVector x) {
   NumericVector y = clone(x);
   int n, half;
   double y1, y2;
   n = y.size();
   half = n / 2;
   if(n % 2 == 1) {
      // median for odd length vector
      std::nth_element(y.begin(), y.begin()+half, y.end());
      return y[half];
   } else {
      // median for even length vector
      std::nth_element(y.begin(), y.begin()+half, y.end());
      y1 = y[half];
      std::nth_element(y.begin(), y.begin()+half-1, y.begin()+half);
      y2 = y[half-1];
      return (y1 + y2) / 2.0;
   }
}
library(rbenchmark)
set.seed(123)
z <- rnorm(1000000)

median_rcpp(1:10)
[1] 5.5
median_rcpp(1:9)
[1] 5
# benchmark median_rcpp and median
benchmark(median_rcpp(z), median(z), order="relative")[,1:4]
            test replications elapsed relative
1 median_rcpp(z)          100   1.747    1.000
2      median(z)          100   5.991    3.429

Next, the mad_rcpp function is defined to compute the median absolute deviation. This is a simple one-liner thanks to the sugar function abs, the vectorized operators, and the median_rcpp function defined above. Note that a default value is specified for the scale_factor argument.

// [[Rcpp::export]]
double mad_rcpp(NumericVector x, double scale_factor = 1.4826) {
   // scale_factor = 1.4826; default for normal distribution consistent with R
   return median_rcpp(abs(x - median_rcpp(x))) * scale_factor;
}
# benchmark mad_rcpp and mad
benchmark(mad_rcpp(z), mad(z), order="relative")[,1:4]
         test replications elapsed relative
1 mad_rcpp(z)          100   3.678    1.000
2      mad(z)          100  13.443    3.655

To leave a comment for the author, please follow the link and comment on his blog: Rcpp Gallery.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.