A robust Hotelling test…

July 12, 2010
By

(This article was first published on Stats raving mad » R, and kindly contributed to R-bloggers)

Recently I was in need of testing a mean vector. I wrote a few lines of code in R and had it done perfectly. Hotelling test is one of the least interesting test to me. never really figured out why…

At that time I had some time to search more about it. One of the most common things to search for a test is a robust version of it (at least that’s what I search for!). A little search in the 3rd page of google results leads to the following :

One-sample and two-sample robust Hotelling tests with fast and robust bootstrap

The classical Hotelling test for testing if the mean equals a certain value or if two means are equal is modified into a robust one through substitution of the empirical estimates by the MM-estimates of location and scatter. The MM-estimator, using Tukey’s biweight function, is tuned by default to have a breakdown point of 50% and 95% location efficiency. This could be changed through the control argument if desired.

Robust Hotelling T2 test

Performs one and two sample Hotelling T2 tests as well as robust one-sample Hotelling T2 test.

The first uses MM and S estimators while the latter a Minimum Covariance Determinant one. You can get info on those on the links in the end of the post. What might be crucial to you is that MM/S estimators would be more time comsuming compared to MCD. A little demonstation is the following..

library(rrcov)
data(delivery)
delivery.x <- delivery[,1:2]
T2.test(delivery.x)
# 
#     One-sample Hotelling test
# 
# data:  delivery.x 
# T^2 = 21.0494, df1 = 2, df2 = 23, p-value = 6.365e-06
# alternative hypothesis: true mean vector is not equal to (0, 0)' 
#  
# sample estimates:
#               n.prod distance
# mean x-vector   8.76   409.28
t0<-Sys.time()
T2.test(delivery.x, method="mcd")
# 
#     One-sample Hotelling test (Reweighted MCD Location)
# 
# data:  delivery.x 
# T^2 = 37.701, df1 = 2.000, df2 = 9.146, p-value = 3.829e-05
# alternative hypothesis: true mean vector is not equal to (0, 0)' 
#  
# sample estimates:
#                n.prod distance
# MCD x-vector 6.190476 309.7143
Sys.time()-t0
# Time difference of 0.04200006 secs
library(FRB)
t0<-Sys.time()
FRBhotellingMM(delivery.x)
# One sample Hotelling test based on multivariate MM-estimates
# (bdp = 0.5, eff = 0.95) 
# data:  delivery.x 
# T^2_R =  84.59 
# p-value =  0.0022 
# Alternative hypothesis : true mean vector is not equal to ( 0 0 ) 
Sys.time()-t0
# Time difference of 4.859 secs

Time consuming as it may is I would stick with the Bootstrap method. What would you do?

Read more

Roelant, E., Van Aelst, S., and Willems, G. (2008), “Fast Bootstrap for Robust Hotelling Tests,” COMPSTAT 2008: Proceedings in Computational Statistics (P. Brito, Ed.) Heidelberg: Physika-Verlag, to appear.

Willems G., Pison G., Rousseeuw P. and Van Aelst S. (2002), A robust hotelling test, Metrika, 55, 125–138.



To leave a comment for the author, please follow the link and comment on their blog: Stats raving mad » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)