A robust Hotelling test…

[This article was first published on Stats raving mad » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Recently I was in need of testing a mean vector. I wrote a few lines of code in R and had it done perfectly. Hotelling test is one of the least interesting test to me. never really figured out why…

At that time I had some time to search more about it. One of the most common things to search for a test is a robust version of it (at least that’s what I search for!). A little search in the 3rd page of google results leads to the following :

One-sample and two-sample robust Hotelling tests with fast and robust bootstrap

The classical Hotelling test for testing if the mean equals a certain value or if two means are equal is modified into a robust one through substitution of the empirical estimates by the MM-estimates of location and scatter. The MM-estimator, using Tukey’s biweight function, is tuned by default to have a breakdown point of 50% and 95% location efficiency. This could be changed through the control argument if desired.

Robust Hotelling T2 test

Performs one and two sample Hotelling T2 tests as well as robust one-sample Hotelling T2 test.

The first uses MM and S estimators while the latter a Minimum Covariance Determinant one. You can get info on those on the links in the end of the post. What might be crucial to you is that MM/S estimators would be more time comsuming compared to MCD. A little demonstation is the following..

library(rrcov)
data(delivery)
delivery.x <- delivery[,1:2]
T2.test(delivery.x)
# 
#     One-sample Hotelling test
# 
# data:  delivery.x 
# T^2 = 21.0494, df1 = 2, df2 = 23, p-value = 6.365e-06
# alternative hypothesis: true mean vector is not equal to (0, 0)' 
#  
# sample estimates:
#               n.prod distance
# mean x-vector   8.76   409.28
t0<-Sys.time()
T2.test(delivery.x, method="mcd")
# 
#     One-sample Hotelling test (Reweighted MCD Location)
# 
# data:  delivery.x 
# T^2 = 37.701, df1 = 2.000, df2 = 9.146, p-value = 3.829e-05
# alternative hypothesis: true mean vector is not equal to (0, 0)' 
#  
# sample estimates:
#                n.prod distance
# MCD x-vector 6.190476 309.7143
Sys.time()-t0
# Time difference of 0.04200006 secs
library(FRB)
t0<-Sys.time()
FRBhotellingMM(delivery.x)
# One sample Hotelling test based on multivariate MM-estimates
# (bdp = 0.5, eff = 0.95) 
# data:  delivery.x 
# T^2_R =  84.59 
# p-value =  0.0022 
# Alternative hypothesis : true mean vector is not equal to ( 0 0 ) 
Sys.time()-t0
# Time difference of 4.859 secs

Time consuming as it may is I would stick with the Bootstrap method. What would you do?

Read more

Roelant, E., Van Aelst, S., and Willems, G. (2008), “Fast Bootstrap for Robust Hotelling Tests,” COMPSTAT 2008: Proceedings in Computational Statistics (P. Brito, Ed.) Heidelberg: Physika-Verlag, to appear.

Willems G., Pison G., Rousseeuw P. and Van Aelst S. (2002), A robust hotelling test, Metrika, 55, 125–138.



To leave a comment for the author, please follow the link and comment on their blog: Stats raving mad » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)