# A robust Hotelling test…

July 12, 2010
By

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Recently I was in need of testing a mean vector. I wrote a few lines of code in R and had it done perfectly. Hotelling test is one of the least interesting test to me. never really figured out why…

At that time I had some time to search more about it. One of the most common things to search for a test is a robust version of it (at least that’s what I search for!). A little search in the 3rd page of google results leads to the following :

### One-sample and two-sample robust Hotelling tests with fast and robust bootstrap

The classical Hotelling test for testing if the mean equals a certain value or if two means are equal is modiﬁed into a robust one through substitution of the empirical estimates by the MM-estimates of location and scatter. The MM-estimator, using Tukey’s biweight function, is tuned by default to have a breakdown point of 50% and 95% location efﬁciency. This could be changed through the control argument if desired.

### Robust Hotelling T2 test

Performs one and two sample Hotelling T2 tests as well as robust one-sample Hotelling T2 test.

The first uses MM and S estimators while the latter a Minimum Covariance Determinant one. You can get info on those on the links in the end of the post. What might be crucial to you is that MM/S estimators would be more time comsuming compared to MCD. A little demonstation is the following..

`library(rrcov)`
`data(delivery)`
`delivery.x <- delivery[,1:2]`
`T2.test(delivery.x)`
`# `
`#     One-sample Hotelling test`
`# `
`# data:  delivery.x `
`# T^2 = 21.0494, df1 = 2, df2 = 23, p-value = 6.365e-06`
`# alternative hypothesis: true mean vector is not equal to (0, 0)' `
`#  `
`# sample estimates:`
`#               n.prod distance`
`# mean x-vector   8.76   409.28`
`t0<-Sys.time()`
`T2.test(delivery.x, method="mcd")`
`# `
`#     One-sample Hotelling test (Reweighted MCD Location)`
`# `
`# data:  delivery.x `
`# T^2 = 37.701, df1 = 2.000, df2 = 9.146, p-value = 3.829e-05`
`# alternative hypothesis: true mean vector is not equal to (0, 0)' `
`#  `
`# sample estimates:`
`#                n.prod distance`
```# MCD x-vector 6.190476 309.7143
Sys.time()-t0
# Time difference of 0.04200006 secs```
`library(FRB)`
```t0<-Sys.time()
FRBhotellingMM(delivery.x)```
```# One sample Hotelling test based on multivariate MM-estimates
# (bdp = 0.5, eff = 0.95) ```
`# data:  delivery.x `
`# T^2_R =  84.59 `
`# p-value =  0.0022 `
```# Alternative hypothesis : true mean vector is not equal to ( 0 0 )
Sys.time()-t0
# Time difference of 4.859 secs
```

Time consuming as it may is I would stick with the Bootstrap method. What would you do?

Roelant, E., Van Aelst, S., and Willems, G. (2008), “Fast Bootstrap for Robust Hotelling Tests,” COMPSTAT 2008: Proceedings in Computational Statistics (P. Brito, Ed.) Heidelberg: Physika-Verlag, to appear.

Willems G., Pison G., Rousseeuw P. and Van Aelst S. (2002), A robust hotelling test, Metrika, 55, 125–138.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.