Introduction to Circular Statistics – Rao’s Spacing Test

November 29, 2015
By

(This article was first published on DataScience+, and kindly contributed to R-bloggers)

Today will be a brief introduction in to circular statistics (sometimes referred to as directional statistics). Circular statistics is an interesting subdivision of statistics involving observations taken as vectors around a unit circle. As an example, imagine measuring birth times at a hospital over a 24-hour cycle, or the directional dispersion of a group of migratory animals. This type of data is involved in a variety fields, such as ecology, climatology, and biochemistry. The nature of measuring observations around a unit circle necessitates a different approach to hypothesis testing. Distributions need to be “wrapped” around the circle to be of use, and conventional estimators such as the sample mean or sample variance hold no water.

In this post, we will conduct Rao’s Spacing Test to assess the uniformity of a circular dataset. This is a basic procedure and should be thought of as an introduction to handling circular data.

Getting started

We are going to conduct a hypothesis test on turtles, a small dataset consisting of the arrival angles of 10 green sea turtles to their nesting island. Our goal is to determine where the arrival angles show signs of directionality or are more indicative of a random scatter.

First, install the circular package and attach the turtles dataset.

install.packages("circular")
require(circular)
attach(turtles)

Plotting the data

The circular package contains its own plotting function, plot.circular. Let’s observe the arrival angles of the turtles.

plot.circular(arrival)

Here is the plot:
arrivalangle

Given the eye test, the observations appear to be uniform around the circle. If we want to run a hypothesis test to determine if the data is truly uniform, we will need to develop a test statistic that works with angular data.

What is a good parameter for us to utilize? Taking the sample mean doesn’t tell us much about the direction of the data (180 degrees is not a useful mean of 2 degrees and 358 degrees). In the following plot, observe how the sample mean is of no use in representing the shape or spread of our data.

mean(arrival)
[1] 0.9120794
plot.circular(mean(arrival))

Here is the plot:
samplemeanarrival

Instead, we will use a method that determines directionality by measuring the average space between observations. This test is called Rao’s Spacing Test.

Rao’s Spacing Test

Rao’s Spacing Test was developed to assess the uniformity of circular data. It uses the space between observations to determine if the data shows significant directionality. If the data is uniform, observations should tend to be evenly spaced apart.

Here is the test statistic (U) for Rao’s Spacing Test: $$U = 1/2sumlimits_{i=1}^n |T_{i} – λ| $$ where (λ = 360/N T_{i} = f_{i+1}-f_{i}) and (T_{n} = (360-f_{n})+f_{1})

Basically, the test statistic aggregates the deviations between consecutive points, each one weighted by the total number of observations in the dataset.

We will use the rao.spacing.test() function to run this hypotheses test. Our null hypothesis says the data is of a uniform distribution, while the alternate states the data shows signs of directionality. Let’s run the test.

rao.spacing.test(arrival,alpha=.10)

       Rao's Spacing Test of Uniformity 
 
Test Statistic = 127.2689 
Level 0.1 critical value = 161.23 
Do not reject null hypothesis of uniformity

With a test statistic of 127 falling below the critical value of 161, the data fails to significantly lean in any direction. We can assume the green sea turtles’ arrivals were of a uniform distribution.

Conclusion

Rao’s spacing test determined the data to show no signs of directional trends. We cannot reject the null hypothesis of uniformity and will assume uniformity in regards to the direction of arrival. While this post was a relatively basic tutorial, many people in the data science community haven’t worked with circular data before. It is an interesting subtopic to dive in to as well as a young field of statistics that is still evolving.

Final remarks

I would like to extend credit to S. Rao Jammalamadaka PhD, of the University of California, Santa Barbara, and his textbook “Topics in Circular Statistics” for sparking my interest in the field of circular statistics.

To leave a comment for the author, please follow the link and comment on their blog: DataScience+.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)