**DataScience+**, and kindly contributed to R-bloggers)

Today will be a brief introduction in to circular statistics (sometimes referred to as directional statistics). Circular statistics is an interesting subdivision of statistics involving observations taken as vectors around a unit circle. As an example, imagine measuring birth times at a hospital over a 24-hour cycle, or the directional dispersion of a group of migratory animals. This type of data is involved in a variety fields, such as ecology, climatology, and biochemistry. The nature of measuring observations around a unit circle necessitates a different approach to hypothesis testing. Distributions need to be “wrapped” around the circle to be of use, and conventional estimators such as the sample mean or sample variance hold no water.

In this post, we will conduct *Rao’s Spacing Test* to assess the uniformity of a circular dataset. This is a basic procedure and should be thought of as an introduction to handling circular data.

## Getting started

We are going to conduct a hypothesis test on *turtles*, a small dataset consisting of the arrival angles of 10 green sea turtles to their nesting island. Our goal is to determine where the arrival angles show signs of directionality or are more indicative of a random scatter.

First, install the `circular`

package and attach the *turtles* dataset.

install.packages("circular") require(circular) attach(turtles)

## Plotting the data

The `circular`

package contains its own plotting function, `plot.circular`

. Let’s observe the arrival angles of the turtles.

plot.circular(arrival)

Given the eye test, the observations appear to be uniform around the circle. If we want to run a hypothesis test to determine if the data is truly uniform, we will need to develop a test statistic that works with angular data.

What is a good parameter for us to utilize? Taking the sample mean doesn’t tell us much about the direction of the data (180 degrees is not a useful mean of 2 degrees and 358 degrees). In the following plot, observe how the sample mean is of no use in representing the shape or spread of our data.

mean(arrival)[1] 0.9120794plot.circular(mean(arrival))

Instead, we will use a method that determines directionality by measuring the average space between observations. This test is called **Rao’s Spacing Test**.

## Rao’s Spacing Test

Rao’s Spacing Test was developed to assess the uniformity of circular data. It uses the space between observations to determine if the data shows significant directionality. If the data is uniform, observations should tend to be evenly spaced apart.

Here is the test statistic (U) for Rao’s Spacing Test: $$U = 1/2sumlimits_{i=1}^n |T_{i} – λ| $$ where (λ = 360/N T_{i} = f_{i+1}-f_{i}) and (T_{n} = (360-f_{n})+f_{1})

Basically, the test statistic aggregates the deviations between consecutive points, each one weighted by the total number of observations in the dataset.

We will use the `rao.spacing.test()`

function to run this hypotheses test. Our null hypothesis says the data is of a uniform distribution, while the alternate states the data shows signs of directionality. Let’s run the test.

rao.spacing.test(arrival,alpha=.10) Rao's Spacing Test of Uniformity Test Statistic = 127.2689 Level 0.1 critical value = 161.23 Do not reject null hypothesis of uniformity

With a test statistic of 127 falling below the critical value of 161, the data fails to significantly lean in any direction. We can assume the green sea turtles’ arrivals were of a uniform distribution.

## Conclusion

Rao’s spacing test determined the data to show no signs of directional trends. We cannot reject the null hypothesis of uniformity and will assume uniformity in regards to the direction of arrival. While this post was a relatively basic tutorial, many people in the data science community haven’t worked with circular data before. It is an interesting subtopic to dive in to as well as a young field of statistics that is still evolving.

## Final remarks

I would like to extend credit to S. Rao Jammalamadaka PhD, of the University of California, Santa Barbara, and his textbook “Topics in Circular Statistics” for sparking my interest in the field of circular statistics.

**leave a comment**for the author, please follow the link and comment on their blog:

**DataScience+**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...