**sigmafield - R**, and kindly contributed to R-bloggers)

*Edit: This post originally appeared on my WordPress blog on September 22, 2009. I present it here in its original form.*

The **R Function of the Day** series will focus on describing in *plain language* how certain R functions work, focusing on simple examples that you can apply to gain insight into your own data.

Today, I will discuss the **rle** function.

### What situation is rle useful in?

The **rle** function is named for the acronym of “run length encoding”. What does the term “run length” mean? Imagine you flip a coin 10 times and record the outcome as “H” if the coin lands showing heads, and “T” if the coin lands showing tails. You want to know what the longest streak of heads is. You also want to know the longest streak of tails. The *run length* is the length of consecutive types of a flip. If the outcome of our experiment was “H T T H H H H H T H”, the longest run length of heads would be 5, since there are 5 consecutive heads starting at position 4, and the longest run length for tails would be 2, since there are two consecutive heads starting at position 2. If you just have 10 flips, it is pretty easy to simply eyeball the answer. But if you had 100 flips, or 100,000, it would not be easy at all. However, it is very easy with the **rle** function in R! That function will *encode* the entire result into its run lengths. Using the example above, we start with 1 H, then 2 Ts, 5 Hs, 1 T, and finally 1 H. That is exactly what the **rle** function computes, as you will see below in the example.

### How do I use rle?

First, we will simulate the results of a the coin flipping experiment. This is trivial in R using the **sample** function. We simulate flipping a coin 1000 times.

> ## generate data for coin flipping example > coin <- sample(c("H", "T"), 1000, replace = TRUE) > table(coin) coin H T 501 499 > head(coin, n = 20) [1] "T" "H" "T" "T" "T" "H" "T" "H" "T" "T" "H" "T" "H" "T" [15] "T" "T" "H" "H" "H" "H"

We can see the results of the first 20 tosses by using the **head** (as in “beginning”, nothing to do with coin tosses) function on our **coin** vector.

So, our question is, what is the longest run of heads, and longest run of tails? First, what does the output of the **rle** function look like?

> ## use the rle function on our SMALL EXAMPLE above > ## note results MATCH what I described above... > rle(c("H", "T", "T", "H", "H", "H", "H", "H", "T", "H")) Run Length Encoding lengths: int [1:5] 1 2 5 1 1 values : chr [1:5] "H" "T" "H" "T" "H" > ## use the rle function on our SIMULATED data > coin.rle <- rle(coin) > ## what is the structure of the returned result? > str(coin.rle) List of 2 $ lengths: int [1:500] 1 1 3 1 1 1 2 1 1 1 ... $ values : chr [1:500] "T" "H" "T" "H" ... - attr(*, "class")= chr "rle" > ## sort the data, this shows the longest run of > ## ANY type (heads OR tails) > sort(coin.rle$lengths, decreasing = TRUE) [1] 9 8 7 7 7 7 7 6 6 6 6 6 6 6 6 5 5 5 5 5 5 5 5 5 5 5 5 [28] 5 5 5 5 5 5 5 5 5 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 [55] 4 4 4 4 4 4 4 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 [82] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 [109] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 2 2 2 2 2 2 2 [136] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 [163] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 [190] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 [217] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 [244] 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 [271] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 [298] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 [325] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 [352] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 [379] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 [406] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 [433] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 [460] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 [487] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 > ## use the tapply function to break up > ## into 2 groups, and then find the maximum > ## within each group > > tapply(coin.rle$lengths, coin.rle$values, max) H T 9 8

So in this case the longest run of heads is 9 and the longest run of tails is 8. The **tapply** function was discussed in a previous **R Function of the Day** article.

### Summary of rle

The **rle** function performs run length encoding. Although it is not used terribly often when programming in R, there are certain situations, such as time series and longitudinal data analysis, where knowing how it works can save a lot of time and give you insight into your data.

**leave a comment**for the author, please follow the link and comment on his blog:

**sigmafield - R**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...