Generating a lag/lead variables

[This article was first published on R HEAD, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

A few days ago, my friend asked me is there any function in R to generate lag/lead variables in a data.frame or did similar thing as _n in stata. He would like to use that to clean-up his dataset in R.

In stata help manual: _n contains the number of the current observation.
Here’s an example to illustrate what _n does:

set obs 10
generate x = _n
generate x_lag1 = x[_n-1]
generate x_lead1 = x[_n+1]

The data generated would be :
x = {1,2,3,4,5,6,7,8,9,10}
x_lag1 = {NA,1,2,3,4,5,6,7,8,9}
x_lead1 = {1,2,3,4,5,6,7,8,9,NA}

The key feature is the new vector has the same length as the original vector, so we can use it with the original vector or other generated vector.

One application is to create a MA series (just an example, it is better to use function in any time-series packages to do that)
generate x_ma_1 = (x[_n-1] + x[_n]) / 2

I googled a while for that, basically there’re two types of method to generate lag/lead variables in R:(reference)

1> Function that generate a shorter vector (e.g. embed(), running() in gtools
2> Function in ts, zoo, xts, dynlm,dlm.

However, both solutions do not solve his problem. Then I wrote a “shift” function to do the task:

shift<-function(x,shift_by){
	stopifnot(is.numeric(shift_by))
	stopifnot(is.numeric(x))

	if (length(shift_by)>1)
		return(sapply(shift_by,shift, x=x))

	out<-NULL
	abs_shift_by=abs(shift_by)
	if (shift_by > 0 )
		out<-c(tail(x,-abs_shift_by),rep(NA,abs_shift_by))
	else if (shift_by < 0 )
		out<-c(rep(NA,abs_shift_by), head(x,-abs_shift_by))
	else 
		out<-x
	out
}


# Example
d<-data.frame(x=1:15) 
#generate lead variable
d$df_lead2<-shift(d$x,2)
#generate lag variable
d$df_lag2<-shift(d$x,-2)

> d
    x df_lead2 df_lag2
1   1        3      NA
2   2        4      NA
3   3        5       1
4   4        6       2
5   5        7       3
6   6        8       4
7   7        9       5
8   8       10       6
9   9       NA       7
10 10       NA       8

# shift_by is vectorized
d$df_lead2 shift(d$x,-2:2)
      [,1] [,2] [,3] [,4] [,5]
 [1,]   NA   NA    1    2    3
 [2,]   NA    1    2    3    4
 [3,]    1    2    3    4    5
 [4,]    2    3    4    5    6
 [5,]    3    4    5    6    7
 [6,]    4    5    6    7    8
 [7,]    5    6    7    8    9
 [8,]    6    7    8    9   10
 [9,]    7    8    9   10   NA
[10,]    8    9   10   NA   NA


# Test
library(testthat)
expect_that(shift(1:10,2),is_identical_to(c(3:10,NA,NA)))
expect_that(shift(1:10,-2), is_identical_to(c(NA,NA,1:8)))
expect_that(shift(1:10,0), is_identical_to(1:10))
expect_that(shift(1:10,0), is_identical_to(1:10))
expect_that(shift(1:10,1:2), is_identical_to(cbind(c(2:10,NA),c(3:10,NA,NA))))

Notice that the result depends on how the data.frame is sorted.


To leave a comment for the author, please follow the link and comment on their blog: R HEAD.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)