egen(stata cmd) compute a summary statistics by groups and store it in to a new variable. For example, the data has three variables, id, time and y, we want to compute the mean of y by for each id and then store it as a new variable mean_y.
In stata, the command would be
egen mean_y = mean(y), by(id)
In R, this task can be completed by ave
Generate dataset:
id <- rep(1:3,each=3) t<-rep(1:3,3) y<-sample(1:5,9,replace=T) my_data<-data.frame(id=id,time=t,y=y)
Orignal data:
> my_data id time y 1 1 1 4 2 1 2 1 3 1 3 4 4 2 1 2 5 2 2 3 6 2 3 3 7 3 1 4 8 3 2 4 9 3 3 3
> within(my_data, {mean_y = ave(y,id)} )
id time y mean_y
1 1 1 4 3.000000
2 1 2 1 3.000000
3 1 3 4 3.000000
4 2 1 2 2.666667
5 2 2 3 2.666667
6 2 3 3 2.666667
7 3 1 4 3.666667
8 3 2 4 3.666667
9 3 3 3 3.666667
The default summary statistics is mean. However, we can assign a particular function to compute the summary statistics. For example, if we want to compute the sd of y by id, then we can have
within(my_data, {sd_y = ave(y,id,FUN=sd)} )
id time y sd_y
1 1 1 4 1.7320508
2 1 2 1 1.7320508
3 1 3 4 1.7320508
4 2 1 2 0.5773503
5 2 2 3 0.5773503
6 2 3 3 0.5773503
7 3 1 4 0.5773503
8 3 2 4 0.5773503
9 3 3 3 0.5773503
Remark: The within evaluate an expression in an environment created from the data.frame. In addition, it will modify the data.frame and return it back(in our case, it create new variables, mean_y or sd_y )
R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

Zero Inflated Models and Generalized Linear Mixed Models with R.
Zuur, Saveliev, Ieno (2012).