**R on Jacob Long**, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

It has been just a little more than a day since I announced my new R package,

`panelr`

, to the wider world.

My new #rstats package, panelr, is now on CRAN. Learn more about this package that has become essential for my workflow with panel data here: https://t.co/1tlwlcA5qc

— Jacob Long (@jacobandrewlong) May 20, 2019

Quite frankly, I’ve been surprised by the level of attention! Some responses

have been calls for *economists* to check the package out (they are certainly

welcome to do so) as well as things along the lines of there finally being

a package for panel data in R. So let me make something clear: There’s at least

one comparable package for R, called `plm`

, which is very good and should be

particularly appealing for economists. This leads to the understandable question

as to how `panelr`

differs from `plm`

.

Super cool! Does your package offer some advantages/differences from plm that I should think about?

— Nicholas Potter (@potterzot) May 21, 2019

The `plm`

package has been around since 2006 and is quite good.

I didn’t make `panelr`

out of any kind of deep dissatisfaction with `plm`

nor

the idea that it needed to be superseded.

Yves Croissant and Giovanni Millo

discuss in `plm`

’s main vignette the fact that there is a great deal of overlap between econometric

treatment of panel data and other statistical approaches to panel data, like

multilevel modeling. They note, among, other things,

while a very comprehensive software framework for (among many other features) maximum likelihood estimation of linear regression models for longitudinal data, packages nlme (Pinheiro et al. 2007) and lme4 (Bates 2007), is available in the R environment and can be used, e.g., for estimation of random effects panel models, its use is not intuitive for a practicing econometrician, and maximum likelihood estimation is only one of the possible approaches to panel data econometrics.

and

Furthermore, we felt there was a need for automation of some basic data management tasks such as lagging, summing and, more in general, applying (in the R sense) functions to the data, which, although conceptually simple, become cumbersome and error-prone on two-dimensional data, especially in the case of unbalanced panels.

I totally agree with these things. I, however, am not an econometrician, so I

came to this area with the opposite problem as Croissant and Millo. I like and

respect `plm`

a great deal, but

a package doing panel data “from the econometrician’s viewpoint”

is the opposite of what I was looking for. The way I am accustomed to

thinking and talking about panel data analysis is different from the standard

econometric approach, for better or worse. My training in this area comes mostly

from sociologists, who are of course not ignorant of econometrics but have

different preferences and norms. There is a lot of overlap, but ultimately I

was motivated to fit a type of model that I would not expect to see in `plm`

even though it is closely related. Those efforts led to `panelr`

.

So here’s the **TL;DR:**

- I wanted to simplify the fitting of panel models that use multilevel

models for estimation, especially the kind that produces within-entity effects

equivalent to econometric fixed effects models. - I subsequently wanted to streamline GEE estimation of these models.
- I wanted to create a function that estimates asymmetric effects models (Allison, 2019).
- The
`panel_data`

object inherits from grouped`tibbles`

and should fit well

into workflows that rely on the “tidyverse.” - I have included tools that make reshaping data from wide to long and vice

versa more user-friendly. - The documentation and general approach talk about panel data in ways that

are more familiar to me and people similarly trained.

**Longer version:**

Bell and Jones (2015) describe a model specification for panel data that uses

the estimation technique econometricians would use for what they call “random

effects” models but generates estimates equivalent to what econometricians call

“fixed effects” models. This is achieved using multilevel (also known as mixed)

models and including individual-level means of time-varying predictors in the

model^{1}. What you get are within-entity estimates (exactly equivalent to

fixed effects) along with between-entity effect estimates, which are not robust

to confounding from stable predictors but nonetheless may be of substantive

interest. This further enables the analyst to include other time-stable

covariates, incorporate random slopes or additional grouping factors, and even

move to generalized models (e.g., logit).

The equivalence of these models was first noted by Mundlak (1978).

Multilevel modeling researchers have been estimating models like this for some

time (e.g., Kreft, de Leeuw, & Aiken, 1995; Hofmann & Gavin, 1998; Raudenbush & Bryk, 2002).

Only recently has there been wider recognition of the near-equivalence of

fixed effects models and this multilevel model that I and some others refer to

as the “within-between” model (Allison, 2009; Bell & Jones, 2015).

Wanting to fit these models is what got me started on the road to `panelr`

.

The transformations needed to properly fit the models were quite tedious and,

to quote Croissant and Millo, I “felt there was a need for automation of some

basic data management tasks such as lagging, summing and, more in general,

applying (in the R sense) functions to the data, which, although conceptually

simple, become cumbersome and error-prone.” With the popularity of `dplyr`

for

data manipulation, and the fact that it can make the necessary transformations

much easier, I thought others would find these tools to be useful and

accessible.

Later on, more models I was interested in became fairly straightforward to add

to the package. GEE estimation of the within-between model may be desirable in

some cases, for instance (McNeish, 2019). I wanted to fit asymmetric effects

models that allow positive and negative changes in variables to differ (Allison, 2019).

I was also able to implement a better method for calculating interactions in

within-between models (Giesselmann & Schmidt-Catran, 2018). Get the details on

these things in the introductory vignette.

In general, `plm`

has a lot more stuff. For instance, for fixed effects models

there are many different methods for calculating standard errors included

rather painlessly with `plm`

. In the mutlilevel modeling framework for

`panelr`

’s `wbm()`

function, the multilevel model inherently deals with the

within-entity correlation of errors, but you are limited if you would like

different kinds of adjustments (GEE estimation offers some more flexibility).

`plm`

also includes many tests of various model assumptions, like the Hausman

test (which can be replicated on a per-coefficient basis in the within-between

model). `plm`

has many tools for including instrumental variables, but `panelr`

has none and I don’t foresee any being added in the near future.

Overall, I am not motivated to duplicate features of `plm`

*just for the sake of
feature parity*. There are some models which are nearly or exactly equivalent

across the two packages, but this is just a happy coincidence. I will expand

`panelr`

as is prudent, which may sometimes involve duplication of `plm`

functionality but only for reasons relating to substantive differences in

implementation. As an example,

`panelr`

includes the function `are_varying()`

that allows the user to assess whether variables vary over time.

It is substantively equivalent to

`plm`

’s `pvar()`

, but I was motivated togive users a means for asking for specific variables using a “tidy” selection

interface. Although I would not generally promise that my packages are highly

performant, I later realized that

`are_varying()`

is much faster than `pvar()`

,which can be quite noticeable for larger datasets.

This is just to say that there may be cases in which `panelr`

does something

very similar, and it may sometimes be better or different in a way that you

would prefer. But I generally see `panelr`

as a complementary package, filling in

some gaps and giving an alternative way to do panel data analysis in R.

### References

Allison, P. D. (2009). Fixed effects regression models. Thousand Oaks, CA: SAGE Publications. https://doi.org/10.4135/9781412993869.d33

Allison, P. D. (2019). Asymmetric fixed-effects models for panel data. *Socius*,

*5*, 1–12. https://doi.org/10.1177/2378023119826441

Bell, A., & Jones, K. (2015). Explaining fixed effects: Random effects modeling

of time-series cross-sectional and panel data. *Political Science Research and
Methods*,

*3*, 133–153. https://doi.org/10.1017/psrm.2014.7

Giesselmann, M., & Schmidt-Catran, A. (2018). Interactions in fixed effects

regression models (Discussion Papers of DIW Berlin No. 1748). DIW Berlin,

German Institute for Economic Research. Retrieved from

https://ideas.repec.org/p/diw/diwwpp/dp1748.html

Hofmann, D. A., & Gavin, M. B. (1998). Centering decisions in hierarchical

linear models: Implications for research in organizations. *Journal of
Management*,

*24*, 623–641. https://doi.org/10.1177/014920639802400504

Kreft, I. G. G., de Leeuw, J., & Aiken, L. S. (1995). The effect of different

forms of centering in hierarchical linear models. *Multivariate Behavioral
Research*,

*30*, 1–21. https://doi.org/10.1207/s15327906mbr3001_1

McNeish, D. (2019). Effect partitioning in cross-sectionally clustered data

without multilevel models. *Multivariate Behavioral Research*, 1–20. https://doi.org/10.1080/00273171.2019.1602504

Mundlak, Y. (1978). On the pooling of time series and cross section data.

*Econometrica*, *46*, 69–85. https://doi.org/10.2307/1913646

Raudenbush, S. W., & Bryk, A. S. (2002). Hierarchical linear models:

Applications and data analysis methods (2nd ed). Thousand Oaks, CA: Sage.

- Conventionally, one also subtracts the individual means from the occasion-level predictor values, as one often does to estimate fixed effects models via OLS. This is not strictly necessary, though.

^{^}

**leave a comment**for the author, please follow the link and comment on their blog:

**R on Jacob Long**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.