Comparing `panelr` and `plm`

[This article was first published on R on Jacob Long, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

It has been just a little more than a day since I announced my new R package,
panelr, to the wider world.

Quite frankly, I’ve been surprised by the level of attention! Some responses
have been calls for economists to check the package out (they are certainly
welcome to do so) as well as things along the lines of there finally being
a package for panel data in R. So let me make something clear: There’s at least
one comparable package for R, called plm, which is very good and should be
particularly appealing for economists. This leads to the understandable question
as to how panelr differs from plm.

The plm package has been around since 2006 and is quite good.
I didn’t make panelr out of any kind of deep dissatisfaction with plm nor
the idea that it needed to be superseded.

Yves Croissant and Giovanni Millo
discuss in plm’s main vignette the fact that there is a great deal of overlap between econometric
treatment of panel data and other statistical approaches to panel data, like
multilevel modeling. They note, among, other things,

while a very comprehensive software framework for (among many other features) maximum likelihood estimation of linear regression models for longitudinal data, packages nlme (Pinheiro et al. 2007) and lme4 (Bates 2007), is available in the R environment and can be used, e.g., for estimation of random effects panel models, its use is not intuitive for a practicing econometrician, and maximum likelihood estimation is only one of the possible approaches to panel data econometrics.


Furthermore, we felt there was a need for automation of some basic data management tasks such as lagging, summing and, more in general, applying (in the R sense) functions to the data, which, although conceptually simple, become cumbersome and error-prone on two-dimensional data, especially in the case of unbalanced panels.

I totally agree with these things. I, however, am not an econometrician, so I
came to this area with the opposite problem as Croissant and Millo. I like and
respect plm a great deal, but

a package doing panel data “from the econometrician’s viewpoint”

is the opposite of what I was looking for. The way I am accustomed to
thinking and talking about panel data analysis is different from the standard
econometric approach, for better or worse. My training in this area comes mostly
from sociologists, who are of course not ignorant of econometrics but have
different preferences and norms. There is a lot of overlap, but ultimately I
was motivated to fit a type of model that I would not expect to see in plm
even though it is closely related. Those efforts led to panelr.

So here’s the TL;DR:

  • I wanted to simplify the fitting of panel models that use multilevel
    models for estimation, especially the kind that produces within-entity effects
    equivalent to econometric fixed effects models.
  • I subsequently wanted to streamline GEE estimation of these models.
  • I wanted to create a function that estimates asymmetric effects models (Allison, 2019).
  • The panel_data object inherits from grouped tibbles and should fit well
    into workflows that rely on the “tidyverse.”
  • I have included tools that make reshaping data from wide to long and vice
    versa more user-friendly.
  • The documentation and general approach talk about panel data in ways that
    are more familiar to me and people similarly trained.

Longer version:

Bell and Jones (2015) describe a model specification for panel data that uses
the estimation technique econometricians would use for what they call “random
effects” models but generates estimates equivalent to what econometricians call
“fixed effects” models. This is achieved using multilevel (also known as mixed)
models and including individual-level means of time-varying predictors in the
model1. What you get are within-entity estimates (exactly equivalent to
fixed effects) along with between-entity effect estimates, which are not robust
to confounding from stable predictors but nonetheless may be of substantive
interest. This further enables the analyst to include other time-stable
covariates, incorporate random slopes or additional grouping factors, and even
move to generalized models (e.g., logit).

The equivalence of these models was first noted by Mundlak (1978).
Multilevel modeling researchers have been estimating models like this for some
time (e.g., Kreft, de Leeuw, & Aiken, 1995; Hofmann & Gavin, 1998; Raudenbush & Bryk, 2002).
Only recently has there been wider recognition of the near-equivalence of
fixed effects models and this multilevel model that I and some others refer to
as the “within-between” model (Allison, 2009; Bell & Jones, 2015).

Wanting to fit these models is what got me started on the road to panelr.
The transformations needed to properly fit the models were quite tedious and,
to quote Croissant and Millo, I “felt there was a need for automation of some
basic data management tasks such as lagging, summing and, more in general,
applying (in the R sense) functions to the data, which, although conceptually
simple, become cumbersome and error-prone.” With the popularity of dplyr for
data manipulation, and the fact that it can make the necessary transformations
much easier, I thought others would find these tools to be useful and

Later on, more models I was interested in became fairly straightforward to add
to the package. GEE estimation of the within-between model may be desirable in
some cases, for instance (McNeish, 2019). I wanted to fit asymmetric effects
models that allow positive and negative changes in variables to differ (Allison, 2019).
I was also able to implement a better method for calculating interactions in
within-between models (Giesselmann & Schmidt-Catran, 2018). Get the details on
these things in the introductory vignette.

In general, plm has a lot more stuff. For instance, for fixed effects models
there are many different methods for calculating standard errors included
rather painlessly with plm. In the mutlilevel modeling framework for
panelr’s wbm() function, the multilevel model inherently deals with the
within-entity correlation of errors, but you are limited if you would like
different kinds of adjustments (GEE estimation offers some more flexibility).
plm also includes many tests of various model assumptions, like the Hausman
test (which can be replicated on a per-coefficient basis in the within-between
model). plm has many tools for including instrumental variables, but panelr
has none and I don’t foresee any being added in the near future.

Overall, I am not motivated to duplicate features of plm just for the sake of
feature parity
. There are some models which are nearly or exactly equivalent
across the two packages, but this is just a happy coincidence. I will expand
panelr as is prudent, which may sometimes involve duplication of plm
functionality but only for reasons relating to substantive differences in
implementation. As an example, panelr includes the function are_varying()
that allows the user to assess whether variables vary over time.
It is substantively equivalent to plm’s pvar(), but I was motivated to
give users a means for asking for specific variables using a “tidy” selection
interface. Although I would not generally promise that my packages are highly
performant, I later realized that are_varying() is much faster than pvar(),
which can be quite noticeable for larger datasets.

This is just to say that there may be cases in which panelr does something
very similar, and it may sometimes be better or different in a way that you
would prefer. But I generally see panelr as a complementary package, filling in
some gaps and giving an alternative way to do panel data analysis in R.


Allison, P. D. (2009). Fixed effects regression models. Thousand Oaks, CA: SAGE Publications.

Allison, P. D. (2019). Asymmetric fixed-effects models for panel data. Socius,
5, 1–12.

Bell, A., & Jones, K. (2015). Explaining fixed effects: Random effects modeling
of time-series cross-sectional and panel data. Political Science Research and
, 3, 133–153.

Giesselmann, M., & Schmidt-Catran, A. (2018). Interactions in fixed effects
regression models (Discussion Papers of DIW Berlin No. 1748). DIW Berlin,
German Institute for Economic Research. Retrieved from

Hofmann, D. A., & Gavin, M. B. (1998). Centering decisions in hierarchical
linear models: Implications for research in organizations. Journal of
, 24, 623–641.

Kreft, I. G. G., de Leeuw, J., & Aiken, L. S. (1995). The effect of different
forms of centering in hierarchical linear models. Multivariate Behavioral
, 30, 1–21.

McNeish, D. (2019). Effect partitioning in cross-sectionally clustered data
without multilevel models. Multivariate Behavioral Research, 1–20.

Mundlak, Y. (1978). On the pooling of time series and cross section data.
Econometrica, 46, 69–85.

Raudenbush, S. W., & Bryk, A. S. (2002). Hierarchical linear models:
Applications and data analysis methods (2nd ed). Thousand Oaks, CA: Sage.

  1. Conventionally, one also subtracts the individual means from the occasion-level predictor values, as one often does to estimate fixed effects models via OLS. This is not strictly necessary, though.

To leave a comment for the author, please follow the link and comment on their blog: R on Jacob Long. offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)