Few days ago I posted on doing a smart job on importing several data files alike from a directory. Today, I want to return to this topic, but stretching it a bit further by adding some complexity. I want to have a snapshot of the datasets even before starting work with them. That is, I want to know beforehand which variables appear across multiple files. This post might be of particular interest for those using survey waves data, since surveys tend to repeat some questions (variables), but change others across time—or place as the interest of the research also changes.
In the R package “SciencesPo” there is a function named “detail” which describes the whole dataset in a nice way: variables as rows and descriptive statistics as columns. I do like this style because it doesn’t really matter how many variables one has. The output of “detail” may become long, but not too wide to fit in the screen. My intention then, is to obtain a similar feature, however, having the variable names as rows and file names as columns. Therefore, with the outcome table will be possible to quickly identify which variables appear in multiple files.
Finally, I’ll show how to get similar results using both R and Stata. The code is divided in two parts. In the first part I provide data for replication (the seniors data that ships with Stata). In the second, I run the example properly; therefore, if you already have some data, only the second part of the code may be important for you.
Doing it in R:
Here the Stata goes: