Finding out repeated variables in multiple datasets

Posted on January 28, 2014 by Daniel in R bloggers | 0 Comments

[This article was first published on Daniel MarcelinoDaniel Marcelino » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Few days ago I posted on doing a smart job on importing several data files alike from a directory. Today, I want to return to this topic, but stretching it a bit further by adding some complexity. I want to have a snapshot of the datasets even before starting work with them. That is, I want to know beforehand which variables appear across multiple files. This post might be of particular interest for those using survey waves data, since surveys tend to repeat some questions (variables), but change others across time—or place as the interest of the research also changes.

In the R package “SciencesPo” there is a function named “detail” which describes the whole dataset in a nice way: variables as rows and descriptive statistics as columns. I do like this style because it doesn’t really matter how many variables one has. The output of “detail” may become long, but not too wide to fit in the screen. My intention then, is to obtain a similar feature, however, having the variable names as rows and file names as columns. Therefore, with the outcome table will be possible to quickly identify which variables appear in multiple files.

Finally, I’ll show how to get similar results using both R and Stata. The code is divided in two parts. In the first part I provide data for replication (the seniors data that ships with Stata). In the second, I run the example properly; therefore, if you already have some data, only the second part of the code may be important for you.

Doing it in R:

]1 R Output

Here the Stata goes:

]2 Stata Output

To leave a comment for the author, please follow the link and comment on their blog: Daniel MarcelinoDaniel Marcelino » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

R-bloggers

R news and tutorials contributed by hundreds of R bloggers

Finding out repeated variables in multiple datasets

Related

Related

Never miss an update! Subscribe to R-bloggers to receive e-mails with the latest R posts. (You will not see this message again.)

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)