I just pushed the most recent version of the PSID panel data builder introduced a little while ago. Got some user feedback and made some improvements. The package is hosted on github.
- I added a reproducible example using artificial data which you can run by calling ‘example(build.panel)’. This means you can try out the package before bothering to download anything and it provides a simple test of the main function.
- I’ve included a suggestion to use the R survey package to analyse this dataset and made it explicit in the examples how to obtain the desired weights for each wave. Note that your results are invalid in the majority of cases if you ignore the survey design (i.e. the weights).
- I got some useful comments from Anthony Damico (thanks!) and integrated the SAScii package. (check out his tutorials at http://www.asdfree.com/). This allows one to download the data directly from the PSID server into R, thereby removing any dependency on Stata or SAS to preprocess the raw data. (As is common with large datasets, the raw data come in ASCII format that needs to be fixed up into rows and columns.) The downside is that downloading directly takes a rather long time: downloading FAM1985ER, FAM1986ER and the index IND2009ER took 3 and a half hours.
Hopefully I can get another round of feedback (particularly from a windows user: I could not test that all the paths are written correctly on a unix system) before submitting to CRAN.