There are many of schools of thought out there on methods for estimating standard errors from weighted survey data, and many books don’t tell you why a method may or may not be valid. So, we decided to simulate a situation that we often see in our work and see what worked best. A few highlights:
Be careful with “weights” arguments in R! Read the documentation carefully. There are many types of weights in statistics, and your standard errors, p-values, and confidence intervals can be wildly wrong if you supply the wrong type of weight.
Use bootstrapping to calculate standard errors. We found that even estimation methods that were made for survey weights underestimated standard errors.
Be skeptical of confidence intervals in weighted survey contexts. We found that standard errors were underestimated when simulating real-world imperfections with the data, such as measurement error and target error. 95% confidence intervals were only truly 95% in best-case scenarios with low error.