Internet surveys
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
I received the following email today:
I am preparing a thesis … I need to conduct the widest possible poll, and it occurred to me that perhaps you could guide me toward an internet-based way in which this can be done easily. I have a ten-question questionnaire prepared, that I wish to have an random sample of the population respond to. I have no budget for this, so I hope you can suggest a way in which a good number of responses can be harvested using blogs or sites you may be aware of.
Here is my response.
There are two issues here. The first is to find a convenient web-based data-collection tool. One popular approach is to use a survey form on Google Docs. The results are automatically saved to a Google spreadsheet. There are many online explanations of how to set up your survey form including this from Google help or this from digital inspirations. A more sophisticated tool for more complex surveys is SurveyMonkey. This allows skipping questions based on previous responses, response validation, and other useful features. For researchers collecting data, I generally recommend that they use SurveyMonkey. But for a quick poll of a small group, Google Docs is adequate. Using either tool, the responses can be downloaded and imported into R or some other statistical analysis package. Web-based data collection avoids all the problems associated with entering and encoding data, although one drawback is the tech barrier for some audiences. You won’t be able to use web-based data collection for a survey of the elderly, or of remote Amazonian tribes, or of many other populations where not everyone uses the internet. But if it is reasonable to assume that all members of the population use the internet, then web-based collection is much better than paper-based forms.
The second issue is more difficult. That is, how to get a random sample of the population. Here, there are no magic tech solutions. Advertising on blogs or other sites will simply give you a biased sample favouring those who read the blogs and have the time and interest to respond. Then you have to make the courageous assumption that the responders are representative of the population of interest. It is better to identify the population of interest first, and find some way of randomly sampling it where each member of the population has equal probability of being selected in the sample. How this can be done depends on the particular population being studied. I suggest you discuss a sampling strategy with the statisticians at your university. There are also some good online references including “Best practices” from the AAPOR, and “What is a survey?” by Fritz Scheuren. A useful textbook is Sampling: Design and Analysis by Sharon Lohr (Duxbury Press, 2009, 2nd ed.).
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.