Andrew Gelman discusses election forecasting and polling. (Transcript)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Here is the podcast link.
Introducing Andrew Gelman
Hugo: Hi there, Andy, and welcome to DataFramed.
Hugo: Such a pleasure to have you on the show and I’m really excited to have you here today to talk about polling and election forecasting, but before that I’d like to find out a bit about you just to set the scene. My first question is, what are you known for in the data community?
Andrew: What is the data community?
Hugo: The data community, I think, is the rough broad collection of people working with data analytic techniques, working with data science, and working with large and messy datasets these days.
Andrew: I’m probably best known as one the of authors of the book Bayesian Data Analysis, which came out in 1995 but then we’ve had two more editions since then. So that was a book, I like to think of it as the first applied Bayesian Statistics book. So, a lot of people who have gotten into Bayesian Statistics have gotten there through our book, or used the book as a reference.
Hugo: Great. And maybe you can tell us a bit more about Bayesian Statistics in general, just by way of introduction. And I suppose there are two types of statistics in general that we talk about which are Bayesian Statistics and Frequentist Statistics, right?
Andrew: So, in Bayesian Statistics, all of your unknowns, all of your unknown parameters and predictions are associated with a probability distribution. So the way you solve a problem using Bayesian Inference is, you put all of your knowns and all of your unknowns into a joint probability distribution and then use the laws of probability to make statements about the unknowns, given the knowns.
Hugo: And so, you’ve actually done a lot of work on implementing a lot of Bayesian techniques in a language called Stan, right? In fact, a language in which, as you mentioned, probability distributions are the core objects of the Bayesian Statistics. I suppose distributions are first class citizens in Stan and other what are known as probabilistic programming languages, right?
Andrew: Right. Exactly. So, I can give you a simple example. Suppose you’re doing an educational innovation, and you want to look at students’ test scores after the intervention. So you start with basic statistical ideas, you fit a linear regression model, say predicting the test score given their pre-test score, and given an indicator for whether they got the treatment or the control. So that’s regression, that’s not Bayesian yet. It’s just statistical modeling.
Andrew: It can become more or less difficult, it can become non-linear, you can control for more predictors, not just your pre-test but all sorts of student characteristics. There’s a million things you can do. What makes it Bayesian, is that this regression model has parameters, like the effect of a treatment, how much the post-test is predictable from the pre-test. There are parameters for how big your variance is, shapes of distributions, whatever.
Andrew: All of those parameters are assigned a probability distribution. We call it a prior distribution. So, you put that all in Stan, along with your data, and then it gives you a posterior distribution which represents your uncertainty about the parameters after seeing the data.
Hugo: And so, Bayesian data analysis and Bayesian inference, I think, historically, we’ve seen them to be incredibly powerful but maybe haven’t been adopted as widely as Bayesians would have liked. I think a lot of learners, a lot of people learning data science and statistical inference, may find Bayesian data analysis even a bit scary. Firstly, is this right? Secondly, why is that the case and how can we correct that?
Andrew: In Bayesian statistics you kind of make a deal with the devil. You assume a probability model, so you make a big assumption, and having done that, you can make predictions about just about anything. So, I think, maybe it’s a little scary in some way because it’s so powerful and so easy to use that it’s like those 3D printers, people are afraid of them because they can print anything. So, in Bayesian statistics, even if you have weak data, you can get inferences and the inferences then become driven by your prior distribution. There’s a saying we have in Bayesian statistics, with great power come great responsibility. What that means is that, in Bayesian inference, it’s very important that you check the fit of your model and check the reasonableness of your model.
Andrew: So, in that sense, there’s kind of two approaches to statistics. One approach is to make very minimal assumptions, and the other is to make maximal assumptions. The Bayesian approach is really you make maximal assumptions. What I like to say is you create a paper trail going from your assumptions to your conclusions, then if your conclusions don’t make sense, you look at what was wrong with your assumptions. What was wrong might be your model for your data. Maybe your sampling was biased and you didn’t recognize that. But, whatever it is, somewhere you need to go back and forth, you need to communicate between your assumptions and conclusions.
Andrew: A lot of people would rather work without assumptions and sometimes you can, we can talk about examples, but basically, if you have a clean problem and good data then you don’t need to work with a lot of assumptions, except for the assumption that you have good data. As the data quality becomes worse, as your questions become more difficult to answer, you need to put more assumptions in and then Bayesian inference becomes more useful.
Hugo: Absolutely and one of the great things that you mentioned in there was the ability to check your model after the fact and we have enough computational power to do that these days, right? So, for example, once we have our model, we can simulate what the data would actually look like and compare that to the data we actually saw.
Andrew: Exactly. We call that posterior predictive checking. People have been doing this for a long time, they were just not under that name. There was a book from the 1950s by the statistician, Frederick Mosteller, where they were analyzing data from an experiment, it was called a stochastic learning experiment, they were actually giving electric shocks to dogs in cages and seeing how long it took for the dogs to figure out that the shock was coming. So, they had this probabilistic model and then after fitting the model, they simulated fake data and compared the fake data to the real data.
Andrew: In the 1970s, the statistician, Brian Ripley, who was working on spatial statistics, and since has become very famous for his involvement with R, Brian Ripley was fitting spatial models and again did the same thing. He had a model that seemed kind of reasonable, he stimulated replicated data from the model and it didn’t look like the real data and that then inspired him to expand his model. So, it was examples like that that motivated us to formalize this idea of model checking. I think people have always checked their model but there’s been a sense in which it has been outside the system. It’s not that people are embarrassed to check their model but it’s almost like people think, “I’m a good person. I’m a good citizen. So, I check my model.” And it hadn’t been formally encompassed into statistics and in the Bayesian framework, you can do that. You can put model checking right in the middle of the process and not feel that its some external thing you’re doing.
Hugo: I’m glad you mentioned that cause that was my next point that it is actually baked in to the Bayesian workflow, the idea of model checking.
Hugo: So, this was Bayesian data analysis. Are there any other things that you’re known for in the data community?
Andrew: I’d like to say that I’m known for statistical graphics because in the early 2000s, I did a lot of work trying to integrate statistical graphics with statistical analytics. So, traditionally, there’s this idea that exploratory data analysis is looking at your data and finding interesting patterns. Confirmatory data analysis is like crunching the numbers and getting your p values. Exploratory data analysis, again, was kind of on the outside of statistics. Its proponents would often say, “Forget all this silly modeling stuff, let’s just jump to looking at the data.”
Andrew: But, what’s interesting, is if you think carefully, exploratory data analysis is finding the unexpected. So, to say I’m finding the unexpected means that’s relative to the expected. In fact, exploratory analysis is most powerful when it’s tied to models. So, I think exploratory data analysis and statistical graphics, and learning new things from data from visualizations is actually fitting in very well with Bayesian inference and formal statistical modeling. Because you fit the model, the better your model is the more you learn from its falsification.
Andrew: So, way back, Copernicus had the model that the planets were going in circular orbits around the sun and it’s easy to falsify that. But then, Kepler moved to the elliptical orbits, so falsifying that became more interesting and so forth. So, every time we have a model, that motivates more sophisticated graphics which allows us to learn more.
How did you get into DataScience and Statistics?
Hugo: So, how did you get in to data science and statistics, originally?
Andrew: I always was good at math, ever since I was a baby, and then, I’ve written about this actually, but anyway, when I was in high school, I did the math Olympiad training program, I found that there were people better at that than I was. We had a very naïve view back then, so we didn’t know about applied math, we just knew about this thing called math, and we thought ability was uni-dimensional. But anyway, I went to college and studied physics and math and I didn’t want to be a pure theoretician. I just felt I wasn’t good enough to make useful contributions in that way. I first took a probability class because it was in the evening, it fit my schedule.
Andrew: So, I took probability and stochastic processes and then took statistics and I really liked that. In statistics, there’s kind of a continuous connection between everything that I care about. So, there’s a connection between things that I can do, like mathematics, and also things like politics, public health, economics, sociology, all those things. There’s kind of a continuous thread from these qualitative thoughts about what’s going on in our country, what’s going on in the world, how do people learn, all sort of things like that, through qualitative thinking, statistical modeling, mathematical analysis, programming, all those things. So, it was really perfect for me.
Andrew: I sometimes think that statistics should be called mathematical engineering. They have electrical engineering and mechanical engineering and statistics is mathematical engineering.
Hugo: I like that and something that you hinted in there or spoke to directly is that it is this marriage of your aptitude and mathematical skills but also your serious deep interest in the political and social sciences.
Andrew: Yeah. In college, I minored in political science and so I found that very interesting. Political science is a funny field because you don’t make progress in the same way you do in a technical field. You can say technically, we can do all sorts of things that Gauss couldn’t do, whatever, I’m sure he could figure it out when he saw it, but we just know stuff they didn’t know. In politics, what do we know that Hobbes didn’t know? Well, it’s hard to say. A lot of specific things like the size of the incumbency advantage and so forth, but it’s a little bit different. It’s more like something like architecture. We have buildings now but you’re just building things that serve current purposes then maybe the principle of the technology changes. But the general principles aren’t changing.
What are the biggest challenges facing data science and statistics as disciplines?
Hugo: So, before we get in to polling and election forecasting, I just want to speak more generally to data science and statistics. I’m just wondering, it’s 2018, moving forward from now, what do you think the biggest challenges facing data science and statistics as disciplines are?
Andrew: Well, speaking generically, I think there are three challenges of statistical inference. The first is generalization from samples to population and that’s a problem that’s associated with survey sampling but actually arises in nearly every application of statistical inference. People sometimes say, “Wait, I have data on the 50 states. That’s the population. We’re not gonna have a 51st state any time soon.” Even then, I would respond, “Okay, you have data from the 50 states last year and the last 10 years, what you’re interested in is the 50 states next year.” So, there’s always some generalization that’s involved. So, ideas of statistical sampling always come up.
Andrew: The second fundamental challenge of statistics is generalizing from the control group to the treatment group. Much of the time we’re interested in the effect of some treatment or intervention and obviously things like drugs, or educational interventions, or business decisions, but all sorts of social science things. Whenever you ask why is something happening, you’re implicitly asking what would happen if I change something. With rare exceptions, we don’t have a matched control and treatment group. Typically the people who you can do something to are different from the people who didn’t get the treatment, and so some adjustment needs to be made.
Andrew: The third is generalizing from observed measurements to the underlying constructs of interest. So, that’s most obvious in something like educational testing. You want to know ability but what you get is test score. So, we spend a lot of time designing instruments, designing survey questions, lab measurements. What those people at Theranos, those fraudulent blood testing people, that was all about measurement. So, when you talk about challenges, I think those are the old challenges and they remain the new challenges. Big data tend to be messy data. So, it’s not a random sample, it’s convenience sample, it’s an opt-in sample. You don’t have control and treatment group, people choose their own decisions on what to do. Often, you don’t have careful measurements of what you care about, you often just have data that are available from another source which you’re trying to adapt.
Andrew: For that reason, if you want to get good predictions and sensible answers, and learn, you need to adjust for differences between sample and population. You need to adjust for differences between control and treatment group and you need to model the connection between what you care about and what your measurement is. All that can take a lot of modeling so, typically, we say that you either get good data, or good model, or you have to have a mixture of both. You have to do a little bit of data, a little bit of work, you have to do work on data collection, you have to also work on the model. So, if you have big data and you need big model, then that’s going to require a lot of computation and that’s going to be expensive. So, you need algorithms for fitting models, approximately fitting models. We have some sort of good things in our corner. For example, as you get a lot of data, often your inferences will become more stable, they won’t necessarily converge to the right answer but things might look more normally distributed, that’s from the Central Limit Theorem. So, that suggests that certain statistical methods, certain approximations might work well when you have a lot of data. Which is good, cause when you have a lot of data, that’s when you need the approximations more. So, there’s a lot of things like that, moving between applications and research agendas but the research is to fit these big models and to understand them and that’s continually going to be a challenge.
Hugo: So, those are all really important points that we’ll actually see focused even more through the lens of polling and election forecasting. Before we get there, this idea of statistical inference and statistical modeling, I’m wondering what it takes to be able to be part of that conversation. I suppose, my question is, as humans, we don’t necessarily have good statistical intuition and I’m wondering how you, as an educator and statistician, would like to see statistical and data literacy change in general for a general population?
Andrew: There’s different ways of looking at it. Some of this is procedural. So, if there is an expectation that, when you have an analysis, you put your data on GitHub and you put your analysis on GitHub and it’s all replicable, I think that alone will help. That won’t make peoples analysis better but it will make is easier for people to see what went wrong. It’s surprisingly difficult to get people to say or write exactly what they did. I find this with students but even I’ve been in consulting settings where maybe there’s an expert on the other side and they do an analysis and they write up their analysis and you can’t understand what they did. They’ll photocopy three pages from a textbook and say, “We did this.” And they don’t say where their data come from or anything. I’ve come to realize that a lot of people don’t even know what they did. People don’t have a workflow, they just have a bunch of numbers and they start screwing around with the numbers and putting calculations in different places on their spreadsheet, and then at the end they pull a number out and write it down and type it in to their report. So, that famous example that Reinhart and Rogoff Excel error from that econ paper from a few years ago, but lots of published journal articles where not only the the results not replicate, but people have gone back to the articles and found that the numbers in the paper aren’t even consistent with themselves. For example, they’ll say there is a certain number of cases, and then they’ll have a percentage, but the percentage doesn’t correspond with any ratio with that denominator, or they have the estimates and the standard errors and the Z-scores but they don’t correspond to the same thing.
Andrew: I’m just starting to realize people don’t have a workflow at all. Requiring a workflow would help. When it comes to understanding, there’s something you might have heard when you were a kid, which is if you have difficulty with a math problem, put a dollar sign in front of it and then it’s somehow much harder to be off by orders of magnitude. Psychologists, such as Gerd Gigerenzer and others, have put a lot of work in to understanding our cognitive illusions and how we can fix those problems. One idea is to move away from probability and move towards frequencies.
Andrew: So, there are these classic probability problems like there’s a disease, and one percent of the people have the disease, and you do a test and the test for the disease has 98 percent accuracy, somebody tests positive, what’s the chance that they have the disease? And it’s very hard to do that in your head. But, what you can do is say imagine you have an auditorium with a thousand people in it, well I just told you one percent of the people have the disease, so picture 10 people in the front row of the auditorium. They’re the people with the disease. The other 990 don’t. Now we’re going to do a test with 98 percent accuracy. That’s tough because you have to do 98 percent of the 10 people, so then you need higher numbers.
Andrew: Let me rephrase that and say if has 90 percent accuracy, just to keep the algebra simple. The test has 90 percent accuracy. So, then you look at the 10 people in the front row, well 9 of them test positive and one of them is gonna test negative, and you look at the 990 people otherwise, and out of them, 99 are gonna test positive by accident, that’s 10 percent, and then the others are gonna be negative. Then you take all the people who test positive, if you have them raise their hand, and you see that we have 9 sick people who tested positive and 99 healthy people who tested positive. So, most of the people tested positive were healthy. So, the amazing thing is, I could do that all by speaking in my head and I couldn’t solve the first problem in my head. You could say, well, but I had to screw around with the numbers cause 98 percent didn’t work, but that’s kind of the point. If you have a one percent disease, and the test has 98 percent accuracy, you really can’t solve the problem by thinking of a thousand people. You need a larger population. So, we could think of a city with a million people and now, one percent, so 10 thousand people have the disease, and I’m purposely talking this through to show that you can do it. 10 thousand people have the disease, and 990 thousand don’t. You could write this down but you could try it in your head. Then, of those 10 thousand with the disease, 98 percent, so that’s gonna be 200.
Andrew: So, I could change the numbers around a little, I could do it in different ways, but the point is having that denominator makes it easier to visualize, it makes all the numbers makes more sense. So, Gigerenzer’s argument is that the denominator really is always there and the denominator actually matters. There’s a difference between something that happens 10 percent of the time to 10 people compared to something that happens 10 percent of the time to 10 thousand people. It’s a different phenomenon. Probability theory is great, so the answer is there are ways of understanding probability better by thinking in terms of frequencies.
Hugo: And this is something we’ve actually seen in election forecasting, so this’ll prove a nice segue. I know 538 and Nate Silver’s house model, they won’t say we predict that the democrats have a 75 percent chance of getting the house, they’ll say a three in four chance because they feel that, heuristically, that helps people formalize it a bit better. They’ll know one out of four times the republicans will get it, three out of four the democrats will. And then you can even think in those terms what does one in four mean. That’s the frequency of getting two heads in a row which you wouldn’t be surprised if that happened, right?
Andrew: Oh, sure, this happened before, I could tell you a story about Nate, but first, before the 2016 election, someone said, “Well what about this forecast?” There was some model that gave Clinton a 90 percent chance of winning. Well, 90 percent, how do you think about that? And I said, “There’s a presidential election every four years. 10 percent means something would happen roughly once every 10 elections, so every 40 years.
Andrew: I remember about 40 years ago, in the 1980 election, that it was supposed to be very close and then Regan one by I think about 7 percentage points. So, it was a big surprise. So, yeah, I think it could happen. Sure. Actually Clinton did very close to what she was polled, she was supposed to get 52 percent of the two party votes, and she got 51 percent. So, the polls are better now, in some ways, the forecasts are better now than they were in 1980. But, that’s how I calibrate one in 10. As a political scientist, I often say I don’t like 95 percent intervals. Because the 95 percent intervals are supposed to be correct 19 times out of 20 for 20 presidential elections that takes 80 years. I think it’s ridiculous to try to make a statement that would be valid over an 80 year period because politics changes over 80 years.
Andrew: Now, my story about Nate was in 2012, he was going around, he said, “Obama has a 65.8 percent chance of reelection”, then next week he’d say it was 63.2 percent, then it was 67.1 percent, and it would jump around. It was meaningless. You can say he has a 60 percent chance, but to say a 65.1 percent, you can do a little bit of mathematics. What you can do is say let’s predict his share of the vote. Let’s suppose he was predicted to get something like 52 or 53 percent of the vote and there’s some uncertainty. You have a little bell-shaped curve and if it’s less than 50 percent, let’s temporarily forget about the electoral college for a minute, that’s not really the concern here. The point is if his electoral votes are predicted to be less than 50 percent then he would lose, otherwise he would win.
Andrew: Let’s suppose you say the probability is 65.8 percent. That’s gonna correspond to a certain bell-shaped curve with his expected number of votes and uncertainty. It turns out, if you wanted to shift that from 65 percent to 66 percent, that would correspond to shifting his forecast share of the vote from something like, I don’t remember the exact numbers, something like 52 percent to 52.01 percent. Something trivial like that. So, it’s a meaningless number. It would be like saying Steph Curry is 6 feet 3.81724 inches tall.
Andrew: So, I got on Nate’s case and I said, “I understand, Nate, you’re trying to, you need eyeballs. You need news every week. There’s not much news. Obama’s expected to win, but he might not. Every week, Obama’s in the lead, but he might lose. That’s what we know. It’s hard and one way of creating news is to focus on these noise fluctuations.” So, if he’s shifted to saying things like three in four chance, I think that’s a good thing. He might have lost a few clicks that way, but one thing I’ve always admired for many years about Nate is his integrity. I don’t think he would want people to get fooled by noise. So, it’s a very good thing that he’s done that.
What is polling?
Hugo: So, moving to polling. Polling is generally thought of with respect to election forecasting. I’m wondering what polling is, more generally, and what type of things it can tell us.
Andrew: Survey sampling is whenever you want to learn about the whole from a part. A blood test is like a survey sample. They take a sample of your blood and it’s supposed to be representative of your blood. If I interview people on the street and ask them how they’re gonna vote, that’s supposed to be representative of the general population. Well, it might not be. They do random digit dialing, that’s kind of representative of the population except not everybody answers the phone. Most people don’t answer the phone, actually. So, it’s not at all representative of the population.
Andrew: I was talking in my class and saying how I think it’s mildly unethical to do an opinion poll and not pay people. You do a survey, you’re making money from your survey, and a lot of pollsters do. Online panels pay people but a lot of your telephone polls they just call people up and you’re kind of abusing people’s good will by doing that. Then someone said, “But, what about the kind of people who will only participate in a survey cause you pay. Are they non-representative?” And I said, “What do you think about the kind of people who will participate in a survey for free? They’re kind of weird people, huh? Most people don’t. Most people hang up on pollsters.” So, survey respondents are not representative.
Andrew: We do a lot of work to adjust the sample to the population. We need to because response rates are so low. But, anyway, it’s not just election polling, it could be public opinion, blood testing, it could be businesses, they audit their own records and if they want to do an audit they’ll take a random sample of records and then audit the random sample and use that to draw conclusions about the whole business et cetera.
Hugo: So, before we move into polling in a bit more detail, I’m just wondering, can you tell us why polling is even important?
Andrew: Well, George Gallup, who was the founder of his poll, wrote a lot about this. He argued that polling is good for democracy. There’s two ways of putting it. Bill James, the great baseball analyst, once said something along the lines of, “The alternative to good statistics is not no statistics, it’s bad statistics.” He was arguing that some baseball player was overrated and then he quoted some sports writer saying, “This Bill James number cruncher doesn’t know anything. This batter was amazing. He had 300, all these time, and he got all these…” And Bill James pointed out, let’s look at what the sports writer said. What was his evidence that this guy was such a great athlete? It was a bunch of statistics. He was just using statistics naively, but the guy wasn’t being Mr. Qualitative, he started talking about how the baseball player was hitting 300.
Andrew: Now, similarly, suppose you are a legislator and you want to know about public opinion. I think, first, public opinion is relevant. We don’t always like when politicians follow public opinion too much but I think we like them to be aware or public opinion. So, if they don’t have polls, what are they gonna do? They might very well do informal polls. Canvasing. And that over-represents certain kinds of people. That’s gonna be unrepresentative of the people they find hard to reach. Gallup’s argument was pretty much that democracy is ultimately based on public opinion and knowing public opinion between the elections is important. A lot of issues come up and it should allow politicians to do a better job, which seems reasonable to me.
Andrew: Beyond all that, of course, surveys are used all the time in marketing. So, business people don’t have to apologize for wanting to know what the customer wants. So, it makes sense to do that. Marketing surveys are very interesting in part because you get into this question of connecting the observed measurements to what you really care about because how realistic is a marketing survey? So, if I call you up on the phone and say, “Will you pay 30 thousand dollars for this sort of electric car?” You could say yes or no, that doesn’t mean that that’s gonna actually make it out of the showroom, cause the survey’s not realistic.
Andrew: Political surveys are a little easier. Who do you plan to vote for? That’s almost the same as being in the damn polling booth and voting. So, the realism of political surveys is much closer than the realism of certain marketing surveys.
Hugo: And I don’t know how long his has been the case but we’ve definitely seen polling affect… there’s a feedback loop in to the political, and voting, and election process. I think it was, the primaries, I think, the debates, your position on the stage and whether you’re in the debate or not, is actually dependent on your performance in polls, right?
Andrew: Yeah, and Donald Trump, when he would give speeches in the primaries, he would talk about how high his poll ratings were.
Hugo: Until they weren’t, and then he said they’re unscientific.
Andrew: Well, yeah, but I’m not talking about his approval, but the percentage of people who said they were gonna vote for him. So, he was polling very high even when outside observers didn’t seem to give him much of a chance. So, yeah, there is feedback. I’ll say one thing, there is a useful feedback, at least for pollsters. Sometimes the question arises, why do people tell the truth to a pollster? And you’ll sometimes get, pundits will say, “Hey, let’s all lie to the pollsters. Let’s screw them up. I don’t like the pollsters. Tell them the opposite of what you think.” And yet, people don’t do that. And there’s a couple reasons for this.
Andrew: The first is that, as I said, polls are opt in. No one forces you to do a poll. So, if you really hate pollsters, it’s likely you won’t bother to do it in the first place. But the second thing is that I think people think of a poll is like a way of voting. So, if I survey you and do you approve of Donald Trump’s job performance? You think this might get out in the news somewhere, you’re motivated: if you approve, you’re motivated to say yes, and if you don’t, you’re motivated to say no. There’s a direct motivation to be sincere in your response. Again, that’s not true of all surveys. If I ask you, do you take illegal drugs? You might have various motivations not to answer that honestly.
Hugo: I couldn’t answer that on air either.
Andrew: Well, there are asymmetries. You could if your answer was no, and you could answer it on air if you want, I’m not asking you. I’m just saying that it’s complicated. So, one thing that’s not always very well understood about political polling is that the incentives align to actually encourage sincerity in the survey response. That’s very important.
Does public opinion exist?
Hugo: Now, the other thing you mentioned that I just want to touch on briefly is this idea of polls, and measuring public opinion, and this is more playing devil’s advocate, not necessarily trolling. I’m just wondering, public opinion generally is views that are prevalent among the general public. Does public opinion even exist?
Andrew: It’s like Heisenberg’s uncertainty principle. So, to measure opinion is to change it. You know how you want to measure the position of a particle, you have to look at it, and looking at it means bouncing a light particle off it and that adds energy and it changes its position and momentum? So, similarly, if you want to know what someone thinks, you have to ask them and then that changes it. Now, not always can you observe their behavior. There’s other ways.
Andrew: I have a college, Matt Salganic, he’s a sociologist at Princeton, he wrote a book around social science data collection recently and he talked about… you can survey people, you can ask them, or you can observe them. Those are different. You can observe someone so inobtrustively it doesn’t change their behavior, sometimes. Amazon can look at how you purchase. Arguably, once you know that Amazon’s looking, then you might not purchase certain things or not search on things because you don’t want them to know about it. Until that happens, you can observe them.
Andrew: Similarly, with the security camera outside of your apartment. If you don’t know it’s there, then it’s observing you pretty well. So, in that sense, if you think of we being measured, we’re kind of in a cat and mouse game with those social scientists who are trying to measure us. That they’re trying to measure us in ways that don’t disturb us and we might want to be aware of how we’re being measure.
Hugo: So, now I want to get directly into polling and this is something that’s known. I’m going to quote you because you said it so well in an article in Slate magazine that I will link to in the show notes. You wrote, “The statistical theory of traditional polling is amazing. In the theory, a random sample of a thousand people is enough to estimate public opinion within a margin of error of plus or minus three percentage points.” Could you just give us a run down of what this exactly means?
Andrew: This is the mathematics of drawing balls from an urn. So, if you have a large urn that’s full of balls, and 55 percent of the balls are green, and 45 percent are yellow, and you draw a ball at random a thousand times, then most likely, you’ll get between 52 percent and 58 percent green balls. And so, it’s 55 percent in the urn, and you draw a thousand, each time you draw a ball and throw it back in the urn and shuffle it and draw another one, then the mathematics of probability tell you that the most likely thing you’ll see is 55 percent green balls, but it could be anywhere between 52 percent and 58 percent. There’s a 95 percent chance, roughly, that it’s in that range. So, we call that the margin of error. If you can actually sample people, like drawing them from an urn, you can learn about public opinion pretty accurately.
Hugo: But, of course, this is theoretical, right? And one of the parts of the theory is that its’ a random representative sample. I’m wondering what the practical problems and challenges associated with this theory are.
Andrew: In practice, you can’t draw people at random from the urn because there’s no list of people. You can call phone numbers at random, not everyone has a phone, some people have two phones, some people never answer their phone, et cetera. Also, if you draw a ball, you get to look at it in the urn model, but, when you’re sampling people, you draw a ball and what if they don’t want to respond to your survey? Then you don’t get to see it.
Andrew: So, our surveys are systematically non-representative of the population. So, what we do is we adjust for known differences between sample and population. So, our population is 52 percent female, but our survey is 60 percent female, we adjust for that. Our survey gets too many old people, it gets too many white people, it gets too many people from some states and not others. Different surveys have different biases. Exit polls, I’ve been told, tend to oversample Democrats, maybe it has to do with who’s willing to talk to the exit poll interviewer. The kind of people who are willing to answer the phone may be different.
Andrew: Then, the other thing is you need to worry about getting honest responses or adjusting for inaccuracy in survey responses, which, like I said, is less of an issue for political polling but comes up in other surveys.
Hugo: I’m really interested in this idea of calling people on the telephone because classically, historically, a lot of people had landlines and you could do that. This isn’t the case anymore and my understanding is that there’s legislation that means you can’t automate phone calls to cell phones, is that right?
Andrew: I don’t know exactly what the laws are about what you can and can’t do. It was all kind of just a window. When Gallup started doing polls, they’d knock on doors because a lot of people didn’t have phones back then. So, there was a certain period where a lot of people had phones. In other countries, not everybody had phones either. But, again, even if you could call everybody, so what? The respondents are not representative of the population.
Hugo: So, it’s really the adjustment process that is really key as well.
Andrew: Yeah, it’s both. You’ve got to try to get a representative sample even though you’re not gonna get it because you want your biases to be correctable. So, if my bias is that I have too many women, I an correct for that. Or too many old people. If my bias is that I have too many conservatives, can I correct for that? Well, maybe cause you can ask people their party affiliation and then you can match it with data on people’s party registration. It’s more work, right? If I’m asking about health care and my bias is that people with health problems are more likely to respond to the survey. Can I adjust for that? Well, that might be tougher.
Andrew: So, it makes sense to try to get that perfect sample even though you’re not gonna get there, to aim for it.
Hugo: And are these correction and adjustment methods relatively sophisticated statistically?
Andrew: Well, they’re getting more sophisticated as our data get worse and worse. So, the short story is that there’s three reasons they need to get more sophisticated. One is to adjust for inaccurate responses but, as I said, I’m not really gonna focus on that. Second is differences between the sample and the population. You want to adjust for a lot of factors, not just sex and age and ethnicity, party identification, lots of things. So, when you want to adjust for more things, then simple adjustment methods, simple weighting methods, don’t do the job. We use a method called multilevel regression and post-stratification, there are other approaches, but you need more sophistication to adjust for more variables.
Andrew: Then, the third thing is that we ask more from our surveys. So, we might want to know not just public opinion, not just do people want to vote for their democrats or republicans, but how does that vote break down among all of the 435 congressional districts? So, even if I have big data, I won’t necessarily have a large sample of each congressional district. So, you want to do statistical analysis to get those more focused inferences. So, that’s one reason why my colleagues and I have put a lot of effort in to modeling the survey response to be able to estimate subgroup of the population, like rich voters and poor voters within different states.
Hugo: Fantastic. And something that I know you’ve worked in is thinking outside the box, how to get people involved in surveys, that was an unintentional poor pun, but, because you’ve actually used gaming technology and Xboxes in order to get survey responses, right?
Andrew: Yes, my colleagues at Microsoft research in New York did that. Microsoft research has some social scientists and my colleagues, David Rothschild and Sharad Goel, who worked there at the time designed a survey so they convinced the Microsoft people to put something on the Xbox in the last months of the 2012 presidential election where people could vote and say who they wanted to vote for. So, every once in a while, you’d get a reminder saying would you like to participate in our poll? And then you’d give some demographics and say who you wanted to vote for. We had a huge sample size, hundreds of thousands of responses, very unrepresentative.
Andrew: An unusual survey because it overrepresented young man and most surveys over represent old women. But after adjustment, first we were able to estimate public opinion very well, in fact, we were able to estimate public opinion more stably than public poll aggregators. That’s the good news. The bad news is that we collected the data in 2012, we didn’t actually do that analysis until afterwards. So, in theory, it could have been done in real time, but actually, it was a research project and we published it later.
Andrew: So, we didn’t beat the polls while it was happening. Not only that, we actually learned something about political science and public opinion. As I said, our estimates were more stable and better than the polling aggregate estimates from the newspapers and online, and it turned out that about two thirds of the variation in the poll, the fluctuations like Romney’s doing well, or Obama’s doing well, these fluctuations, about two thirds of those fluctuations were actually attributable to differential nonresponse. So, when Romney has some good news, Republicans were more likely to answer the poll. Which makes sense, right? Do you want to participate in a poll? Well, if my candidate is a laughing stock, maybe not. If my candidate is doing great stuff, yes.
Andrew: So, there was this positive feedback mechanism that… negative feedback stabilizes, positive feedback amplifies fluctuations. So, a positive feedback mechanisms which is if a candidate is doing well, more of their supporters will respond to the poll, meaning they look like they’re doing even better. And so, you get these big fluctuations from week to week but then when you actually adjust for partisanship you find that the results were much more stable. We found that in 2016 also. You might say, well, maybe people’s partisanship was fluctuating too, but we have evidence that that’s not really happening. There’s various loose ends and the project that we tied up when we wrote our paper, all of that came from this collaboration with these people at Microsoft.
Hugo: I’m glad you mentioned 2016, because, as you stated earlier, the popular vote, the pollsters did pretty well on, within one percent, right? It was 52 instead of 51 was what the pollsters said. But, of course, in the electoral college vote, things were relatively different and I think something you’ve written about is that’s potentially due to the fact that in several key states, people who voted for Trump didn’t necessarily respond in the polls. Is that right or do I misremember?
Andrew: There’s so much nonresponse, the problem was more with state polls than national polls. That is, there were some people, after the election, some pollsters, Gary Langer and some of his colleagues wrote a paper where they analyzed their national polls by state and they actually found that the state level analyses of the national polls were not that far off. But, some of the state polls in Michigan and other states, didn’t do a good enough job of adjusting for non response, so it seems. There was a lot going on but part of it is that the nonresponse adjustments weren’t really as complete. It’s an issue. Survey response rates keep going down and so the raw survey data, or even the weakly adjusted survey data, are not always enough.
Hugo: In the same Slate article that I mentioned earlier, you also wrote, “Instead of zeroing in on elections, we should think of polling and public opinion as more of a continuous process for understanding policy.” I find this very attractive and I’m just wondering if you can elucidate this and tell me what you mean by this?
Andrew: Well, this came about, I think it’s particularly clear in the Obama administration that there were various issues like the stimulus plan, the health care plan, where public opinion seemed to be very important. There’s a lot of both sides rallying public opinion in order to sway certain swing votes in Congress. It’s less so right now. Right now, it’s like okay Republicans control the House, the Senate, the Presidency, and the Supreme Court, and so, it’s sort of up to them what to do. Public opinion doesn’t seem to be directly influencing things. They seem to be willing to do various unpopular things to make use of the majority that they have.
Andrew: But, most of the time, politics is at the legislative level it’s a bit more transactional. There are swing voters, certainly if one party controls the house and one party control the senate then you get more power to various swing voters. At that point, public opinion can make a difference. So, it’s not just about who you’re gonna vote for, it’s how people are gonna vote once they’re in office. So, pollsters are gonna be interested in public opinion throughout the process because it’s not really just about who you plan to vote for but also what are your views on various issues, whether it’s foreign policy, or health care, or immigration, or trade, or whatever.
How does party allegiance play a role?
Hugo: And how does party allegiance play a role in that, though?
Andrew: Oh, party allegiance is very important and there’s a lot of evidence that voters will switch positions based on what their party says. If you look at things like support for wars, there’s been big jumps based on the party in power. If you’re a Democrat then you support the same policy that you wouldn’t support if a Republican was doing, or vice versa. Or how things are labeled. It’s like as the economists say, it’s exogenist. The pollsters are measuring opinion, but at the same time, politicians are trying to use that opinion.
Andrew: My colleague, Bob Shapiro in the political science department, he and a colleague wrote a book called Politicians Don’t Pander, which was based on his study of various political fights, not elections, but legislative fights. He argued that politicians think of public opinion as a tool that… there’s a naïve view of politicians wanting to do what the public wants but it’s actually politicians are often quite confident and they feel that they can sway the voters and they think of public opinion as something that they can manipulate. So, both sides are doing it. To the extent that individual congress members and senators are involved, you also need to know about local public opinion, not just national.
What is the future of polling?
Hugo: So, Andy, what does the future of polling look like to you?
Andrew: I don’t have a great sense of what the future will be, if you look traditionally, you’d say lower response rates is the future, paying people to participate, online panels. I think, maybe, in general, we should think of the people who respond to surveys as being more participants, the same way as medical statistics. Instead of thinking that we’re measuring people and estimating the effect of the drug and people are just counters that are being moved around, we should actually think of patients as participating in studies and really being involved. Not just because you want to get more compliance, but also because people have a lot of private knowledge that they can share as well as people should be more motivated to help out if they have more connection.
Andrew: So, to me, the future would be something a bit more collaborative. In the other direction, there’s just gonna be a lot of passive measurement, things like Amazon measuring your clicks. That’s like polling also. So, that’s from the opposite direction. So, either, if it’s intrusive I think people should be more involved or it’s just not intrusive at all.
Favorite Data Science Technique
Hugo: So, Andrew, my final question for you is what’s one of your favorite data science and statistical techniques or methodologies?
Andrew: My favorite thing is something I’ve never done and I read about. This was maybe 10 years ago, supposedly, someone built a machine that you could stick in someone’s office and then, if they’re typing, 10 minutes later it could be a keylogger. Supposedly, how it works, there’s about a hundred keys on your keyboard, so it listens to the sounds and uses some classification algorithm to classify the sounds of the keys into a hundred clusters, and then, having done that, it then uses simple code breaking techniques to estimate which is the space bar, which is the carriage return, which is the letter E, and so forth. Of course, it doesn’t have to be perfect, you can use statistical tools and then it can figure out what you’re typing.
Andrew: So, I’ve always wanted to build that. Now, that kind of thing I don’t know how to build, it also involves having a microphone and doing sound analysis. I just think it would be really cool. These things are pretty Bayesian, you’re using a lot of prior information on especially the second step, the code breaking step. Of course, Alan Turing used Bayesian methods to crack the enigma code in World War II. That’s my favorite example even though I’ve never seen it. I just think it’s the coolest. It’s not something that I could really do, though.
Andrew: If you want to talk about stuff that I could do, then my favorite technique would be multilevel regression and post-stratification because that’s what we use to estimate state level public opinion. It’s how we did red state, blue state, and estimated how opinions vary by income in different parts of the country. It allows us to do our best to adjust between differences between sample and population. We could do it in Stan. So, I’ll push that.
Hugo: Great. And so, multilevel regression and post-stratification, we’ll include some links in the show notes. It’s also known as MRP or Mr. P, correct?
Andrew: Exactly. And recently I’ve started calling it regularized prediction and post-stratification because, strictly speaking, it’s modular. So, the first part is you fit a model to do the adjustment, the second part is having done that, you make inferences for the population, which is called post-stratification. So, multilevel regression is one way of doing the model, but generally, you could use the term regularized prediction which includes all sorts of other methods that are out there.
Hugo: It’s been such a pleasure having you on the show.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.