Statisticians have long known that the use of p-values has major problems. Some of us have long called for reform, weaning the profession away from these troubling beasts. At one point, I was pleased to see Frank Harrell suggest that R should stop computing them.
That is not going to happen, but last year the ASA shocked many people by producing a manifesto stating that p-values are often wrongly used, and in any case used too much. Though the ASA statement did not go so far as to recommend not using this methodology at all, to me it came pretty close. Most significant to me (pardon the pun) was the fact that, though the ASA report said p-values are appropriate in some situations, they did not state any.examples. I wrote blog posts on this topic here, here and here. I noted too that the ASA report had even made news in Bloomberg Businessweek.
But not much seems to have changed in the professions since then, as was shown rather dramatically last Saturday. The occasion was iidata 2017, a student-run conference on data science at UC Davis. This was the second year the conference has been held, very professionally managed and fun to attend. There were data analysis competitions, talks by people from industry, and several workshops on R, including one on parallel computing in R, by UCD Stat PhD student Clark Fitzgerald.
Now, here is how this connects to the topic of p-values. I was a judge on one of the data analysis competitions, and my fellow judges and I were pretty shocked by the first team to present. The team members were clearly bright students, and they gave a very polished, professional talk. Indeed, we awarded them First Prize. However…
The team presented the p-values for various tests, not mentioning any problems regarding the large sample size, 20,000. During the judges’ question period, we asked them to comment on the effect of sample size, but they still missed the point, saying that n = 20,000 is enough to ensure that each cell in their chi-squared test would have enough observations! After quite a lot of prodding, one of them finally said there may be an issue of practical significance vs. statistical significance.
Once again, we cannot blame these bright, energetic students. They were simply doing what they had been taught to do — or, I should say, NOT doing what they had NOT been taught to do, which is to view p-values with care, especially with large n. The blame should instead be placed on the statistics faculty who taught them. The dangers of p-values should have been constantly drilled into them in their coursework, to the point at which a dataset with n = 20,000 should have been a red flag to them.
On this point, I’d like to call readers’ attention to the ASA Symposium on Statistical Inference, to be held in Bethesda, MD on October 11-13, 2017. Under the leadership of ASA Executive Director Ron Wasserstein, we are in the process of putting together what promises to be a highly stimulating and relevant program, with prominent speakers, and most important, lots of interactive participation among attendees. Hopefully much of the discussion will address ways to get serious coverage of the issue into statistics curricula, rather than the present situation, in which stat instructors briefly make a one-time comment to students about practical significance vs. statistical significance.