# Simulating p curves and detecting dodgy stats

[This article was first published on

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Psych your mind has an interesting blog post on using **Psychological Statistics**, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

*p*curves to detect dodgy stats in a a volume of published work (e.g., for a researcher or journal). The idea apparently comes from Uri Simonsohn (one of the authors of a recent paper on dodgy stats). The author (Michael W. Kraus) bravely plotted and published his own

*p*curve – which looks reasonably ‘healthy’. However, he makes an interesting point – which is that we don’t know how useful these curves are in practice – which depends among other things on the variability inherent in the profile of

*p*values.

I quickly threw together a simulation to address this in R. It is pretty limited (as I don’t have much time right now), but potentially interesting. It simulates independent

*t*test

*p*values where the samples are drawn from independent, normal distributions with equal variances but different means (and

*n*= 25 per group). The population standardized effect size is fixed at

*d*= 0.5 (as psychology research generally reports median effect sizes around this value). Fixing the parameters is unrealistic, but is perhaps OK for a quick simulation.

I ran this several times and plotted

*p*curves (really just histograms with bins collecting

*p*values at relevant intervals). First I plotted for an early career researcher with just a few publications reporting 50

*p*values. I then repeated for more experienced researchers with

*n*= 100 or

*n*= 500 published

*p*values.

Here are the 15 random plots for 50

*p*values:

At least one of the plots has a suspicious spike between

*p*= .04 and .05 (exactly where dodgy practices would tend to push the*p*values).What about 100

*p*values?Here the plots are still variable (but closer to the theoretical ideal plotted on Kraus’ blog).

You can see this pattern even more clearly with 500 p values:

Some quick conclusions … The method is too unreliable for use with early career researchers. You need a few hundred

*p*values to be pretty confidence of a nice flat pattern between*p*= .01 and*p*= .06. Varying the effect size and other parameters might well inject further noise (as would adding in null effects which have a uniform distribution of*p*values and are thus probably rather noisy).I’m also skeptical that this is useful for detecting fraud (as presumably deliberate fraud will tend to go for ‘impressive’

*p*values such as*p*< .0001). Also (going forward) fraudsters will be able to generate results to circumvent tools such as*p*curves (if they are known to be in use).To

**leave a comment**for the author, please follow the link and comment on their blog:**Psychological Statistics**.R-bloggers.com offers

**daily e-mail updates**about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.