Moving beyond hopeless graphics

July 2, 2012
By

(This article was first published on Statistical Modeling, Causal Inference, and Social Science » R, and kindly contributed to R-bloggers)

I was at a talk awhile ago where the speaker presented tables with 4, 5, 6, even 8 significant digits even though, as is usual, only the first or second digit of each number conveyed any useful information. A graph would be better, but even if you’re too lazy to make a plot, a bit of rounding would seem to be required.

I mentioned this to a colleague, who responded:

I don’t know how to stop this practice. Logic doesn’t work. Maybe ridicule? Best hope is the departure from field who do it. (Theories don’t die, but the people who follow those theories retire.)

Another possibility, I think, is helpful software defaults. If we can get to the people who write the software, maybe we could have some impact.

Once the software is written, however, it’s probably too late. I’m not far from the center of the R universe, but I don’t know if I’ll ever succeed in my goals of increasing the default number of histogram bars or reducing the default number of decimal places in regression results. In the latter case, I just went and wrote my own display() function. Now I have to get people to switch from summary() to display(), which is a big task but is perhaps easier than convincing whoever is in charge of R to change the defaults.

P.S. Neal Beck points us to this (turn to his article beginning on page 4):

Numbers in the text of articles and in tables should be reported with no more precision than they are measured and are substantively meaningful. In general, the number of places to the right of the decimal point for a measure should be one more than the number of zeros to the right of the decimal point on the standard error of this measure.

Variables in tables should be rescaled so the entire table (or portion of the table) has a uniform number of digits reported. A table should not have regressions coefficients reported at, say, 77000 in one line and .000046 in another. By appropriate rescaling (e.g., from thousands to millions of dollars, or population in millions per square mile to population in thousands per square mile), it should be possible to provide regression coefficients that are easily comprehensible numbers. The table should clearly note the rescaled units. Rescaled units should be intuitively meaningful, so that, for example, dollar figures would be reported in thousands or millions of dollars. The rescaling of variables should aid, not impede, the clarity of a table.

In most cases, the uncertainty of numerical estimates is better conveyed by confidence intervals or standard errors (or complete likelihood functions or posterior distributions), rather than by hypothesis tests and p-values. However, for those authors who wish to report “statistical significance,” statistics with probability levels of less than .001, .01, and .05 may be flagged with 3, 2, and 1 asterisks, respectively, with notes that they are significant at the given levels. Exact probability values may always be given. Political Analysis follows the conventional usage that the unmodified term “significant” implies statistical significance at the 5% level. Authors should not depart from this convention without good reason and without clearly indicating to readers the departure from convention.

All articles should strive for maximal clarity. Choices about figures, tables, and mathematics should be made so as to increase clarity. In the end all decisions about clarity must be made by the author (with some help from referees and editors).

The post Moving beyond hopeless graphics appeared first on Statistical Modeling, Causal Inference, and Social Science.

To leave a comment for the author, please follow the link and comment on his blog: Statistical Modeling, Causal Inference, and Social Science » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.