# R summary() got better!

June 4, 2017
By

(This article was first published on R – Win-Vector Blog, and kindly contributed to R-bloggers)

Here is a really nice feature found in the current 3.4.0 version of R: summary() has become a lot more reasonable.

```summary(15555)

#    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
#   15555   15555   15555   15555   15555   15555
```

In older versions of R (say R 3.3.1) the above code gave the following undesirable result:

```summary(15555)

#    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
#   15560   15560   15560   15560   15560   15560
```

This was always very confusing and hard to explain to beginners. To justify this you had to explain that “R, by default, calculates the summary rounded to 4 significant digits, and is simultaneously configured to give absolutely no indication has to how many significant digits are in fact being displayed.” To add insult to injury `summary()` picked a different number of sigfigs than the default numeric presentation. One could type “median(15555)” and get the expected presentation “`15555`“.

Frankly people do not expect significant digits to be 4 when viewing what appears to be an integer presented directly from software. They either expect display significance to be much lower such as “Earth has about `7,500,000,000` people” (2 sigfig) or higher as “Daniel Burnham’s New York flatiron building has zip code `10010`” (5 sigfig, and not the same as `10012`). In my opinion it is a bit of crime to aggressively round numbers in an analysis (not presentation) system prior to moving into scientific notation (which can, in principle, signal the number of significant figures through the use of trailing zeros).

I take “`1.556e+4`” as an acceptable textual approximation of `15555` and “`15560`” as unacceptable.

To make matters much worse, at the time R was storing rounded numbers in the summary! It wasn’t storing the presentation string “`15560`” but the floating point or numeric value `15560.0`. This very much confused representation and presentation and made pulling the median off a summary needlessly different than calling `median()`.

Now thanks to Martin Maechler and the R core team: `summary()` now stores much more reasonable numbers and separates representation from presentation:

```summary(1555555555)
#      Min.   1st Qu.    Median      Mean   3rd Qu.      Max.
# 1.556e+09 1.556e+09 1.556e+09 1.556e+09 1.556e+09 1.556e+09

format(summary(1555555555), digits=12)
#         Min.      1st Qu.       Median         Mean      3rd Qu.         Max.
# "1555555555" "1555555555" "1555555555" "1555555555" "1555555555" "1555555555"
```

One of the motivations for the fix (which obviously will change some results) was [loc. sit.]:

The benefit for maintainers and old timers like me will be that we will not need to answer this (non-official) FAQ nor excuse a peculiar behavior in the future …..

The idea is: it is simpler to fix things than to forever explain/defend peculiar behavior. At some point software must adapt to its domain and users, and not always expect the users to retrain an arbitrary number of distinctions and caveats.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...