**StaTEAstics.**, and kindly contributed to R-bloggers)

The *sum* function in R is a special one in contrast to other summary statistics functions such as *mean* and *median*. The first distinguish is that it is a **Primitive** function where the others are not (Although you can call *mean* using *.Internal*). This causes many inconsistency and unexpected behaviours.

__(1) Inconsistency in argument__For example, the arguments are inconsistent. Both

*mean*and

*median*takes the argument x, while the

*sum*operates on whatever argument that is not matched. This can be a problem in the case when you want to write a function which switches between all the summary functions such as:

do.call(myFUN, list(x = x))

Where **myFun** can be any statistical summary function. The problem first arises when I wanted to write a function which encompasses several different summary statistics and so I can switch between them when required. The main problem arises when I have to pass additional arguments such as the “weight” in the *weighted.mean* function. I wrote the following call and naively hope it would work

do.call(myFUN, list(x = x, w = w))

What turns out is that this line of code works find for all the summary statistics except the *sum* function where the “weight” is also summed. So my current solution is just to use the *switch* function which is not my favourite function.

__(2) Inconsistency in output__Another inconsistency arises in how the NA’s are treated. In the

*mean*,

*median*and

*weighted.mean*summaries; if all the observations are NA then either NA or NaN are returned.

mean(rep(NA, 10), na.rm = TRUE)

median(rep(NA, 10), na.rm = TRUE)

While the sum function returns zero. It puzzles me how you get zero when NA stands for not available and this is like creating something out of nothing. This is a problem for me since if I want to sum up multiple time series with missing values, I want the function to remove NA and compute where there are partial data while returning NA instead of zero when there are no data at all.

Nevertheless, a simple solution exists and thanks to the active R community. This post on R help addresses this problem and solve in an elegant manner.

sum(x, na.rm = any(!is.na(x)))

“The computations and the software for data analysis should be trustworthy” – John Chamber, Software for Data Analysis

I am not sure about the reasoning underlay the behaviour of sum, but it should be consistent so people can trust it and use it as what they expect.

**leave a comment**for the author, please follow the link and comment on their blog:

**StaTEAstics.**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...