Aggregating Measures of Uncertainty

[This article was first published on rstats on Bryan Shalloway's Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

There are many situations where you want to aggregate values, however if those values are on different scales or are related to measures of uncertainty, it’s typically more complicated than simply taking a simple mean or sum.

As an example, say your firm sells a variety of different products. Each product has a different price and contracts are negotiated on a deal-by-deal basis so the final price has some level of variability that may differ across products. Sales teams are responsible for securing the final price for any sale but each team has a different product mix. You want a measure for evaluating sales teams to identify which are commanding higher or lower prices relative to their peers. Reliably characterizing a team’s performance as ‘good’ or ‘bad’ requires both an understanding of the value and variability of their specific product mix. You can’t evaluate their sales numbers in isolation (e.g. if their product mix is composed of items with high variability in price, what at first glance looks like good or bad performance may have more to do with luck / a greater level of variability in outcomes). What you want is to be able to condense all of that information into some measure of how the sales team did based on their particular product mix that takes into account both the expected sales and also the amount of variability in their particular portfolio.

Another common example may be in forecasting. Perhaps you are producing forecasts for products at a county, state, and national level. In addition to point estimates though you are producing ranges for these forecasts. Aggregating lower and upper bounds from the lowest level forecasts up to higher levels requires more than just taking a sum1.

In this post I’ll briefly introduce a few of the general types of approaches an analyst may take when faced with the problem of aggregating measures that in some way rely on or reflect a measure of uncertainty or are on different scales (in future posts I may delve into the details of each approach in more detail). (These “type” distinctions are overlapping and more reflect distinctions I found convenient for articulation rather than concrete separations.)

Analytic approach

If each of the parts you want to aggregate follow a well-defined parametric distribution (or close), you may be able to aggregate the measures of uncertainty analytically. Say you have average sale prices across four separate products. If the sale price of each product follows a normal distribution, there are well established methods for figuring out what the variance is for the average sale price across products and you can use these measures to determine an appropriate bounded range for forecasts of aggregated sales. If using a statistical forecasting approach within the fable package (e.g. ARIMA), a distribution object is saved. In hierarchical forecasting tasks, these distribution objects can be used when producing prediction intervals on aggregated forecasts at higher levels of the hierarchy.

Transformation to a common scale

This approach is similar to what I called the “analytic approach” but it involves doing some kind of transformation on a measure to put it onto some common scale at which point an aggregation can be done appropriately. Related to the “mistakes of aggregating uncertainty” is the “mistake of applying linear aggregations on values that do not follow a linear scale.” A common mistake I’ve seen is analysts taking the average of correlation coefficients. A few years ago I wrote a toy package piececor for investigating piecewise correlations in a tidyverse friendly way. In this case, the default method for getting aggregated measures across correlations was to use a Fisher Transformation, in other cases getting the data onto some other common scale may be appropriate2.

Simulate it

Often your data does not follow a parametric distribution or, even if it does, the math required in generating the joint distribution is incredibly complicated. In these cases you can might take repeated samples of the underlying data and generate measures of uncertainty by doing simulation. The approach you take is defined by your problem and you need to ensure that the procedure of your simulation mirrors the type of uncertainty measure you are trying to estimate.

Re-do the measure at each level

You may not care that the uncertainty measures of the components are consistent with the uncertainty measures of the aggregated whole. In these cases, you can simply estimate the values separately. In 2020, the M5 forecasting competition required participants to provide forecasts for each level of Walmart’s sales (across geographic levels and also across products). In addition to hosting a competition on the accuracy of forecasts, Kaggle also featured a competition on the quality of prediction intervals. Participants were evaluated based on the quality of prediction intervals across each level. If you look into the notebooks of some of the top performing participants, many of the participants did not worry about ensuring that the prediction intervals provided at each level of Walmart’s hierarchy were consistent with one another. They simply used various methods for identifying the typical quantiles at each level and used this investigation to create ranges independently. This type of approach may be combined with simulation based approaches where you simulate forecasts at each level and then use the distribution of the simulated errors at each level to provide a measure of uncertainty at each aggregation level.


Aggregating measures that encompass or reflect uncertainty requires consideration of the underlying distributions, and the context of the data. In future posts I may provide additional detail of common examples and approaches from each “type” of approach outlined here.

  1. or going from higher level bounds to lower levels is more than just dividing the minimum bound based on the proportion at the lower levels.↩︎

  2. Again, these approaches are similar to the “analytic” based approaches in that they typically come with various distributional assumptions.↩︎

To leave a comment for the author, please follow the link and comment on their blog: rstats on Bryan Shalloway's Blog. offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)