Bayesian machine learning techniques allow us to obtain a posterior density for individual predictions instead of just the mean. This additional information allows us to understand and explore the uncertainty involved. However not all uncertainties are the same; one could be certain the mean of a predicted value is a certain value but there could be high variance or one can not be certain of a particular prediction because the training set didn’t include values like the predicted input. This post will explore these two types of uncertainty and see if:
- these methods can calculate the variance of the dataset itself and
- how this compares with uncertainty of a value it hasn’t seen before (low support regions in data)
I generated a dataset with the variance of \(y\) as a function of \(x\) but with zero mean as follows.
\[ x = sequence(-10,10,.1)\] \[ y = normal(0, sin(x) +2) \]
I will use bartMachine to model \(y=f(x)\) and compute the standard deviations of predictions. BartMachine is a package that creates Bayesian Additive Regression Trees and offer the option to extract the posterior distribution of predictions.
To determine 1) I will look at the standard deviation of predictions within the interval [-10,10] and compare with actual standard deviation. For 2) I will look at standard deviation of predictions below -10 and above 10 to see how it handles uncertainty with respect to ‘unseen’ observations.
## bartMachine initializing with 50 trees... ## bartMachine vars checked... ## bartMachine java init... ## bartMachine factors created... ## bartMachine before preprocess... ## bartMachine after preprocess... 2 total features... ## bartMachine sigsq estimated... ## bartMachine training data finalized... ## Now building bartMachine for regression ... ## evaluating in sample data...done
Above we can see the predicted standard deviation of bartMachine compared actual standard deviation from the data generating process. While the predicted standard deviation is much lower it does appear to follow the general pattern. Looking at a scatterplot of predicted vs actual standard deviation shows a general positive relationship below.
Also, the observations outside the sequence [-10,10] get a high variance compared with most other observations. Curiously the predicted standard deviation above 10 are larger than -10. I’m assuming this is just noise.
Bayesian ML techniques such as BART appear to be able to capture the variances in a dataset. While the values of predicted standard deviation are not calibrated they can be used as guidance to determine heteroskedastic nature of dataset.
In addition, those values outside the range of original training set have higher predicted variances. However, I’m not sure as to how to differentiate between high variance of the data generation process itself and low support of new dataset.