Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I’m reviewing Bayes’ theorem and related topics for the upcoming GDAT class. In its simplest form, Bayes’ theorem is statement about conditional probabilities. The probability of A, given that B has occurred, is expressed as: \begin{equation} \Pr(A|B) = \dfrac{\Pr(B|A)\times\Pr(A)}{\Pr(B)} \label{eqn:bayes} \end{equation} In Bayesian language, $\Pr(A|B)$ is called the posterior probability, $\Pr(A)$ the prior probability, and $\Pr(B|A)$ the likelihood. Bayes’ original conception had to do with capturing the notion that our certainty about the state of the world is altered by the arrival of new information (data). Exactly how how eqn. \eqref{eqn:bayes} should be interpreted has been the subject of vigorous historical debate (which I won’t go into here). Part of the problem was that the early notions of probability arose out of trying to formalize scenarios where the outcome was not known in advance; like games of chance and betting. The basic concept there is odds, not probability. This is still true today. Bookmakers calculate odds based on what they believe about certain information they are privy to. Consequently, Bayes used the word belief in his essay, rather than the word probability, and many critics found the Bayesian approach too subjective for determining scientific truth. Today, however, Bayesian inference is a hot topic and a central part of machine learning.

One of the things that had always intrigued me was something I read about the famous French mathematician Pierre Simon Laplace using Bayes’ theorem to estimate the mass of the planet Saturn in the early nineteenth century and that his method turned out to be remarkably accurate. As a matter of historical fact, Laplace (the Frenchman), working on mathematical theories of probability during the period 1812 to 1816, was entirely unaware of the prior work of Bayes (the Englishman) published posthumously in 1763. Rather, Laplace discovered a more generalized version of Bayes’ theorem on his own. \begin{equation} \Pr(M~|~D,I) = \dfrac{\Pr(D~|~M,I)\times\Pr(M ~|~I)}{\Pr(D~|~I)} \label{eqn:lapbayes} \end{equation} where $M$ is the mass of the planet (in particular, the masses of Jupiter, Saturn and Uranus), $D$ is data from various measurements of planetary orbits, and $I$ is existing background information, e.g., Newton’s laws of celestial mechanics. You can see the similarity between eqns. \eqref{eqn:bayes} and \eqref{eqn:lapbayes}.

Although many authors refer to Laplace’s work, as well as remarking that his estimate of the mass of Saturn only differs from the modern value by about 0.5%, they never show how Laplace actually applied eqn. \eqref{eqn:lapbayes} to achieve his result. However, I just came across a translation of Laplace’s calculation and I am going to reproduce it here using R. I’ll try to stick with Laplace’s notation as much as possible and explain what it means in modern statistical terminology.

Laplace credits astronomer and former student Monsieur Alexis Bouvard with having already compiled measurements and calculations that facilitated his Bayesian approach. Out of some 129 equations for Saturn’s motion involving perturbations due to the tidal interaction between Jupiter and Saturn, as well as their moons (i.e., orbital wobbles) the following constants are distilled. \begin{equation} z^{\prime} = 0.00620 \label{eqn:eqnz} \end{equation} Laplace actually writes the decimal fractions as $0,00620$ using the European comma convention. I’ll use the decimal point notation for clarity. \begin{equation} \log P = 4.8856829 \label{eqn:logP} \end{equation} Here, it turns that log means logarithm to the base 10, not the natural log; as it does in R. This difference in naming screwed up my original calculations. In R notation, we can simply write the numerical value of P as \begin{equation} P = \exp(4.885683 * \log(10)) \label{eqn:eqnP} \end{equation} where log is now the natural logarithm.

In Laplace’s presentation, the mass of Saturn is then expressed as \begin{equation} \dfrac{1}{3534.08} (1 + z^{\prime}) \end{equation} Substituting the value in eqn. \eqref{eqn:eqnz} for $z^{\prime}$ produces: \begin{equation} \dfrac{1}{3534.08} (1 + 0.00620) = \dfrac{1}{3512.3} \end{equation} for the corrected mass of Saturn. This looks a bit strange because the currently accepted value is $568.36 \times 10^{24}$ kilogram, but Laplace is not using MKS units. Instead, he expresses the mass as a fraction of the mass of the Sun. In other words, the mass of Saturn is $1/3512.3$ solar masses. Then, he says:

“The probability that the error of $z^{\prime}$ is comprehended within the limits $\pm~U$ is, by $\S~1$, \begin{equation*} \dfrac{\sqrt{P}}{\sqrt{\pi}} \int ~du ~c^{-P ~u^2} \, , \end{equation*} the integral being taken from $u = -U$ to $u = U$.”
The use of the letter $c$ instead of $e$ in the integral is most likely a typo in translation, since Euler had already introduced the base $e = 2.718282$ in the early part of the eighteenth century and Laplace seems to have used it in other publications. I found that if I identify the quantity $P$ in eqn. \eqref{eqn:eqnP} (which he calls a weight) with the variance \begin{equation} P = \dfrac{1}{2 \sigma^2} \end{equation} then the above integral can be rewritten as \begin{equation} \Pr(M_{Sat}~|~D,I) = \dfrac{1}{\sigma \sqrt{2 \pi}} \int_{-U}^{U}~e^{ -\frac{x^2}{2\sigma^2} } ~dx \label{eqn:satmass} \end{equation} which is immediately recognizable as a normal distribution with zero mean and variance $\sigma^2$. Elsewhere in his presentation, Laplace makes the association with the posterior probability of eqn. \eqref{eqn:lapbayes}. He goes on to claim
“the probability that this mass is comprehended within the limits \begin{equation*} \dfrac{1}{3512.3} ~\pm ~\dfrac{1}{100}~\dfrac{1}{3534.08} \end{equation*} is equal to $\dfrac{11327}{11328}$.”
“Applying to them my formulae of probability I find that it is a bet of 11,000 against one that the error of this result is not 1/100 of its value or that which amounts to almost the same that after a century of new observations added to the preceding ones and examined in the same manner the new result will not differ by 1/100 from that of M. Bouvard.”
Note the use of bookmaking parlance.

It becomes easier to understand what Laplace is saying if we plot the function represented by eqn. \eqref{eqn:satmass}.

The normal probability density function is centered at zero and the relevant probability is given by the blue area under the curve between the limits $\pm U$, indicated by the vertical dashed lines. It resembles a plot of the error distribution in the mass calculation. The horizontal blue line is the full width at half max height (FWHM). The horizontal red line is twice the standard deviation ($\sigma$) corresponding to the reduced FWHM. (See Visualizing Variance) Here is the R code that computes the above plot.

# Created by NJG on Saturday, September 14, 2013
# SATURN constants
z    = 0.00620
P    = exp(4.885683 * log(10))
sd   = sqrt(1/(2*P))
pSat = 11327 / 11328

# Msat is expressed as solar mass fraction
Fnorm <- function(x) { sqrt(P/pi) * exp(-P * x^2) }
# Numerically integrate the normal dsn
U     <- 1/100
prob  <- integrate(Fnorm, -U, U)

# Compare the probabilities
print(prob)
print(pSat)

# Plot the posterior dsn
pts <- 1000
x <- seq(-U,U,length=pts)
plot(x, Fnorm(x), type="l",
xlab=expression(paste("Error in ", M[Sat])),
ylab=expression(paste("Posterior PDF for ", M[Sat]))
)
polygon(c(x,rev(x)), c(rep(0,pts),rev(Fnorm(x))), col=colors())
abline(v=0)
abline(h=0, v=c(-sd,sd),col="gray")
abline(v=c(-U,U),col="blue",lty="dashed")
abline(v=c(-4*sd,4*sd),col="red")
HM   <- sqrt(P/pi)/2
HW <- uniroot(function(x) Fnorm(x) - HM, c(0,1))$root sdHW <- 2 * HW / (2+1/3) sdHH <- 1.2 * HM lines(x=c(-HW, HW), y=c(HM,HM), col="blue", lwd=2) abline(h=HM,lty="dotted") lines(x=c(-sdHW, sdHW), y=c(sdHH, sdHH), col="red", lwd=2) arrows(x0=0,x1=sd,y0=50,y1=50,length = 0.10) text(x=sd/2, y=55, expression(sigma)) arrows(x0=0,x1=4*sd,y0=30,y1=30,length = 0.10) text(x=4*sd/2, y=35, expression(4*sigma))  One of the problems I ran into was determining the value for$U$. Laplace never states explicitly what value he uses. It turns out to be$U = 0.01$, i.e., the$1/100$or 1% that appears in the interval that contains the mass of Saturn. Calculating the definite integral with those limits produces \begin{equation*} \Pr(M_{Sat}~|~D,I) = 0.9999117 \end{equation*} which corresponds to the fraction$11327 ~/~ 11328$claimed by Laplace. Once again, using the language of gamblers, Laplace remarks: "Newton had found, by the observations of Pound on the greatest elongation of the fourth satellite of Saturn, the mass of this planet equal to$1/3012$, that which surpasses by a sixth the preceding result. There are odds of millions of billions against one that the one of Newton is in error." What does all this mean? •$U = \pm~0.01$corresponds to an interval slightly less than$\pm ~4 \sigma$. • It looks like a confidence interval, but it isn't in the modern sense because the number of measurements is essentially$n = 1$. • Similarly, we can't speak of$M_{Sat}$as a mean value. • The CI that corresponds to 99% (i.e.,$1 - 0.01$) is closer to$3 \sigma\$
• Supposedly, that's the point of the Bayesian approach. You can get away with a good guess for the prior.

Or did Laplace just get lucky?