**Econometrics Beat: Dave Giles' Blog**, and kindly contributed to R-bloggers)

One question that always comes up when students are first being introduced to such tables is:

“Do I just interpolate linearly between the nearest entries on either side of the desired value?”

The correct answer to the question is: “No – not in general”. To which we should add that the correct way to deal with this situation depends on:

- Which distribution we’re dealing with.
- Whether we are trying to retrieve a “missing”
*quantile*for the distribution, or trying to determine a tail area (*probability*) associated with a given quantile.

With regard to the first of these calculations, suppose that we have access to this table of percentiles for the Chi-square distribution. Using

*this table alone*, the best that we could do by way of reporting the 70th.percentile when the degrees of freedom are 67, would be to say that it lies somewhere between 46.459 and 85.527. Let’s face it – that’s not particularly helpful!

A similar situation arises if we want to compute the p-value for the F-distribution mentioned above, and we have access only to the tables that we usually find in text books.

Of course, if we have access to our favourite econometric/statistical package, it’s a simple matter to compute the desired value. To get the exact answers with EViews, we’d use the command:

** scalar q70 = @qchisq(0.70, 67)** and the answer would be 72.554.

Similarly, the p-value referred to above could be computed exactly by using the command:

**scalar pval = 1 – @cfdist(0.65,17,33)**

and the answer would be 0.826.

In R, we’d use the commands:

**qchisq(0.70, df=67)**and

**pf(0.65, 17, 33, lower.tail=FALSE)**

to produce exactly the same results.

But what about the answer to your student’s question about interpolation? Suppose you’re stuck with the statistical table, and you don’t have access to your econometrics or statistical package. What do you do?

We could look at various common distributions here. However, by way of illustration, let’s take just the

**Student-t Distribution**.

**We’ll consider appropriate ways to compute (interpolate) a**

*quantile*(

*percentile*), first for a non-tabulated tail area probability; and secondly, for a non-tabulated degrees of freedom.

1.

*Interpolating Between Tail Areas*

Suppose that we have v = 20 degrees of freedom and we want to find the quantile (percentile) that will give us an area of 0.075 under the

*right*tail of the Student-t density function.

Note that 0.075 is (mid-way) between 0.05 and 0.10, so any standard table (

*e.g*., here) will tell us that the quantile that we want is between 1.325 and 1.725. The (

*arithmetic*) average of these two percentiles is (1.325 + 1.725) /2 = 1.525.

What we’re actually doing is linearly (straight-line) interpolating between the two original values:

1.525 = 1.325 + (1.725 – 1.325)[(0.075 – 0.1) / (0.05 – 0.1)]

However, using either the EViews command,

**scalar qval = @qtdist(0.925,20)**or the

**R command,**

**qt(0.925, df=20)**, we get the answer,

**1.497036. This**

*isn’t*the result that we got by linearly interpolating between the percentiles on either side!

(Recall that we wanted a

*right-tail*area of 0.075, which implies a

*left-tail*area of 0.925.)

Why isn’t the linear interpolation working? Well, intuitively, it’s a consequence of the curvature of the density function (or the distribution function).

So, how can we deal with this?

Hoaglin

*et al.*(1991) show that we can get a very good approximation to the correct value for the quantile (1.497036) by interpolating using the (base 10) logarithms of the tail areas to construct the interpolating weights.

Noting that log

_{10}(0.075) = -1.1249387366083 ; log

_{10}(0.1) = -1.0 ; and log

_{10}(0.05) = -1.301029995663981, the appropriate calculation for the desired quantile becomes:

q = 1.325 + (1.725 – 1.325)[(-1.1249387366083 – (-1)) / (-1.301029995663981 – (-1))]

= 1.49104

That’s more like it, even though it isn’t perfect!

2. *Interpolating Between Degrees of Freedom*Now let’s consider a different situation – one where the table that we’re using for the t-distribution includes the (right-hand) tail area that we want – say, 5%. However, our particular degrees of freedom (v) are nowhere to be found in the table.

Actually, this isn’t likely to be as much of a problem as the situation we’ve just considered above. Most t-tables cover every degree of freedom, for small-to-moderate values of v; and when v is large the standard-Normal approximation will usually suffice.

However, let’s suppose that we want an accurate answer, and by way of an example, consider a 5% (right) tail area, and 53 degrees of freedom.

In this case we can use an

*harmonic interpolation*, rather than a linear one. This is like switching from an arithmetic mean to an harmonic mean, so we use the the values of (1 / v) to construct the interpolation weights.

(Isn’t it nice to come across a situation where that harmonic mean that you learned about actually gets used! Another situation arises with certain index numbers.)

Let’s go back to our example. From any Student-t table you can find that the quantiles that determine 5% in the

*right tail*of the density are 1.671 and 1.684 for v = 60 and v = 40, respectively. Here’s what happens if you just use a naive linear interpolation to get the quantile for v = 53:

q = 1.671 + (1.684 – 1.671)[(53 – 60) / (40 – 60)] = 1.6755

Using the EViews command

**scalar q = @qtdist(0.95 , 53),**or the R code

**qt(0.95, df = 53**), you can verify that the exact answer is 1.674116.

Applying the harmonic interpolation, we get:

q = 1.671 + (1.684 – 1.671)[((1/53) – (1/60)) / ((1/40) – (1/60))] = 1.67443.

This is certainly

*a little*more accurate than the result obtained above by linear interpolation. However, the gain in accuracy will vary, depending on the significance level and degrees of freedom that we’re considering.

When we’re working with the Chi-Square, F, or other distributions, similar methods are available. However, the details differ a little from those used for the Student-t distribution. The references given below will give you some guidance.

In a sense, the take-away message here is very simple – don’t use “straight-line” interpolation when the function is curved!

**References**

Hoaglin, D. C., F. Mosteller, and J. W. Tukey, 1991.

*Fundamentals of Exploratory Analysis of Variance*. Wiley, New York.

Salton, G

**.**, 1959. The use of the central limit theorem for interpolating in tables of probability distribution functions.

*Mathematical Tables and Other Aids to Computation*, 13, 213-216.

Zinger, A., 1964. On interpolation in tables of the F-distribution.

*Journal of the Royal Society*,

*Series C*(

*Applied Statistics*), 13, 51-53.

**leave a comment**for the author, please follow the link and comment on their blog:

**Econometrics Beat: Dave Giles' Blog**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...