Worrying About my Cholesterol Level

[This article was first published on Econometrics Beat: Dave Giles' Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The headline, “Don’t Get Wrong Idea About Cholesterol”, caught my attention in the 3 May, 2015 Times-Colonist newspaper here in Victoria, B.C.. In fact the article came from a syndicated column, published about a week earlier. No matter – it’s always a good time for me to worry about my cholesterol!

The piece was written by a certain Dr. Gifford-Jones (AKA Dr. Ken Walker).

Here’s part of what he had to say:

“Years ago, Dr. John Judkin, formerly emeritus professor of physiology at the University of London, was ridiculed after after he reported that a high dietary intake of animal fat and the eating of foods containing cholesterol were not the cause of coronary heart disease. 
But Judkin pointed to a greater correlation between the intake of sucrose (ordinary sugar) and coronary attack. For instance a study in 15 countries showed that as the population consumed more sugar, there was a dramatic increase in heart attacks. 
More impressive is a prison study by Milton Winitz, a U.S. biochemist, in 1964. Eighteen prisoners, kept behind bars for six months, were given food that was regulated. In this controlled  environment, it was proven that when the prisoner diet was high in sugar, blood cholesterol increased and when dietary sugar was decreased there was a huge drop in blood cholesterol.”
I’ve got nothing against the good doctor, but you’ll notice that I’ve highlighted a few key words in the material quoted above. I’m sure I don’t need to explain why!

What he’s referring to is research reported by Winitz and his colleagues in the 1964 paper, “The effect of dietary carbohydrate on serum cholesterol levels” (Archives of Biochemistry and Biophysics, 108, 576-579). Interestingly, the findings outlined in that paper were a by-product of the main research that was being undertaken with NASA sponsorship – research into the development of diets for astronauts!

In his famous book, How to Live Longer and Feel Better, the Nobel laureate Linus Pauling refers to this study by Winitz et al.:
“These investigators studied 18 subjects, who were kept in a locked institution, without access to other food, during the whole period of study (about 6 months). 

After a preliminary period with ordinary food, they were placed on a chemically well-defined small molecule diet (seventeen amino acids, a little fat, vitamins, essential minerals, and glucose as the only carbohydrate).

The only significant physiological change that was found was in the concentration of cholesterol in the blood serum, which decreased rapidly for each of the 18 subjects.

The average concentration in the initial period, on ordinary food, was 227 milligrams per deciliter. After two weeks on the glucose diet it had dropped to 173, and after another two weeks it was 160.

The diet was then changed by replacing one quarter of the glucose with sucrose, with all the other dietary constituents the same. Within one week the average cholesterol concentration had risen from 160 to 178, and after two more weeks to 208.

The sucrose was then replaced by glucose. Within one week the average cholesterol concentration had dropped to 175, and it continued dropping , leveling off at 150, 77 less than the initial value.” (p.42)

Does any of this constitute proof? Of course not!

But let’s take a look at the actual data and undertake our own statistical analysis of this very meagre set of information. From Winitz et al., p.577, we have:

(The data are available in a text file on the data page for this blog. As you can see, the sample size is extremely small – only 18 people were “treated”.)

The authors summarize their key findings as follows (p. 578):
“On the basis of a statistical analysis of the mean values of the serum cholesterol levels at the end of the 4th, 7th, 8th, and 19th weeks (see Table I), the following conclusions were drawn (95% confidence level): (a) each of the two progressive decreases in serum cholesterol level with the diet containing glucose as the sole sugar is statistically significant, and (b) the progressive increase in serum cholesterol level upon partial substitution of the glucose with sucrose is also statistically significant.”  
Interestingly, exactly what they mean by, “a statistical analysis”, is not explained!

(Don’t try getting away with that lack of specificity, kids!)

So, the claim is that there’s a significant difference between the “before treatment” and “after treatment” results. I must admit to being somewhat skeptical about this, given the tiny number of candidates being treated. However, let’s keep an open mind.

Crucially, we’re not told by the researchers what statistical tests were actually performed!

However, given that we have a group of people with “before treatment” and “after treatment” scenarios, paired t-tests for the equality of the means provide a natural way to proceed. (e.g., see here.) This requires that we have simple random sampling, and that the population is Normal. More specifically, the differences between the “before” and “after” data have to be normal.

Just for funzies, let’s use a variety of resources in our statistical analysis

My EViews workfile can be found on the code page for this blog. Take a look at the “README” text object in that file for some details of what I did.

Some basic Q-Q plots support the normality assumption for the data differences. Here’s just one typical example – it’s for the difference (“D67”) between the “Week 7” and “Week 6” data in Table 1, above:
Moreover, the Anderson-Darling test for normality (with a sample size adjustment) produces p-values ranging from 0.219 to 0.925. This is very strong support for the normality assumption, especially given the known desirable power properties of this test.

The assumption of random sampling is somewhat more problematic. Were the 18 inmates selected randomly from the population at large? I don’t think so! There’s nothing we can about this, except to be cautious if we try to extrapolate the results of the study to more general population. Which is, of course, what Dr. Gifford-Jones and others are doing.

Now, what about the results for the paired t-tests themselves?

Here they are, with p-values in parentheses. The naming of the t-statistics and their p-values follows the column headings in Table 1, above, with “1½” and “2½” abbreviated to “1” and “2” respectively. For example, “t02” and “p02” refer to the test between the “0 Weeks” and “2 Weeks” data. Similarly, “t819” and “p819” refer to the test between the “8 Weeks” and “19 Weeks” data, etc.

Table 2
                                   Phase I                                         Phase II            Phase III
t01        t02        t04        t12        t14        t24        t56        t57        t67        t819    

-7.92     -7.33     -9.82    -1.10    -2.88      – 2.21    4.92     7.54       3.71      -4.12
(0.00)    (0.00)    (0.00)   (0.15)   (0.01)    (0.02)   ( 0.00)  (0.00)    (0.00)     (0.00)

Take a look back at footnote “b” in Table 1 above. You’ll see that negative t-statistics are expected everywhere in Table 2 except during “Phase II” of the trials, if we believe the hypothesis that (prolonged) high sucrose intakes are associated with high cholesterol levels.

In all but one instance, the paired t-tests give results that are significant at the 5% level.

Now, it’s all very well to have obtained these results, but we might ask – “How powerful is the paired t-test when we’re working with such small samples?”

To answer this question I decided to adapt some code that uses the “pwr” package in R, kindly provided by Jake Westfall on the Cross Validated site. The code requires the value of the so-called “effect size“, which I computed for our data to be equal to 1, using this online resource. The “tweaked” R code that I used is available on my code page.

The for a particular sample correlation between the paired data, the code computes the (minimum) number of pairs needed for a paired t-test to have a desired power when the significance level is 5%.
Table 3

The pair-wise sample correlations in the data set we’re examining (the relevant columns in Table 1) range between 0.696 and 0.964. So, in Table 3, it turns out that even for the sample sizes that we have, the powers of the paired t-tests are actually quite respectable. For example, the sample correlation for the data for Weeks 1 and 2 is 0.898, so a sample size of at least 5 is needed for the test of equality of the corresponding means to have a power of 99%. This is for a significance level of 5%. This minimum sample size increases to 6 if the significance level is 1% – you can re-run the R code to verify this.

At the end of the day, the small number of people included in the experiment was probably not a big problem. However, don’t forget that (questionable) assumption of independent sampling.

In any case, I’m going to cut back on my sugar intake and get more exercise!

© 2015, David E. Giles

To leave a comment for the author, please follow the link and comment on their blog: Econometrics Beat: Dave Giles' Blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)