Discovering Argon with the 2-Sample t-Test

[This article was first published on The Chemical Statistician » R programming, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I learned about Lord Rayleigh’s discovery of argon in my 2nd-year analytical chemistry class while reading “Quantitative Chemical Analysis” by Daniel Harris.  (William Ramsay was also responsible for this discovery.)  This is one of my favourite stories in chemistry; it illustrates how diligence in measurement can lead to an elegant and surprising discovery.  I find no evidence that Rayleigh and Ramsay used statistics to confirm their findings; their paper was published 13 years before Gosset published about the t-test.  Thus, I will use a 2-sample t-test in R to confirm their result.

Lord Rayleigh                                    William Ramsay

Photos of Lord Rayleigh and William Ramsay

Source: Wikimedia Commons

Rayleigh and Ramsay

John Williams Strutt (1842-1919), better known as Lord Rayleigh, was a very accomplished physicist and chemist known for Rayleigh scattering (which explains why the sky is blue), Rayleigh waves (surface acoustic waves in elastic solids) and the Rayleigh-Jeans law (which attempted to describe blackbody radiation but failed at short wavelengths, resulting in the ultraviolet catastrophe and motivating the development of quantum mechanics).  Out of all of his accomplishments, he was recognized for the discovery of argon with the Nobel Prize in Physics in 1904.

William Ramsay (1852–1916) was a chemist at University College London, and his research focus was on nitrogen oxides before he learned about Rayleigh’s work and shared his curiosity about the discrepancy in the two masses.  He and Rayleigh collaborated significantly on the discovery of argon, and he later won the Nobel Prize in Chemistry in 1904 for discovering the noble gases.

I could only find documentation about Rayleigh’s experimentation and data on the discovery of argon, but it is clear that Ramsay was responsible for this discovery, too, and the two scientists communicated many times about their work on this same problem.  Even though I will refer to Rayleigh’s work and data in this blog post, Ramsay should also be recognized.

The Nitrogenous Origin

The discovery of argon can be traced back to Rayleigh’s original intent of measuring the densities of various gases in 1882.  His first publication on this work appeared in 1888, and it discussed the relative densities of oxygen and hydrogen.  He later focused on the density of gaseous nitrogen, which he measured in 2 ways.

1) Scientists knew that air contained roughly 75% nitrogen and 25% oxygen, so Rayleigh sought to obtain nitrogen by passing air over hot copper, removing oxygen via the reaction

O2(g)  +  2Cu(s)   →   2CuO(s)

2) Rayleigh also produced nitrogen by bubbling air through liquid ammonia and then through a hot tube.

3O2(g)  +  4NH3(l)   →   6H2O(l)  +  2N2(s)

The water was then removed with a drying agent.

The Unexpected Result

Rayleigh discovered that the nitrogen from Method #1 (“atmospheric” nitrogen”) was 2.3 mg heavier than the nitrogen from Method #2 (“chemical” nitrogen).  While that may seem very small and quite possibly due to random error, he was confident that there was something in “atmospheric” nitrogen that caused this difference.  Rayleigh sent a letter to Nature describing these results, and William Ramsay responded to Rayleigh with similar puzzlement.

Thus, Rayleigh decided to make nitrogen from the two sources with slightly different methods.

1) In addition to passing air over copper, Rayleigh also isolated nitrogen by

– passing air over hot iron to remove the oxygen

– passing air over freshly precipitated ferrous hydrate, or iron (II) hydrate, to remove oxygen

2) In addition to passing air over ammonia, Rayleigh also used chemical decomposition of the following nitrogenous compounds to isolate nitrogen:

nitric oxide

nitrous oxide

urea

ammonium nitrite

Exploring the Data

Here are the mass data from the two types of nitrogen, “atmospheric” and “chemical”, as entered into and displayed in R.  Note that these are not paired data; 7 data were collected for nitrogen from air, and 8 data were collected for nitrogen from chemical decomposition.  I created a data table called nitrogen.masses; note the use of the colnames() function to set the column names.

##### Analyzing Lord Rayleigh's Data on Nitrogen and Discovering Argon with the 2-Sample t-Test
##### Written by Eric Cai - The Chemical Statistician

atmospheric.nitrogen = c(2.31017, 2.30986, 2.31010, 2.31001, 2.31024, 2.31010, 2.31028, NA)
chemical.nitrogen = c(2.30143, 2.29890, 2.29816, 2.30182, 2.29869, 2.29940, 2.29849, 2.29869)
nitrogen.masses = data.frame(atmospheric.nitrogen, chemical.nitrogen)
colnames(nitrogen.masses) = c('Nitrogen from Air', 'Nitrogen from Chemical Decomposition')

> nitrogen.masses
           Nitrogen from Air            Nitrogen from Chemical Decomposition
1          2.31017                      2.30143
2          2.30986                      2.29890
3          2.31010                      2.29816
4          2.31001                      2.30182
5          2.31024                      2.29869
6          2.31010                      2.29940
7          2.31028                      2.29849
8          NA                           2.29869

It is easy to see that the masses in the first column are all slightly larger than the masses of the second column – but by how much?  A box plot is a useful way to visualize this contrast; it shows the following summary statistics of a set of data:

– Max{Min, Q1 – 1.5IQR}, where Min is the minimum, Q1 is the 1st quartile, and IQR is the interquartile range

– 1st quartile (25th percentile)

– median (50th percentile)

– 3rd quartile (75th percentile)

– Min{Max, Q3 + 1.5IQR}, where Max is the maximum, Q3 is the 3rd quartile, and IQR is the interquartile range

Here is my code, sandwiched by the commands to export the plot as a PNG image to my desired directory.

png('Insert your directory here/nitrogen masses.png')
boxplot(nitrogen.masses, main = "Lord Rayleigh''s Measured Masses of Nitrogen", ylab = 'Mass (g)')
dev.off()

Here is the box plot.

nitrogen masses

The 2-Sample t-Test to Compare Population Means

The box plot makes the difference between the masses of the 2 types of nitrogen even more striking.  However, let’s use a hypothesis test to compare the 2 population means to quantify just how certain we are that the difference is beyond just random error.  Since we are comparing two population means, a 2-sample t-test is appropriate.

William Gosset (a.k.a. “Student”) developed the t-test in 1908, while Rayleigh and Ramsay published their discovery of argon in 1895, so Rayleigh and Ramsay could not have used the t-test for their data.  In reading through Rayleigh’s and Ramsay’s paper, I actually don’t find any statistical analysis to ensure that the difference was significant.  Now that we do know about the t-test, let’s use it to confirm their finding.

We are testing the following hypotheses, which are mutually exclusive:

Null Hypothesis: There is no difference between the masses of the 2 types of nitrogen.

Alternative Hypothesis: There is a difference between the masses of the 2 types of nitrogen.

Here is my R code for conducting the t-test, using the t.test() function.

> t.test(atmospheric.nitrogen, chemical.nitrogen)
Welch Two Sample t-test

data:  atmospheric.nitrogen and chemical.nitrogen 
t = 21.5183, df = 7.168, p-value = 8.919e-08
alternative hypothesis: true difference in means is not equal to 0 
95 percent confidence interval:
0.009495066     0.011827076 
sample estimates:
mean of x     mean of y 
2.310109      2.299448

The key result in the above output is the p-value, which is the strength of the evidence for the null hypothesis.  A lower p-value indicates less evidence for the null hypothesis or, equivalently, more evidence for the alternative hypothesis.  In this case, the p-value is 8.919 e-08, which is extremely low.  (The usual cut-off is 0.05, which results in 95% confidence.  Our low p-value gives much greater confidence!)  Thus, the null hypothesis is rejected, and we have shown from Rayleigh’s data that the two samples of nitrogen are different!

Of course, this doesn’t prove that the higher mass in “atmospheric” nitrogen is caused by the presence of an inert, gaseous element.  (Yes, I realize that the title of my blog post is slightly misleading.  :) )  A series of attempts to characterize the inert residue from the “atmospheric” residue via chemical reactions yielded no results, and the emission spectrum of this residue was unlike any other known to chemists at the time.  Rayleigh and Ramsay knew that claiming this inert gas as a new element was going to be controversial, since there was no place on the periodic table for it yet.  Eventually, Ramsay discovered many other gases that were also chemically inert, and the noble gases were finally discovered as a new family on the periodic table.

References

– Daniel C. Harris.  ”Quantitative Chemical Analysis”, 7th Edition, Page 60-62.

– Russell D. Larsen.  ”Lessons learned from Lord Rayleigh on the importance of data analysis.”  Journal of Chemical Education.   67, no. 11 (1990): 925.

– Carmen Giunta.  ”Using history to teach scientific method: The case of argon.”  Journal of Chemical Education.  75, no. 10 (1998): 1322.

– Rayleigh, Lord, and William Ramsay. “Argon, a New Constituent of the Atmosphere.” Proceedings of the Royal Society of London 57.340-346 (1894): 265-287.


Filed under: Analytical Chemistry, Applied Statistics, Basic Chemistry, Plots, R programming Tagged: analytical chemistry, argon, basic chemistry, box plot, chemistry, data, data analysis, data visualization, inference, Lord Rayleigh, nitrogen, Nobel, Nobel Prize, plot, plots, plotting, R, R programming, Ramsay, Rayleigh, statistical inference, statistics, t-test, William Ramsay

To leave a comment for the author, please follow the link and comment on their blog: The Chemical Statistician » R programming.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)