[This article was first published on The Chemical Statistician » R programming, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

#### Introduction

Many processes in chemistry, especially in synthesis, require attaining a certain target value for a property of interest.  For example, when synthesizing drug capsules that contain a medicine, a chemist has to ensure that the concentration of the medicine meets a target value.  If the concentration is too high or too low, then the patient ingesting the drug capsules could suffer catastrophic health problems.  Thus, monitoring this attainment is a very important part of analytical chemistry.

Of course, natural variation in any chemical process will result in some variation in the output, so the target value will rarely be attained exactly.  There is usually an acceptable range of values, but any deviation of the output beyond this acceptable range must be discovered and treated with alarm, as the underlying process for generating that output may be inherently faulty.  The process should be stopped, examined, and repaired before any more output can be generated.  From a statistical perspective, there needs to be some mechanism to monitor for outliers as the process unfolds.

A control chart is a useful tool for monitoring chemical processes to detect outliers.  In this tutorial, I will

• explain the underlying concepts of a simple but common type of control charts
• demonstrate how to produce control charts with an example data set in R

Read the rest of this blog post to learn how to build the above control chart in R!

#### What is a Control Chart?

A control chart is a scatter plot that allows a chemist to monitor a process as it happens over time.  It plots the quantity of interest on the vertical axis against time (or the order of the generation of the data) on the horizontal axis.  There are many variations of control charts, but they generally show how far the data deviate from a target value.  Here is one type of control charts from Harris (2003) that you can easily plot.  There are 5 special horizontal lines in this type of control charts.  Suppose that

• the target value of the property of interest is $\mu$,
• the standard deviation*** for generating this quantity in the chemical process is $\sigma$,
• and the number of data collected is $n$.

(***Although the true values of the population mean and population standard deviation can never be accurately determined, many laboratories and companies have long histories of doing the same chemical process.  Thus, they have very unbiased and precise estimates of the population mean and population standard deviation, and those estimates can be considered to be “true” for such practical purposes.)

Then, you can plot the data versus time, and add 5 special lines to this scatter plot.  The values of these lines are in brackets below.

1. the target value line ($\mu$)
2. the upper warning line ($\mu + 2 \sigma \div \sqrt{n}$)
3. the upper action line ($\mu + 3 \sigma \div \sqrt{n}$)
4. the lower warning line ($\mu - 2 \sigma \div \sqrt{n}$)
5. the lower action line ($\mu - 2 \sigma \div \sqrt{n}$)

A chemist can use these lines to determine if the data are deviating too far away from the target value.  Significantly large deviations indicate that something is wrong with the data-generating process.  Daniel Harris (2003) suggests stopping the process and examining for malfunctions if any of the following events occur.

a) 1 datum falls outside of the action lines

b) 2 out of 3 consecutive data fall between the warning lines and the action lines

c) 7 consecutive data are all above or all below the target value line

d) 6 consecutive data steadily increase or steadily decrease, regardless of their location

e) 14 consecutive data alternate up and down regardless of their location

f) some obvious non-random pattern

#### An Example of a Control Chart in R

Suppose that you are producing vitamin C capsules, and the target weight percentage of vitamin C in each capsule is 95%.  Based on past experience in making these capsules, you know that the standard deviation of the weight percentage is 0.005.  I have simulated 25 data for this production process, and you can find this data set at the end of this blog post.  I have called the vector of this data set “vitamin_c”.

Notice my use of

• the abline() function to draw the 5 horizontal lines
• the axis() function to draw my custom labels for the 5 horizontal lines along the vertical axis
• the “yaxt = ‘n’” option in the plot() function to suppress the printing of the default vertical axis

Here is the R script for plotting a control chart for this production process according to the specifications as outlined in Harris (2003).  Note that I have labeled the 5 horizontal lines using abbreviations:

• upper action = UA
• upper warning = UW
• target value = TV
• lower warning = LW
• lower action = LA
##### Plotting a Control Chart in Analytical Chemistry
##### By Eric Cai, The Chemical Statistician

# first, import the data vector from the bottom of this blog post
# assign it to the variable "vitamin_c"

# obtain the number of data in this vector
n = length(vitamin_c)

# create a vector of the order of the production
# this will be the horizontal axis in the control chart
ordering = 1:n

# the target weight percentage is 95%
mu = 95

# from past experience, you know that the standard deviation is 0.005
# treat this as the "true" standard deviation
sigma = 0.005

# set the 5 horizontal lines of the control chart
upper_action_line = mu + 3*sigma/sqrt(n)
upper_warning_line = mu + 2*sigma/sqrt(n)
target_value = mu
lower_warning_line = mu - 2*sigma/sqrt(n)
lower_action_line = mu - 3*sigma/sqrt(n)

# put all of the values of the 5 horizontal lines into 1 vector
control_lines = c(upper_action_line, upper_warning_line, target_value, lower_warning_line, lower_action_line)

# create a vector of labels for the 5 horizontal lines
control_labels = c('UA', 'UW', 'TV', 'LW', 'LA')

# export the control chart in PNG format to a folder of your choice
png('Write Your Working Directory Path Here/control chart for vitamin c production.png')

# note the use of the "yaxt = 'n'" option to suppress the default y-axis
plot(ordering, vitamin_c, main = 'Control Chart for Vitamin C Production',
xlab = 'Order of Production', ylab = 'Weight Percentage', yaxt = 'n',
ylim = c(lower_action_line - sd, upper_action_line + sd))

# draw the 5 horizontal lines along the left vertical axis
abline(h = control_lines)

# write the labels for the 5 horizontal lines
axis(2, at = control_lines, labels = control_labels)
dev.off()

Here is the resulting control chart.

Notice that Data #12-18 (7 consecutive points) all fall below the target value line.  Based on criterion c) above, that warrants shutting down the production process to examine for a possible malfunction.

#### Reference

Harris, Daniel C. “Quantitative analytical chemistry” (2003).

#### Data

 Weight Percentage of Vitamin C 94.9991591445192 95.0013843593435 94.9987445081374 95.0000701427664 95.0017114408727 94.9993970920185 94.9995278336148 94.9993646286875 94.9997142263651 95.0001381082248 95.0012276303438 94.9991982205454 94.9989196074000 94.9998424656439 94.9989282399601 94.9998610138594 94.9994026869053 94.9978160332399 95.0002408172559 94.9997406445933 95.0009005119453 95.0009418693939 95.0014679619034 95.0007067610896 95.0008190089303

Filed under: Analytical Chemistry, Applied Statistics, Chemistry, Data Analysis, Descriptive Statistics, Plots, Practical Applications of Chemistry, R programming, Statistics, Statistics in Industry and Practice, Tutorials Tagged: abline(), analytical chemistry, axis(), chemistry, control chart, lower action line, lower warning line, plot, quality assurance, quality control, R, R programming, statistical process control, target value, upper action line, upper warning line, vitamin c