Blog Archives

Performing Logistic Regression in R and SAS

Performing Logistic Regression in R and SAS

Introduction My statistics education focused a lot on normal linear least-squares regression, and I was even told by a professor in an introductory statistics class that 95% of statistical consulting can be done with knowledge learned up to and including a course in linear regression.  Unfortunately, that advice has turned out to vastly underestimate the

Read more »

Online index of plots and corresponding R scripts

Online index of plots and corresponding R scripts

Dear Readers of The Chemical Statistician, While working in my job at the British Columbia Cancer Agency, I learned about a wonderful new data visualization resource from a colleague who works at the British Columbia Centre for Disease Control.  I want to share this with you, as I think that it will help you immensely in your efforts

Read more »

The Chi-Squared Test of Independence – An Example in Both R and SAS

The Chi-Squared Test of Independence – An Example in Both R and SAS

Introduction The chi-squared test of independence is one of the most basic and common hypothesis tests in the statistical analysis of categorical data.  Given 2 categorical random variables, and , the chi-squared test of independence determines whether or not there exists a statistical dependence between them.  Formally, it is a hypothesis test with the following null and

Read more »

Side-by-Side Box Plots with Patterns From Data Sets Stacked by reshape2 and melt() in R

Side-by-Side Box Plots with Patterns From Data Sets Stacked by reshape2 and melt() in R

Introduction A while ago, one of my co-workers asked me to group box plots by plotting them side-by-side within each group, and he wanted to use patterns rather than colours to distinguish between the box plots within a group; the publication that will display his plots prints in black-and-white only.  I gladly investigated how to

Read more »

Useful Functions in R for Manipulating Text Data

Useful Functions in R for Manipulating Text Data

Introduction In my current job, I study HIV at the genetic and biochemical levels.  Thus, I often work with data involving the sequences of nucleotides or amino acids of various patient samples of HIV, and this type of work involves a lot of manipulating text.  (Strictly speaking, I analyze sequences of nucleotides from DNA that are reverse-transcribed from

Read more »

Rectangular Integration (a.k.a. The Midpoint Rule)

Rectangular Integration (a.k.a. The Midpoint Rule)

Introduction Continuing on the recently born series on numerical integration, this post will introduce rectangular integration.  I will describe the concept behind rectangular integration, show a function in R for how to do it, and use it to check that the distribution actually integrates to 1 over its support set.  This post follows from my

Read more »

Trapezoidal Integration – Conceptual Foundations and a Statistical Application in R

Trapezoidal Integration – Conceptual Foundations and a Statistical Application in R

Introduction Today, I will begin a series of posts on numerical integration, which has a wide range of applications in many fields, including statistics.  I will introduce with trapezoidal integration by discussing its conceptual foundations, write my own R function to implement trapezoidal integration, and use it to check that the Beta(2, 5) probability density

Read more »

Detecting an Unfair Die with Bayes’ Theorem

Detecting an Unfair Die with Bayes’ Theorem

Introduction I saw an interesting problem that requires Bayes’ Theorem and some simple R programming while reading a bioinformatics textbook.  I will discuss the math behind solving this problem in detail, and I will illustrate some very useful plotting functions to generate a plot from R that visualizes the solution effectively. The Problem The following question is

Read more »

Exploratory Data Analysis: Quantile-Quantile Plots for New York’s Ozone Pollution Data

Exploratory Data Analysis: Quantile-Quantile Plots for New York’s Ozone Pollution Data

Introduction Continuing my recent series on exploratory data analysis, today’s post focuses on quantile-quantile (Q-Q) plots, which are very useful plots for assessing how closely a data set fits a particular distribution.  I will discuss how Q-Q plots are constructed and use Q-Q plots to assess the distribution of the “Ozone” data from the built-in

Read more »

Exploratory Data Analysis: Useful R Functions for Exploring a Data Frame

Exploratory Data Analysis: Useful R Functions for Exploring a Data Frame

Introduction Data in R are often stored in data frames, because they can store multiple types of data.  (In R, data frames are more general than matrices, because matrices can only store one type of data.)  Today’s post highlights some common functions in R that I like to use to explore a data frame before

Read more »