Articles by Eric Cai - The Chemical Statistician

Use unique() instead of levels() to find the possible values of a character variable in R

March 10, 2018 | Eric Cai - The Chemical Statistician

When I first encountered R, I learned to use the levels() function to find the possible values of a categorical variable. However, I recently noticed something very strange about this function. Consider the built-in data set “iris” and its character variable “Species”. Here are the possible values of “Species”, as ... [Read more...]

Use the LENGTH statement to pre-set the lengths of character variables in SAS – with a comparison to R

August 16, 2017 | Eric Cai - The Chemical Statistician

I often create character variables (i.e. variables with strings of text as their values) in SAS, and they sometimes don’t render as expected. Here is an example involving the built-in data set SASHELP.CLASS. Here is the code: data c1; set sashelp.class; * define a new character variable ... [Read more...]

Producing a Control Chart in R – An Application in Analytical Chemistry

August 2, 2015 | Eric Cai - The Chemical Statistician

Introduction Many processes in chemistry, especially in synthesis, require attaining a certain target value for a property of interest. For example, when synthesizing drug capsules that contain a medicine, a chemist has to ensure that the concentration of the medicine meets a target value. If the concentration is too high ...

[Read more...]

How to Extract a String Between 2 Characters in R and SAS

June 18, 2015 | Eric Cai - The Chemical Statistician

Introduction I recently needed to work with date values that look like this: mydate Jan 23/2 Aug 5/20 Dec 17/2 I wanted to extract the day, and the obvious strategy is to extract the text between the space and the slash. I needed to think about how to program this carefully in both ... [Read more...]

Resources for Learning Data Manipulation in R, SAS and Microsoft Excel

February 23, 2015 | Eric Cai - The Chemical Statistician

I had the great pleasure of speaking to the Department of Statistics and Actuarial Science at Simon Fraser University on last Friday to share my career advice with its students and professors. I emphasized the importance of learning skills in data manipulation during my presentation, and I want to supplement ... [Read more...]

The advantages of using count() to get N-way frequency tables as data frames in R

February 12, 2015 | Eric Cai - The Chemical Statistician

Introduction I recently introduced how to use the count() function in the “plyr” package in R to produce 1-way frequency tables in R. Several commenters provided alternative ways of doing so, and they are all appreciated. Today, I want to extend that tutorial by demonstrating how count() can be used ... [Read more...]

How to Get the Frequency Table of a Categorical Variable as a Data Frame in R

February 3, 2015 | Eric Cai - The Chemical Statistician

Introduction One feature that I like about R is the ability to access and manipulate the outputs of many functions. For example, you can extract the kernel density estimates from density() and scale them to ensure that the resulting density integrates to 1 over its support set. I recently needed to ... [Read more...]

Exploratory Data Analysis – All Blog Posts on The Chemical Statistician

December 11, 2014 | Eric Cai - The Chemical Statistician

This series of posts introduced various methods of exploratory data analysis, providing theoretical backgrounds and practical examples. Fully commented and readily usable R scripts are available for all topics for you to copy and paste for your own analysis! Most of these posts involve data visualization and plotting, and I ... [Read more...]

Performing Logistic Regression in R and SAS

November 24, 2014 | Eric Cai - The Chemical Statistician

Introduction My statistics education focused a lot on normal linear least-squares regression, and I was even told by a professor in an introductory statistics class that 95% of statistical consulting can be done with knowledge learned up to and including a course in linear regression. Unfortunately, that advice has turned out ... [Read more...]

Online index of plots and corresponding R scripts

October 29, 2014 | Eric Cai - The Chemical Statistician

Dear Readers of The Chemical Statistician, While working in my job at the British Columbia Cancer Agency, I learned about a wonderful new data visualization resource from a colleague who works at the British Columbia Centre for Disease Control. I want to share this with you, as I think that ... [Read more...]

The Chi-Squared Test of Independence – An Example in Both R and SAS

August 25, 2014 | Eric Cai - The Chemical Statistician

Introduction The chi-squared test of independence is one of the most basic and common hypothesis tests in the statistical analysis of categorical data. Given 2 categorical random variables, and , the chi-squared test of independence determines whether or not there exists a statistical dependence between them. Formally, it is a hypothesis test ... [Read more...]

Side-by-Side Box Plots with Patterns From Data Sets Stacked by reshape2 and melt() in R

April 10, 2014 | Eric Cai - The Chemical Statistician

Introduction A while ago, one of my co-workers asked me to group box plots by plotting them side-by-side within each group, and he wanted to use patterns rather than colours to distinguish between the box plots within a group; the publication that will display his plots prints in black-and-white only. ... [Read more...]

Useful Functions in R for Manipulating Text Data

February 27, 2014 | Eric Cai - The Chemical Statistician

Introduction In my current job, I study HIV at the genetic and biochemical levels. Thus, I often work with data involving the sequences of nucleotides or amino acids of various patient samples of HIV, and this type of work involves a lot of manipulating text. (Strictly speaking, I analyze sequences ... [Read more...]

Rectangular Integration (a.k.a. The Midpoint Rule)

January 20, 2014 | Eric Cai - The Chemical Statistician

Introduction Continuing on the recently born series on numerical integration, this post will introduce rectangular integration. I will describe the concept behind rectangular integration, show a function in R for how to do it, and use it to check that the distribution actually integrates to 1 over its support set. This ... [Read more...]

Trapezoidal Integration – Conceptual Foundations and a Statistical Application in R

December 14, 2013 | Eric Cai - The Chemical Statistician

Introduction Today, I will begin a series of posts on numerical integration, which has a wide range of applications in many fields, including statistics. I will introduce with trapezoidal integration by discussing its conceptual foundations, write my own R function to implement trapezoidal integration, and use it to check that ... [Read more...]

Detecting an Unfair Die with Bayes’ Theorem

October 30, 2013 | Eric Cai - The Chemical Statistician

Introduction I saw an interesting problem that requires Bayes’ Theorem and some simple R programming while reading a bioinformatics textbook. I will discuss the math behind solving this problem in detail, and I will illustrate some very useful plotting functions to generate a plot from R that visualizes the solution ... [Read more...]

Exploratory Data Analysis: Quantile-Quantile Plots for New York’s Ozone Pollution Data

September 22, 2013 | Eric Cai - The Chemical Statistician

Introduction Continuing my recent series on exploratory data analysis, today’s post focuses on quantile-quantile (Q-Q) plots, which are very useful plots for assessing how closely a data set fits a particular distribution. I will discuss how Q-Q plots are constructed and use Q-Q plots to assess the distribution of ... [Read more...]

Exploratory Data Analysis: Useful R Functions for Exploring a Data Frame

August 19, 2013 | Eric Cai - The Chemical Statistician

Introduction Data in R are often stored in data frames, because they can store multiple types of data. (In R, data frames are more general than matrices, because matrices can only store one type of data.) Today’s post highlights some common functions in R that I like to use ... [Read more...]

Exploratory Data Analysis: The 5-Number Summary – Two Different Methods in R

August 12, 2013 | Eric Cai - The Chemical Statistician

Introduction Continuing my recent series on exploratory data analysis (EDA), today’s post focuses on 5-number summaries, which were previously mentioned in the post on descriptive statistics in this series. I will define and calculate the 5-number summary in 2 different ways that are commonly used in R. (It turns out ... [Read more...]

Exploratory Data Analysis: Combining Histograms and Density Plots to Examine the Distribution of the Ozone Pollution Data from New York in R

July 29, 2013 | Eric Cai - The Chemical Statistician

Introduction This is a follow-up post to my recent introduction of histograms. Previously, I presented the conceptual foundations of histograms and used a histogram to approximate the distribution of the “Ozone” data from the built-in data set “airquality” in R. Today, I will examine this distribution in more detail by ... [Read more...]

1 2 3 »

R-bloggers

R news and tutorials contributed by hundreds of R bloggers

Articles by Eric Cai - The Chemical Statistician

Use unique() instead of levels() to find the possible values of a character variable in R

Use the LENGTH statement to pre-set the lengths of character variables in SAS – with a comparison to R

Producing a Control Chart in R – An Application in Analytical Chemistry

How to Extract a String Between 2 Characters in R and SAS

Resources for Learning Data Manipulation in R, SAS and Microsoft Excel

The advantages of using count() to get N-way frequency tables as data frames in R

How to Get the Frequency Table of a Categorical Variable as a Data Frame in R

Exploratory Data Analysis – All Blog Posts on The Chemical Statistician

Performing Logistic Regression in R and SAS

Online index of plots and corresponding R scripts

The Chi-Squared Test of Independence – An Example in Both R and SAS

Side-by-Side Box Plots with Patterns From Data Sets Stacked by reshape2 and melt() in R

Useful Functions in R for Manipulating Text Data

Rectangular Integration (a.k.a. The Midpoint Rule)

Trapezoidal Integration – Conceptual Foundations and a Statistical Application in R

Detecting an Unfair Die with Bayes’ Theorem

Exploratory Data Analysis: Quantile-Quantile Plots for New York’s Ozone Pollution Data

Exploratory Data Analysis: Useful R Functions for Exploring a Data Frame

Exploratory Data Analysis: The 5-Number Summary – Two Different Methods in R

Exploratory Data Analysis: Combining Histograms and Density Plots to Examine the Distribution of the Ozone Pollution Data from New York in R

Articles by Eric Cai - The Chemical Statistician

Never miss an update! Subscribe to R-bloggers to receive e-mails with the latest R posts. (You will not see this message again.)

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)