Blog Archives

Exploratory Data Analysis: The 5-Number Summary – Two Different Methods in R

Exploratory Data Analysis: The 5-Number Summary – Two Different Methods in R

Introduction Continuing my recent series on exploratory data analysis (EDA), today’s post focuses on 5-number summaries, which were previously mentioned in the post on descriptive statistics in this series.  I will define and calculate the 5-number summary in 2 different ways that are commonly used in R.  (It turns out that different methods arise from

Read more »

Exploratory Data Analysis: Combining Histograms and Density Plots to Examine the Distribution of the Ozone Pollution Data from New York in R

Exploratory Data Analysis: Combining Histograms and Density Plots to Examine the Distribution of the Ozone Pollution Data from New York in R

Introduction This is a follow-up post to my recent introduction of histograms.  Previously, I presented the conceptual foundations of histograms and used a histogram to approximate the distribution of the “Ozone” data from the built-in data set “airquality” in R.  Today, I will examine this distribution in more detail by overlaying the histogram with parametric

Read more »

Exploratory Data Analysis: Conceptual Foundations of Histograms – Illustrated with New York’s Ozone Pollution Data

Exploratory Data Analysis: Conceptual Foundations of Histograms – Illustrated with New York’s Ozone Pollution Data

Introduction Continuing my recent series on exploratory data analysis (EDA), today’s post focuses on histograms, which are very useful plots for visualizing the distribution of a data set.  I will discuss how histograms are constructed and use histograms to assess the distribution of the “Ozone” data from the built-in “airquality” data set in R.  In

Read more »

Exploratory Data Analysis – Kernel Density Estimation and Rug Plots on Ozone Data in New York and Ozonopolis

Exploratory Data Analysis – Kernel Density Estimation and Rug Plots on Ozone Data in New York and Ozonopolis

For the sake of brevity, this post has been created from the second half of a previous long post on kernel density estimation.  This second half focuses on constructing kernel density plots and rug plots in R.  The first half focused on the conceptual foundations of kernel density estimation. Introduction This post follows the recent

Read more »

Exploratory Data Analysis: 2 Ways of Plotting Empirical Cumulative Distribution Functions in R

Exploratory Data Analysis: 2 Ways of Plotting Empirical Cumulative Distribution Functions in R

Introduction Continuing my recent series on exploratory data analysis (EDA), and following up on the last post on the conceptual foundations of empirical cumulative distribution functions (CDFs), this post shows how to plot them in R.  (Previous posts in this series on EDA include descriptive statistics, box plots, kernel density estimation, and violin plots.) I

Read more »

Exploratory Data Analysis: Conceptual Foundations of Empirical Cumulative Distribution Functions

Exploratory Data Analysis: Conceptual Foundations of Empirical Cumulative Distribution Functions

Introduction Continuing my recent series on exploratory data analysis (EDA), this post focuses on the conceptual foundations of empirical cumulative distribution functions (CDFs); in a separate post, I will show how to plot them in R.  (Previous posts in this series include descriptive statistics, box plots, kernel density estimation, and violin plots.) To give you

Read more »

Exploratory Data Analysis: Combining Box Plots and Kernel Density Plots into Violin Plots for Ozone Pollution Data

Exploratory Data Analysis: Combining Box Plots and Kernel Density Plots into Violin Plots for Ozone Pollution Data

Introduction Recently, I began a series on exploratory data analysis (EDA), and I have written about descriptive statistics, box plots, and kernel density plots so far.  As previously mentioned in my post on box plots, there is a way to combine box plots and kernel density plots.  This combination results in violin plots, and I

Read more »

Exploratory Data Analysis: Kernel Density Estimation in R on Ozone Pollution Data in New York and Ozonopolis

Exploratory Data Analysis: Kernel Density Estimation in R on Ozone Pollution Data in New York and Ozonopolis

Introduction Recently, I began a series on exploratory data analysis; so far, I have written about computing descriptive statistics and creating box plots in R for a univariate data set with missing values.  Today, I will continue this series by analyzing the same data set with kernel density estimation, a useful non-parametric technique for visualizing

Read more »

Exploratory Data Analysis: Variations of Box Plots in R for Ozone Concentrations in New York City and Ozonopolis

Exploratory Data Analysis: Variations of Box Plots in R for Ozone Concentrations in New York City and Ozonopolis

Introduction Last week, I wrote the first post in a series on exploratory data analysis (EDA).  I began by calculating summary statistics on a univariate data set of ozone concentration in New York City in the built-in data set “airquality” in R.  In particular, I talked about how to calculate those statistics when the data

Read more »

When Does the Kinetic Theory of Gases Fail? Examining its Postulates with Assistance from Simple Linear Regression in R

When Does the Kinetic Theory of Gases Fail?  Examining its Postulates with Assistance from Simple Linear Regression in R

Introduction The Ideal Gas Law, , is a very simple yet useful relationship that describes the behaviours of many gases pretty well in many situations.  It is “Ideal” because it makes some assumptions about gas particles that make the math and the physics easy to work with; in fact, the simplicity that arises from these

Read more »