Articles by Ron Pearson (aka TheNoodleDoodler)

The Long Tail of the Pareto Distribution

September 17, 2011 | Ron Pearson (aka TheNoodleDoodler)

In my last two posts, I have discussed cases where the mean is of little or no use as a data characterization. One of the specific examples I discussed last time was the case of the Pareto type I distribution, for which the density is given by: p(x) = aka/... [Read more...]

Some Additional Thoughts on Useless Averages

August 27, 2011 | Ron Pearson (aka TheNoodleDoodler)

In my last post, I described three situations where the average of a sequence of numbers is not representative enough to be useful: in the presence of severe outliers, in the face of multimodal data distributions, and in the face of infinite-variance distributions. The post generated three interesting comments that ... [Read more...]

When are averages useless?

August 20, 2011 | Ron Pearson (aka TheNoodleDoodler)

Of all possible single-number characterizations of a data sequence, the average is probably the best known. It is also easy to compute and in favorable cases, it provides a useful characterization of “the typical value” of a sequence of numbers. It is not the only such “typical value,” however, nor ... [Read more...]

Fitting mixture distributions with the R package mixtools

August 6, 2011 | Ron Pearson (aka TheNoodleDoodler)

My last two posts have been about mixture models, with examples to illustrate what they are and how they can be useful. Further discussion and more examples can be found in Chapter 10 of Exploring Data in Engineering, the Sciences, and Medicine. One important topic I haven’t covered is how ... [Read more...]

Mixture distributions and models: a clarification

July 16, 2011 | Ron Pearson (aka TheNoodleDoodler)

In response to my last post, Chris had the following comment: I am actually trying to better understand the distinction between mixture models and mixture distributions in my own work. You seem to say mixture models apply to a small set of models – namely regression models.This comment suggests that ... [Read more...]

A Brief Introduction to Mixture Distributions

June 18, 2011 | Ron Pearson (aka TheNoodleDoodler)

Last time, I discussed some of the advantages and disadvantages of robust estimators like the median and the MADM scale estimator, noting that certain types of datasets – like the rainfall dataset discussed last time – can cause these estimators to fail spectacularly. An extremely useful idea in working with datasets like ... [Read more...]

The pros and cons of robust data characterizations

June 6, 2011 | Ron Pearson (aka TheNoodleDoodler)

Over the years, I have looked at a lot of data contaminated with outliers, the subject of Chapter 7 of Exploring Data in Engineering, the Sciences, and Medicine. That chapter adopts the definition of an outlier presented by Barnett and Lewis in their book Outliers in Statistical Data 2nd Edition, that ... [Read more...]

The distribution of interestingness

May 21, 2011 | Ron Pearson (aka TheNoodleDoodler)

On April 22, David Landy posed a question about the distribution of interestingness values in response to my April 3rd post on “Interestingness Measures.” He noted that the survey paper by Hilderman and Hamilton that I cited there makes the following comment: “Our belief is that a useful measure of interestingness ... [Read more...]

Computing Odds Ratios in R

May 7, 2011 | Ron Pearson (aka TheNoodleDoodler)

In my last post, I discussed the use of odds ratios to characterize the association between edibility and binary mushroom characteristics for the mushrooms characterized in the UCI mushroom dataset. I did not, however, describe those co...

[Read more...]

Measuring association using odds ratios

April 23, 2011 | Ron Pearson (aka TheNoodleDoodler)

In my last two posts, I have used the UCI mushroom dataset to illustrate two things. The first was the use of interestingness measures to characterize categorical variables, and the second was the use of binary confidence intervals...

[Read more...]

Screening for predictive characteristics … and a mea culpa

April 12, 2011 | Ron Pearson (aka TheNoodleDoodler)

In my last post, I considered the UCI mushroom dataset and characterized the variables included there using four different interestingness measures. When I began drafting this post, my intention was to consider the question of how the different m...

[Read more...]

Interestingness Measures

April 3, 2011 | Ron Pearson (aka TheNoodleDoodler)

Probably because I first encountered them somewhat late in my professional life, I am fascinated by categorical data types. Without question, my favorite book on the subject is Alan Agresti’s Categorical Data Analysis (Wiley Series in Probabili...

[Read more...]

The Many Uses of Q-Q Plots

March 23, 2011 | Ron Pearson (aka TheNoodleDoodler)

My last four posts have dealt with boxplots and some useful variations on that theme. Just after I finished the series, Tal Galili, who maintains the R-bloggers website, pointed me to a variant I hadn’t seen before. It's called a bee...

[Read more...]

Boxplots & Beyond IV: Beanplots

March 5, 2011 | Ron Pearson (aka TheNoodleDoodler)

This post is the last in a series of four on boxplots and some of their extensions. Previous posts in this series have discussed basic boxplots, modified boxplots based on a robust asymmetry measure, and violin plots, an alternative that essentia...

[Read more...]

Boxplots and Beyond III: Violin Plots

February 15, 2011 | Ron Pearson (aka TheNoodleDoodler)

This post is the third in a series of four on boxplots and closely related data visualization techniques for comparing subsets of a dataset, or comparing different datasets that we hope or expect to be similarly distributed. The previous two post...

[Read more...]

Boxplots and Beyond – Part II: Asymmetry

February 6, 2011 | Ron Pearson (aka TheNoodleDoodler)

In my last post, I discussed boxplots in their simplest forms, illustrating some of the useful options available with the boxplot command in the open-source statistical software package R. As I noted in that post, the basic boxplot is both useful...

[Read more...]

Boxplots and Beyond – Part I

January 29, 2011 | Ron Pearson (aka TheNoodleDoodler)

Boxplots are a simple and reasonably popular way of summarizing the range of variation of a real-valued variable across different subsets of data. Typical examples might include diastolic blood pressure across a group of patients, broken dow...

[Read more...]

The Art of Exploratory Data Analysis

January 22, 2011 | Ron Pearson (aka TheNoodleDoodler)

This blog is about the art of exploratory data analysis, which is also the subject of my new book, Exploring Data in Engineering, the Sciences, and Medicine (http://www.oup.com/us/ExploringData). This art is appropriate in situations where y...

[Read more...]

« 1 2

Articles by Ron Pearson (aka TheNoodleDoodler)

Never miss an update! Subscribe to R-bloggers to receive e-mails with the latest R posts. (You will not see this message again.)

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)