Blog Archives

When are averages useless?

When are averages useless?

Of all possible single-number characterizations of a data sequence, the average is probably the best known.  It is also easy to compute and in favorable cases, it provides a useful characterization of “the typical value” of a sequence of numbers.  It is not the only such “typical value,” however, nor is it always the most useful one: two other...

Read more »

Fitting mixture distributions with the R package mixtools

Fitting mixture distributions with the R package mixtools

My last two posts have been about mixture models, with examples to illustrate what they are and how they can be useful.  Further discussion and more examples can be found in Chapter 10 of Exploring Data in Engineering, the Sciences, and Medicine.  One important topic I haven’t covered is how to fit mixture models to datasets like the Old Faithful geyser...

Read more »

Mixture distributions and models: a clarification

Mixture distributions and models: a clarification

In response to my last post, Chris had the following comment:

           
I am actually trying to better understand the distinction between mixture models and mixture distributions in my own work.  You seem to say mixture models apply to a small set of models – namely regression models.

This comment suggests that my caution about the difference between mixed-effect models and mixture distributions...

Read more »

A Brief Introduction to Mixture Distributions

A Brief Introduction to Mixture Distributions

Last time, I discussed some of the advantages and disadvantages of robust estimators like the median and the MADM scale estimator, noting that certain types of datasets – like the rainfall dataset discussed last time – can cause these estimators to fail spectacularly.  An extremely useful idea in working with datasets like this one is that of mixture distributions,...

Read more »

The pros and cons of robust data characterizations

The pros and cons of robust data characterizations

Over the years, I have looked at a lot of data contaminated with outliers, the subject of Chapter 7 of Exploring Data in Engineering, the Sciences, and Medicine.  That chapter adopts the definition of an outlier presented by Barnett and Lewis in their book Outliers in Statistical Data 2nd Edition

Read more »

The distribution of interestingness

The distribution of interestingness

On April 22, David Landy posed a question about the distribution of interestingness values in response to my April 3rd post on “Interestingness Measures.”  He noted that the survey paper by Hilderman and Hamilton that I cited there makes the following comment:

“Our belief is that a useful measure of interestingness should generate index values that are reasonably distributed throughout...

Read more »

The distribution of interestingness

The distribution of interestingness

On April 22, David Landy posed a question about the distribution of interestingness values in response to my April 3rd post on “Interestingness Measures.”  He noted that the survey paper by Hilderman and Hamilton that I cited there makes the following comment:

“Our belief is that a useful measure of interestingness should generate index values that are reasonably distributed throughout...

Read more »

Computing Odds Ratios in R

Computing Odds Ratios in R

In my last post, I discussed the use of odds ratios to characterize the association between edibility and binary mushroom characteristics for the mushrooms characterized in the UCI mushroom dataset.  I did not, however, describe those co...

Read more »

Measuring association using odds ratios

Measuring association using odds ratios

In my last two posts, I have used the UCI mushroom dataset to illustrate two things.  The first was the use of interestingness measures to characterize categorical variables, and the second was the use of binary confidence intervals...

Read more »

Screening for predictive characteristics … and a mea culpa

Screening for predictive characteristics … and a mea culpa

In my last post, I considered the UCI mushroom dataset and characterized the variables included there using four different interestingness measures.  When I began drafting this post, my intention was to consider the question of how the different m...

Read more »