Articles by Ken Kleinman

Example 8.7: Hosmer and Lemeshow goodness-of-fit

September 28, 2010 | Ken Kleinman

The Hosmer and Lemeshow goodness of fit (GOF) test is a way to assess whether there is evidence for lack of fit in a logistic regression model. Simply put, the test compares the expected and observed number of events in bins defined by the predicted p...

Example 8.6: Changing the reference category for categorical variables

September 21, 2010 | Ken Kleinman

How can we change the reference category for a categorical variable? This question comes up often in a consulting practice.When including categorical covariates in regression models, there is a question of how to incorporate the categories. One simpl...

Example 8.5: bubble plots part 3

September 14, 2010 | Ken Kleinman

An anonymous commenter expressed a desire to see how one might use SAS to draw a bubble plot with bubbles in three colors, corresponding to a fourth variable in the data set. (x, y, z for bubble size, and the category variable.) In a previous entries...

Example 8.4: Including subsetting conditions in output

September 7, 2010 | Ken Kleinman

A number of analyses perform operations on subsets. Making it clear what observations have been excluded or included is helpful to include in the output.SASThe where statement (section A.6.3) is a powerful and useful tool for subsetting on the fly. (...

Summer hiatus

August 2, 2010 | Ken Kleinman

We're taking a break from posting for most of August. We'll be back in a month with new examples, including R- and SAS-applicable tricks and tools.Please drop us any ideas in the comments or by e-mail. We love feedback of any kind.

Example 8.2: Digits of Pi, redux

July 12, 2010 | Ken Kleinman

In example 8.1, we considered some simple tests for the randomness of the digits of Pi. Here we develop a different test and implement it. If each digit appears in each place with equal and independent probability, then the places between recurrences...

Example 8.1: Digits of Pi

July 6, 2010 | Ken Kleinman

Do the digits of Pi appear in a random order? If so, the trillions of digits of Pi calculated can serve as a useful random number generator. This post was inspired by this entry on Matt Asher's blog. Generating pseudo-random numbers is a key piece o...

Example 7.42: Testing the proportionality assumption

June 21, 2010 | Ken Kleinman

In addition to the non-parametric tools discussed in recent entries, it's common to use proportional hazards regression, (section 4.3.1) also called Cox regression, in evaluating survival data.It's important in such models to test the proportionality a...

Example 7.36: Propensity score stratification

May 10, 2010 | Ken Kleinman

In examples 7.34 and 7.35 we described methods using propensity scores to account for possible confounding factors in an observational study.In addition to adjusting for the propensity score in a multiple regression and matching on the propensity score...

Example 7.35: Propensity score matching

May 3, 2010 | Ken Kleinman

As discussed in example 7.34, it's sometimes preferable to match on propensity scores, rather than adjust for them as a covariate.SASWe use a suite of macros written by Jon Kosanke and Erik Bergstralh at the Mayo Clinic. The dist macro calculates the ...

Example 7.34: Propensity scores and causal inference from observational studies

April 26, 2010 | Ken Kleinman

Propensity scores can be used to help make causal interpretation of observational data more plausible, by adjusting for other factors that may responsible for differences between groups. Heuristically, we estimate the probability of exposure, rather t...

Example 7.33: Specifying fonts in graphics

April 19, 2010 | Ken Kleinman

For interactive data analysis, the default fonts used by SAS and R are acceptable, if not beautiful. However, for publication, it may be important to manipulate the fonts. For example, it would be desirable for the fonts in legends, axis labels, or o...

Example 7.31: Contour plot of BMI by weight and height

April 5, 2010 | Ken Kleinman

A contour plot is a simple way to plot a surface in two dimensions. Lines with a constant Z value are plotted on the X-Y plane.Typical uses include weather maps displaying "isobars" (lines of constant pressure), and maps displaying lines of constant e...

Example 7.30: Simulate censored survival data

March 30, 2010 | Ken Kleinman

To simulate survival data with censoring, we need to model the hazard functions for both time to event and time to censoring. We simulate both event times from a Weibull distribution with a scale parameter of 1 (this is equivalent to an exponential ra...

Example 7.29: Bubble plots colored by a fourth variable

March 27, 2010 | Ken Kleinman

In Example 7.28, we generated a bubble plot showing the relationship among CESD, age, and number of drinks, for women. An anonymous commenter asked whether it would be possible to color the circles according to gender. In the comments, we showed simp...

Example 7.28: Bubble plots

March 22, 2010 | Ken Kleinman

A bubble plot is a means of displaying 3 variables in a scatterplot. The z dimension is presented in the size of the plot symbol, typically a circle. The area or radius of the circle plotted is proportional to the value of the third variable. This c...

Example 7.27: probability question reconsidered

March 15, 2010 | Ken Kleinman

In Example 7.26, we considered a problem, from the xkcd blog:Suppose I choose two (different) real numbers, by any process I choose. Then I select one at random (p= .5) to show Nick. Nick must guess whether the other is smaller or larger. Being righ...

Example 7.26: probability question

March 8, 2010 | Ken Kleinman

Here's a surprising problem, from the xkcd blog.Suppose I choose two (different) real numbers, by any process I choose. Then I select one at random (p= .5) to show Nick. Nick must guess whether the other is smaller or larger. Being right 50% of the ...

Example 7.25: compare draws with distribution

March 5, 2010 | Ken Kleinman

In example 7.24, we demonstrated a Metropolis-Hastings algorithm for generating observations from awkward distributions. In such settings it is desirable to assess the quality of draws by comparing them with the target distribution.Recall that the dis...

Example 7.23: the Monty Hall problem

January 20, 2010 | Ken Kleinman

The Monty Hall problem illustrates a simple setting where intuition often leads to a solution different from formal reasoning. The situation is based on the game show Let's Make a Deal. First, Monty puts a prize behind one of three doors. Then the player chooses a door. Next, (without moving ...

« 1 … 3 4 5 6 »

Copyright © 2025 | MH Corporate basic by MH Themes