Post 4: Sampling the person ability parameters

[This article was first published on Markov Chain Monte Carlo in Item Response Models, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The previous post outlined the general strategy of writing a MH within Gibbs sampler by breaking the code into two levels: a high level shell and a series of lower-level samplers which do the actual work.

This post discusses the first of the lower-level samplers, the sampler for the person ability parameters. The chapter gives a complete treatment of how to write the sampler, please read it for the details. Here, we summarize the development of the sampler, check that the sampler code is correct, and give two examples of how bugs might be detected. The final bug will motivate the next post where we will refactor this specific person ability sampler into a general MH sampler.

The person ability sampler

The person ability sampler will use the Metropolis-Hastings (MH) algorithm to sample from the complete conditional. See the chapter for details on how (and why!) to write the code presented in this section.

In a previous post we derived the complete conditional as:

begin{equation}
f(theta_p|text{rest}) propto
prod_{i=1}^I
pi_{pi}^{u_{pi}} (1 – pi_{pi})^{1-u_{pi}}
f_text{Normal}(theta_p| 0,sigma_theta^2) label{eq:thetacc}
end{equation}
where pi_{pi} is the probability of person p answering item i correctly. pi_{pi} depends on both the person and item parameters, and as is defined by
begin{equation*}
ln{frac{pi_{pi}}{1+pi_{pi}}} = a_i ( theta_p – b_i) .
end{equation*}

The likelihood term in Equation ref{eq:thetacc} left( pi_{pi}^{u_{pi}} (1 - pi_{pi})^{1-u_{pi}} right) must be calculated twice for each iteration of the MH algorithm: first for the proposed values of boldsymbol theta and second for the “old” values of boldsymbol theta. To minimized code duplication, we define the log.prob function to calculate it on the log scale:

## log.prob
##
## Output : A matrix such that the
##          column sums are item log-likelihood values, and
##          row sums are person log-likelihood values.
log.prob 

The chapter gives details on how to implement the MH sampler. A more highly commented version of the code is given here:

sample.th 

Please see the chapter for additional details.

Running the sampler

We can run the sampler as follows:

## Load the necessary code from this and previous posts
source("http://mcmcinirt.stat.cmu.edu/setup/post-4.R")

## Set the seed to keep results reproducible 
set.seed(314159)

## Run the sampler with overdispersed theta.abl
## values and the true, simulated values for the other
## parameters. 
run.B 

Note that the acceptance rate reported by the sampler is about 48%, which is near the optimal rate. I achieved this by changing the value for MH.th until it was “just right”. The tuning runs are not shown (see Post 9 for how to tune the sampler).

Testing the sampler

Using the coda library and the get.2pl.params function from Post 3 (both of which were loaded by the source("http://mcmcinirt.stat.cmu.edu/setup/post-4.R") line in the code above), we can quickly visualize the trace plots for theta_1 through theta_9:
run.B.mcmc.theta.1-9
using the default plot function:

## N.B. ylim specifies the scale on the y-axis. 
plot( run.B.mcmc[, get.2pl.params(1:9,NULL,NULL,NULL)],
      density=FALSE,
      ylim=c(-5,5), smooth=TRUE )

From the trace plots, it appears that each chain quickly converges to the mode from its starting value. We can test that the chains are getting the right answers by calculating the average values of the chains (Expected A Posteriori estimates; EAP estimates),

theta.eap.B 

and graphically comparing them with the simulated true values:
run.B.mcmc.theta.check

To generate the graph, we define a special visualization function

## We'll be doing these checks a lot, so let's make a function
check.sampler.graph 

and call it for this specific run of the sampler:

## Run the function for this example: 
check.sampler.graph(theta.abl, theta.eap.B, 'Person Ability',
                    'EAP estimates', 'blue')
## Parameter recovery looks a little biased
## Empirical CDF looks very close 

Both the parameter recovery scatter plot and the empirical CDF plot suggest that the sampler is properly recovering the simulated values. This is good news; it suggests that we implemented the sampler correctly.

What if there is a bug in the code?

Bug 1: Good sampler properties, but wrong answers

In my experience, a common bug in writing a sampler is not calculating the log-likelihood correctly. Here we introduce a bug in the calculation of the log-likelihood to see what happens with the sampler.

## Introduce a bug into the calculation of the log-likelihood
log.prob.buggy 

Now we swap in the buggy log.prob function, make a buggy run of the sampler, and the swap out the buggy log.prob function for the correct one:

## Set the seed to keep results reproducible 
set.seed(314159)

## Swap in the buggy log likelihood
log.prob.correct 

Now, the first nine person ability parameters still converge quickly to their mode:

run.B.buggy.mcmc.theta.1-9
plot( run.B.buggy.mcmc[, get.2pl.params(1:9,NULL,NULL,NULL)],
      density=FALSE,
      ylim=c(-5,5), smooth=TRUE )

However, those modes do not correspond to the true values:

run.B.buggy.mcmc.theta.check
## Calculate buggy EAP estimates
theta.eap.B.buggy 

Lesson learned from bug #1

Sometimes the sampler will converge well to a global mode, but that mode will be the wrong answer because there is an error in the way that the likelihood (or prior) is being calculated. Only a check with simulated data — where the true values are known — will be able to spot this bug.

Bug 2: Terrible sampler properties, but correct answers

In my experience, a common bug in writing a sampler is making a small error in how the Metropolis-Hastings algorithm is implemented. Here we introduce a bug in the calculation of the acceptance probability:

sample.th.buggy 

Since the bug involves the acceptance rate, we must adjust the variance of the Metropolis-Hastings proposal to get reasonable acceptance rates:

## Set the seed to keep results reproducible 
set.seed(314159)
 
## Swap in 
sample.th.correct 

Again, the first nine person ability parameters still converge quickly to their modes:

run.B.mcmc.theta.buggy.th.1-9
plot( run.B.buggy.th.mcmc[, get.2pl.params(1:9,NULL,NULL,NULL)],
      density=FALSE, ylim=c(-5,5), smooth=TRUE )

And the chains even recover the true values:

run.B.buggy.th.mcmc.theta.check
## Calculate buggy EAP estimates
theta.eap.B.buggy.th 

So, if the chains converge quickly to their modes and recover their true values, in what sense is this a bug?

This bug is about efficiency. Note that the buggy chain has much higher auto-correlation than the correct chain, even though both acceptance rates are around 0.44.

run.B.mcmc.theta.buggy.th.acf
par(mfrow=c(1,2))
   acf( run.B.mcmc[, 'theta.abl 1'],
        main="ACF: correct sampler", col='blue')
   acf( run.B.buggy.th.mcmc[, 'theta.abl 1'],
        main="ACF: buggy sampler", col='purple')
par(mfrow=c(1,1))

Since the autocorrelation is so high in the buggy sampler, there is less information about the posterior distribution in the chain. One metric that is both simple to calculate and to interpret is the effective sample size. This is (roughly) the number of draws from an independent sampler that would have the same amount of information as the (in this case, 1,000) draws from the correlated sampler. In R, we can calculate an effective sample size for each theta_p and examine boxplots as follows:

run.B.mcmc.theta.buggy.th.viz.ess
## Effective sample size of the correct sampler
ess.run.B 

Note that the 1,000 draws from the correct sampler is worth about 200 draws from a hypothetical independent sampler, while the 1,000 draws from the buggy sampler is worth only about 50. Therefore, this bug does affect our ability to draw inferences — mainly because our standard error of estimation will be quite a bit worse.

Lesson learned from bug #2

Sometimes a sampler will appear to recover the true values, but the sampler will be very inefficient. This could be due to a mistake in the algorithm. Unfortunately, detecting these bugs is quite difficult.

In the next post, we give a partial solution to avoiding such bugs.

To leave a comment for the author, please follow the link and comment on their blog: Markov Chain Monte Carlo in Item Response Models.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)