More Explorations with catR

December 1, 2013
By

(This article was first published on Econometrics by Simulation, and kindly contributed to R-bloggers)

# For the purposes of simulating computerized adaptive tests
# the R package catR is unparallelled.
 
# catR is an excellent tool for students who are curious about
# how a computerized adaptive test might work. It is also useful
# for testing companies that are interested in seeing how
# their choices of number of items, or model, stopping rule,
# or quite a few of the other options which are available
# when designing a specific computerized adaptive test.
 
# In this post I will explore some of the features of the
# function randomCAT, an extremely powerful function
# that simulates an entire response pattern for an individual.
 
# In a previous post I explore some of the other function 
# in catR in order to step by step demonstrate how to use 
# the package to simulate a test.
 
library("catR")
 
# First let's generate an item bank.
# Items specifies how many items to generate
 
# Model specifies which model to use in generating the items
# a,b,c Priors are specifying distributions to draw
# the parameters from for each item.
 
# The final set of arguments is for specifying
# what range of theta values the bank will initially
# draw item parameters for.  Theta values are the typical
# latent traits for which item response theory is concerned
# with estimating.
Bank <- createItemBank(items = 500, model = "3PL",
                       aPrior=c("norm",1,0.2),
                       bPrior=c("norm",0,1),
                       cPrior=c("unif",0,0.25),
                       thMin = -4, thMax = 4,
                       step = 0.05)
 
# We may want to examine the object we have created called "Bank"
attributes(Bank)
 
# Within the Bank object of class "itBank" there is three named
# attributes.
 
# itemPar lists the item parameters for those items which have been
# generated.  We could see a histogram of difficulty parameters (b) by
# targeting within the Bank object:
 
hist(Bank$itemPar[,2], breaks=30,
     main="Distribution of Item Difficulties",
     xlab="b parameter")


# We can also see how much information a particular item would add
# accross a range of ability levels.  This information is already
# available within the Bank object under the names infoTab and
# theta.
 
# Plot the first item's information
plot(rep(Bank$theta,1),Bank$infoTab[,1],
     type="l", main="Item 1's information",
     xlab="Ability (theta)", ylab="Information")
 
# Plot the first 3 items
# By specifying type = "n" this plot is left empty
nitems = 3
plot(rep(Bank$theta,nitems),Bank$infoTab[,1:nitems], type="n",
     main=paste0("First ",nitems," items' information"),
     xlab="Ability (theta)", ylab="Information")
# Now we plot the
for (i in 1:nitems) lines(Bank$theta,Bank$infoTab[,i],
                          col=grey(.8*i/nitems))


# We can see how different items can have information that
# spans different ability estimates as well as some items
# which just have more information than other items.
 
 
# Plotting all 500 items (same code as previously but now
# we specify the number of items as 500)
nitems = 500
plot(rep(Bank$theta,nitems),Bank$infoTab[,1:nitems], type="n",
    main=paste0("First ",nitems," items' information"),
    xlab="Ability (theta)", ylab="Information")
for (i in 1:nitems) lines(Bank$theta,Bank$infoTab[,i],
                          col=grey(.8*i/nitems))
# This plot may look nonsensical at first.  Be it actually
# provides some useful information.  From it you can see the
# maximum amount of information available for any one
# item at different levels of ability.  In the places where
# there is only one very tall item standing out we may be
# concerned about item exposure since subjects which seem to
# be in the area of that item are disproportionately more likely
# to get the same high info item than other other subjects
# in which the next highest item is very close in information
# to the max item.
 
# To see the max information for each ability we can add a line.
lines(Bank$theta,apply(Bank$infoTab, 1, max), col="blue", lwd=2)
 
# We might also be interested in seeing how much information
# on average a random item chosen from the bank would provide
# or in other words what is the expected information from a
# random item drawn from the bank at different ability levels.
lines(Bank$theta,apply(Bank$infoTab, 1, mean), col="red", lwd=2)
 
# Or perhaps we might want to see what the maximum average information
# for a 20 item test might be. So we calculate the average information
# for the top 20 items at different ability levels.
maxmean <- function(x, length=20) mean(sort(x, decreasing=T)[1:length])
maxmean(1:100) # Returns 90.5, seems to be working properly
 
lines(Bank$theta,apply(Bank$infoTab, 1, maxmean), col="orange", lwd=3)
 


# Now this last line is very interesting because it reflects
# per item the maximum amount of information this bank can provide
# given a fixed length of 20. Multiply this curve by 20 and it will give
# us the maximum information this bank can provide given a 20 item test
# and a subject's ability.
 
# This can really be thought of as a theoretical maximum for which
# any particular CAT test might attempt to meet but on average will
# always fall short.
 
# We can add a lengend
legend(-4.2, .55, c("max item info", "mean(info)",
                    "mean(top items)"),
       lty = 1, col = c("blue","red","orange"),  adj = c(0, 0.6))
 
 
 
library("reshape")
library("ggplot2")
 
# Let's seperate info tab
infoTab <- Bank$infoTab
 
# Let's add three columns to info tab for max, mean, and mean(top 20)
infoTab <- cbind(infoTab,
                 apply(Bank$infoTab, 1, max),
                 apply(Bank$infoTab, 1, mean),
                 apply(Bank$infoTab, 1, maxmean))
 
 
# Melt will turn the item information array into a long object
items.long <- melt(infoTab)
 
# Let's assign values to the first column which are thetas
items.long[,1] <- Bank$theta
 
# Now we are ready to name the different columns created by melt
names(items.long) <- c("theta", "item", "info")
 
itemtype <- factor("Item", c("Item","Max", "Mean", "Mean(Max)"))
items.long <- cbind(items.long, type=itemtype)
items.long[items.long$item==501,4] <- "Max"
items.long[items.long$item==502,4] <- "Mean"
items.long[items.long$item==503,4] <- "Mean(Max)"
 
# Now we are ready to start plotting
# Assign the data to a ggplot object
a <- ggplot(items.long, aes(x=theta, y=info, group=item))
 
# Plot a particular instance of the object
a + geom_line(colour = gray(.2)) +
    geom_line(aes(colour = type), size=2 ,
            subset = .(type %in% c("Max", "Mean", "Mean(Max)")))
 


# Now let's look at how the randomCAT function works.
# There are a number of arguments that the randomCAt function
# can take.  They can be defined as lists which are fed
# into the function.
 
# I will specify only that the stoping rule is 20 items.
# By specifying true Theta that is telling random CAT what the
# true ability level we are estimating.
res <- randomCAT(trueTheta = 3, itemBank = Bank,
                 test=list(method = "ML"),
                 stop = list(rule = "length", thr = 20))
# I specify test (theta estimator) as using ML because the
# default which is Bayesian model is strongly centrally
# biased in this case.
 
# Let's examine what elements are contained with the object "res"
attributes(res)
 
# We can see our example response pattern.
thetaEst <- c(0, res$thetaProv)
 
plot(1:21, thetaEst, type="n",
     xlab="Item Number",
     ylab="Ability Estimate",
     main="Sample Random Response Pattern")
# Add true ability line
  abline(h=3, col="red", lwd=2, lty=2)
# Add a line connecting responses
  lines(1:21, thetaEst, type="l", col=grey(.8))
# Add the response pattern to
  text(1:21, thetaEst, c(res$pattern, "X"))
# Add the legend
  legend(15,1,"True Ability", col="red", lty=2, lwd=2)
 


# Plot the sample item information from the set of items selected.
plot(rep(Bank$theta,20),Bank$infoTab[,res$testItems], type="n",
     main="High information items are often selected",
     xlab="Ability (theta)", ylab="Information")
for (i in 1:500) lines(Bank$theta,Bank$infoTab[,i], col=grey(.75))
# Now we plot the
for (i in res$testItems) lines(Bank$theta,Bank$infoTab[,i],
                               lwd=2, col=grey(.2))
 


# Now let's see how randomCat performs with a random draw
# of 150 people with different ability estimates.
 
npers <- 150   # Specify number of people to simulate
 
theta <- rnorm(npers)
# Draw a theta ability level vector
 
thetaest <- numeric(npers)
# Creates an empty vector of zeros to hold future estimates
# of theta
 
# Create an empty item object
items.used <- NULL
 
# Create an empty object to hold b values for items used
b.values <- NULL
 
 
for (i in 1:npers) {
  # Input the particular theta[i] ability for a particular run.
  res <- randomCAT(trueTheta = theta[i],
                   itemBank = Bank,
                   test=list(method = "ML"),
                   stop = list(rule = "length", thr = 20))
  # Save theta final estimates
  thetaest[i] <- res$thFinal
  # Save a list of items selected in each row of items.used
  items.used <- rbind(items.used, res$testItems)
  # Save a list of b values of items selected in each row
  b.values <- rbind(b.values, res$itemPar[,2])
 
}
 
# Let's see how our estimated theta's compare with our true
plot(theta, thetaest,
     main="Ability plotted against ability estimates",
     ylab="theta estimate") 
# To get a sense of how much exposure our items get 
itemTab <- table(items.used)
 
length(itemTab)
# We can see we only have 92 items used for all 150 subjects
# taking the cat exam.
 
mean(itemTab)
# On average each item used is exposed 32 times which means
mean(itemTab)/150
# over a 20% exposure rate on average in addition to some items
# have much higher exposure rates.
Created by Pretty R at inside-R.org

To leave a comment for the author, please follow the link and comment on his blog: Econometrics by Simulation.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.