# More Explorations with catR

December 1, 2013
By

(This article was first published on Econometrics by Simulation, and kindly contributed to R-bloggers)

`# For the purposes of simulating computerized adaptive tests# the R package catR is unparallelled. # catR is an excellent tool for students who are curious about# how a computerized adaptive test might work. It is also useful# for testing companies that are interested in seeing how# their choices of number of items, or model, stopping rule,# or quite a few of the other options which are available# when designing a specific computerized adaptive test. # In this post I will explore some of the features of the # function randomCAT, an extremely powerful function # that simulates an entire response pattern for an individual. # In a previous post I explore  some of the other function `
`# in catR in order to step by step demonstrate how to use `
`# the package to simulate a test. library("catR") # First let's generate an item bank. # Items specifies how many items to generate # Model specifies which model to use in generating the items# a,b,c Priors are specifying distributions to draw# the parameters from for each item. # The final set of arguments is for specifying# what range of theta values the bank will initially# draw item parameters for.  Theta values are the typical# latent traits for which item response theory is concerned# with estimating.Bank <- createItemBank(items = 500, model = "3PL",                        aPrior=c("norm",1,0.2),                        bPrior=c("norm",0,1),                        cPrior=c("unif",0,0.25),                       thMin = -4, thMax = 4,                       step = 0.05) # We may want to examine the object we have created called "Bank"attributes(Bank) # Within the Bank object of class "itBank" there is three named# attributes. # itemPar lists the item parameters for those items which have been# generated.  We could see a histogram of difficulty parameters (b) by# targeting within the Bank object: hist(Bank\$itemPar[,2], breaks=30,      main="Distribution of Item Difficulties",     xlab="b parameter")# We can also see how much information a particular item would add# accross a range of ability levels.  This information is already# available within the Bank object under the names infoTab and# theta. # Plot the first item's informationplot(rep(Bank\$theta,1),Bank\$infoTab[,1],      type="l", main="Item 1's information",      xlab="Ability (theta)", ylab="Information") # Plot the first 3 items# By specifying type = "n" this plot is left emptynitems = 3plot(rep(Bank\$theta,nitems),Bank\$infoTab[,1:nitems], type="n",      main=paste0("First ",nitems," items' information"),      xlab="Ability (theta)", ylab="Information")# Now we plot the for (i in 1:nitems) lines(Bank\$theta,Bank\$infoTab[,i],                           col=grey(.8*i/nitems))# We can see how different items can have information that # spans different ability estimates as well as some items# which just have more information than other items.  # Plotting all 500 items (same code as previously but now# we specify the number of items as 500)nitems = 500plot(rep(Bank\$theta,nitems),Bank\$infoTab[,1:nitems], type="n",     main=paste0("First ",nitems," items' information"),     xlab="Ability (theta)", ylab="Information")for (i in 1:nitems) lines(Bank\$theta,Bank\$infoTab[,i],                           col=grey(.8*i/nitems)) # This plot may look nonsensical at first.  Be it actually# provides some useful information.  From it you can see the# maximum amount of information available for any one# item at different levels of ability.  In the places where# there is only one very tall item standing out we may be # concerned about item exposure since subjects which seem to# be in the area of that item are disproportionately more likely# to get the same high info item than other other subjects# in which the next highest item is very close in information# to the max item. # To see the max information for each ability we can add a line.lines(Bank\$theta,apply(Bank\$infoTab, 1, max), col="blue", lwd=2) # We might also be interested in seeing how much information# on average a random item chosen from the bank would provide# or in other words what is the expected information from a# random item drawn from the bank at different ability levels.lines(Bank\$theta,apply(Bank\$infoTab, 1, mean), col="red", lwd=2) # Or perhaps we might want to see what the maximum average information# for a 20 item test might be. So we calculate the average information# for the top 20 items at different ability levels.maxmean <- function(x, length=20) mean(sort(x, decreasing=T)[1:length])maxmean(1:100) # Returns 90.5, seems to be working properly lines(Bank\$theta,apply(Bank\$infoTab, 1, maxmean), col="orange", lwd=3) # Now this last line is very interesting because it reflects# per item the maximum amount of information this bank can provide# given a fixed length of 20. Multiply this curve by 20 and it will give# us the maximum information this bank can provide given a 20 item test# and a subject's ability. # This can really be thought of as a theoretical maximum for which# any particular CAT test might attempt to meet but on average will# always fall short. # We can add a lengendlegend(-4.2, .55, c("max item info", "mean(info)",                     "mean(top items)"),       lty = 1, col = c("blue","red","orange"),  adj = c(0, 0.6))   library("reshape")library("ggplot2") # Let's seperate info tabinfoTab <- Bank\$infoTab # Let's add three columns to info tab for max, mean, and mean(top 20)infoTab <- cbind(infoTab,                  apply(Bank\$infoTab, 1, max),                 apply(Bank\$infoTab, 1, mean),                 apply(Bank\$infoTab, 1, maxmean))  # Melt will turn the item information array into a long objectitems.long <- melt(infoTab) # Let's assign values to the first column which are thetasitems.long[,1] <- Bank\$theta # Now we are ready to name the different columns created by meltnames(items.long) <- c("theta", "item", "info") itemtype <- factor("Item", c("Item","Max", "Mean", "Mean(Max)"))items.long <- cbind(items.long, type=itemtype)items.long[items.long\$item==501,4] <- "Max"items.long[items.long\$item==502,4] <- "Mean"items.long[items.long\$item==503,4] <- "Mean(Max)" # Now we are ready to start plotting# Assign the data to a ggplot objecta <- ggplot(items.long, aes(x=theta, y=info, group=item)) # Plot a particular instance of the objecta + geom_line(colour = gray(.2)) +    geom_line(aes(colour = type), size=2 ,             subset = .(type %in% c("Max", "Mean", "Mean(Max)"))) # Now let's look at how the randomCAT function works.# There are a number of arguments that the randomCAt function# can take.  They can be defined as lists which are fed# into the function. # I will specify only that the stoping rule is 20 items.# By specifying true Theta that is telling random CAT what the# true ability level we are estimating.res <- randomCAT(trueTheta = 3, itemBank = Bank,                  test=list(method = "ML"),                 stop = list(rule = "length", thr = 20))# I specify test (theta estimator) as using ML because the# default which is Bayesian model is strongly centrally# biased in this case. # Let's examine what elements are contained with the object "res"attributes(res) # We can see our example response pattern.thetaEst <- c(0, res\$thetaProv) plot(1:21, thetaEst, type="n",     xlab="Item Number",     ylab="Ability Estimate",      main="Sample Random Response Pattern")# Add true ability line  abline(h=3, col="red", lwd=2, lty=2)# Add a line connecting responses  lines(1:21, thetaEst, type="l", col=grey(.8))# Add the response pattern to   text(1:21, thetaEst, c(res\$pattern, "X"))# Add the legend  legend(15,1,"True Ability", col="red", lty=2, lwd=2) # Plot the sample item information from the set of items selected.plot(rep(Bank\$theta,20),Bank\$infoTab[,res\$testItems], type="n",      main="High information items are often selected",      xlab="Ability (theta)", ylab="Information")for (i in 1:500) lines(Bank\$theta,Bank\$infoTab[,i], col=grey(.75)) # Now we plot the for (i in res\$testItems) lines(Bank\$theta,Bank\$infoTab[,i],                                lwd=2, col=grey(.2)) # Now let's see how randomCat performs with a random draw# of 150 people with different ability estimates. npers <- 150   # Specify number of people to simulate theta <- rnorm(npers)# Draw a theta ability level vector thetaest <- numeric(npers)# Creates an empty vector of zeros to hold future estimates# of theta # Create an empty item objectitems.used <- NULL # Create an empty object to hold b values for items usedb.values <- NULL  for (i in 1:npers) {  # Input the particular theta[i] ability for a particular run.  res <- randomCAT(trueTheta = theta[i],                    itemBank = Bank,                    test=list(method = "ML"),                   stop = list(rule = "length", thr = 20))  # Save theta final estimates  thetaest[i] <- res\$thFinal  # Save a list of items selected in each row of items.used  items.used <- rbind(items.used, res\$testItems)  # Save a list of b values of items selected in each row  b.values <- rbind(b.values, res\$itemPar[,2]) } # Let's see how our estimated theta's compare with our trueplot(theta, thetaest,      main="Ability plotted against ability estimates",     ylab="theta estimate") `
`# To get a sense of how much exposure our items get itemTab <- table(items.used) length(itemTab)# We can see we only have 92 items used for all 150 subjects# taking the cat exam. mean(itemTab)# On average each item used is exposed 32 times which meansmean(itemTab)/150# over a 20% exposure rate on average in addition to some items# have much higher exposure rates.`

Created by Pretty R at inside-R.org

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...