# More Explorations with catR

[This article was first published on

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

**Econometrics by Simulation**, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

# For the purposes of simulating computerized adaptive tests

# the R package catR is unparallelled.

# catR is an excellent tool for students who are curious about

# how a computerized adaptive test might work. It is also useful

# for testing companies that are interested in seeing how

# their choices of number of items, or model, stopping rule,

# or quite a few of the other options which are available

# when designing a specific computerized adaptive test.

# In this post I will explore some of the features of the

# function randomCAT, an extremely powerful function

# that simulates an entire response pattern for an individual.

# In a previous post I explore some of the other function

# in catR in order to step by step demonstrate how to use

# the package to simulate a test.

library("catR")

# First let's generate an item bank.

# Items specifies how many items to generate

# Model specifies which model to use in generating the items

# a,b,c Priors are specifying distributions to draw

# the parameters from for each item.

# The final set of arguments is for specifying

# what range of theta values the bank will initially

# draw item parameters for. Theta values are the typical

# latent traits for which item response theory is concerned

# with estimating.

Bank <- createItemBank(items = 500, model = "3PL",

aPrior=c("norm",1,0.2),

bPrior=c("norm",0,1),

cPrior=c("unif",0,0.25),

thMin = -4, thMax = 4,

step = 0.05)

# We may want to examine the object we have created called "Bank"

attributes(Bank)

# Within the Bank object of class "itBank" there is three named

# attributes.

# itemPar lists the item parameters for those items which have been

# generated. We could see a histogram of difficulty parameters (b) by

# targeting within the Bank object:

hist(Bank$itemPar[,2], breaks=30,

main="Distribution of Item Difficulties",

xlab="b parameter")

# We can also see how much information a particular item would add

# accross a range of ability levels. This information is already

# available within the Bank object under the names infoTab and

# theta.

# Plot the first item's information

plot(rep(Bank$theta,1),Bank$infoTab[,1],

type="l", main="Item 1's information",

xlab="Ability (theta)", ylab="Information")

# Plot the first 3 items

# By specifying type = "n" this plot is left empty

nitems = 3

plot(rep(Bank$theta,nitems),Bank$infoTab[,1:nitems], type="n",

main=paste0("First ",nitems," items' information"),

xlab="Ability (theta)", ylab="Information")

# Now we plot the

for (i in 1:nitems) lines(Bank$theta,Bank$infoTab[,i],

col=grey(.8*i/nitems))

# We can see how different items can have information that

# spans different ability estimates as well as some items

# which just have more information than other items.

# Plotting all 500 items (same code as previously but now

# we specify the number of items as 500)

nitems = 500

plot(rep(Bank$theta,nitems),Bank$infoTab[,1:nitems], type="n",

main=paste0("First ",nitems," items' information"),

xlab="Ability (theta)", ylab="Information")

for (i in 1:nitems) lines(Bank$theta,Bank$infoTab[,i],

col=grey(.8*i/nitems))

# This plot may look nonsensical at first. Be it actually

# provides some useful information. From it you can see the

# maximum amount of information available for any one

# item at different levels of ability. In the places where

# there is only one very tall item standing out we may be

# concerned about item exposure since subjects which seem to

# be in the area of that item are disproportionately more likely

# to get the same high info item than other other subjects

# in which the next highest item is very close in information

# to the max item.

# To see the max information for each ability we can add a line.

lines(Bank$theta,apply(Bank$infoTab, 1, max), col="blue", lwd=2)

# We might also be interested in seeing how much information

# on average a random item chosen from the bank would provide

# or in other words what is the expected information from a

# random item drawn from the bank at different ability levels.

lines(Bank$theta,apply(Bank$infoTab, 1, mean), col="red", lwd=2)

# Or perhaps we might want to see what the maximum average information

# for a 20 item test might be. So we calculate the average information

# for the top 20 items at different ability levels.

maxmean <- function(x, length=20) mean(sort(x, decreasing=T)[1:length])

maxmean(1:100) # Returns 90.5, seems to be working properly

lines(Bank$theta,apply(Bank$infoTab, 1, maxmean), col="orange", lwd=3)

# Now this last line is very interesting because it reflects

# per item the maximum amount of information this bank can provide

# given a fixed length of 20. Multiply this curve by 20 and it will give

# us the maximum information this bank can provide given a 20 item test

# and a subject's ability.

# This can really be thought of as a theoretical maximum for which

# any particular CAT test might attempt to meet but on average will

# always fall short.

# We can add a lengend

legend(-4.2, .55, c("max item info", "mean(info)",

"mean(top items)"),

lty = 1, col = c("blue","red","orange"), adj = c(0, 0.6))

library("reshape")

library("ggplot2")

# Let's seperate info tab

infoTab <- Bank$infoTab

# Let's add three columns to info tab for max, mean, and mean(top 20)

infoTab <- cbind(infoTab,

apply(Bank$infoTab, 1, max),

apply(Bank$infoTab, 1, mean),

apply(Bank$infoTab, 1, maxmean))

# Melt will turn the item information array into a long object

items.long <- melt(infoTab)

# Let's assign values to the first column which are thetas

items.long[,1] <- Bank$theta

# Now we are ready to name the different columns created by melt

names(items.long) <- c("theta", "item", "info")

itemtype <- factor("Item", c("Item","Max", "Mean", "Mean(Max)"))

items.long <- cbind(items.long, type=itemtype)

items.long[items.long$item==501,4] <- "Max"

items.long[items.long$item==502,4] <- "Mean"

items.long[items.long$item==503,4] <- "Mean(Max)"

# Now we are ready to start plotting

# Assign the data to a ggplot object

a <- ggplot(items.long, aes(x=theta, y=info, group=item))

# Plot a particular instance of the object

a + geom_line(colour = gray(.2)) +

geom_line(aes(colour = type), size=2 ,

subset = .(type %in% c("Max", "Mean", "Mean(Max)")))

# Now let's look at how the randomCAT function works.

# There are a number of arguments that the randomCAt function

# can take. They can be defined as lists which are fed

# into the function.

# I will specify only that the stoping rule is 20 items.

# By specifying true Theta that is telling random CAT what the

# true ability level we are estimating.

res <- randomCAT(trueTheta = 3, itemBank = Bank,

test=list(method = "ML"),

stop = list(rule = "length", thr = 20))

# I specify test (theta estimator) as using ML because the

# default which is Bayesian model is strongly centrally

# biased in this case.

# Let's examine what elements are contained with the object "res"

attributes(res)

# We can see our example response pattern.

thetaEst <- c(0, res$thetaProv)

plot(1:21, thetaEst, type="n",

xlab="Item Number",

ylab="Ability Estimate",

main="Sample Random Response Pattern")

# Add true ability line

abline(h=3, col="red", lwd=2, lty=2)

# Add a line connecting responses

lines(1:21, thetaEst, type="l", col=grey(.8))

# Add the response pattern to

text(1:21, thetaEst, c(res$pattern, "X"))

# Add the legend

legend(15,1,"True Ability", col="red", lty=2, lwd=2)

# Plot the sample item information from the set of items selected.

plot(rep(Bank$theta,20),Bank$infoTab[,res$testItems], type="n",

main="High information items are often selected",

xlab="Ability (theta)", ylab="Information")

for (i in 1:500) lines(Bank$theta,Bank$infoTab[,i], col=grey(.75))

# Now we plot the

for (i in res$testItems) lines(Bank$theta,Bank$infoTab[,i],

lwd=2, col=grey(.2))

# Now let's see how randomCat performs with a random draw

# of 150 people with different ability estimates.

npers <- 150 # Specify number of people to simulate

theta <- rnorm(npers)

# Draw a theta ability level vector

thetaest <- numeric(npers)

# Creates an empty vector of zeros to hold future estimates

# of theta

# Create an empty item object

items.used <- NULL

# Create an empty object to hold b values for items used

b.values <- NULL

for (i in 1:npers) {

# Input the particular theta[i] ability for a particular run.

res <- randomCAT(trueTheta = theta[i],

itemBank = Bank,

test=list(method = "ML"),

stop = list(rule = "length", thr = 20))

# Save theta final estimates

thetaest[i] <- res$thFinal

# Save a list of items selected in each row of items.used

items.used <- rbind(items.used, res$testItems)

# Save a list of b values of items selected in each row

b.values <- rbind(b.values, res$itemPar[,2])

}

# Let's see how our estimated theta's compare with our true

plot(theta, thetaest,

main="Ability plotted against ability estimates",

ylab="theta estimate")

# To get a sense of how much exposure our items get

itemTab <- table(items.used)

length(itemTab)

# We can see we only have 92 items used for all 150 subjects

# taking the cat exam.

mean(itemTab)

# On average each item used is exposed 32 times which means

mean(itemTab)/150

# over a 20% exposure rate on average in addition to some items

# have much higher exposure rates.

To

**leave a comment**for the author, please follow the link and comment on their blog:**Econometrics by Simulation**.R-bloggers.com offers

**daily e-mail updates**about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.