Item Equating with same Group – SAT, ACT example

[This article was first published on Econometrics by Simulation, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

# Item equating is the practice of making the results from two 
# different assessments equivalent.  This can be done by either
# 1. having the same group take both assessments
# 2. having equivalent groups take the different assessments
# 3. having non-equivalent groups which use common items take the different
#    assessments.
# In this post I will cover topic 1.

# For this code I will use the catR package to generate my assessments 
# and responses.
# Let's attempt the first proceedure:
# First let's generate our item parameters for assessment 1.
# Let's create an item bank with 100 items for the assessments
nitems <- 100
bank1 <- createItemBank(model="2PL", items=nitems)$itemPar
bank2 <- createItemBank(model="2PL", items=nitems)$itemPar
# Now let's generate a 1000 person population sample to take our assessment
npeep <- 1000
theta <- rnorm(npeep)
# Calculate the score on both assessments
resp1 <- resp2 <- matrix(0, nrow=npeep, ncol=nitems)
for (i in 1:npeep) {
  resp1[i,Pi(theta[i],bank1)$Pi<runif(nitems)] <- 1 # Test 1
  resp2[i,Pi(theta[i],bank2)$Pi<runif(nitems)] <- 1 # Test 2
# To calculate total score on the tests we can sum the results of each row
score1 <- apply(resp1, 1, sum)
score2 <- apply(resp2, 1, sum)
# Since we know both forms of the test are parrellel we can check the correlation
# between scores on the different forms of the test as a measure of reliability.
# This gives me a correlation estimate of .95 which is very good.
# However, we are not interested in reliability right now.  We would like to 
# equate the two tests using the information thus far garnished.
# First let's estimate our parameters.  
# I will use the ltm command in the package ltm for this.
est2pl1 <- ltm(resp1~z1)
est2pl2 <- ltm(resp2~z1)
# Now we have two tests with the different items.
# We want to make sure the items are on the same scale
# so that it does not matter which test individuals take.
# Their expected score will be the same.
# Because we simulated the generation of both tests which are very lengthy
# the tests are pretty much already equated by design which is frankly
# much easier to do by simulation and perhaps impossible to do with
# real assessments.
# Nevertheless, let's act as if our tests were not already parrellel and
# equate them.
# Let's first do some linear equating which is done by setting the standardized
# scores of the two exames equal to each other. (page 33)
# See
# X1 and X2 refers to total scores for each individual on exams 1 and 2
# (X1 - mean(X1))/Sd(X1) = (X2 - mean(X2))/Sd(X2)
# X1 = Sd(X1)/Sd(X2)*X2 + (mean(X1)-Sd(X1)/Sd(X2)*mean(X2)) = A*X2 + B
# Where A=Sd(x1)/Sd(x2) and B=mean(X1)-A*mean(X2)
(A <- sd(score1)/sd(score2))
# A being close to 1 indicates that the tests are scaled similary
(B <- mean(score1)-A*mean(score2))
# B being close to 0 indicates the tests are of similar difficulty
# Now let's calculate what score2 would be if scaled on assessment 1.
score2scaled <- A*score2 + B
# We can see that score2scaled looks slightly closer to score1.
# Let's try the same thing with slightly more interesting.
# Let's imagine that score1 is for ACT from 0 to 32 and score 2 is from SAT 200 to 800
# The standard deviation is 6 and average score around 20 for the ACT
ACT <- (score1-mean(score1))/sd(score1)*6 + 20
ACT[ACT>32] <- 32
ACT[ACT<4]  <- 4
ACT <- round(ACT) # ACT rounds to whole numbers
# The standard deviation is 100 and average score around 500 for the ACT
SAT <- (score2-mean(score2))/sd(score2)*100 + 500
SAT[SAT>800] <- 800
SAT[SAT<200]  <- 200
SAT <- round(SAT/10)*10 # SAT rounds to nearest 10
# Now let's see if we can transform our SAT scores to be on our ACT scale
(A <- sd(ACT)/sd(SAT))
(B <- mean(ACT)-A*mean(SAT))
SATscaled <- A*SAT + B
# The results from taking parrellel tests should fall on a linear form
plot(ACT,SATscaled, main="SAT results placed on ACT scale")
# This is the easiest method of equating two tests.  
# However, it is not usually the most practical since it is costly to get the same
# group of individuals to take two different tests.  In addition, there
# may be issues with fatigue which could be alleviated somewhat if for half of the
# group the first assessment was given first and for a different half 
# the second assessment assessment was given first.
Formatted by Pretty R at

To leave a comment for the author, please follow the link and comment on their blog: Econometrics by Simulation. offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)