Item Equating with same Group – SAT, ACT example

August 5, 2013

(This article was first published on Econometrics by Simulation, and kindly contributed to R-bloggers)

# Item equating is the practice of making the results from two 
# different assessments equivalent. This can be done by either
# 1. having the same group take both assessments
# 2. having equivalent groups take the different assessments
# 3. having non-equivalent groups which use common items take the different
# assessments.
# In this post I will cover topic 1.

# For this code I will use the catR package to generate my assessments
# and responses.
# Let's attempt the first proceedure:
# First let's generate our item parameters for assessment 1.
# Let's create an item bank with 100 items for the assessments
nitems <- 100
bank1 <- createItemBank(model="2PL", items=nitems)$itemPar
bank2 <- createItemBank(model="2PL", items=nitems)$itemPar
# Now let's generate a 1000 person population sample to take our assessment
npeep <- 1000
theta <- rnorm(npeep)
# Calculate the score on both assessments
resp1 <- resp2 <- matrix(0, nrow=npeep, ncol=nitems)
for (i in 1:npeep) {
resp1[i,Pi(theta[i],bank1)$Pi(nitems)] <- 1 # Test 1
resp2[i,Pi(theta[i],bank2)$Pi(nitems)] <- 1 # Test 2
# To calculate total score on the tests we can sum the results of each row
score1 <- apply(resp1, 1, sum)
score2 <- apply(resp2, 1, sum)
# Since we know both forms of the test are parrellel we can check the correlation
# between scores on the different forms of the test as a measure of reliability.
# This gives me a correlation estimate of .95 which is very good.
# However, we are not interested in reliability right now. We would like to
# equate the two tests using the information thus far garnished.
# First let's estimate our parameters.
# I will use the ltm command in the package ltm for this.
est2pl1 <- ltm(resp1~z1)
est2pl2 <- ltm(resp2~z1)
# Now we have two tests with the different items.
# We want to make sure the items are on the same scale
# so that it does not matter which test individuals take.
# Their expected score will be the same.
# Because we simulated the generation of both tests which are very lengthy
# the tests are pretty much already equated by design which is frankly
# much easier to do by simulation and perhaps impossible to do with
# real assessments.
# Nevertheless, let's act as if our tests were not already parrellel and
# equate them.
# Let's first do some linear equating which is done by setting the standardized
# scores of the two exames equal to each other. (page 33)
# See
# X1 and X2 refers to total scores for each individual on exams 1 and 2
# (X1 - mean(X1))/Sd(X1) = (X2 - mean(X2))/Sd(X2)
# X1 = Sd(X1)/Sd(X2)*X2 + (mean(X1)-Sd(X1)/Sd(X2)*mean(X2)) = A*X2 + B
# Where A=Sd(x1)/Sd(x2) and B=mean(X1)-A*mean(X2)
(A <- sd(score1)/sd(score2))
# A being close to 1 indicates that the tests are scaled similary
(B <- mean(score1)-A*mean(score2))
# B being close to 0 indicates the tests are of similar difficulty
# Now let's calculate what score2 would be if scaled on assessment 1.
score2scaled <- A*score2 + B
# We can see that score2scaled looks slightly closer to score1.
# Let's try the same thing with slightly more interesting.
# Let's imagine that score1 is for ACT from 0 to 32 and score 2 is from SAT 200 to 800
# The standard deviation is 6 and average score around 20 for the ACT
ACT <- (score1-mean(score1))/sd(score1)*6 + 20
ACT[ACT>32] <- 32
ACT[ACT<4] <- 4
ACT <- round(ACT) # ACT rounds to whole numbers
# The standard deviation is 100 and average score around 500 for the ACT
SAT <- (score2-mean(score2))/sd(score2)*100 + 500
SAT[SAT>800] <- 800
SAT[SAT<200] <- 200
SAT <- round(SAT/10)*10 # SAT rounds to nearest 10
# Now let's see if we can transform our SAT scores to be on our ACT scale
(A <- sd(ACT)/sd(SAT))
(B <- mean(ACT)-A*mean(SAT))
SATscaled <- A*SAT + B
# The results from taking parrellel tests should fall on a linear form
plot(ACT,SATscaled, main="SAT results placed on ACT scale")

# This is the easiest method of equating two tests.  
# However, it is not usually the most practical since it is costly to get the same
# group of individuals to take two different tests. In addition, there
# may be issues with fatigue which could be alleviated somewhat if for half of the
# group the first assessment was given first and for a different half 
# the second assessment assessment was given first.

Formatted by Pretty R at

To leave a comment for the author, please follow the link and comment on their blog: Econometrics by Simulation. offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers


Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)