Item Response Theory
Generalizability Theory
Classical Test Theory
X = T + E
\(\sigma^2_X = \sigma^2_T + \sigma^2_E\)
\(\sigma_{\text{SEM}} = \sigma \sqrt{1 - r_{xx}}\)
In an nutshell, IRT is able to address all of these criticisms
BUT, makes stronger assumptions and requires a larger sample size
A measurement perspective
A series of non-linear models
Links manifest variables with latent variables
Latent characteristics of individuals and items are predictors of observed responses
Not a "how" or "why" theory
Anxiety could be loosely defined as feelings that range from general uneasienss to incapcitating attacks of terror
Is anxiety latent and is it continuous, categorical, or both?
Categorical - Individuals can be placed into a high anxiety latent class and a low anxiety latent class
Continuous - Individuals fall along an anxiety continuum
Both - Given a latent class (e.g. the high anxiety latent class), within this class there is a continuum of even greater anxiety.
Response of a person to an item can be modeled with the a specific item reponse function
The logistic model
\(p(x = 1 | z) = \frac{e^z}{1 - e^z}\)
The logistic regression model
\(p(x = 1 | g) = \frac{e^{\beta_0 + \beta_1g}}{1 - e^{\beta_0 + \beta_1g}}\)
The Rasch model
\(p(x_j = 1 | \theta, b_j) = \frac{e^{\theta - b_j}}{1 - e^{\theta - b_j}}\)
So, the Rasch model is just the logistic regression model in disguise
rasch <- function(person, item) {
exp(person - item\) (1 + exp(person - item))
}
rasch(person = 1, item = 1.5)
# [1] 0.3775407
rasch(person = 1, item = 1)
# [1] 0.5
Similar to the SEM, the standard error of estimate (SEE) allows us to quantify uncertainty about score of a person within IRT
Information is the inverse of the SEE and tells us how precise our estimates
We can use this to select items and develop tests!
Reliability is essentially a measure of consistency in your scores (think of a dart board)
Generalizability theory, the child of CTT, allows a researcher to quantify and distinguish the different sources of error in observed scores
The CTT model is: \(X = T + E\)
The G-Theory model is: \(X = \mu_p + E_1 + E_2 + \dots + E_H\)
\(\mu_p\) - universe score and \(E_h\) - are sources of error
\(X_{pir} = \mu + v_p + v_i + v_r + v_{pi} + v_{pr} + v_{ir} + v_{pir}\)
If we assume that that these effects are uncorrelated then
\(\sigma^2(X_{pir}) = \sigma^2_p + \sigma^2_i + \sigma^2_r + \sigma^2_{pi} + \sigma^2_{pr} + \sigma^2_{ir} + \sigma^2_{pir}\)
These are our random effects variance components
This forms the basis of our D study which can investigate different scenarios and allow us to calculate reliability estimates
In a D study, we partitition our variance into 3 components: universe score, relative error, and absolute error variance.
Relative error and the generalizability coefficient, are analagous to \(\sigma^2_E\) and reliability in CTT, and is based on comparing examinees
\(E\rho^2 = \frac{\sigma^2_\tau}{\sigma^2_\tau + \sigma^2_\delta}\)
Absolute error variance is for making absolute decisions about examinees
Dependability coefficient, \(\phi = \frac{\sigma^2_\tau}{\sigma^2_\tau + \sigma^2_\Delta}\)