Reliability is essentially a measure of consistency in your scores (think of a dart board)
Error doesn't necessarily mean mistake, it can just refer to the measurement procedure/conditions
Generalizability theory, the child of CTT and ANOVA, allows a researcher to quantify and distangle the different sources of error in observed scores
What are we trying to generalize over
The CTT model is: \(X = T + E\)
The G-Theory model is: \(X = \mu_p + E_1 + E_2 + \dots + E_H\)
\(\mu_p\) - universe score and \(E_h\) - are sources of error
Recall, each rater rates each item
\(X_{pir} = \mu + v_p + v_i + v_r + v_{pi} + v_{pr} + v_{ir} + v_{pir}\)
If we assume that that these effects are uncorrelated then
\(\sigma^2(X_{pir}) = \sigma^2_p + \sigma^2_i + \sigma^2_r + \sigma^2_{pi} + \sigma^2_{pr} + \sigma^2_{ir} + \sigma^2_{pir}\)
These are our variance components
In a G study, we estimate each of these variance components
They can be estimated using aov()
or lme4::lmer()
functions in R
This forms the basis of our D study, which is used to investigate different scenarios and allow us to calculate different reliability estimates based on our use
We need to derive universe score, relative error, and absolute error variances
\(\sigma^2(X_{pir}) = \sigma^2_p + \sigma^2_i + \sigma^2_r + \sigma^2_{pi} + \sigma^2_{pr} + \sigma^2_{ir} + \sigma^2_{pir}\)
universe-score variance $$\sigma_{\tau}^2 = \sigma_p^2$$
relative error variance$$\sigma_{\delta}^2 = \frac{\sigma_{pi}^2}{n_i^`} + \frac{\sigma_{pr}^2}{n_r^`} + \frac{\sigma_{pir}^2}{n_i^`n_r^`} $$
absolute error variance$$\sigma_{\Delta}^2 = \frac{\sigma_{i}^2}{n_i^`} + \frac{\sigma_{r}^2}{n_r^`} + \frac{\sigma_{ir}^2}{n_i^`n_r^`} + \frac{\sigma_{pr}^2}{n_r^`} + \frac{\sigma_{pi}^2}{n_i^`} + \frac{\sigma_{pir}^2}{n_i^`n_r^`} $$
IMPORTANT: What we consider fixed or random determines what goes where!
Now that we've partititioned our variance into 3 components: universe score, relative error, and absolute error variance.
Relative error and the generalizability coefficient, are analagous to \(\sigma^2_E\) and reliability in CTT, and is based on comparing examinees
\(E\rho^2 = \frac{\sigma^2_\tau}{\sigma^2_\tau + \sigma^2_\delta}\)
Absolute error variance is for making absolute decisions about examinees
Dependability coefficient, \(\phi = \frac{\sigma^2_\tau}{\sigma^2_\tau + \sigma^2_\Delta}\)
Again, consider our G-study in which Icelanders answer items on a writing test that were scored by multiple raters.
Source | Variance component | Estimate | Total variability (%) |
Person (p) | $$\sigma_p^2$$ | 1.376 | 32 |
Item (i) | $$\sigma_i^2$$ | 0.215 | 05 |
Rater (r) | $$\sigma_r^2$$ | 0.043 | 01 |
p × i | $$\sigma_{pi}^2$$ | 0.860 | 20 |
p × r | $$\sigma_{pr}^2$$ | 0.258 | 06 |
i × r | $$\sigma_{ir}^2$$ | 0.001 | 00 |
p × r × i | $$\sigma_{pir}^2$$ | 1.548 | 36 |
$$\sigma_{\delta}^2 = \frac{\sigma_{pi}^2}{n_i^`} + \frac{\sigma_{pr}^2}{n_r^`} + \frac{\sigma_{pir}^2}{n_i^`n_r^`} = \frac{0.86}{20} + \frac{0.258}{3} + \frac{1.548}{3*20} = 0.1548$$
$$\sigma_{\Delta}^2 = \frac{\sigma_{i}^2}{n_i^`} + \frac{\sigma_{r}^2}{n_r^`} + \frac{\sigma_{ir}^2}{n_i^`n_r^`} + \frac{\sigma_{pi}^2}{n_i^`} + \frac{\sigma_{pr}^2}{n_r^`} + \frac{\sigma_{pir}^2}{n_i^`n_r^`} = \frac{0.215}{20} + \frac{0.043}{3} + \frac{.001}{3*20} + \frac{0.86}{20} + \frac{0.258}{3} + \frac{1.548}{3*20} = 0.1799$$
\(E\rho^2 = \frac{\sigma^2_\tau}{\sigma^2_\tau + \sigma^2_\delta} = \frac{1.376}{1.376 + 0.1548} = 0.899\)
\(\phi = \frac{\sigma^2_\tau}{\sigma^2_\tau + \sigma^2_\Delta} = \frac{1.376}{1.376 + 0.1799} = 0.884\)
What if we used just 10 items and 2 raters?
\(E\rho^2 = 0.824\) and \(\phi = 0.803\)
So reliabilities decrease!