Chapter 8 Hypothesis Testing

Hypothesis testing provides a standardized decision-making process for scientific research. It can be used in many situations: to compare one group’s result to a ‘standard’, to compare two groups to each other, or to compare among many groups. Hypothesis testing is used to compare means, to compare proportions, to compare odds ratios, to compare hazard ratios, and so on.

In this lecture, we will introduce the logic behind hypothesis testing. In the next lecture, we will formally present a specific hypothesis test, the one sample t-test, which is used to compare one sample mean to a ‘standard’. To review, statistical inference is the process of concluding something about the value of an unknown population parameter (such as the true mean or true proportion) based on information from a known sample (such as the sample mean or sample proportion).

There are two approaches to doing statistical inference. One approach is through estimating the unknown population parameter via confidence intervals using the known result from our sample (which we learned in an earlier lecture). The other approach is through hypothesis testing. So how is hypothesis testing different from using confidence intervals?

In hypothesis testing, we first assume some claim about the population parameter and then use the result from our sample to gain insight into that statement. When we use confidence intervals, on the other hand, we make no assumption about what the population parameter is. The purpose of constructing confidence intervals is to estimate the population value from the result of our sample, whereas, the purpose of hypothesis testing is to determine whether there is enough evidence (from the result from our sample) to refute the assumed claim about the population value.

You might be thinking, it seems odd to start with an assumed claim about the population value. Why would we start with that? To understand this, we have to understand the logic of hypothesis testing. The logic of hypothesis testing involves three big pieces: Hypotheses, which are statements or claims about the population parameter; Evidence, which involves summarizing sample data to provide evidence about the claims; and Evaluation, which involves making decisions about how usual or unusual the evidence is compared to the claim about the population parameter.

A commonly used analogy for the logic of hypothesis testing is the US criminal justice system. Let’s use this analogy to better understand the pieces in the framework. A key principle of the US criminal justice system is ‘A defendant is presumed innocent until proven guilty’. In this statement, there are two competing hypotheses about the ‘truth’:

– The defendant is innocent. - The defendant is guilty.

The first hypothesis (in this case, innocence) is assumed to be true. This assumed claim about the ‘truth’ is also known as the null hypothesis, denoted by \(H_0\), H-sub-zero or H-naught. It is the claim that is to be tested. In general, the null hypothesis describes the default situation (the status quo or the ‘nothing interesting is happening’ situation). It is the thing that we are going to initially assume is true about the population(s). When we are comparing one population to a standard, the null hypothesis is that the value of the parameter of interest in the population is equal to some hypothesized value. When we are comparing two populations (for example, a treatment group and a control group), the null hypothesis is that there isn’t any difference: the parameter has the same value in both populations. (‘Null’ means amounting to nothing, none, absent, insignificant.)

In contrast, the second hypothesis (in this case, guilt) is an alternative claim, a claim for which we seek significant evidence. The alternative hypothesis describes what we hope to demonstrate about the population(s) using the data. It is closely aligned with the research question. When we are comparing one population to a standard, the alternative hypothesis is that the value of the parameter of interest in the population is different than the hypothesized value. When we are comparing two populations (for example, a treatment group and a control group), the alternative hypothesis is that there IS a difference: the parameter does NOT have the same value in both populations.

These two statements should be made prior to examining any evidence. It would be considered cheating if we used our evidence to determine the hypotheses. We need to start with some assumed ‘truth’ in order to compare the evidence to the claim. Using knowledge of research design and detective skills, evidence is collected and evaluated against the claims (hypotheses) that have been made. In the US justice system, this might involve interviewing witnesses, gathering physical evidence, analyzing forensic evidence, etc.

In statistics, our ‘evidence’ is our sample of data, which we may have obtained through a survey, or through a clinical trial, or through some other kind of study. The data are summarized to provide a clearer picture of what evidence we have on hand.

References: Image: https://www.kisspng.com/png-magnifying-glass-fingerprint-clip-art-kisspng-1155964/The last piece of the framework is to evaluate how usual or unusual the evidence is compared to the claim about the ‘truth’. From this evaluation, one of two situations could occur: We find that the evidence is unusual (is not likely to occur) compared to what is assumed to be true. We find that the evidence is usual (is likely to occur) compared to what is assumed to be true.

In the US justice system, if the evidence is not consistent with (or is unusual compared to) the assumed claim of innocence, then we conclude there is evidence against the claim that the defendant is innocent (i.e., against the null hypothesis) and evidence for the claim that the defendant is guilty (i.e., in support of the alternative hypothesis). On the other hand, if the evidence is consistent with (or not unusual compared to) the assumed claim of innocence, then what we conclude is that there is no evidence for the claim that the defendant is guilty.

CAUTION: We need to be careful with our wording with hypothesis testing. If the evidence is consistent with (or not unusual compared to) the assumed claim of innocence, then we do NOT conclude that there is evidence for the claim that the defendant is innocent (i.e., we do NOT ‘accept the null hypothesis’). Lack of evidence against the null hypothesis is not evidence for the null hypothesis. For example, a defendant may indeed have exceeded the speed limit, but if there were no witnesses and no speed measurements, then there won’t be any evidence of it. Lack of evidence doesn’t necessarily mean the defendant is innocent: it just means there isn’t any evidence of guilt. This is why the US justice system verdict is either ‘guilty’ (if there is sufficient evidence against innocence) or ‘not guilty’ (if there isn’t enough evidence against innocence); the verdict is never ‘innocent’.

Evidence is used in a similar manner in statistics. We evaluate evidence by calculating the probability of observing the result we obtained given the assumption that the null hypothesis is true. That is, IF we assume the null hypothesis is true, how likely or unlikely would it be to see the result we obtained from our data just by random chance? We compare this value against some threshold to make a conclusion about the population. If the result we obtained from our data is NOT likely to occur if the null hypothesis statement is true, then something else must be going on. In this case, we say we have evidence against the null hypothesis and for the alternative hypothesis. Another common phrase for this is we reject the null hypothesis. On the other hand, if the result we obtained from our data IS likely to occur if the null hypothesis statement is true, then we say we lack evidence against the null hypothesis or for the alternative hypothesis, or we do not reject the null hypothesis.

These three pieces are the framework for all hypothesis testing and will be used throughout this course.

References: Image: https://www.pixcove.com/scales-balances-fairness-weighing-considering-justice-tilted-slanted-symbol-sign/