Chapter 5 Confidence Intervals

A sample proportion is a “point estimate” of a population proportion. In contrast, a confidence interval for a proportion is an “interval estimate” of a population proportion that reflects the precision of the estimate. A confidence interval gives a plausible range for a population proportion based on observed data. Confidence intervals can also be used to make inferences about a population proportion. This lecture will introduce the concept of confidence intervals, focusing on confidence intervals for a proportion.

We learned earlier that theoretical distributions and their parameters are useful when we are trying to understand the probability of observing various outcomes in random samples given a known characteristic of the population. For example, if we know the true prevalence of diabetes in the population, we can use the Binomial distribution to say things about random samples from that population. This is an example of the upper arrow in the figure above.

However, when analyzing data, we have a single sample in hand that we want to use to say something about the population it came from. For example, if we calculate the proportion of people with diabetes in a sample, what does this tell us about the proportion with diabetes in the entire population? This would be an example of the lower arrow in the figure above: using information about the sample to find out about the population.

How do we do this? The key, as we will see, is to use sampling distributions! Suppose we are interested in understanding the prevalence of diagnosed diabetes in younger adults aged 20 - 44 years old in the United States. How would we obtain an estimate of this number? How confident are we that we have indeed estimated accurately the percent of people in this age group with diabetes in the United States?

Let’s assume that we have on hand a representative sample of 500 Americans between 20 and 44. Now suppose we ask each one “Have you been diagnosed with diabetes”, and 53 of them say “Yes”. If we divide 53 by 500 we obtain the sample proportion, the proportion of younger adults in our sample who have diabetes, in this case 10.6%.

The sample proportion, which we refer to as p-hat, is an estimate of the true proportion, p, of diabetic younger adults in the entire US population. It may be somewhat lower than the true population proportion, or it may be somewhat higher. But how much higher or lower is plausible given our sample? Getting an estimate of 10.6% from a sample of 500 seems pretty unlikely if the true population proportion is, say, 50%. But is this estimate consistent with a true population proportion of, say, 14%? We can use confidence intervals to tell us.

A confidence interval is an interval estimate that tells us about the precision of the estimate. Loosely, we can think of it as a range of true population parameter values that are plausible given the observed data.

Without worrying for now about how we calculated it, a 95% confidence interval for the true population proportion from our diabetes study is 7.9% to 13.3%. This interval is telling us that true population proportions between 7.9% and 13.3% are relatively plausible given our observed sample proportion of 10.6%.

There are a few ways of writing out the confidence interval, including “lower CI value” to “upper CI value” (e.g., 7.9% to 13.3%), or placing parentheses around the two CI values and separating them with a comma (e.g., (7.9%, 13.3%)). Confidence intervals for a proportion can denoted either as proportions or as percents.

So what does the 95% mean? Informally, it quantifies how confident we are that the true population proportion falls in the given range. We?ll give a more formal definition shortly.

So how did we calculate this confidence interval from the data?Determining a confidence interval requires us to invert or “flip” our thinking, to go from “known” sample to “unknown” population (instead of “known” population to “unknown” sample as we did when we explored sampling distributions).

Let’s think about the sampling distribution of sample proportions again. The dot plot above shows the values of the sample proportion, p-hat, calculated for each of 1,000 samples, each with size n=500, from a population that has a known population proportion, p, of 0.093 or 9.3%. Previously, we saw that the sampling distribution for p-hat (the sample proportion) is approximately Normal (a.k.a. bell-shaped) with a mean equal to the true population proportion of 0.093 and a standard error (i.e., standard deviation of the sampling distribution) equal to 0.013, which mathematically works out to be the square root of \(\frac{p(1-p)}{n}\).

It is important to recall that the standard error (denoted as SE) is just the “fancy” name denoting a special case of the standard deviation (denoted as SD). The standard error is the standard deviation when applied to a sampling distribution for a statistic. But it?s still a standard deviation (period). Using the term “standard error” just alerts us that it describes a sampling distribution and not the population or the sample observations (our data), which would both be described with the term “standard deviation”.

Since the sample size, n, is big here, the observed sample proportions are pretty close to being a Normal distribution. We can see that this is the case by observing the Normal curve that is laid over the dots in the plot and noting how similar they are. This Normal curve has the same mean and standard deviation (a.k.a. standard error) as the sampling distribution; that is, it has a mean of 0.093 and a standard deviation of 0.013. So, using this Normal approximation, what can we say about the likely values of p-hat? Using the 68-95-99.7 rule of Normal distributions, roughly 68% of the sample proportions will fall within one standard deviation (a.k.a. one standard error) of the mean of 0.093, and about 95% of them will fall within two standard deviations (a.k.a. two standard errors) of the mean of 0.093. Building on the “within 1 or 2 SE from the mean” idea, let’s see what the cutoff values for these sample proportions would be.

The dot plot on the previous slide has been removed from the plot and all that remains is the Normal curve with same mean of 0.093, the true population proportion, and the same standard error of 0.013. This is depicted in the top curve on the slide.

If we wanted the middle 68% of the sample proportions, we would take p, the mean of the sampling distribution for sample proportions, plus or minus 1 * SE. So the middle 68% of the sample proportions are between 0.080 and 0.106. Similarly, the middle approximately 95% of the sample proportions are between 0.067 and 0.119, which is p plus or minus 2 * SE.

In fact, we can use the properties of Normal distributions to tell us how many standard errors above and below 0.093 we have to go to contain any arbitrary percentage of the data. For example, it turns out that a range of plus or minus 1.64 standard errors from the mean contains approximately 90% of the data, so in our case, about 90% of the sample proportions should fall between 0.072 and 0.114.

Recall that we can standardize data by calculating ?z-scores? that tell us how many standard deviations the observation is from the mean. If the data are statistics, such as sample proportions, then the z-score tells us how many standard errors away from the mean the sample proportion is. The same sampling distribution for the sample proportion is shown in the lower curve on the slide, in standardized form with the z-score on the horizontal axis.

Note that the product of the value of the z-score and the standard error is known as the margin of error. Using what we know about where values of p-hat are likely to fall given the true value of p, we can ?flip? our thinking and say something about where p might lie based on the observed value of p-hat. That is, we have just seen that 95% of the sample proportions will lie within ~2SE of the true proportion. Flipping this around tells us that for any given sample proportion, there is a 95% chance that the true proportion will lie within ~2SE of it (and a 5% chance that the true proportion will be further than ~2SE from it).

This leads us to the formula for the confidence interval of a proportion: p-hat +/- its margin of error. The margin of error consists of two parts: the z-value, which is determined by the degree of confidence we want, and the standard error of p-hat.

Notice that we?ve dressed up the z-value term a little by adding the subscript 1 ? alpha/2. This subscript indicates the appropriate z-value corresponding to a specified level of confidence. For instance, for a 95% confidence interval, alpha equals 1 ? 0.95 or 0.05. Therefore, 1 ? alpha/2 equals 0.975 which tells us that our z-value should be the number that exceeds 97.5% of the observations in a Standard Normal distribution. It turns out that this number is about 1.96.

Next we move on to the second component, the standard error of the sample proportion, p-hat. It turns out that the formula for this standard error involves the true population parameter, p. However, this is a problem since p is unknown! What do we do? We simply replace p with its estimate, p-hat. This results in an estimated standard error rather than the true standard error, which uses p. For our example, the sample proportion of young adults with diagnosed diabetes was 53/500 = 0.106, so p-hat = 0.106 and the estimated SE(p-hat) = 0.014 (Notice that the estimated SE of p-hat is slightly different than the true SE of p-hat on slide 5.) For a 95% confidence interval, alpha = 0.05 and the z-value is 1.96. This gives a margin of error of 1.96*(0.014) = 0.027. Our 95% confidence interval for the true population proportion of young adults who have diabetes is therefore 0.106 +/- 0.027, which gives (0.079, 0.133).

Now, how do we interpret this range? We can say things like: ?I am 95% confident that the interval between 0.079 and 0.133 contains the true proportion of US young adults with diabetes.? Or ?A plausible range of values for the true proportion of US young adults with diabetes is 0.079 to 0.133.?

But these interpretations are somewhat ambiguous. What does ?95% confident? mean? How do we define ?plausible??

Next, we will define precisely what we mean by a 95% (or any %) confidence interval.This plot shows our sample proportion, 0.106 (black dot) and the true population proportion, 0.093 (solid red vertical line). We see that our sample proportion, while not the same as the population proportion, is close.

In this case, the 95% confidence interval (or CI) constructed from our one sample contains the true population proportion. How often will this be the case?Let’s say we were able to obtain 50 different samples of size n=500 from the same population. From each sample, we calculate a confidence interval for the true population proportion. The plot above shows 95% confidence intervals for 50 hypothetical random samples taken from our US young adult population where the true proportion of diabetes is 0.093 or 9.3%. We see that the sample proportions, p-hat, vary a bit from sample to sample as indicated by the dots in the center of each interval. For 47 of the samples, the calculated 95% confidence interval contains the true population proportion of 0.093. But for three of the samples, the calculated confidence interval misses the true population proportion (as indicated by the red horizontal lines in the plot above). And, lo and behold, 47 out of 50 is 94%, which is very close to the 95% confidence level we used to calculate the confidence intervals.

This isn?t a coincidence. If we repeated the study and obtained 1,000 samples of size n=500, instead of 50 samples, we would get 1,000 slightly different point estimates and 1,000 slightly different confidence intervals, due to sampling variability. Those calculated confidence intervals would contain the true population proportion of 0.093 in about 950 (or 95%) of the samples, but would miss the true value in about 50 (or 5%) of the samples.

This plot tells us how to precisely define a 95% confidence interval: A 95% confidence interval is an interval such that, when estimated on repeated samples, approximately 95% of those estimated intervals will contain the true population parameter and approximately 5% will miss it. 90% confidence intervals will “hit” 90% of the time and “miss” 10% of the time; 99% confidence intervals will hit 99% of the time and miss 1% of the time, and so on.What if we wanted to be more than 95% confident? If we wished to be 99% confident, then our confidence interval would need to be wider, so that 99% of the time it would “catch” the true population value and only 1% of the time it would “miss” it. A confidence interval is kind of like a butterfly net: a wider net has a better chance of catching the butterfly. In our imaginary study, a 99% confidence interval for the true proportion of US young adults with diabetes is 0.071 to 0.141.

What if we wished to be less than 95% confident? If we wished to be 90% confident, then our confidence interval would need to be narrower, so that it only “catches” the true population value 90% of the time. An interval with a lower level of confidence will produce a confidence interval that is narrower than one with a higher level of confidence.

Recall that the z-value in the confidence interval calculation is determined by the desired confidence level. The z-values for three common confidence levels are:

  • 1.645 for a 90% confidence level,
  • 1.96 for a 95% confidence level, and
  • 2.575 for a 99% confidence level.

Increasing the confidence level is a double-edged sword. On one hand, we can feel more confident that our interval will contain the truth. But on the other hand, our interval will be wider, so it gives us less information about where the true value lies. Conversely, decreasing the confidence level makes it more likely that we will “miss” the truth, but in exchange our intervals will be narrower.

So, where does all this leave us when we are staring at one confidence interval calculated from the one sample we have on hand? Since we don?t know the true population parameter, we don’t know whether this particular interval has “hit” or “missed” the true value. But, since we know that confidence intervals will “hit” the true value a relatively high percentage of the time (a percentage of the time that we determine by fixing the confidence level), we take a sort of “leap of faith” and view the interval as a plausible range for the true population parameter. We have mentioned several times the idea of “flipping” our thinking when it comes to confidence intervals. To recap, the “flipping” part comes when instead of thinking about the variation among the proportions from many samples when we “know” the population proportion, we are thinking about estimating an “unknown” population proportion from a single sample. Let’s connect the concept of the sampling distribution for a proportion (when it is approximately Normal) to the concept of confidence intervals.

The same image of multiple confidence intervals estimated from multiple samples is presented as from a previous slide. But this time, the sampling distribution for a proportion with mean 0.093 and standard error of 0.013 is overlaid on top of the confidence intervals. The dark blue vertical lines at 0.068 and 0.118 (that is, at ~2SE below and above the mean) denote the middle 95% of the sample proportions in the sampling distribution. That is, 95% of the sample proportions fall within ~2 standard errors of the mean, the population proportion, but 5% of the sample proportions are outside of this region. So while most of the sample proportions are “near” the population proportion, just by chance we might get a sample proportion that is “far away” from the population proportion.

Notice that the 47 sample proportions (blue dots) that are within the middle 95% of the sampling distribution have a confidence interval that contains the population proportion. In contrast, the 3 sample proportions (red dots) that are not within this region have a confidence interval that doesn’t contain the population proportion.

To put it all together now, based on the sampling distribution of sample proportions, we know that for 95% of the sample proportions (the ones in the middle of the sampling distribution) the true population proportion is going to be within ~2SE of that sample proportion, and for the other 5% of the sample proportions (the ones in the tails), the true population proportion is going to be further than ~2SE from that sample proportion. So if we put an interval of +/- ~2SE around”any”observed sample proportion, then 95% of the time, this interval will contain the true population proportion.A confidence interval method is a “recipe” for calculating a confidence interval based on observed data. And just as in cooking, there are different recipes for making the same dish, and each recipe has its own strengths and weaknesses.

The Wald method is the one we have just described. It generally works fairly well, but becomes less accurate when the sample size is small.

The modified Wald method involves a small tweak to the standard confidence interval formula that improves its behavior in small samples. For more information on this method, see the following website: https://www.graphpad.com/support/faq/the-modified-wald-method-for-computing-the-confidence-interval-of-a-proportion/.

Because they are calculated by adding and subtracting the margin of error from the sample proportion, both the Wald and modified Wald methods generate symmetric confidence intervals. This can be a problem if the sample size is very small or the estimated proportion is close to 0 or 1, because then the confidence interval may include values below zero or above 1, which are impossible values for a population proportion. An alternative confidence interval recipe that avoids this problem is the Clopper-Pearson Exact Binomial method. The word “exact” refers to the fact that this method does not rely on the Normal distribution approximation; instead, it uses the fact that the sampling distribution of the sample proportion is a re-scaled version of the Binomial distribution. Confidence intervals estimated using the Clopper-Pearson method may not be symmetric, but will never contain values outside the range of 0 to 1. Like most exact methods, Clopper-Pearson confidence intervals usually require a computer to calculate; they cannot be easily hand-calculated like Wald confidence intervals. See the following website if you are interested in the formula: https://ncss-wpengine.netdna-ssl.com/wp-content/themes/ncss/pdf/Procedures/PASS/Confidence_Intervals_for_One_Proportion.pdf.

No matter which recipe is used, as the size of the sample increases, the confidence interval gets narrower, which means that the estimate is more precise. Recall that the process of using a sample proportion to estimate a population proportion is an example of statistical inference. Statistical inference for a proportion relies on several assumptions.

We assume that the sample is a random (or representative) sample from the population of interest. In this way, we are able to generalize the results from this sample to the population it came from. This assumption would be violated, for example, if a sample was obtained in a way that under-represented people of color. The prevalence of diabetes is higher in people of color than in whites, so the results from a non-representative sample would not be generalizable to the U.S. young adult population as a whole.

We assume that the observations are independent. This assumption would be violated, for example, if siblings were included in the sample. Siblings may share genetic and environmental risk factors for diabetes and so might be more similar to each other than two people chosen at random would be.

If we are using the Wald method, we assume that the sample is “large enough” for the sampling distribution of sample proportions to be approximately Normal. For proportions, “large enough” requires at least 5 events (or “successes”) and at least 5 non-events (or “failures”). In our example, this assumption would be met if our sample contained at least 5 people with diabetes and at least 5 people without diabetes. If the sample size is too small, consider using other methods, such as the exact binomial method, to compute confidence intervals.

If these assumptions are violated, the confidence intervals we calculate may give us faulty information about the true population proportion. For instance, intervals might be too narrow, suggesting a more precise estimate than we actually have, or they might be centered at the wrong place, and hence more likely to “miss” the true population proportion.