What is a Confidence Interval?

On Monday we talked about how p-values tell us the probability of obtaining the result if the null hypothesis (the status quo) is true.  Today we turn our attention to the confidence interval and the additional information using a confidence interval provides.

There are two things you need to remember to make sense of a confidence interval.  First, your sample provides an estimate of the true value.  Second, if you took a different sample, you would get a different estimate.  So what does this mean?

The sample is only an estimate

Let’s say we were interested in finding the average circumference of pregnant bellies.  The only way to know this for sure is to measure every pregnant belly — but you couldn’t really do that.  So you might go to a midwive’s office and measure the waist of each pregnant woman who had an appointment that day — a sample of all pregnant women. From this group of data you would calculate an average. But if you went to do the same thing the next day, you would get a different group of women (sample) and end up with a different average waist circumference.  In fact, if you continued this experiment every day for a month –and had researchers stationed in every midwifery office in your city — you would get a different result for every day and for every midwifery office you visited. Each of these samples only estimates the true average belly size.

If you plotted the averages you obtained from each of your samples, you would get a graph that looks similar to a normal (bell) curve.  In this case, with each “sample” representing one day at one midwife’s office, our normal curve would be quite wide.  Why?  Because each of our samples were small, maybe only 10 to 15 women.  A small sample makes our calculation more sensitive to any measurements that are at the extremes of normal – so if you have one woman who is carrying exceptionally small or large, she will move your average away from the true average.

If we added all the measurements for each day from every midwife’s office into one “sample,” our sample sizes would now be larger and less susceptible to the random variations of normal — a larger sample gives us a more precise measurement because there are more observations to provide buffers for the extreme values.  When we plot each days data, we will still end up with a normal curve, butthis normal curve will be more narrow.

If we added all the measurements for the entire month in your city, our sample size is even larger and less susceptible to random variations of normal.  When we plot the average for the month in your city with the averages for the month from other cities we will again obtain a normal curve, but this curve will be even more narrow.  In this case, our samples are much larger and so the averages we obtain are closer to the true average.

What Confidence Intervals Tell Us

Essentially, the confidence interval tells you how wide a curve this sample has provided.  This means it tells you how precise the measurement is likely to be.  Confidence intervals are reported as 95% confidence intervals.  When you read the 95% confidence interval is from x to xx, you should read that statement as “I can be 95% certain the true value is between x and xx.”  What you don’t know is where between x and xx that true value lies.

When you are comparing two groups, you can use the confidence intervals to determine if the differences observed are really differences.  To do this you look at the confidence intervals to see if they overlap.  If the confidence intervals overlap it means the possibility exists that the values are the same.

When you are looking at odds ratios or risk ratios, the confidence interval can help you determine if there is an increase in odds or risk.  For these measurements, a ratio of 1 means the two groups are the same.  If the confidence interval includes 1, you cannot be certain the odds or the risk is actually different.

A Literature Example

Check out this study: Cesarean Section and Rate of Subsequent Stillbirth, Miscarriage, and Ectopic Pregnancy: A Danish Register-Based Cohort Study.

Even if you only have access to the abstract, you are able to see the confidence intervals for the main measures.  The sample in this study was large (n= 832,996) which makes the confidence intervals rather narrow.

The Birth Worker Survey

One of the questions we asked on the Birth Worker Survey was the length in hours of your most recent labor.  We converted this to minutes because so many of the respondents couldn’t help but provide a more accurate measurement than hours. For the overall sample, the mean (or average) length of labor was 415.71 minutes with a standard deviation of 477.8 minutes.  But does this differ based on characteristics of the respondents?

On Monday we split the group into those who worked as doulas for income, and those for whom it was a hobby.  We can use that same breakdown here.  When we do we find out that repondents who work as doulas for income had a mean length of labor of 408.5 minutes while those who do not work as a doula for income had a mean length of labor of 318.6 minutes.

Using a test called a T-test (we’ll talk about this later) we are able to measure the difference between the length of labor for the two groups.  Remember, in this test we are measuring the difference in the means, and the mean difference is -89.9 with 95% CI -550.5  to 370.7.

Notice that the confidence interval includes a mean difference of zero?  This means we cannot be confident there is a real difference in labor times for these two groups. This is verified by our p-value which was 0.687.