Like the other statistical techniques we’ve talked about, regression techniques allow us to examine the relationship between two variables. But the regression techniques go a step beyond the Chi-Square and T-Test because they allow us to examine the relationship of multiple variables. Unlike the previous tests, regression allows you to find the correlation of multiple variables at one time, and then see what portion of the variation is due to each individual variable. This allows for control of potentially confounding variables, and is helpful for supporting the presence or absence of causation.
In studies that use regression techniques you may read the terms dependent and independent variables. These terms describe the relationship of the variables in the regression equation. The independent variable stands on its own, and is considered to change on its own. The dependent variable is “dependent” because regression is analyzing the portion of change in the variable that responds to changes in the independent variables.
Regression can also be used to predict which conditions or events might lead to certain health states. For example, you might use regression to determine how blood pressure and pre-eclampsia lab values are associated with increased risk for seizures.
The actual math for regression is done by computers, and basically is plotting the data points and finding the best lines through ALL the data. The result of the regression is a group of coefficients for each independent variable and one constant that is used to place the line on the y-axis. When reading a study using regression, the authors should report which variables were used in the equation, and what the resulting coefficients were. Sometimes authors will only report coefficients or odds ratios (depending on the type of regression) which are significant since insignificant coefficients are generally dropped from the final regression equation. The specific reporting is determined by the purpose of the regression, either to predict a health condition or to determine the strength of association between to variables.
From the Literature
In this paper, Successful induction of labor: prediction by pre-induction cervical length, angle of progression and cervical elastography, the authors are using regression to produce an equation that can predict successful induction. For this regression, the dependent variable would be either vaginal or cesarean delivery. The authors tested cervical length, elastographic score, angle of progression and nulliparity. They found nulliparity and cervical length predicted as well without the other two measures.
From the Birth Worker Survey
One of the questions asked on the Birth Worker Survey was about the location of the most recent birth. The respondents were split, with 54% giving birth in the hospital and the rest out of the hospital. We could hypothesize that being a doula increases the likelihood a respondent would give birth out of the hospital.
Of the 21 who are doulas, 9 (43%) gave birth in a hospital. Of the 7 participants who are not doulas, 6 (86%) gave birth in a hospital. This is a big difference, but the small sample size means it is difficult to have a significant finding. Our p-value is 0.084, non-significant. The odds ratio for hospital birth for doulas is 0.22, with a 95%CI from 0.02 to 2.19 — this is further evidence that this sample does not demonstrate a statistically significant difference. If we had a larger sample, we might have a significant finding.
But what if we wanted to control for factors that may affect place of birth? For example, one of the questions asked of participants to list their income category from a list of choices. Does income make a difference in the use of out of hospital birth among doulas?
In the regression equation, we can control for income by categorizing participants as having household incomes above or below $50,000. This is an arbitrary number chosen only because it makes a nearly even split for the groups. In the regression, we find the odds ratio for a doula to give birth in a hospital is 0.1, with a 95% CI of .009 to 1.092. Our p-value is 0.59. This finding is still not statistically significant — remember our sample is too small for that — but notice controlling for income did change the odds ratio we found for doulas. Income was also not statistically significant, with 95% CI for the odds ratio ranging from 0.5 to 13.5.
Overall though, this was not a very good model, it only correctly predicted the place of birth for 64% of the participants. With a better designed survey, we might have been able to better predict. With a larger sample we might have been able to find significant differences. We can talk about sample size more later this week.