The story begins with an article demonstrating an association between maternal age and prematurity. The authors did something that seems straightforward; they combined the standardized premature gestational age groups into two groups instead of analyzing the data by standardized groups. This happens because often, too few births occur less than 28 weeks to be able to have a sufficient sample size for analysis.
In the response, another researcher suggested the interpretation of the results of this study was problematic because combining premature categories into two larger groups results in less meaningful estimates — which group does this estimate really represent? When this study is synthesized with other studies, which category of premature would these results be considered to represent?
Why does this matter? Because when studies use methods that are not comparable to other studies, it becomes more difficult to separate the signal from the noise. Unfortunately, this situation isn’t unique. I’m working through a similar problem in studies that examine perineal outcomes. Although the clinical measures are (in general) standardized, these measures are grouped differently for analysis leading to results that are not necessarily able to be synthesized. This is frustrating to me for two reasons. 1) When reading a study I don’t know if they chose a non-standard way to group a measure because it was the only data they had, or because it was the only way to get a statistically significant result and 2) If the data cannot be synthesized to provide clear guidance, it is easy to misread the situation as no evidence at all.
What’s the take-away message? The next time you read a meta-analysis on childbirth that explains there is “insufficient evidence” pay a little more attention to the groupings used in the primary studies. You may just find there were plenty of studies, and they may all even agree in general. They just might have all measured things a little differently.