Understanding Data – the measure

We need to talk about what might be considered the “back end” of statistics. That is, how did the data come to be?

Types of Observations

Data is simply a collection of variables, grouped by observation.  In health care, the observation is usually a “case” or a “person”. Researchers make each observation in a variety of ways. They may use a survey, asking individuals to report specific things. They may use biomarkers, taking blood or saliva samples to look for interesting components. They may use administrative data, looking at all hospital discharges or insurance charges for an organization. They may use surveillance data, examining collections of government mandated reporting data such as birth certificates.

Each type of data has its benefits and its problems.  For example, administrative data can give you many observations, which is helpful for getting information about things that don’t happen very often. But administrative data is collected for reasons other than research, and so the information itself limits what can be learned.  Surveys allow us to find out why people make the decisions they do, but people will sometimes answer a survey in the way they think the researcher wants the survey answered.

The Measurement

When you have a set of data, one of the biggest problems occurs if the variables are not consistently measured. This happens in large data sets, multi-site trials and anytime more than one person is collecting the data. Researchers try to limit this problem, but if you are dealing with administrative or surveillance data, you simply have to accept the risk.

So this means a study where blood pressures are measured with specifically calibrated sphygmomanometers gives us better data than hospital records that use whatever blood pressure cuff the aid or nurse happens to grab for the day.

One interesting case of unequal measurement happened with vital statistics data.  One of my professors was called in to find out why newspapers were reporting Memphis had such a high perinatal mortality rate.  As he dug into the data to see what was going on he quickly learned doctors in Memphis were writing birth certificates for every pregnancy loss, regardless of the gestational age. Measures like perinatal mortality rate have specific guidelines, in this case it is only counted if the fetus was 500 grams or more, or 22 weeks gestation or more. Epidemiologists at the CDC know this, and discard observations that are inappropriate. But in Memphis, the statistic being reported in the media (who didn’t know the accurate measurement) was including all perinatal death certificates — regardless of gestational age or weight. Do you see how the local reporting was inflated – and how unfair this was to the people of Memphis?

Sometimes there isn’t a standard measure, or there are multiple measures being used in the research.  This can make it hard to compare studies because the measures may not produce equivalent results.  Check out this example, where changing the way the researches treated the “missed” prenatal visits made big changes to the number of women who had adequate prenatal care 

Birth Worker Survey

The Birth Worker Survey  has several big problems as a source of data.  To begin with, the sample used was whoever felt interested enough after reading a post to complete the survey. This means not only are we limited to the people who read the posts from Birthing Naturally, but we are further limited to those who had the time and interest to fill out the survey.  This means all we can really say about this survey is that it informs us about the most motivated readers of the Birthing Naturally website – not anything about doulas, childbirth educators, or midwives in general.

An Interesting Read

If you find the problem of data interesting, you might enjoy this article by Kate Clancy discussing the problems of inconsistent datasets for pregnancy from rape and trying to use the results from the data.

Coming Up

Next time, we will talk about how to use our data to talk about differences in populations.

Jennifer Vanderlaan (Author)