Sampling Methods

On Wednesday we talked about why sample size matters.  Today I want to focus on how the sampling method affects the data. Basically, there are two ways to obtain a sample.  One way is to randomly select members of the population, and the other is to use whomever is available.

To understand the difference, I want you to imagine you are a member of a midwifery organization.  That midwifery organization wants to learn something about its members — maybe they want to know how many families the average midwife works with each month.  How is sampling method likely to affect the data the organization collects?

Random Sampling

If the organization wanted to get the most accurate estimate of the number of families they would ask every member.  But asking every member is expensive and difficult, so the next best option is to use a random sample.  To do this, they would first use a random selection tool to identify midwives who are members. The random selection tool would identify a group of midwives who should be surveyed.  The organization would then contact each midwife in the random group to have them complete the survey. Because this group of midwives was selected at random, the results are considered representative of the group as a whole and will be a pretty accurate estimate.

But random sampling isn’t easy.  It takes a lot of time and money to contact specific people and encourage them to complete your survey or give you data.  Sometimes it is impossible. There is no list of every pregnant woman in the world, or even in your country.  But you might be able to get a list of every woman who received prenatal care in the last year through your clinic or an insurance program — which leads us to the second sampling method, convenience sampling.

Convenience Sampling

To use convenience sampling, your midwifery organization might send the survey to every member and analyze the surveys that are returned. This is easy, but because you can only have the data for midwives that choose to complete and return the surveys it is no longer representative.  Why?  There may be inherent differences between those who return the surveys and those who do not. These differences may mean you are missing important data you need to make your estimates.  For example, if busy midwives are less likely to complete and return the survey your data will underestimate the average number of families served. You may find midwives in certain regions of your country are less likely to return the survey. For example, if rural midwives have less access to postage services your data will over-represent the urban midwives and is likely to overestimate the number of families served because the lower-volume rural midwives are not included.

Most research is done with convenience sampling because it is impossible to randomly select from the population of interest.  As explained earlier, there are no lists of every pregnant woman. So instead, samples are collected from every woman who uses a particular clinic, gives birth at a certain hospital, or is included in an insurance registry. Researchers understand this, and so have a few methods to overcome the bias built into convenience sampling.

Overcoming Convenience Sampling Bias

First, researchers clearly define the sample used for each study.  This helps readers understand what biases may have been built into the data due to the sample.  If possible, researchers will compare the sample to other estimates to help the reader understand further what differences this particular sample may have from the total population.  For example, authors often list the age and ethnic backgrounds of the women in the study. They explain where the participants were recruited, and even how many of the eligible women agreed to participate.

Second, researchers repeat previous studies with different samples. Remember, every sample provides an estimate of the true value, and no researcher considers their study to represent the whole picture of a problem.  Multiple similar studies help researchers understand where that true value lies, and how differences in the sample change that value. For example, in the discussion or conclusion section of a study researchers will often discuss how the estimates from the study compare to the findings in other studies.

Finally, researchers discuss the known limitations of the data available to help readers understand how the sample (or other aspects of the study) may be affecting the results.  Limitations are usually discussed in the conclusion or discussion section of a study and the authors explain how they dealt with each limitation, or why the limitation existed. If an author misses a limitation, a reader is certain to bring it up in a response letter that will be published in the journal.

The Birth Worker Survey

The Birth Worker Survey was a convenience sample, and as such is not representative of the total population of birth workers. For starters, the sample was heavily skewed to doulas with only three midwives responding.

Secondly, to complete the survey a birth worker had to find out about it.  This means the birth worker had to either be in contact with Birthing Naturally herself, or she needed to have a friend who cared enough to tell her about it. Birth workers outside this social network would not have known about the survey.

Making it even more difficult, not everyone who found out about it would complete it — you had to have enough time or care enough about learning statistics to want to complete it.

Finally, don’t forget that this was an online survey. This means only those birth workers with easy enough internet access could complete the survey.

So what does this mean for the data? Basically, it wasn’t very “rich.” Despite asking some interesting questions, there is little data available for analysis because the sample was so homogeneous that many of the questions offer little to no diversity in answers.

Some examples of the homogeneity:

  1. Only one of the respondents identified as any race other than White, and only one identified as Hispanic.
  2. Only three respondents identified as residing outside the United States.
  3. Only four respondents had never given birth themselves, and of those who gave birth only 3 had any health complication and only three gave birth via cesarean surgery.
  4. Only two respondents disagreed at all that birth is a normal, healthy process.
  5. No respondents disagreed that women must be prepared to successfully manage the pain of labor, only 4 agreed a hospital is no place to give birth. Only two agreed evidence based practice does not apply to midwifery.
  6.  The responses for comfort measures were so homogeneous only a total of four responses differed at all from the majority in all five questions.

When we talked about regression techniques, we compared the birth location for doulas to non-doulas and found no statistically significant differences.  However, remember this sample was only of birth workers.  We were not comparing doulas to the “average” woman.  We were comparing doulas to childbirth educators and midwives.  One limitation of our study is the problem that childbirth educators and midwives may be more similar to doulas in their birth preferences than they are to other women. I hope you see that you only understand this problem if you understand the sample that was used to obtain the result.

Next week we will explore proving causation and the popular randomized controlled trial.



Jennifer Vanderlaan (Author)