Dealing with selection bias
Whenever we recruit participants, we are looking to get a representative sample of our actual (or desired) users.
We may even go to a fair bit of effort to get (or exclude) specific types of users, by querying customer databases, posting invitations to specific user forums, and so on.
In the end, though, all recruiting is imperfect; we’ll miss some users we were hoping to get, and we’ll get some that we were hoping to miss.
It is important, nonetheless, to try to identify any selection bias in our recruiting, so we can take that into account when we analyze our results, or when we do our next study.
What is selection bias?
From Wikipedia’s article:
Selection bias is the selection of individuals, groups or data for analysis in such a way that proper randomization is not achieved, thereby ensuring that the sample obtained is not representative of the population intended to be analyzed.
In other words, certain recruitment methods may yield a skewed selection of participants, rather than the representative sample that we normally want.
For example, suppose that we only use a web ad on our site to recruit for our study. Only people who visit our site in the next few days (the duration of our recruitment) will see the ad. This means that:
We are ignoring customers who don’t use our website. (For many businesses, such as banks, this may be a big chunk of customers.)
We are more likely to get people who visit our site frequently (say, several times a week), but less likely to get people who use our site once a month (e.g. to check their bill).
What causes selection bias?
Here are some common causes of selection bias:
Web ads only get web users.
If we only use web ads to recruit users, then (by definition) we’re only getting those users who visit our website. While this is OK for many studies, it does ignore those customers who use other channels instead of the website. If we need offline users too, we’ll need to find another way to recruit them.Customer email lists only get existing customers.
If we only use a customer list to email invitations, we are missing prospective customers (a potentially valuable audience) and ex-customers (who are often good sources of honest feedback).Daytime in-person studies often get lots of retired people and students.
These two groups are easier to schedule during typical business hours, but may skew your user sample.Commercial customer panels may be skewed to certain demographics.
We've used commercial panels that were heavily skewed to consumers rather than businesses, or had a larger-than-normal representation of lower socio-economic groups.
This is usually easy to solve by being careful in our selection criteria, because panels typically offer lots of ways to select the participants you need.
How can we reduce bias?
We may decide that a given selection bias is acceptable; in our example above, we may only want customers who visit our website, so this implicit selection actually serves as a useful screening mechanism.
However, it’s important that we consider what kind of selection bias each recruiting method adds to our study. Then, we can either:
Try to reduce that bias, and/or
Acknowledge the bias and take it into account when analyzing our results and presenting the findings.
The most common way to reduce selection bias is to use several different types of recruitment. For example, instead of just running a web ad (which only yields web visitors), we could also use customer lists to reach those customers who don’t use the website.
Another (generally less effective) way to reduce bias is to modify the single recruitment method we are using. If we only use a web ad, for example, we could post that ad on several different websites. We will still only get website visitors, but some of them will be people who have not visited our website.
While we will never eliminate all bias from our studies, these steps should help minimize it so we can be reasonably confident in our results.
Copyright © 2024 Dave O'Brien
This guide is covered by a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.