Dealing with outliers

When we see lots of participants choosing a certain answer, or following a certain path through the tree, there’s not much to argue with – clearly there’s some reason why so many of them made the same choice.

But often we see just a few participants (or even a single person) choose a certain answer or path. What should we make of that? Is it meaningful, or just an outlier that we can safely ignore?

For example, here's a task from a tree test run by Meridian Energy (a power company):

How much electricity did your household use last week?

And here's where they went from the Meridian Home page:

The few participants who chose About Us or For Business or For agribusiness may have done so honestly (because they thought it was the best answer) or because they were tired/bored/hurried and just clicked randomly. In an unmoderated online study, it’s hard to know which.

Earlier in this chapter, in Cleaning the data, we made sure we deleted the results of participants who “guessed” too often. But we may still have some cases where participants chose randomly on a task here or there.

As in any quantitative research, the key to spotting and ignoring true outliers is a large number of participants. Suppose, for example, that 3 participants choose a certain incorrect answer in a study:

If it’s 3 participants out of 100, that’s 3% of responses, and we can safely ignore it.
If it’s 3 participants out of 20, that’s 15%, which is much harder to ignore.

This, of course, is why we recommend getting at least 50 participants per user group:

When we have small numbers, any answer that gets a few clicks must be considered.
When we have large numbers, it’s much easier to spot outliers and ignore them. 50 participants is a good minimum for this, and 100 or more make it very easy to identify answers we can ignore.

As a rough rule of thumb, we ignore results of about 5% or less. For 50 participants, that means ignoring answers that get 1 or 2 (or even 3) clicks, which is very common in tree-test results.

Next: Finding patterns among tasks