Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Table of Contents
maxLevel1

...

The great thing about online testing tools is that, not only are the results compiled and partially analyzed for us, they are usually available instantly, even while the test is still “live”.

...

For example, we ran three tree tests for an electricity company. We expected a low score for the existing site tree, but we also got low scores for the new trees we had designed:

Existing tree

New tree 1

New tree 2

Uncorrected score

36%

40%

35%

While this was lower than we expected across the board, we told ourselves not to panic. From experience, we knew that the original data needs cleaning up before it accurately shows what happened.

There are a few common reasons for this:

  • Participants may have misinterpreted some of our tasks, leading to a lot of wrong answers.
    There’s not much we can do about this once the results start coming in, other than to take those tasks’ results with a grain of salt, and to fix those tasks in later tests.

  • Some participants may have given garbage answers, for a variety of reasons.
    We can remove some or all of these after the fact.

  • It’s likely that participants discovered some correct answers that we missed.
    We can fix these after the fact too.

Safeguarding the original data

The first thing to do, before we touch the data at all, is to back it up. We always want to have a copy of the original data in case anything goes wrong with the tool or our data clean-up.

...

This is also useful if the tool we’re using goes offline, either temporarily or (eventually) permanently. In the unlikely case that we need to go back and look at the raw numbers a year or two from now, we’ll want to have our own offline copy of the data.

Removing garbage sessions

Most online studies collect some garbage data.

...

This is especially true when the study is open to the public and there’s an incentive involved. We get more respondents, but some of them are just there for the prize, and will zoom through the study as quickly as possible to get to the pay-off.

  • Sometimes this means lower-quality data.
    They did the task in a rush, so their decisions were not as considered as they might be in real life. On the other hand, many people “rush” through their normal web browsing, so it’s hard to quantify this effect.

  • Sometimes this means garbage data.
    They clicked randomly, or chose the same option each time, just to get through the test quickly.

We can normally weed out the latter, and reduce some of the former.

Going too fast

The first thing we look at is sessions that were done too quickly. Most tools track the total time taken for each participant, and this becomes a good way to weed out garbage data.

...

For the remaining sessions (those approaching “half time”, or whatever threshold we’re comfortable with), we review the data. This means looking at the click paths of each task for that participant, to see if there are any clear indications they were intentionally speeding through our tests. The most common indications are

  • Choosing the same item at each level (often the first or last item)

  • Going down the same path for every task

  • Choosing nonsense paths for every task
    Careful with this one, because what we consider a “nonsense path” might have made sense to them. Only suspect those who do this for a large number of tasks.

If we find a session with a lot of this kind of garbage, we delete it, and if we are doing a prize draw for this study, we remove that participant from the draw. This is not a behavior to encourage.

Note

Some tools let us “exclude” sessions from the analysis. This is what we think of as a “soft” deletion; the session is removed from the analysis, but it’s not actually deleted, so we can get it back later if we change our minds. If our tool offers exclusions, we recommend using them to clean up the data instead of actually deleting the data outright.

Skipping too many tasks

Another way that some participants speed through a test is by skipping tasks.

...

For the remaining few, we review their click paths (like we did above for those who went too fast), look for the same indicators, and delete/exclude those that look guilty.

Being wrong way too often

The final check we do is for participants who got every task wrong (or something close to that). This is a clue that they may not have made an honest effort.

...

Again, we review their click paths as described above, look for the same indicators, and mete out our justice accordingly.

Updating correct answers

One more way that we “clean up” the data is changing the correct answers for our tasks.

...

When we find pages that seem like very reasonable places to go for a given task, we need to make sure those pages actually help users who are performing that task in real life. We can either:

  • Include content on that page that satisfies the given task, or

  • Include a prominent cross-link to a page that does satisfy the task.

Recalculating the results

If we do alter the results (either by excluding/deleting sessions, or by adjusting correct answers), we need to make sure the scores are recalculated accordingly.

...

For the power-company study we described above, when we recalculated the scores after adding some missed correct answers, our results changed substantially (although they were still lower than we would have liked):

Existing tree

New tree 1

New tree 2

Uncorrected score

36%

40%

35%

Corrected score

46%

43%

47%

...

Next: Sharing the data