Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

The great thing about online testing tools is that, not only are the results compiled and partially analyzed for youus, they are usually available instantly, even while the test is still “live”.

...

For example, we ran three tree tests for an electricity company. We expected a low score for the existing site tree, but we also got low scores for the new trees we had designed:


 Existing treeNew tree 1New tree 2
Uncorrected score36%40%35%


While this was lower than we expected across the board, we told ourselves not to panic. From experience, we knew that the original data needs cleaning up before it accurately shows what happened.

...

  • Participants may have misinterpreted some of your our tasks, leading to a lot of wrong answers.
    There’s not much you we can do about this once the results start coming in, other than to take those tasks’ results with a grain of salt, and to fix those tasks in later tests.

  • Some participants may have given garbage answers, for a variety of reasons.
    We can remove some or all of these after the fact.

  • It’s likely that participants discovered some correct answers that we missed.
    We can fix these after the fact too.


...

For online tools, the most common way to preserve your our data is to download it as a spreadsheet file (XLS, CSV, etc.).

Download the entire data set and save it in a secure place. Explicitly name it as the original data, then never touch that file again. It’s not for later analysis – it’s simply a backup of your the raw data in case Murphy’s Law strikes and you we need something to go back to.

This is also useful if the tool you’re we’re using goes offline, either temporarily or (eventually) permanently. In the unlikely case that you we need to go back and look at the raw numbers a year or two from now, you’ll we’ll want to have your our own offline copy of the data.

...

This is especially true when the study is open to the public and there’s an incentive involved. You We get more respondents, but some of them are just there for the prize, and will zoom through your the study as quickly as possible to get to the pay-off.

...

For the remaining sessions (those approaching “half time”, or whatever threshold you’re we’re comfortable with), we review the data. This means looking at the click paths of each task for that participant, to see if there are any clear indications they were intentionally speeding through our tests. The most common indications are

  • Choosing the same item at each level (often the first or last item)

  • Going down the same path for every task

  • Choosing nonsense paths for every task
    Careful with this one, because what you we consider a “nonsense path” might have made sense to them. Only suspect those who do this for a large number of tasks.

If we find a session with a lot of this kind of garbage, we delete it, and if we are doing a prize draw for this study, we remove that participant from the draw. This is not a behavior to encourage.

 

 

Note

Some tools let you us “exclude” sessions from the analysis. This is what we think of as a “soft” deletion; the session is removed from the analysis, but it’s not actually deleted, so you we can get it back later if you we change your mindour minds. If your our tool offers exclusions, we recommend using them to clean up your the data instead of actually deleting the data outright.

 

...

When we find additional correct answers (and this is surprising surprisingly common), we need to go back and mark those new answers as correct, then make sure that the tool recalculates the results accordingly.

...

Recalculating the results

If you we do alter your the results (either by excluding/deleting sessions, or by adjusting correct answers), you we need to make sure your the scores are recalculated accordingly.

Depending on the tool you we use, this may be done automatically, or you we may need to trigger it manually.

And we should remember to download another local copy of the revised results, for safekeeping.

For the power-company study we described above, when we recalculated the scores after adding some missed correct answers, our results changed substantially (although they were still lower than we would have liked):

 

 Existing treeNew tree 1New tree 2
Uncorrected score36%40%35%
Corrected score46%43%47%

 

...

Next: Sharing the data