Success rate

No surprisingly, the most important thing to look at is success rate – how many participants chose the correct answer, across all tasks?

Most tools will give you this as a rating out of 10 or 100. For example, a score of 45 means that 45% of the time, participants chose a correct answer.

Once you see a tree’s overall success rate, the natural question is “Well, is that good, bad, or just average?”

As any consultant will tell you, it depends. Mainly, it depends on two things:

Size of the tree – All other things being equal, it’s harder to find things in a larger tree (or haystack, as the saying goes).
Complexity for the intended user – If the topics and subtopics in the tree are challenging for participants to understand, they’re going to have a tougher time finding the right answers.

But we do need to start from somewhere. In our experience, over hundreds of tree tests, the following rough markers have emerged for trees of average size and complexity: ~are these just success rate, or composite?

0-50 – The tree needs to be completely rethought or discarded.
Trying to tweak it will only bring it up to “mediocre”.
50-65 – The tree needs substantial revisions.
If your analysis reveals specific problems (and it should), and you think you can fix them, you should be able to revise this tree to perform well.
65+ - The tree is effective, but may need minor revisions.
Your participants are finding the correct answer at least two-thirds of the time, so the tree is doing its job well, and only needs tweaking.

A high score doesn’t mean “no revisions needed”. We’ve never run a tree test where everything worked so well that we couldn’t improve it a bit more. There are always a few lower-scoring tasks that suggest further improvements.note

how TT scores relate to UT scores, why 65+ is good, and the role of visual design, nav aids, content, etc. in eventual success rates. Lisa Fast’s graph?

Directness (backtracking)

To get a general idea of the effectiveness of your tree, it also helps to look at how directly your participants found the right answer. Did they go straight there, or did they have to try a few different paths first?

Tree Testing for Websites > Reviewing overall results > image2015-12-16 11:13:42.png

How this is scored depends on the tool you’re using:

Some tools treat directness as a simple yes/no measure – did the participant backtrack at all during a task? This method doesn’t care if they backtracked 1 time or 5 times during the task – it’s either yes or no.

In our experience, 70% is an average score for this method. Less than that indicates that users are having trouble finding the right path.
Some tools try to quantify how much wandering a participant did. A single back step lowers their score a bit, but repeated meanderings through the tree lowers it much more.

~guidelines for scores using this method?

While the overall directness score gives you a rough idea of how clear and distinguishable your headings are, you’ll need to drill down to specific tasks to determine where the most backtracking happens. For more on this, see “Where people backtracked” later in this chapter.

Speed (time taken)

Most tree-testing tools show you the average (or median) time taken by your participants to complete the tree test.

Tree Testing for Websites > Reviewing overall results > image2015-12-16 11:15:12.png

Comparing times between trees

If you’re testing several trees against each other, and the trees are approximately the same size (in breadth and depth), you can compare these overall times to see if some trees are “slower” than others. This suggests that participants either had to:

think a bit longer between clicks, and/or
click more times to get to their answer

This is a very rough measure, however, and to make sense of it, you’ll need to drill down to see which tasks (or specific areas of the tree) are slowing down your participants. For more on this, see “Where people slowed down” below.

Keeping your study brief

A more practical use for the average time taken is making sure that your tree test is not taking too much of your participants’ time.

In general, we recommend an overall duration of 5 minutes for a tree test. This is typically how long it takes the average participant to do 8-10 tasks (our recommended amount) for a medium-size tree (200-500 items).

If you have a larger tree, your test time may exceed this, but we still recommend that you keep it under 10 minutes to avoid participant fatigue and boredom.

If your average duration is longer than this because you are asking each participant to do a lot of tasks (say, 12 or more), you are likewise inviting participant fatigue and boredom. More importantly, your results may be skewed by the “learning effect” – see How many tasks? in Chapter 7.

A “total” score

Some tools present a single overall score, combining several measures: success rate; directness; speed; and so on. This overall score typically uses some kind of weighting, with success rate usually being the biggest factor.

This is useful when testing trees, because it makes us consider more than just the success rate itself. If people can find items in our tree, but they have to do a lot of backtracking, or they have to ponder each click, there’s something wrong and the score should reflect that.

Note that the various online tools differ in how they calculate their overall score, making it harder to compare scores between tools:

Treejack calculates its overall score as a weighted average of success rate and directness (at a 3:1 ratio), but does not include speed in its calculations.
~other tools?

Next: Analyzing by task