Once we get a score for a task, we naturally want to know why it scored like that:

For high-scoring tasks, we want to know which parts of the tree were particularly effective in steering participants to the correct answers.
For low-scoring tasks, we want to find out where participants went off the rails – which parts of the tree confused them or lured them down the wrong path.

A typical success

Let’s start with an example of a high-scoring task, because they are usually easier to analyze.

This one comes from InternetNZ, a non-profit Internet advocacy group in New Zealand. When they redesigned their website, they ran several tree tests to check the effectiveness of the new structure. Here’s a task that performed fairly well:

As we can see from the task summary, almost three quarters of our users found a correct answer, with very little backtracking. That’s good to know, but it’s not enough – we need to see where they went, both for the correct and the incorrect cases.

Most tools give us a way to examine the click paths for a given task. We used Treejack for this study, which offers a “pie tree” diagram showing where participants went:

For this example, let’s concentrate on the paths, not so much the pies themselves:

The green paths are the right ones.
The gray paths are the wrong ones.
The thickness of the line shows how many people went that way.

The first thing we notice is how sparse the graph is – it only shows paths that participants actually took, and they didn’t take many different ones for this task.

When we run a tree test, this is what we want to see – a small number of paths taken, lots of traffic on the correct paths, and everyone clear on what everything means. The participants knew where they were going, and agreed on the same answers.

Note that there are two correct paths here. In the original tree, principles were only in the Policies section, but we saw most people go to About Us. So we put the Principles topic in both places. It would live in one place, and be explicitly cross-linked from the other. The same thing happened this time – most participants went to the About Us section, but this time there was a correct answer waiting for them.

Even the wrong answer is not so wrong here. We could feel pretty confident that users who went to an Our Mission page would get a partial answer for (and probably a link to) the organization’s principles.

Notice also that there was very little “leakage” at the top level (the first click from the home page). Only 1 participant out of 63 made a wrong turn at the start. We’ll discuss the importance of first clicks in What they clicked first later in this chapter.

A typical failure

But tree tests are not all sunshine and lollipops. Some of our tasks will probably look like this one, again from InternetNZ:

The correct answer in under Policies, in a section called Jurisdiction, but 87% of our participants failed to find it. Only 3% gave up, so where did all the others go?

The answer is that they went everywhere. Graphically, this is what “everywhere” looks like:

Quite the mess. But what can we learn from it?

If we look at the thickest paths first, we can see that most people did go to the Policies section (correct), and then mostly to the two correct areas (Projects by name, Projects by topic), but then they scattered. In those sections, they couldn’t agree which subtopic was the right one.

For example, let’s look at the Projects by name section more closely:

A large chunk of participants came here, and the correct answer (Jurisdiction) was waiting for them, but most didn’t choose it (partly because it’s more abstract than the other topics).

Not only did they not find the right answer, they couldn’t agree on where to go instead. The headings were not clear themselves, and they were hard to distinguish from each other.

The graph makes another problem obvious too – the “leakage” at the top level of the tree:

Right at the start, as their first click, a large fraction of the participants ran off in all directions. This is not what we intend when we design a site structure. It suggests that either:

Our top-level headings are not clear and distinguishable (which we should be able to confirm by looking for similar results in other tasks), or
The task itself was not clear (probable if the top-level headings performed well in the other tasks)

We can also see scattering when we view the results as a spreadsheet. The vertical cluster of cells shows that, for a single task, participants chose a wide variety of subtopics under the correct topic.

Studying the click paths of a single task gives us insights into how the tree performed for that particular task. But it’s dangerous to draw conclusions based on only one scenario. What we need to do is look for supporting evidence from the other tasks in our study – see Finding patterns among tasks later in this chapter.

Discovering more correct answers

Participants select wrong answers all the time. And most of the time, they really are wrong. Occasionally, though, they’re right, because we missed a correct answer in our tree.

It seems that no matter how thoughtfully we construct our trees and tasks up front, and mark which answers are correct, once we launch the test and start seeing where they go, we always seem to find more correct answers that participants have “discovered” on their own.

Often, it’s clear that we missed an answer, so we should just fix it and recalculate our results – see Cleaning the data earlier in this chapter.

However, we do need to be careful about changing our correct answers based on incoming results, because adding correct answers will raise our scores. If we are analyzing a tree that we hope does well, it’s often a bit too easy to convince ourselves that our participants’ borderline answers should be marked as correct. For more on what we call "pandering to the task", see Revising trees in Chapter 14.

The best way to govern this is to agree on some consistent criteria for correctness ahead of time - see Identifying correct answers in Chapter 7.

Next: What they clicked first

Tree Testing for Websites

Where they went

A typical success

A typical failure

Discovering more correct answers

Related content