Why run a tree test? As we saw in Chapter 2, there are two common motivations:

To baseline an existing tree, discovering where the problems are and establishing a base score.
To try out some new trees that you’ve come up with, looking for problems and comparing to each other (and the baseline tree, if any).

Baselining an existing tree

If you are testing an existing IA (e.g. the structure of a current website that you’re about to revise), you’re obviously interested in finding out which parts of the current structure work well and which don’t.

Most of the time, you will already have an idea of where some of the problems are. It might be from other usability testing you’ve done, from web analytics, from user feedback, from your own gut feelings, or (most commonly) from some mixture of all of these.

When testing an existing IA, then, you’re likely to be looking for:

How well the suspected problem areas perform, and
Which other (unsuspected) areas perform particularly well or poorly

Testing revised trees

If you’re revising a site structure, you will generally be looking for:

How well the revised parts of the new structure perform, and
Which other areas perform particularly well or poorly, especially areas that may be indirectly affected by the revisions you made.

Trying out new trees

If you’re creating structures for a new website, you may not have much existing research to inform your IA work. In this case, the main value of tree testing is being able to evaluate one or more structures early in the design process, before the website exists even in beta form.

Comparing alternatives

Whether it’s architecture or brand design or vacuum cleaners, the best designers agree on one thing – generate lots of ideas early, then cheerfully discard the ones that don’t work out.

The same is true with site structures. Early in the design phase, you should think up several different ways of structuring your site. Yes, you will probably have a favorite, but your favorite may not be the best solution for your users. If you create some true alternatives and test them against each other, you’re more likely to produce a better structure.

So, whether you’re testing revised structures or new ones, a main goal of your testing should be to compare multiple candidate structures and determine which performs best.

In our experience, what often happens is that tree A performs best overall, but parts of tree B do better than their counterparts in A. The natural next step is to create a hybrid (tree C), which usually ends up testing better than either A or B. If you only created tree A, how would you ever get to C, the better structure?

Testing groupings

For most designers, the main reason to run a tree test is to determine if their main grouping scheme works well.

For example, if we decide to go with a task-based scheme (e.g. installing a product, using it, getting support, uninstalling it, etc.), we want to know if our task-based headings help our users find the page they’re looking for.

If we’re testing several grouping schemes against each other (the recommended approach), we want to find out if one scheme is clearly better than the others. If several schemes work equally well, then we can choose based on other criteria (such as how much effort it will take to rework our content to fit a scheme).

We can also flip our schemes and test which variation works better. For example:

We could create our first tree to use audiences as our top-level headings (e.g. teachers, students, parents, etc.) and tasks as our second level (choosing a school, enrolling, choosing courses, etc.).
We could create a second tree that flips this scheme, so that our top level is tasks and our second level is audiences.

For more on grouping schemes, see Chapter 5 - Creating trees.

Testing labeling

Another big reason to run a tree test is to test the terms we use. Will our users understand what “contingency planning” means? Will they be able to distinguish between “products” and “solutions”?

Labeling is a critical element of information architecture, and it’s often hard to get right. We must consider:

The terms we use internally (the organization’s jargon) vs. the terms that users understand and prefer.
For example, we may say “Business Development”, but our customers call it “Sales”. Which term should we use on our website?
The conflicting terms that our various audiences use.
Doctors may be looking for “deep-vein thrombosis”, but you and I would probably look for “blood clot”.
The many synonyms we can choose from.
Languages are rich, and there are often several words that we could use. Which works best for our website visitors?

If we test alternative terms against each other, we’ll usually find that one works substantially better than the others. That’s a clear win.

However, we may also find that two terms work equally well, in which case we can decide based on other factors. For example, a consumer-review site in New Zealand considered renaming their “Electronics” section to “Technology”, and this became the subject of prolonged internal arguments about whether users would understand the new term properly. When they ran tree tests, they made sure to include tasks that targeted these alternative terms in their trees. The result was a 49/51 split; both terms worked well, so they could use either depending on other preferences.

Sharing and documenting issues and goals

When we tree test, there are several problems we typically want to fix, but yours may not completely overlap with mine. To run a good study, we need to be clear about what we're trying to find out, which means discussing it and writing it down.

To do this, we typically run a short workshop (1 to 1.5 hours) to:

make a list of the problems we're trying to solve, and rank them
make a list of our specific goals for this study (some of which will spring directly from our issues list).

We record this list in a shared, public place (a project whiteboard, an online spreadsheet, etc.) so we can keep it handy when designing our tree tests:

pic of lists

Next: How many rounds of testing?