A free comprehensive guide for evaluating site structures

Page tree
Skip to end of metadata
Go to start of metadata


While a tree test is running, it’s good practice to log in every day or two and see how things are coming along.

We all hope that our studies attract lots of participants, of each type we ask for, and that the results show how much better our new tree is than the old one.

While that does sometimes happen, more often we find that things are not quite running to plan. Here are the problems we encounter most when we check the progress of a study.


Not enough responses

Low participant numbers are the most common worry for any online study.

If we’re halfway through the test period and we only have a quarter of the participants we hoped for, we may need to work harder to find people.

  • Email: If we only sent an initial batch of email invitations, we send another batch.

  • Web ads: We re-check that our ad is very visible and presents a concise, attractive proposition. We also consider putting it on more web pages and (if possible) on more websites to get more views.

  • Social media: We often repeat our invitation on our social networks halfway though the testing period.

  • Incentive: If we suspect that our incentive is not big enough (and we’ve done everything else we can to boost our responses), consider increasing it. If management reduced our planned incentive because they thought it was excessive, we may want to revisit that decision with them (with the data in hand).

Missing user groups

When we target several user groups in a single tree test, sometimes we get lots of people from group A and B, but hardly any from group C. If we included a survey question that identified the participant’s user group, we can check that now to see if any groups are lagging and need more recruiting effort.

Obviously, the best way to boost a certain group’s numbers is to invite more of that group.

  • If we have group-specific email batches that we haven’t sent yet, that’s the easiest thing to do.

  • If we can place an ad on websites that this group frequents, that should also help boost our numbers.

  • If this group is likely to have some kind of organization that they belong to (a trade association, meet-up group, special-interest forum, etc.), we may want to approach the organization’s administrator and ask for help.

Unbalanced numbers between tests

Earlier we talked about splitting users randomly among tests. Usually this is an even split (e.g. two tests would each have a 50% chance of being selected), but sometimes we find that, halfway through the test period, test A has two thirds of the responses for some reason.

Whether we used code or a set of arbitrarily split links, we can change this partway through the test to even up the numbers.

  • If we used a set of links, we can change the split from something like "first name A-M, first name N-Z" to "first name A-E, first name F-Z" so that the first test now gets 20% of the clicks, while the second test (the one that’s lagging) gets 80%.

  • If we used code, we can change it from a 50/50 split to 80/20 in favor of the under-supplied group.


Low success rates at first

Besides the number of participants, the other big thing we’re sure to check is the scoring – how well our tree is performing overall, and how individual tasks are doing.

Very often, we’ll be surprised (and appalled) by how low the interim scores are. Some part of the low scores will be justified – especially in a first-round test of a new tree, parts of that tree will simply not work well for participants. Testing simply lets us identify the parts that need rethinking.

However, we also find that interim scores are often lower than expected because:

  • Some tasks may be confusing or misleading.
    This is especially likely if we didn’t properly pilot our test. Some tasks are hard to phrase clearly without giving away the answer, but remember that a confusing task is a problem in the study, not necessarily a problem in the tree itself. We shouldn’t change the wording during the test, but we should revise in our next round of testing.

  • Some correct answers aren’t marked as “correct”.
    After doing hundreds of tree tests, we still run into this wrinkle all the time. When we set up each task, we try to mark all the correct answers for it. However, in a large tree, each task may have several correct answers, and it’s likely we’ll miss a few.
    Because of this, a good testing tool should let us (as test administrators) change which answers are correct for each task, either while the test is running or afterward when we’re doing the analysis. We often find that test scores go up substantially when we do this post-test correction. For more, see Cleaning the data in Chapter 12.

Very high task scores

Ideally, a high task score means that we did our job well when we created the tree.

Unfortunately, it can also mean that we included a “giveaway” word (and didn’t spot it during piloting). If we did, then this isn’t a fair measure of the real-word effectiveness of the tree.

Again, we shouldn’t edit the task’s wording while the test is running (unless we spot it very early); fix it in the next round.


High drop-out rates

We may find that lots of participants start our study, but many drop out before they finish it.

We’ll always have some drop-off (it’s the nature of online studies), but if it exceeds about 25%, we should investigate.

  • At the explanation page
    If our web ads or email invitations link to an explanation page, we can use web analytics to compare how many people visit that page to how many actually start the tree test itself. A large drop-off here indicates that the explanation page is either confusing, hard to scan, or the “start” link is not obvious. (Is it prominent and above the fold?)

  • During the test itself
    If a person makes it to the tree test itself, try to find out where they drop out. (Unfortunately, most testing tools do a poor job of helping us in this regard.)

    If it’s during the welcome/instructions stage, they may be finding these pages confusing, too long, or simply not what they expected. We can check this by trying the test with a few people in person to see where the problem lies.

    If they drop out during the tasks, it could be caused by:

    • Having too many tasks (seeing “1 of 26” is daunting)

    • Presenting tasks that are confusing (“Forget this, it's just too hard”)

    • Or simply because this is the first time they’ve done a tree test and they’re not sure what to do. (Better instructions may help, but some people will leave no matter how well we explain it.)


Next: Keeping stakeholders informed


  • No labels
Write a comment…