Does Geography Matter?

Today I’ve been writing for the new edition of Handbook of Usability Testing about setting up a test environment. Should you be in the lab or in the field? If you’re in the lab, what should the setup be like and why? These seemed like fairly easy questions to answer. But then I got to a question that I’ve been wondering about myself for years: Does geography matter?

Nielsen says it doesn’t

Jakob Nielsen’s April 30, 2007 Alertbox (http://www.useit.com/alertbox/user-test-locations.html) says that geography doesn’t matter (unless there are international considerations or a single industry dominates the location or a couple of other things). “You get the same insights regardless of where you conduct user testing, so there’s no reason to test in multiple cities. When a city is dominated by your own industry, however, you should definitely test elsewhere.”

I sent my question around to several usability testing experts. Jared Spool sent one of the most interesting, but nearly everyone had experience that indicates that geography does matter.

Spool, Killam, and James say it does matter

“Remember,” Jared Spool says, “if you know everything [emphasis mine] there is to know about your users, their tasks, and their contexts, then you never need to test in the first place — all you need to do is be really smart and create a simple design. At that point, it boils down to a simple matter of programming.”

Bill Killam, of User-Centered Design put it this way:

Performance and subjective preference and motivation are all linked, so any change in location that affects one or more of these can be a factor across all of them. But we usually find it appears only in subjective data – not as much in behavioral observations. Even local variations like testing within the client’s office versus a “neutral lab” sometimes have noticeable effects on things like projected responding. However, also consider regional differences in the use or exposure to the product being tested. That will certainly effect results. Not to use too specific an example, but consider testing voting machines in the DC area versus a rural location. Or DC where paper and DREs [direct recording electronic voting machines] already exist versus NY where a full face ballot is used versus Oregon where all votes are by [mail].

Janice James contributed, “I’ve found that it IS important to test across multiple locations because I’ve found that the users do differ in terms of their experience level and exposure to product types, and technology, in general.”

Professor Spool and I continued the conversation by IM:

Dana: Okay, so it seems like your answer and Jakob’s article come from different assumptions. Jakob seems to assume that the field work is done. The team knows the context, etc. You seem to be saying that teams don’t always do the field work, first. By Nielsen’s parking meter example, the design team seems to have some background about the location.

Jared: Except teams always think they know everything.

Dana: I also think Jakob is assuming a fairly mature UX [user experience] group.

Jared: But, Jakob says, except for the few special cases discussed below, we’ve always identified the same usability findings, no matter where we tested. By now, we can clearly conclude that it’s a waste of money to do user testing in more than one city within a country. Good thing he wasn’t testing soda. Or pop. Or coke.

Dana: Yes, to your example, testing IA [information architecture] is a REALLY good reason to test in multiple locations. And the design team always will get some benefit from being on site – usually something that wasn’t predictable.

Dana: And with the audience for this book, I think it’s safe to assume that they won’t have done much (or any) field work before doing usability testing.

Jared: Right.

Jared: Testing in more than one locale is definitely a luxury.

Jared: I wouldn’t not test at all because you can’t get to more than one venue. Another approach is to make it work great for the local community and look to support and other feedback channels to hear if regional differences pop up. It’s the cross-your-fingers approach to design. It’s worked well through the centuries. Another approach is to look at other competitive/comparable designs for things that might be regional. If the designs have elements that seem different, is there a regional explanation?

Jared: Many design issues are just pure human behavior, independent of any cultural or regional issues.

Dana: I believe that.

Jared: Rolf [Molich] and Carolyn [Synder] did a study where they tested people in two countries on the same sites. They found 80% of the problems were in common. They found regional biases. People in Europe didn’t understand the purpose of a gift registry (and found it to be quite vulgar). But, if you perfected the design for your local venue, you’d nail 80% of the problems found anywhere else, if you extrapolate their results. And that’s a pretty good hit rate for a small budget.

Dana: I agree.

Jared: My guess is that’s what Jakob was trying to say.

Dana: That’s possible.

Jared: It’s hard to say with his shield of impenetrable ego obscuring the real intent.

Dana: Do you mind if I clean up this thread and use it in a blog post?

Jared: Not at all.

Jared: You can even leave in the impenetrable ego comment.

Dana: Makes it more believable that it was a conversation with Jared Spool.

Jared: Remember, all elephants are tall and flat, except for the instances when they are long and skinny.

Dana: That’s right. Anyway, thanks for answering the email and for continuing the discussion. I appreciate it.

Jared: I’m saying his exceptions are the generalized case. And his generalized declaration is rarely executable.

Beware the Hawthorne Effect

In a clear and thoughtful article in the May 3, 2007 Journal of Usability Studies (JUS) put out by the Usability Professionals’ Association, Rich Macefield blasts the popular myths around the legendary Hawthorne effect. He goes on to explain very specifically how no interpretation of the Hawthorne effect applies to usability testing.

Popular myth – and Mayo’s (1933) original conclusion – says that human subjects in any kind of research will perform better just because they’re aware they’re being studied.

Several researchers have reviewed the original study that generated the finding, and they say that’s not what really happened. Parsons (1974) was the first to say that the improvement in performance of subjects in the original study was more likely due to feedback they got from the researchers about their performance and what they learned from getting that feedback.

Why it doesn’t apply to usability tests

Macefield convincingly demonstrates why the Hawthorne effect just doesn’t figure in to well designed and professionally executed usability tests:

  • The Hawthorne studies were longitudinal, most usability tests are not.
  • The subjects were experts, most participants are novices at something in a usability test because what they are using is new.
  • The metrics used in the Hawthorne studies were different from most usability tests.
  • The subjects in the Hawthornestudies had horrible, boring jobs, so they may have been motivated to perform better because of attention they got from researchers; it’s possible in usability tests that participants are experiencing unwanted interruptions by being included or that they’re just doing the test to get paid. 
  • The Hawthorne subjects may have thought that taking part in the study would improve their chances for raises or promotions; the days of usability test participants thinking that their participating in studies might help them get jobs are probably over.

What about feedback and learning effects?

We want feedback to be part of a good user interface, don’t we? Yes. And we want people to learn from using an interface, don’t we? Again, yes. But, as Macefield says, let’s make sure that all the feedback and learning from a usability test comes from the UI and not the researcher/moderator. Instead, get to the cause of problems from qualitative data such as the verbal protocol from participants’ thinking aloud to see how they’re thinking about the problem.

Look at effects across tasks or functions

Macefield suggests that if you’re getting grief, add a control group to compare against and then look at performance across tasks. For example, you might expect that the test group (using an “improved” UI) would be more efficient or effective in all elements of a test than a control group. But it’s possible that the test group did better on one task but both groups had a similar level of problems on a different task. If this happens, it is unlikely that the moderator has given feedback or prompted learning to create the effect of improved performance because the effect should be global across tasks across groups.

Macefield closes the article with a couple of pages that could be a lesson out of Defense Against the Dark Arts, setting out very specific ways to argue against any assertion that your findings might be “contaminated.” But don’t just zoom to the end of the piece. The value of the article is in knowing the whole story.

Moderating tips and techniques

Getting the right information from the participant can be a difficult. As the moderator, you must attend to many things besides what the participant doing and saying. Focusing on a few specific behaviors of your own will help you have a better test.

Focus your attention on what’s happening now

  • Quickly build rapport with the participant
  • Listen attentively
  • Be open to what might happen in a session – be ready to learn from the participant

Tips for being a better moderator

Be the neutral observer – avoid priming or teaching. If you’re too close to the product or the domain, you may train participants without realizing it by using keywords in your task scenarios or materials.

Observe at the expense of collecting data, if you must. It is difficult to take notes and to watch the participant at the same time. If things are happening quickly or you find yourself missing things the participant is saying or doing, just stop taking notes. Instead, listen and spend time between sessions making notes about what happened. Go through your recordings later if you need to, or ask observers to share their notes.

Play dumb – don’t answer questions. If participants perceive that you are an expert on the product, they may ask you questions about it or look for your approval on actions. Instead, let her know that you are learning too, and that you’ll note her questions but won’t always be able to answer them.

Flex the script and test plan. Even after you pilot test your test, you may have to adjust on-the-fly when participants do unpredictable things. That’s okay. You’re learning important things that fit into your aggregate patterns of use.

Practice and get feedback. Ask co-workers and observers to give you feedback about how you conduct sessions and how you ask questions.

Your own self-awareness is your best tool for moderating test sessions successfully. Following these guidelines should help you get valid, reliable data from your participants, even if your attention is slightly divided.

Why create a test design?

I get a lot of clients who are in a hurry. They get to a point in their product cycle that they’re supposed to have done some usability activity to exit the development phase they are in and now find they have to scramble to pull it together. How long can it take to arrange and execute a discount usability test, anyway?

Well, to do a usability test right, it does take a few steps. How much time those steps take depends on your situation. Every step in the process is useful.

The steps of a usability test
Jeff Rubin and I think there are these steps to the process for conducting a usability test:

  1. Develop a test plan
  2. Set up the testing environment and plan logistics
  3. Find and select participants
  4. Prepare test materials
  5. Conduct the sessions
  6. Debrief participants and observers
  7. Analyze data and observations
  8. Create findings and recommendations

Notice that “develop a test plan” and “prepare test materials” are different steps.

It might seem like a shortcut to go directly to scripting the test session without designing the test. But the test plan is a necessary step.
Test plan or test design?
There’s a planning aspect to this deliverable. Why are you testing? Where will you test? What are the basic characteristics of the participants? What’s the timing for the test? For the tasks? What other logistics are involved in making this particular test happen? Do you need bogus data to play with, userids, or other props?

To some of us, a test design would be about experiment design. Will you test a hypothesis or is this an exploratory test? What are your research questions? What task scenarios will get you to the answers? Will you compare anything? If so, is it between subjects or within subjects? Will the moderator sit in the testing room or not? What data will you collect and what are you measuring?

It all goes together.

 

Why not just script the session without writing a plan?
Having a plan that you’ve thought through is always useful. You can use the test plan to get buy-in from stakeholders, too. As a representation of what the study will be, it’s understanding the blueprints and renderings before you give the building contractor approval to start building.

With a test plan, you also have a tool for documenting requirements (a frozen test environment, anyone?) for the test and a set of unambiguous details that define the scope of the test. Here, in a test plan, you define the approach to the research questions. In a session script, you operationalize the research questions. Writing a test plan helps you know what you’re going to collect data about and what you’re going to report on, as well as what the general content of the report will be.
Writing a test plan (or design, or whatever you want to call it) will give you a framework for the test in which a session script will fit. All the other deliverables of a usability test stem from the test plan. If you don’t have a plan, you risk using inappropriate participants and getting unreliable data.

Should you record sessions on video/audio?

The accepted practice for professional usability practitioners has been since the beginning of time to record sessions on video. It is something that we tend to do automatically.

There aren’t many obstacles to recording sessions these days. It really only takes a web camera and some relatively inexpensive recording software on the testing PC. (Of course, this assumes that you’re testing software or web sites that run on desktop or laptop computers.)

Recording is inexpensive
The software is pretty easy to use and it doesn’t cause issues with response times or otherwise fool with the user’s experience of using the software or website you’re testing. You get nice, bright colors, picture-in-picture, and you can capture it all digitally. For example, there’s Morae, by TechSmith. (In the interest of full disclosure: I own a license, and I have upgraded to the new version). With Morae, you can capture all sorts of nerdy bits. It’s a good tool.

Even if you decide to use a regular video camera rather than a web cam, or multiple cameras, that technology is cheaper and more accessible all the time. Storage media also is very inexpensive.


But should you record sessions?

Karl Fast on Boxes and Arrows (from August 2002) has a whole treatise on recording usability test sessions: http://www.boxesandarrows.com/view/recording_screen_activity_during_usability_testing. He called it “crucial.” I say Not.

Know why you’re recording
You may want the video recordings for reviewing, or sharing with a research partner. You may want your boss to sit down and watch the recorded sessions as evidence. Most practitioners will say that they use video recordings as backup to notes. You could go back and review the recordings.

Most usability tests have fairly few participants. Say you’re doing a study with 5 to 8 participants. If your notes from so few sessions don’t help you analyze the data, you should work on making better data collection tools for yourself or make it a practice to write notes about what happened immediately following each session. Reviewing recordings is making work for yourself.

But do you actually review the recordings? Rarely. And do people who could not attend the sessions review the recordings later? Again, rarely.

Know how you’re storing recordings and control access to protect the privacy of participants
And let’s consider participant privacy and confidentiality. Digital recordings are easier than ever to manage and archive. However, the longer the recordings hang around your company, the more likely it is that they will a) get lost, b) fall into the wrong hands, or c) be misused in some way. A client once asked me if her company could review a tape of a participant because he was coming in for a job interview. I said absolutely not.

You ask participants to sign a recording waiver that sets out specific purposes of the recording. Someone has to make sure that the waiver is respected. That person is the usability specialist who recorded the session to begin with.

Manage recordings carefully
The form that you ask study participants to sign asking for their permission to record, you should also state in plain language

  • How the recording will be used
  • Who will use the recording
  • How long you (or your company) will store the recording
  • How the recording will be destroyed

But get it approved by your legal department, of course.

There are some good reasons to record sessions on video. There are a lot of good reasons not to. Should you?

Keeping a rolling list of issues throughout a study

Design teams are often in a hurry to get results from usability studies. How do you support them while giving good data and ensuring that the final findings are valid?

One thing I do is to start a list of observations or issues after the first two or three participants. I go over this list with my observers and get them to help me expand or clarify each item. Then we agree on which participants we saw have that particular problem.

I continue adding to that list the numbers for each participant who had the issue and note any variations on each observation.

For example, in a study I’m working on this week, we noted on the first day of testing that

Participants talked about location, but scrolled past the map without interacting with it to get to the search results (the map may not look clickable)

I went back later and added the participant numbers for those who we observed doing this:

Participants talked about location, but scrolled past the map without interacting with it to get to the search results (the map may not look clickable) PP, P1, P3

Today, I’ll add more participant numbers. At the end of the study, we’ll have a quick summary of the major issues with a good idea of how many participants had each problem.

There are three things that are “rolling” about the list. First, you’re adding participant numbers for each of the issues as you go along. Second, you’re refining the descriptions of the issues as you learn more from each new participant. Third, you’re adding issues to the list as you see new things come up (or that you didn’t notice before, or seemed like a one-off problem).

I will still go back and tally all of the official data that I collected during each session, so there may be slight differences between these debriefing notes and the final report, but I have found that the rolling issues list and the final reports usually match pretty closely.

Doing the rolling list keeps your observers engaged and informed, helps you cross-check your data later, and gives designers and developers something to work from right away that is fairly reliable.

When to ask participants to think out loud

I was taught that one of the most important aspects of moderating usability study sessions was to encourage participants to think out loud as they worked on tasks. While the technique is good and useful in many usability test situations, it isn’t always the best approach.

Get data about how and why people do things
This “verbal protocol,” as it is known, can be an extremely useful thing to have. If the participant is good at thinking aloud, you all hear about how how she is forming the task and how she is thinking about reaching her goal. You will hear about why she is doing the things she is doing, and the words she uses to describe it all. You also will get verbal feedback about how the participant feels about what is happening because she may say that she’s frustrated, or annoyed or even happy.

What the data means
Hearing how a participant forms a task tells you whether the designers and the user are thinking of (modeling) the task in the same way.

Hearing why a participant is taking a particular step tells you where your design does and does not support users’ goals.

Hearing the words gives you labels for navigation, links, and buttons. Your information architecture should match the participant’s vocabulary.

Hearing the emotion tells you how severe a problem may be.

These are all good things to know.
How to get a good think-aloud
Some people think aloud naturally, or at least will verbalize their questions and frustrations just because there’s someone else in the room (that would be you, the moderator). But most people need to be primed to do it, and some even need practice doing it.

In your introduction to the session, ask participants to tell you what’s going through their minds as they do tasks.

Consider incorporating a practice task that lasts for half a minute, just to get participants to try it out. Encourage them and quickly move on.

When you describe the task scenario to participants, remind them to think aloud.

During the task, when something seems to be frustrating, annoying or hindering — and the participant isn’t talking — ask her to tell you what she’s thinking.

Know that there’s more going on than you can hear

People filter automatically
Participants can’t tell you everything they’re thinking. And really, you don’t want that. Humans can process on a number of cognitive tracks at the same time. Most study participants will automatically be able to distinguish between what is related to the situation and what isn’t.

This is a test
They also may filter what they tell you beyond this basic distinction. For example, they want to do well. Although you tell participants you are not testing them, a participant might feel some level of test, even if she’s just in competition with The Machine.

Participants may fear failure or embarrassment. In usability studies, people often persist at times when they would normally ask for help.

People tend to give positive feedback
Participants want to give you a good session. People are conditioned to say and do things for the approval of others. They want the moderator to approve of their performance.
Participants take responsibility for bad design
People who are novices at a task or are working with something outside their experience may excuse the design by taking responsibility for a design problem. For example, they may say they could do it now that they (have failed and) have done it once. Or they just need more time to learn the site. This is especially common among older adults who are unsure of their computer or other appropriate skills.

When you might not want to use think-aloud
There are times when using think-aloud can conflate or dilute your data. There are other situations in which using think-aloud is just difficult, or won’t work for the type of participants you have in your study.

Time on task
If you want to measure how much time it takes people to complete a task because you are particularly concerned with efficiency, introducing think-aloud is probably a bad idea. Talking about what you’re thinking slows you down while you choose words to convey your ideas about what’s happening and why.

Audio feedback in a user interface
Some interfaces incorporate audio feedback to indicate statuses or modes. These auditory cues may be overlapped by the participant talking so the participant may miss something important happening – or you might. Also, many blind people and people with severe vision impairments use screen readers to use software and web sites. If you’re tuned in, you can learn things by listening to the screen reader as it works. And, although most of the people with visual disabilities who use screen readers who I have observed can listen and talk at the same time (like sighted people can see or read and talk at the same time), as a sighted moderator, my auditory channel is challenged by listening to both the screen reader and the participant at the same time.

You’re interrupting a taxed thought process
People who have short-term memory loss, are medicated, or have other cognitive limitations tend to stop talking when they encounter obstacles to reaching their goals. You might be tempted to prompt these people to “tell me what you’re thinking,” but try not to. They’re concentrating on working around the obstacle. If you watch closely, you can see their solution unfold. After it does, then ask them about how they got to it.

An alternative to think-aloud: Retrospective review
“Retrospective review” is just a fancy name for asking people to tell you what happened after the fact. Go back to a particular point in the task, set the context, and ask the participant to tell you what was happening. For example, say something like this: “When you got to this point on the registration form [pointing to a field], you stopped talking. Tell me about what you were trying to do and what was happening.” The participant may revise what happened, but you will have good notes and the memory of someone who was observing closely, not trying to perform, so you can pinpoint issues that you thought were happening. Invite the participant to correct your perceptions.

If you have the tools and time available, you can go to the video recording so the participant can see what he did and respond to that by giving you a play-by-play commentary.

It’s a great tool, used at the right time with the right participants
Think-aloud or verbal protocol can give you rich data about vocabulary and effectiveness of design. From it, you can also get some impression of the severity of problems for a particular participant or the level of satisfaction for someone who had a positive experience. Use think-aloud in exploratory or formative studies to help you understand how users are modeling their tasks. Consider carefully whether to use it in other situations, though, to ensure that you’re not adding to the cognitive load that the participant is already experiencing.

The Hardest Part: Getting the right participant in the room

This week has proved to me that that nothing — nothing — matters as much as having the right participants.

Without the right participants, it all falls apart
If you don’t have participants who are appropriate, you can’t learn what you want to learn because they don’t behave and think the way real users do. You may get data, but what does it mean? Not much.

Who’s the right participant?
The right participant is a person. Not a set of demographics or psychographic data taken from market segementations. It’s easy to lose sight of the idea that the person sitting in the chair using the product you’re testing is a person and not a tool for you to identify design problems – a substitute for you. He or she is a person with a personality, habits, memories, beliefs, attitudes, abilities, intelligence, experience, and relationships. You want the person to bring those things with them (along with their computer glasses). That’s the stuff of mental models. That’s what makes the sessions interesting and unpredictable.

How do you know?
You should be able to visualize who participant-person will be by talking about the kinds of things you want them to do in the session. Here’s an example from a study I’m working on right now. We want

 

Someone who travels at least a few times a year and stays a couple of nights in a hotel on each trip. This person books his own travel because it’s quicker and easier than giving instructions to someone else. He likes to book online because he can see options and amenities that inform his final decisions. This traveler knows where he’s going, how to get there, and what to do on arrival.

 


There’s a task with a context: booking travel accommodation online. There are motivations: it’s comparatively easy and there’s decision-making information available that isn’t otherwise. There is a level of experience in the task domain: traveling a few times a year and staying in hotels.

Visualizing participants this way is a technique I borrowed from User Interface Engineering.

You can create a screening questionnaire from that description that should get you appropriate participants. And look, there are very few selection criteria embedded in the visualization. We don’t care what the annual household income is, or the education level, or even what the person’s job is. Don’t make this too hard for yourself by collecting data you’re not going to use. (Besides, then you have to protect that personal information, but I’ll talk about that later.)

Now, share your test objectives and your visualization of the participant with your recruiter.

 

Stay tuned for much more about recruiting, like how to work with a recruiter, where to find the right participants, and lessons that Sandy and I have learned through dozens of recruits.

Getting Started with HUT

With this piece, I can officially announce that I am co-authoring with Jeff Rubin a new edition of the Handbook of Usability Testing. (We refer to it affectionately as HUT.) This great little book was published in 1994, but Wiley, the publisher, has seen it continue to sell and so wisely asked Jeff to do another edition after all this time.

It seems that there is still a need for a how-to book for people who don’t do usability testing for a living.

Jeff and I are collecting feedback from readers of the first edition while we work on the new edition, which is due out in spring of 2008. Yes, we’re just getting started. So if you have comments, ideas, or experiences about using the book that you want to share, bring ’em on.