Beware the Hawthorne Effect

In a clear and thoughtful article in the May 3, 2007 Journal of Usability Studies (JUS) put out by the Usability Professionals’ Association, Rich Macefield blasts the popular myths around the legendary Hawthorne effect. He goes on to explain very specifically how no interpretation of the Hawthorne effect applies to usability testing.

Popular myth – and Mayo’s (1933) original conclusion – says that human subjects in any kind of research will perform better just because they’re aware they’re being studied.

Several researchers have reviewed the original study that generated the finding, and they say that’s not what really happened. Parsons (1974) was the first to say that the improvement in performance of subjects in the original study was more likely due to feedback they got from the researchers about their performance and what they learned from getting that feedback.

Why it doesn’t apply to usability tests

Macefield convincingly demonstrates why the Hawthorne effect just doesn’t figure in to well designed and professionally executed usability tests:

  • The Hawthorne studies were longitudinal, most usability tests are not.
  • The subjects were experts, most participants are novices at something in a usability test because what they are using is new.
  • The metrics used in the Hawthorne studies were different from most usability tests.
  • The subjects in the Hawthornestudies had horrible, boring jobs, so they may have been motivated to perform better because of attention they got from researchers; it’s possible in usability tests that participants are experiencing unwanted interruptions by being included or that they’re just doing the test to get paid. 
  • The Hawthorne subjects may have thought that taking part in the study would improve their chances for raises or promotions; the days of usability test participants thinking that their participating in studies might help them get jobs are probably over.

What about feedback and learning effects?

We want feedback to be part of a good user interface, don’t we? Yes. And we want people to learn from using an interface, don’t we? Again, yes. But, as Macefield says, let’s make sure that all the feedback and learning from a usability test comes from the UI and not the researcher/moderator. Instead, get to the cause of problems from qualitative data such as the verbal protocol from participants’ thinking aloud to see how they’re thinking about the problem.

Look at effects across tasks or functions

Macefield suggests that if you’re getting grief, add a control group to compare against and then look at performance across tasks. For example, you might expect that the test group (using an “improved” UI) would be more efficient or effective in all elements of a test than a control group. But it’s possible that the test group did better on one task but both groups had a similar level of problems on a different task. If this happens, it is unlikely that the moderator has given feedback or prompted learning to create the effect of improved performance because the effect should be global across tasks across groups.

Macefield closes the article with a couple of pages that could be a lesson out of Defense Against the Dark Arts, setting out very specific ways to argue against any assertion that your findings might be “contaminated.” But don’t just zoom to the end of the piece. The value of the article is in knowing the whole story.

Moderating tips and techniques

Getting the right information from the participant can be a difficult. As the moderator, you must attend to many things besides what the participant doing and saying. Focusing on a few specific behaviors of your own will help you have a better test.

Focus your attention on what’s happening now

  • Quickly build rapport with the participant
  • Listen attentively
  • Be open to what might happen in a session – be ready to learn from the participant

Tips for being a better moderator

Be the neutral observer – avoid priming or teaching. If you’re too close to the product or the domain, you may train participants without realizing it by using keywords in your task scenarios or materials.

Observe at the expense of collecting data, if you must. It is difficult to take notes and to watch the participant at the same time. If things are happening quickly or you find yourself missing things the participant is saying or doing, just stop taking notes. Instead, listen and spend time between sessions making notes about what happened. Go through your recordings later if you need to, or ask observers to share their notes.

Play dumb – don’t answer questions. If participants perceive that you are an expert on the product, they may ask you questions about it or look for your approval on actions. Instead, let her know that you are learning too, and that you’ll note her questions but won’t always be able to answer them.

Flex the script and test plan. Even after you pilot test your test, you may have to adjust on-the-fly when participants do unpredictable things. That’s okay. You’re learning important things that fit into your aggregate patterns of use.

Practice and get feedback. Ask co-workers and observers to give you feedback about how you conduct sessions and how you ask questions.

Your own self-awareness is your best tool for moderating test sessions successfully. Following these guidelines should help you get valid, reliable data from your participants, even if your attention is slightly divided.

Why create a test design?

I get a lot of clients who are in a hurry. They get to a point in their product cycle that they’re supposed to have done some usability activity to exit the development phase they are in and now find they have to scramble to pull it together. How long can it take to arrange and execute a discount usability test, anyway?

Well, to do a usability test right, it does take a few steps. How much time those steps take depends on your situation. Every step in the process is useful.

The steps of a usability test
Jeff Rubin and I think there are these steps to the process for conducting a usability test:

  1. Develop a test plan
  2. Set up the testing environment and plan logistics
  3. Find and select participants
  4. Prepare test materials
  5. Conduct the sessions
  6. Debrief participants and observers
  7. Analyze data and observations
  8. Create findings and recommendations

Notice that “develop a test plan” and “prepare test materials” are different steps.

It might seem like a shortcut to go directly to scripting the test session without designing the test. But the test plan is a necessary step.
Test plan or test design?
There’s a planning aspect to this deliverable. Why are you testing? Where will you test? What are the basic characteristics of the participants? What’s the timing for the test? For the tasks? What other logistics are involved in making this particular test happen? Do you need bogus data to play with, userids, or other props?

To some of us, a test design would be about experiment design. Will you test a hypothesis or is this an exploratory test? What are your research questions? What task scenarios will get you to the answers? Will you compare anything? If so, is it between subjects or within subjects? Will the moderator sit in the testing room or not? What data will you collect and what are you measuring?

It all goes together.

 

Why not just script the session without writing a plan?
Having a plan that you’ve thought through is always useful. You can use the test plan to get buy-in from stakeholders, too. As a representation of what the study will be, it’s understanding the blueprints and renderings before you give the building contractor approval to start building.

With a test plan, you also have a tool for documenting requirements (a frozen test environment, anyone?) for the test and a set of unambiguous details that define the scope of the test. Here, in a test plan, you define the approach to the research questions. In a session script, you operationalize the research questions. Writing a test plan helps you know what you’re going to collect data about and what you’re going to report on, as well as what the general content of the report will be.
Writing a test plan (or design, or whatever you want to call it) will give you a framework for the test in which a session script will fit. All the other deliverables of a usability test stem from the test plan. If you don’t have a plan, you risk using inappropriate participants and getting unreliable data.

Should you record sessions on video/audio?

The accepted practice for professional usability practitioners has been since the beginning of time to record sessions on video. It is something that we tend to do automatically.

There aren’t many obstacles to recording sessions these days. It really only takes a web camera and some relatively inexpensive recording software on the testing PC. (Of course, this assumes that you’re testing software or web sites that run on desktop or laptop computers.)

Recording is inexpensive
The software is pretty easy to use and it doesn’t cause issues with response times or otherwise fool with the user’s experience of using the software or website you’re testing. You get nice, bright colors, picture-in-picture, and you can capture it all digitally. For example, there’s Morae, by TechSmith. (In the interest of full disclosure: I own a license, and I have upgraded to the new version). With Morae, you can capture all sorts of nerdy bits. It’s a good tool.

Even if you decide to use a regular video camera rather than a web cam, or multiple cameras, that technology is cheaper and more accessible all the time. Storage media also is very inexpensive.


But should you record sessions?

Karl Fast on Boxes and Arrows (from August 2002) has a whole treatise on recording usability test sessions: http://www.boxesandarrows.com/view/recording_screen_activity_during_usability_testing. He called it “crucial.” I say Not.

Know why you’re recording
You may want the video recordings for reviewing, or sharing with a research partner. You may want your boss to sit down and watch the recorded sessions as evidence. Most practitioners will say that they use video recordings as backup to notes. You could go back and review the recordings.

Most usability tests have fairly few participants. Say you’re doing a study with 5 to 8 participants. If your notes from so few sessions don’t help you analyze the data, you should work on making better data collection tools for yourself or make it a practice to write notes about what happened immediately following each session. Reviewing recordings is making work for yourself.

But do you actually review the recordings? Rarely. And do people who could not attend the sessions review the recordings later? Again, rarely.

Know how you’re storing recordings and control access to protect the privacy of participants
And let’s consider participant privacy and confidentiality. Digital recordings are easier than ever to manage and archive. However, the longer the recordings hang around your company, the more likely it is that they will a) get lost, b) fall into the wrong hands, or c) be misused in some way. A client once asked me if her company could review a tape of a participant because he was coming in for a job interview. I said absolutely not.

You ask participants to sign a recording waiver that sets out specific purposes of the recording. Someone has to make sure that the waiver is respected. That person is the usability specialist who recorded the session to begin with.

Manage recordings carefully
The form that you ask study participants to sign asking for their permission to record, you should also state in plain language

  • How the recording will be used
  • Who will use the recording
  • How long you (or your company) will store the recording
  • How the recording will be destroyed

But get it approved by your legal department, of course.

There are some good reasons to record sessions on video. There are a lot of good reasons not to. Should you?

Keeping a rolling list of issues throughout a study

Design teams are often in a hurry to get results from usability studies. How do you support them while giving good data and ensuring that the final findings are valid?

One thing I do is to start a list of observations or issues after the first two or three participants. I go over this list with my observers and get them to help me expand or clarify each item. Then we agree on which participants we saw have that particular problem.

I continue adding to that list the numbers for each participant who had the issue and note any variations on each observation.

For example, in a study I’m working on this week, we noted on the first day of testing that

Participants talked about location, but scrolled past the map without interacting with it to get to the search results (the map may not look clickable)

I went back later and added the participant numbers for those who we observed doing this:

Participants talked about location, but scrolled past the map without interacting with it to get to the search results (the map may not look clickable) PP, P1, P3

Today, I’ll add more participant numbers. At the end of the study, we’ll have a quick summary of the major issues with a good idea of how many participants had each problem.

There are three things that are “rolling” about the list. First, you’re adding participant numbers for each of the issues as you go along. Second, you’re refining the descriptions of the issues as you learn more from each new participant. Third, you’re adding issues to the list as you see new things come up (or that you didn’t notice before, or seemed like a one-off problem).

I will still go back and tally all of the official data that I collected during each session, so there may be slight differences between these debriefing notes and the final report, but I have found that the rolling issues list and the final reports usually match pretty closely.

Doing the rolling list keeps your observers engaged and informed, helps you cross-check your data later, and gives designers and developers something to work from right away that is fairly reliable.

When to ask participants to think out loud

I was taught that one of the most important aspects of moderating usability study sessions was to encourage participants to think out loud as they worked on tasks. While the technique is good and useful in many usability test situations, it isn’t always the best approach.

Get data about how and why people do things
This “verbal protocol,” as it is known, can be an extremely useful thing to have. If the participant is good at thinking aloud, you all hear about how how she is forming the task and how she is thinking about reaching her goal. You will hear about why she is doing the things she is doing, and the words she uses to describe it all. You also will get verbal feedback about how the participant feels about what is happening because she may say that she’s frustrated, or annoyed or even happy.

What the data means
Hearing how a participant forms a task tells you whether the designers and the user are thinking of (modeling) the task in the same way.

Hearing why a participant is taking a particular step tells you where your design does and does not support users’ goals.

Hearing the words gives you labels for navigation, links, and buttons. Your information architecture should match the participant’s vocabulary.

Hearing the emotion tells you how severe a problem may be.

These are all good things to know.
How to get a good think-aloud
Some people think aloud naturally, or at least will verbalize their questions and frustrations just because there’s someone else in the room (that would be you, the moderator). But most people need to be primed to do it, and some even need practice doing it.

In your introduction to the session, ask participants to tell you what’s going through their minds as they do tasks.

Consider incorporating a practice task that lasts for half a minute, just to get participants to try it out. Encourage them and quickly move on.

When you describe the task scenario to participants, remind them to think aloud.

During the task, when something seems to be frustrating, annoying or hindering — and the participant isn’t talking — ask her to tell you what she’s thinking.

Know that there’s more going on than you can hear

People filter automatically
Participants can’t tell you everything they’re thinking. And really, you don’t want that. Humans can process on a number of cognitive tracks at the same time. Most study participants will automatically be able to distinguish between what is related to the situation and what isn’t.

This is a test
They also may filter what they tell you beyond this basic distinction. For example, they want to do well. Although you tell participants you are not testing them, a participant might feel some level of test, even if she’s just in competition with The Machine.

Participants may fear failure or embarrassment. In usability studies, people often persist at times when they would normally ask for help.

People tend to give positive feedback
Participants want to give you a good session. People are conditioned to say and do things for the approval of others. They want the moderator to approve of their performance.
Participants take responsibility for bad design
People who are novices at a task or are working with something outside their experience may excuse the design by taking responsibility for a design problem. For example, they may say they could do it now that they (have failed and) have done it once. Or they just need more time to learn the site. This is especially common among older adults who are unsure of their computer or other appropriate skills.

When you might not want to use think-aloud
There are times when using think-aloud can conflate or dilute your data. There are other situations in which using think-aloud is just difficult, or won’t work for the type of participants you have in your study.

Time on task
If you want to measure how much time it takes people to complete a task because you are particularly concerned with efficiency, introducing think-aloud is probably a bad idea. Talking about what you’re thinking slows you down while you choose words to convey your ideas about what’s happening and why.

Audio feedback in a user interface
Some interfaces incorporate audio feedback to indicate statuses or modes. These auditory cues may be overlapped by the participant talking so the participant may miss something important happening – or you might. Also, many blind people and people with severe vision impairments use screen readers to use software and web sites. If you’re tuned in, you can learn things by listening to the screen reader as it works. And, although most of the people with visual disabilities who use screen readers who I have observed can listen and talk at the same time (like sighted people can see or read and talk at the same time), as a sighted moderator, my auditory channel is challenged by listening to both the screen reader and the participant at the same time.

You’re interrupting a taxed thought process
People who have short-term memory loss, are medicated, or have other cognitive limitations tend to stop talking when they encounter obstacles to reaching their goals. You might be tempted to prompt these people to “tell me what you’re thinking,” but try not to. They’re concentrating on working around the obstacle. If you watch closely, you can see their solution unfold. After it does, then ask them about how they got to it.

An alternative to think-aloud: Retrospective review
“Retrospective review” is just a fancy name for asking people to tell you what happened after the fact. Go back to a particular point in the task, set the context, and ask the participant to tell you what was happening. For example, say something like this: “When you got to this point on the registration form [pointing to a field], you stopped talking. Tell me about what you were trying to do and what was happening.” The participant may revise what happened, but you will have good notes and the memory of someone who was observing closely, not trying to perform, so you can pinpoint issues that you thought were happening. Invite the participant to correct your perceptions.

If you have the tools and time available, you can go to the video recording so the participant can see what he did and respond to that by giving you a play-by-play commentary.

It’s a great tool, used at the right time with the right participants
Think-aloud or verbal protocol can give you rich data about vocabulary and effectiveness of design. From it, you can also get some impression of the severity of problems for a particular participant or the level of satisfaction for someone who had a positive experience. Use think-aloud in exploratory or formative studies to help you understand how users are modeling their tasks. Consider carefully whether to use it in other situations, though, to ensure that you’re not adding to the cognitive load that the participant is already experiencing.

The Hardest Part: Getting the right participant in the room

This week has proved to me that that nothing — nothing — matters as much as having the right participants.

Without the right participants, it all falls apart
If you don’t have participants who are appropriate, you can’t learn what you want to learn because they don’t behave and think the way real users do. You may get data, but what does it mean? Not much.

Who’s the right participant?
The right participant is a person. Not a set of demographics or psychographic data taken from market segementations. It’s easy to lose sight of the idea that the person sitting in the chair using the product you’re testing is a person and not a tool for you to identify design problems – a substitute for you. He or she is a person with a personality, habits, memories, beliefs, attitudes, abilities, intelligence, experience, and relationships. You want the person to bring those things with them (along with their computer glasses). That’s the stuff of mental models. That’s what makes the sessions interesting and unpredictable.

How do you know?
You should be able to visualize who participant-person will be by talking about the kinds of things you want them to do in the session. Here’s an example from a study I’m working on right now. We want

 

Someone who travels at least a few times a year and stays a couple of nights in a hotel on each trip. This person books his own travel because it’s quicker and easier than giving instructions to someone else. He likes to book online because he can see options and amenities that inform his final decisions. This traveler knows where he’s going, how to get there, and what to do on arrival.

 


There’s a task with a context: booking travel accommodation online. There are motivations: it’s comparatively easy and there’s decision-making information available that isn’t otherwise. There is a level of experience in the task domain: traveling a few times a year and staying in hotels.

Visualizing participants this way is a technique I borrowed from User Interface Engineering.

You can create a screening questionnaire from that description that should get you appropriate participants. And look, there are very few selection criteria embedded in the visualization. We don’t care what the annual household income is, or the education level, or even what the person’s job is. Don’t make this too hard for yourself by collecting data you’re not going to use. (Besides, then you have to protect that personal information, but I’ll talk about that later.)

Now, share your test objectives and your visualization of the participant with your recruiter.

 

Stay tuned for much more about recruiting, like how to work with a recruiter, where to find the right participants, and lessons that Sandy and I have learned through dozens of recruits.

Getting Started with HUT

With this piece, I can officially announce that I am co-authoring with Jeff Rubin a new edition of the Handbook of Usability Testing. (We refer to it affectionately as HUT.) This great little book was published in 1994, but Wiley, the publisher, has seen it continue to sell and so wisely asked Jeff to do another edition after all this time.

It seems that there is still a need for a how-to book for people who don’t do usability testing for a living.

Jeff and I are collecting feedback from readers of the first edition while we work on the new edition, which is due out in spring of 2008. Yes, we’re just getting started. So if you have comments, ideas, or experiences about using the book that you want to share, bring ’em on.