think aloud | Dana Chisnell

Retrospective review and memory

One of my favorite radio programs (though I listen to it as a podcast) is Radiolab, “ a show about science,” which is a production of WNYC hosted by Robert Krulwich and Jad Abmurad and distributed by NPR. This show contemplates lots of interesting things from reason versus logic in decision making to laughter to lies and deception.

The show I listened to last night was about how memories are formed. Over time, several analogies have developed for human memory that seem to be related to the technology available at that time. Robert said he thinks of his memory as a filing cabinet. But Jad, who is somewhat younger than Robert, described his mind as a computer hard disk. Neurologists and cognitive scientists they talked to, though, said No, memory isn’t like that at all. In fact, we don’t store memories. We recreate them every time we think of them.

Huh, I thought. Knowing this has implications for user research. For example, there are several points at which usability testing relies on memory: the memory of the participant if we’re asking questions about the past behavior; the memory of the facilitator for taking notes, analyzing data, and drawing inferences; the memories of observers in discussions about what happened in sessions and what it means.

Using a think-aloud technique – getting participants to say what they’re thinking while working through a task – avoids some of this. You have a verbal protocol as “evidence.” If there’s disagreement about what happened among the team members, you can go back to the recording to review what the participant said as well as what they did.

But there are times when think-aloud is not the right technique, either because the participant cannot manage the divided attention of doing a task and talking about it at the same time, or because of other circumstances. In those situations, you might think about doing retrospective review, instead.

“Retrospective review” is just a fancy name for asking people to tell you what happened. If you have the tools and time available, you can go to a recording after a session, so the participant can see what she did and respond to that by giving you a play-by-play commentary.

As soon as participants start viewing or listening to the beginning of an episode – up to 48 hours after doing the task – they’ll remember having done it. They probably won’t be able to tell you how it ended. But they will be able to tell you what’s going to happen next.

And that’s the really useful thing about doing retrospective review. As the participant recreates the memory of the task, you can ask, “What happens next? What will you do next and why?” Pause. Listen. Take notes. And then start playing back the recording again. Sure enough, it’ll be like the participant said. Only now you know why.

Asking participants what happens next in their own stories also avoids most revisionist history. That is, if you ask participants to explain had what happened after they view it, they may rationalize what they did. This isn’t the same as remembering it.

Beware the Hawthorne Effect

In a clear and thoughtful article in the May 3, 2007 Journal of Usability Studies (JUS) put out by the Usability Professionals’ Association, Rich Macefield blasts the popular myths around the legendary Hawthorne effect. He goes on to explain very specifically how no interpretation of the Hawthorne effect applies to usability testing.

Popular myth – and Mayo’s (1933) original conclusion – says that human subjects in any kind of research will perform better just because they’re aware they’re being studied.

Several researchers have reviewed the original study that generated the finding, and they say that’s not what really happened. Parsons (1974) was the first to say that the improvement in performance of subjects in the original study was more likely due to feedback they got from the researchers about their performance and what they learned from getting that feedback.

Why it doesn’t apply to usability tests

Macefield convincingly demonstrates why the Hawthorne effect just doesn’t figure in to well designed and professionally executed usability tests:

The Hawthorne studies were longitudinal, most usability tests are not.
The subjects were experts, most participants are novices at something in a usability test because what they are using is new.
The metrics used in the Hawthorne studies were different from most usability tests.
The subjects in the Hawthornestudies had horrible, boring jobs, so they may have been motivated to perform better because of attention they got from researchers; it’s possible in usability tests that participants are experiencing unwanted interruptions by being included or that they’re just doing the test to get paid.
The Hawthorne subjects may have thought that taking part in the study would improve their chances for raises or promotions; the days of usability test participants thinking that their participating in studies might help them get jobs are probably over.

What about feedback and learning effects?

We want feedback to be part of a good user interface, don’t we? Yes. And we want people to learn from using an interface, don’t we? Again, yes. But, as Macefield says, let’s make sure that all the feedback and learning from a usability test comes from the UI and not the researcher/moderator. Instead, get to the cause of problems from qualitative data such as the verbal protocol from participants’ thinking aloud to see how they’re thinking about the problem.

Look at effects across tasks or functions

Macefield suggests that if you’re getting grief, add a control group to compare against and then look at performance across tasks. For example, you might expect that the test group (using an “improved” UI) would be more efficient or effective in all elements of a test than a control group. But it’s possible that the test group did better on one task but both groups had a similar level of problems on a different task. If this happens, it is unlikely that the moderator has given feedback or prompted learning to create the effect of improved performance because the effect should be global across tasks across groups.

Macefield closes the article with a couple of pages that could be a lesson out of Defense Against the Dark Arts, setting out very specific ways to argue against any assertion that your findings might be “contaminated.” But don’t just zoom to the end of the piece. The value of the article is in knowing the whole story.

Should you record sessions on video/audio?

The accepted practice for professional usability practitioners has been since the beginning of time to record sessions on video. It is something that we tend to do automatically.

There aren’t many obstacles to recording sessions these days. It really only takes a web camera and some relatively inexpensive recording software on the testing PC. (Of course, this assumes that you’re testing software or web sites that run on desktop or laptop computers.)

Recording is inexpensive
The software is pretty easy to use and it doesn’t cause issues with response times or otherwise fool with the user’s experience of using the software or website you’re testing. You get nice, bright colors, picture-in-picture, and you can capture it all digitally. For example, there’s Morae, by TechSmith. (In the interest of full disclosure: I own a license, and I have upgraded to the new version). With Morae, you can capture all sorts of nerdy bits. It’s a good tool.

Even if you decide to use a regular video camera rather than a web cam, or multiple cameras, that technology is cheaper and more accessible all the time. Storage media also is very inexpensive.

But should you record sessions?
Karl Fast on Boxes and Arrows (from August 2002) has a whole treatise on recording usability test sessions: http://www.boxesandarrows.com/view/recording_screen_activity_during_usability_testing. He called it “crucial.” I say Not.

Know why you’re recording
You may want the video recordings for reviewing, or sharing with a research partner. You may want your boss to sit down and watch the recorded sessions as evidence. Most practitioners will say that they use video recordings as backup to notes. You could go back and review the recordings.

Most usability tests have fairly few participants. Say you’re doing a study with 5 to 8 participants. If your notes from so few sessions don’t help you analyze the data, you should work on making better data collection tools for yourself or make it a practice to write notes about what happened immediately following each session. Reviewing recordings is making work for yourself.

But do you actually review the recordings? Rarely. And do people who could not attend the sessions review the recordings later? Again, rarely.

Know how you’re storing recordings and control access to protect the privacy of participants
And let’s consider participant privacy and confidentiality. Digital recordings are easier than ever to manage and archive. However, the longer the recordings hang around your company, the more likely it is that they will a) get lost, b) fall into the wrong hands, or c) be misused in some way. A client once asked me if her company could review a tape of a participant because he was coming in for a job interview. I said absolutely not.

You ask participants to sign a recording waiver that sets out specific purposes of the recording. Someone has to make sure that the waiver is respected. That person is the usability specialist who recorded the session to begin with.

Manage recordings carefully
The form that you ask study participants to sign asking for their permission to record, you should also state in plain language

How the recording will be used
Who will use the recording
How long you (or your company) will store the recording
How the recording will be destroyed

But get it approved by your legal department, of course.

There are some good reasons to record sessions on video. There are a lot of good reasons not to. Should you?

When to ask participants to think out loud

I was taught that one of the most important aspects of moderating usability study sessions was to encourage participants to think out loud as they worked on tasks. While the technique is good and useful in many usability test situations, it isn’t always the best approach.

Get data about how and why people do things
This “verbal protocol,” as it is known, can be an extremely useful thing to have. If the participant is good at thinking aloud, you all hear about how how she is forming the task and how she is thinking about reaching her goal. You will hear about why she is doing the things she is doing, and the words she uses to describe it all. You also will get verbal feedback about how the participant feels about what is happening because she may say that she’s frustrated, or annoyed or even happy.

What the data means
Hearing how a participant forms a task tells you whether the designers and the user are thinking of (modeling) the task in the same way.

Hearing why a participant is taking a particular step tells you where your design does and does not support users’ goals.

Hearing the words gives you labels for navigation, links, and buttons. Your information architecture should match the participant’s vocabulary.

Hearing the emotion tells you how severe a problem may be.

These are all good things to know.
How to get a good think-aloud
Some people think aloud naturally, or at least will verbalize their questions and frustrations just because there’s someone else in the room (that would be you, the moderator). But most people need to be primed to do it, and some even need practice doing it.

In your introduction to the session, ask participants to tell you what’s going through their minds as they do tasks.

Consider incorporating a practice task that lasts for half a minute, just to get participants to try it out. Encourage them and quickly move on.

When you describe the task scenario to participants, remind them to think aloud.

During the task, when something seems to be frustrating, annoying or hindering — and the participant isn’t talking — ask her to tell you what she’s thinking.

Know that there’s more going on than you can hear

People filter automatically
Participants can’t tell you everything they’re thinking. And really, you don’t want that. Humans can process on a number of cognitive tracks at the same time. Most study participants will automatically be able to distinguish between what is related to the situation and what isn’t.

This is a test
They also may filter what they tell you beyond this basic distinction. For example, they want to do well. Although you tell participants you are not testing them, a participant might feel some level of test, even if she’s just in competition with The Machine.

Participants may fear failure or embarrassment. In usability studies, people often persist at times when they would normally ask for help.

People tend to give positive feedback
Participants want to give you a good session. People are conditioned to say and do things for the approval of others. They want the moderator to approve of their performance.
Participants take responsibility for bad design
People who are novices at a task or are working with something outside their experience may excuse the design by taking responsibility for a design problem. For example, they may say they could do it now that they (have failed and) have done it once. Or they just need more time to learn the site. This is especially common among older adults who are unsure of their computer or other appropriate skills.

When you might not want to use think-aloud
There are times when using think-aloud can conflate or dilute your data. There are other situations in which using think-aloud is just difficult, or won’t work for the type of participants you have in your study.

Time on task
If you want to measure how much time it takes people to complete a task because you are particularly concerned with efficiency, introducing think-aloud is probably a bad idea. Talking about what you’re thinking slows you down while you choose words to convey your ideas about what’s happening and why.

Audio feedback in a user interface
Some interfaces incorporate audio feedback to indicate statuses or modes. These auditory cues may be overlapped by the participant talking so the participant may miss something important happening – or you might. Also, many blind people and people with severe vision impairments use screen readers to use software and web sites. If you’re tuned in, you can learn things by listening to the screen reader as it works. And, although most of the people with visual disabilities who use screen readers who I have observed can listen and talk at the same time (like sighted people can see or read and talk at the same time), as a sighted moderator, my auditory channel is challenged by listening to both the screen reader and the participant at the same time.

You’re interrupting a taxed thought process
People who have short-term memory loss, are medicated, or have other cognitive limitations tend to stop talking when they encounter obstacles to reaching their goals. You might be tempted to prompt these people to “tell me what you’re thinking,” but try not to. They’re concentrating on working around the obstacle. If you watch closely, you can see their solution unfold. After it does, then ask them about how they got to it.

An alternative to think-aloud: Retrospective review
“Retrospective review” is just a fancy name for asking people to tell you what happened after the fact. Go back to a particular point in the task, set the context, and ask the participant to tell you what was happening. For example, say something like this: “When you got to this point on the registration form [pointing to a field], you stopped talking. Tell me about what you were trying to do and what was happening.” The participant may revise what happened, but you will have good notes and the memory of someone who was observing closely, not trying to perform, so you can pinpoint issues that you thought were happening. Invite the participant to correct your perceptions.

If you have the tools and time available, you can go to the video recording so the participant can see what he did and respond to that by giving you a play-by-play commentary.

It’s a great tool, used at the right time with the right participants
Think-aloud or verbal protocol can give you rich data about vocabulary and effectiveness of design. From it, you can also get some impression of the severity of problems for a particular participant or the level of satisfaction for someone who had a positive experience. Use think-aloud in exploratory or formative studies to help you understand how users are modeling their tasks. Consider carefully whether to use it in other situations, though, to ensure that you’re not adding to the cognitive load that the participant is already experiencing.