Consensus on observations in real time: Keeping a rolling list of issues

 

Design teams often need results from usability studies yesterday. Teams I work with always want to start working on observations right away. How to support them while giving good data and ensuring that the final findings are valid?

Teams that are fully engaged in getting feedback from users – teams that share a vision of the experience they want their users to have – have often helped me gather data and evaluate in the course of the test. In chatting with Livia Labate, I learned that the amazing team at Comcast Interactive Media (CIM) came to the same technique on their own. Crystal Kubitsky of CIM was good enough to share photos of CIM’s progress through one study. Here’s how it works:

 

1. Start noting observations right away

After two or three participants have tried the design, we take a longer break to debrief about what we have observed so far. In that debrief, each team member talks about what he or she has observed. We write it down on a white board and note which participants had the issue. The team works together to articulate what the observation or issue was. Continue reading Consensus on observations in real time: Keeping a rolling list of issues

Looking for love: Deciding what to observe for

 

The team I was working with wanted to find out whether a prototype they had designed for a new intranet worked for users. Their new design was a radical change from the site that had been in place for five years and in use by 8,000 users. Going to this new design was a big risk. What if users didn’t like it? Worse, what if they couldn’t use it?

We went on tour. Not to show the prototype, but to test it. Leading up to this moment we had done heaps of user research: stakeholder interviews, field observations (ethnography, contextual inquiry – pick your favorite name), card sorting, taxonomy testing. We learned amazing things, and as our talented interaction designer started translating all that into wireframes, we got pressure to show them. We knew what we were doing. But we wanted to be sure. So we made the wireframes clickable and strung them together to make them feel like they were doing something. And then we asked (among other things): Continue reading Looking for love: Deciding what to observe for

Popping the big question(s): How well? How easily? How valuable?

When teams decide to do usability testing on a design, it is often because there’s some design challenge to overcome. Something isn’t working. Or, there’s disagreement among team members about how to implement a feature or a function. Or, the team is trying something risky. Going to the users is a good answer. Otherwise, even great teams can get bogged down. But how do you talk about what you want to find out? Testing with users is not binary – you probably are not going to get an up or down, yes or no answer. It’s a question of degree. Things will happen that were not expected. The team should be prepared to learn and adjust. That is what iterating is for (in spite of how Agile talks about iterations).

Ask: How well
Want to find out whether something fits into the user’s mental model? Think about questions like these:

  • How well does the interaction/information information architecture support users’ tasks?
  • How well do headings, links, and labels help users find what they’re looking for?
  • How well does the design support the brand in users’ minds?

Ask: How easily
Want to learn whether users can quickly and easily use what you have designed? Here are some questions to consider:

  • How easily and successfully do users reach their task goals?
  • How easily do users recognize this design as belonging to this company?
  • How easily and successfully do they find the information they’re looking for?
  • How easily do users understand the content?
  • How easy is it for users to understand that they have found what they were looking for?
  • How easy or difficult is it for them to understand the content?

Ask: How valuable

  • What do users find useful about the design?
  • What about the design do they value and why?
  • What comments do participants have about the usefulness of the feature?

Ask: What else?

  • What questions do your users have that the content is not answering?
  • What needs do they have that the design is not addressing?
  • Where do users start the task?

Teams that think of their design issues this way find that their users show them what to do in the way they perform with a design. Rarely is the result of usability testing an absolute win or lose for a design. Instead, you get clues about what’s working – and what’s not – and why. From that, you can make a great design.

Retrospective review and memory

One of my favorite radio programs (though I listen to it as a podcast) is Radiolab, “ a show about science,” which is a production of WNYC hosted by Robert Krulwich and Jad Abmurad and distributed by NPR. This show contemplates lots of interesting things from reason versus logic in decision making to laughter to lies and deception.

The show I listened to last night was about how memories are formed. Over time, several analogies have developed for human memory that seem to be related to the technology available at that time. Robert said he thinks of his memory as a filing cabinet. But Jad, who is somewhat younger than Robert, described his mind as a computer hard disk. Neurologists and cognitive scientists they talked to, though, said No, memory isn’t like that at all. In fact, we don’t store memories. We recreate them every time we think of them.

Huh, I thought. Knowing this has implications for user research. For example, there are several points at which usability testing relies on memory: the memory of the participant if we’re asking questions about the past behavior; the memory of the facilitator for taking notes, analyzing data, and drawing inferences; the memories of observers in discussions about what happened in sessions and what it means.

Using a think-aloud technique – getting participants to say what they’re thinking while working through a task – avoids some of this. You have a verbal protocol as “evidence.” If there’s disagreement about what happened among the team members, you can go back to the recording to review what the participant said as well as what they did.

But there are times when think-aloud is not the right technique, either because the participant cannot manage the divided attention of doing a task and talking about it at the same time, or because of other circumstances. In those situations, you might think about doing retrospective review, instead.

“Retrospective review” is just a fancy name for asking people to tell you what happened. If you have the tools and time available, you can go to a recording after a session, so the participant can see what she did and respond to that by giving you a play-by-play commentary.

As soon as participants start viewing or listening to the beginning of an episode – up to 48 hours after doing the task – they’ll remember having done it. They probably won’t be able to tell you how it ended. But they will be able to tell you what’s going to happen next.

And that’s the really useful thing about doing retrospective review. As the participant recreates the memory of the task, you can ask, “What happens next? What will you do next and why?” Pause. Listen. Take notes. And then start playing back the recording again. Sure enough, it’ll be like the participant said. Only now you know why.

Asking participants what happens next in their own stories also avoids most revisionist history. That is, if you ask participants to explain had what happened after they view it, they may rationalize what they did. This isn’t the same as remembering it.

Making it easy to collect the data you want to collect

As I have said before, taking notes is rife with danger. It’s so tempting to just write down everything that happens. But you probably can’t deal with all that data. First, it’s just too much. Second, it’s not organized.

Let’s look at an example research question: Do people make more errors on one version of the system than the other?

And we chose these measures to find out the answer:

  • Count of all incorrect selections (errors)
  • Count and location of incorrect menu choices
  • Count and location of incorrect buttons selected
  • Count of errors of omission
  • Count and location of visits to online help
  • Number and percentage of tasks completed incorrectly

Continue reading Making it easy to collect the data you want to collect

Translating research questions to data

There’s an art to asking a question and then coming up with a way to answer it. I find myself asking, What do you want to find out? The next question is How do we know what the answer is?

Maybe the easiest thing is to take you through an example.

Forming the right question

On a study I’m working on now, we have about 10 research questions, but the heart of the research is about this one:

Do people make more errors on one version of the system than the other?

Note that this is not a hypothesis, which would be worded something more like, “We expect people to make more mistakes and to be more likely to not complete tasks on the B version of the system than on the A version of the system.” (Some would argue that there are multiple hypotheses embedded in that statement.)

But in our study, we’re not out to prove or disprove anything. Rather, we just want to compare two versions to see what works well about each one and what doesn’t.

 

Choosing data to answer the question

There are dozens of possible measures you can look at in a usability test. Here are just a few examples:

Continue reading Translating research questions to data

Data collecting: Tips and tricks for taking notes

A common mistake people make when they’re new to conducting usability tests is taking verbatim notes.

Note taking for summative tests can be pretty straightforward. For those you should have benchmark data that you’re comparing against or at least clear success criteria. In that case, data collecting could (and probably should) be done mostly by the recording software (such as Morae). But for formative or exploratory tests, note taking can be more complex.

Why is it so tempting to write down everything?

Interesting things keep happening! Just last week I was the note taker for a summative test in which I noticed (after about 30 sessions), that women and men seemed to be holding the stylus for marking what we were testing differently and that it seemed that difference was causing a specific category of errors.

But the test wasn’t about using the hardware. This issue wasn’t something we had listed in our test plan as a measure. It was interesting, but not something we could investigate for this test. We will include it as an incidental observation in the report as something to research later.

Note taking don’ts

  • Don’t take notes yourself if you are moderating the session if you can help it.
  • Don’t take verbatim notes. Ever. If you want that, record the sessions and get transcripts. (Or do what Steve Krug does, and listen to the recordings and re-dictate them into a speech recognition application.)
  • Don’t take notes on anything that doesn’t line up with your research questions.
  • Don’t take notes on anything that you aren’t going to report on (either because you don’t have time or it isn’t in the scope of the test).

 

Tips and tricks

  • DO get observers to take notes. This is, in part, what observers are for. Give them specific things to look for. Some usability specialists like to get observer notes on large sticky notes, which is handy for the debriefing sessions.
  • DO create pick lists, use screen shots, or draw trails. For example, for one study, I was trying to track a path through a web site to see if the IA worked. I printed out the first 3 levels of IA in nested lists in 2 columns so it fit on one page of a legal sized sheet of paper. Then I used colored highlighters to draw arrows from one topic label to the next as the participant moved through the site, numbering as I went. It was reasonably easy to transfer this data to Excel spreadsheets later to do further analysis.
  • DO get participants to take notes for you. If the session is very formative, get the participants to mark up wireframes, screen flows, or other paper widgets to show where they had issues. For example, you might want to find out if a flow of screens matches the process a user typically follows. Start the session asking the participant to draw a boxes-and-arrows diagram of their process. At the end of the session, ask the participant to revise the diagram to a) get any refinements they may have forgotten, b) see gaps between their process and how the application works, or c) some variation or combination of a and b.
  • DO think backward from the report. If you have written a test plan, you should be able to use that as a basis for the final report. What are you going to report on? (Hint: the answers to your research questions, using the measures you said you were going to collect.)

Beware the Hawthorne Effect

In a clear and thoughtful article in the May 3, 2007 Journal of Usability Studies (JUS) put out by the Usability Professionals’ Association, Rich Macefield blasts the popular myths around the legendary Hawthorne effect. He goes on to explain very specifically how no interpretation of the Hawthorne effect applies to usability testing.

Popular myth – and Mayo’s (1933) original conclusion – says that human subjects in any kind of research will perform better just because they’re aware they’re being studied.

Several researchers have reviewed the original study that generated the finding, and they say that’s not what really happened. Parsons (1974) was the first to say that the improvement in performance of subjects in the original study was more likely due to feedback they got from the researchers about their performance and what they learned from getting that feedback.

Why it doesn’t apply to usability tests

Macefield convincingly demonstrates why the Hawthorne effect just doesn’t figure in to well designed and professionally executed usability tests:

  • The Hawthorne studies were longitudinal, most usability tests are not.
  • The subjects were experts, most participants are novices at something in a usability test because what they are using is new.
  • The metrics used in the Hawthorne studies were different from most usability tests.
  • The subjects in the Hawthornestudies had horrible, boring jobs, so they may have been motivated to perform better because of attention they got from researchers; it’s possible in usability tests that participants are experiencing unwanted interruptions by being included or that they’re just doing the test to get paid. 
  • The Hawthorne subjects may have thought that taking part in the study would improve their chances for raises or promotions; the days of usability test participants thinking that their participating in studies might help them get jobs are probably over.

What about feedback and learning effects?

We want feedback to be part of a good user interface, don’t we? Yes. And we want people to learn from using an interface, don’t we? Again, yes. But, as Macefield says, let’s make sure that all the feedback and learning from a usability test comes from the UI and not the researcher/moderator. Instead, get to the cause of problems from qualitative data such as the verbal protocol from participants’ thinking aloud to see how they’re thinking about the problem.

Look at effects across tasks or functions

Macefield suggests that if you’re getting grief, add a control group to compare against and then look at performance across tasks. For example, you might expect that the test group (using an “improved” UI) would be more efficient or effective in all elements of a test than a control group. But it’s possible that the test group did better on one task but both groups had a similar level of problems on a different task. If this happens, it is unlikely that the moderator has given feedback or prompted learning to create the effect of improved performance because the effect should be global across tasks across groups.

Macefield closes the article with a couple of pages that could be a lesson out of Defense Against the Dark Arts, setting out very specific ways to argue against any assertion that your findings might be “contaminated.” But don’t just zoom to the end of the piece. The value of the article is in knowing the whole story.

Should you record sessions on video/audio?

The accepted practice for professional usability practitioners has been since the beginning of time to record sessions on video. It is something that we tend to do automatically.

There aren’t many obstacles to recording sessions these days. It really only takes a web camera and some relatively inexpensive recording software on the testing PC. (Of course, this assumes that you’re testing software or web sites that run on desktop or laptop computers.)

Recording is inexpensive
The software is pretty easy to use and it doesn’t cause issues with response times or otherwise fool with the user’s experience of using the software or website you’re testing. You get nice, bright colors, picture-in-picture, and you can capture it all digitally. For example, there’s Morae, by TechSmith. (In the interest of full disclosure: I own a license, and I have upgraded to the new version). With Morae, you can capture all sorts of nerdy bits. It’s a good tool.

Even if you decide to use a regular video camera rather than a web cam, or multiple cameras, that technology is cheaper and more accessible all the time. Storage media also is very inexpensive.


But should you record sessions?

Karl Fast on Boxes and Arrows (from August 2002) has a whole treatise on recording usability test sessions: http://www.boxesandarrows.com/view/recording_screen_activity_during_usability_testing. He called it “crucial.” I say Not.

Know why you’re recording
You may want the video recordings for reviewing, or sharing with a research partner. You may want your boss to sit down and watch the recorded sessions as evidence. Most practitioners will say that they use video recordings as backup to notes. You could go back and review the recordings.

Most usability tests have fairly few participants. Say you’re doing a study with 5 to 8 participants. If your notes from so few sessions don’t help you analyze the data, you should work on making better data collection tools for yourself or make it a practice to write notes about what happened immediately following each session. Reviewing recordings is making work for yourself.

But do you actually review the recordings? Rarely. And do people who could not attend the sessions review the recordings later? Again, rarely.

Know how you’re storing recordings and control access to protect the privacy of participants
And let’s consider participant privacy and confidentiality. Digital recordings are easier than ever to manage and archive. However, the longer the recordings hang around your company, the more likely it is that they will a) get lost, b) fall into the wrong hands, or c) be misused in some way. A client once asked me if her company could review a tape of a participant because he was coming in for a job interview. I said absolutely not.

You ask participants to sign a recording waiver that sets out specific purposes of the recording. Someone has to make sure that the waiver is respected. That person is the usability specialist who recorded the session to begin with.

Manage recordings carefully
The form that you ask study participants to sign asking for their permission to record, you should also state in plain language

  • How the recording will be used
  • Who will use the recording
  • How long you (or your company) will store the recording
  • How the recording will be destroyed

But get it approved by your legal department, of course.

There are some good reasons to record sessions on video. There are a lot of good reasons not to. Should you?

Keeping a rolling list of issues throughout a study

Design teams are often in a hurry to get results from usability studies. How do you support them while giving good data and ensuring that the final findings are valid?

One thing I do is to start a list of observations or issues after the first two or three participants. I go over this list with my observers and get them to help me expand or clarify each item. Then we agree on which participants we saw have that particular problem.

I continue adding to that list the numbers for each participant who had the issue and note any variations on each observation.

For example, in a study I’m working on this week, we noted on the first day of testing that

Participants talked about location, but scrolled past the map without interacting with it to get to the search results (the map may not look clickable)

I went back later and added the participant numbers for those who we observed doing this:

Participants talked about location, but scrolled past the map without interacting with it to get to the search results (the map may not look clickable) PP, P1, P3

Today, I’ll add more participant numbers. At the end of the study, we’ll have a quick summary of the major issues with a good idea of how many participants had each problem.

There are three things that are “rolling” about the list. First, you’re adding participant numbers for each of the issues as you go along. Second, you’re refining the descriptions of the issues as you learn more from each new participant. Third, you’re adding issues to the list as you see new things come up (or that you didn’t notice before, or seemed like a one-off problem).

I will still go back and tally all of the official data that I collected during each session, so there may be slight differences between these debriefing notes and the final report, but I have found that the rolling issues list and the final reports usually match pretty closely.

Doing the rolling list keeps your observers engaged and informed, helps you cross-check your data later, and gives designers and developers something to work from right away that is fairly reliable.