Making it easy to collect the data you want to collect

As I have said before, taking notes is rife with danger. It’s so tempting to just write down everything that happens. But you probably can’t deal with all that data. First, it’s just too much. Second, it’s not organized.

Let’s look at an example research question: Do people make more errors on one version of the system than the other?

And we chose these measures to find out the answer:

  • Count of all incorrect selections (errors)
  • Count and location of incorrect menu choices
  • Count and location of incorrect buttons selected
  • Count of errors of omission
  • Count and location of visits to online help
  • Number and percentage of tasks completed incorrectly

Continue reading Making it easy to collect the data you want to collect

Translating research questions to data

There’s an art to asking a question and then coming up with a way to answer it. I find myself asking, What do you want to find out? The next question is How do we know what the answer is?

Maybe the easiest thing is to take you through an example.

Forming the right question

On a study I’m working on now, we have about 10 research questions, but the heart of the research is about this one:

Do people make more errors on one version of the system than the other?

Note that this is not a hypothesis, which would be worded something more like, “We expect people to make more mistakes and to be more likely to not complete tasks on the B version of the system than on the A version of the system.” (Some would argue that there are multiple hypotheses embedded in that statement.)

But in our study, we’re not out to prove or disprove anything. Rather, we just want to compare two versions to see what works well about each one and what doesn’t.

 

Choosing data to answer the question

There are dozens of possible measures you can look at in a usability test. Here are just a few examples:

Continue reading Translating research questions to data

Data collecting: Tips and tricks for taking notes

A common mistake people make when they’re new to conducting usability tests is taking verbatim notes.

Note taking for summative tests can be pretty straightforward. For those you should have benchmark data that you’re comparing against or at least clear success criteria. In that case, data collecting could (and probably should) be done mostly by the recording software (such as Morae). But for formative or exploratory tests, note taking can be more complex.

Why is it so tempting to write down everything?

Interesting things keep happening! Just last week I was the note taker for a summative test in which I noticed (after about 30 sessions), that women and men seemed to be holding the stylus for marking what we were testing differently and that it seemed that difference was causing a specific category of errors.

But the test wasn’t about using the hardware. This issue wasn’t something we had listed in our test plan as a measure. It was interesting, but not something we could investigate for this test. We will include it as an incidental observation in the report as something to research later.

Note taking don’ts

  • Don’t take notes yourself if you are moderating the session if you can help it.
  • Don’t take verbatim notes. Ever. If you want that, record the sessions and get transcripts. (Or do what Steve Krug does, and listen to the recordings and re-dictate them into a speech recognition application.)
  • Don’t take notes on anything that doesn’t line up with your research questions.
  • Don’t take notes on anything that you aren’t going to report on (either because you don’t have time or it isn’t in the scope of the test).

 

Tips and tricks

  • DO get observers to take notes. This is, in part, what observers are for. Give them specific things to look for. Some usability specialists like to get observer notes on large sticky notes, which is handy for the debriefing sessions.
  • DO create pick lists, use screen shots, or draw trails. For example, for one study, I was trying to track a path through a web site to see if the IA worked. I printed out the first 3 levels of IA in nested lists in 2 columns so it fit on one page of a legal sized sheet of paper. Then I used colored highlighters to draw arrows from one topic label to the next as the participant moved through the site, numbering as I went. It was reasonably easy to transfer this data to Excel spreadsheets later to do further analysis.
  • DO get participants to take notes for you. If the session is very formative, get the participants to mark up wireframes, screen flows, or other paper widgets to show where they had issues. For example, you might want to find out if a flow of screens matches the process a user typically follows. Start the session asking the participant to draw a boxes-and-arrows diagram of their process. At the end of the session, ask the participant to revise the diagram to a) get any refinements they may have forgotten, b) see gaps between their process and how the application works, or c) some variation or combination of a and b.
  • DO think backward from the report. If you have written a test plan, you should be able to use that as a basis for the final report. What are you going to report on? (Hint: the answers to your research questions, using the measures you said you were going to collect.)

The difference between good UX teams and great ones, with Jared Spool

 

I had a blast in this conversation with Jared Spool (recorded June 7, 2008) about what qualities make user experience design teams great and why some teams just can’t get past good.

 

 

The importance of rehearsal

You have designed a study. Everyone seems to be buying in. Scheduling participants is working out and the mix looks good. What’s left to be done except just doing the sessions? Three things:

  1. Practice.
  2. Practice.
  3. Practice.

There are three rounds of practice that I do before I do a “real” session. Jeez, I can hear you say, why would I need to practice so much? Why would you, Dana, who have been doing usability testing for so many years, need to practice so much? I do it for a couple of reasons:

  • It gives me multiple opportunities to clarify the intent of the test, the tasks, and the data measures.
  • I can focus on observing the participant in each regular session because any kinks have been worked out.

Walk through the script and gather tools and materials
The first is to walk through my test plan and script. I read the script aloud even though I’m by myself. While I’m doing that, I do two things: adjust the wording to sound more natural, and gather tools and materials I’ll need to do the sessions.

Do a dress rehearsal or dry run
For the second round of practice, I do a dry run of the now refined script with someone I know filling the role of the participant. We do everything you would normally do in a session, from greeting and filling out forms, to doing tasks, to closing the session. I might occasionally stop the session to adjust the script or to make notes about what to do differently next time. I might even ask the participant (usually a friend, neighbor, or colleague) questions about whether the test is making sense. It’s a combination of dress rehearsal and “logic and accuracy” test to get the sequence down and to make sure you’ve got all the necessary pieces.

Pilot the protocol
Finally, there’s the pilot test session. In this pilot, I work with a “real” participant – someone who was screened and scheduled along with all of the other participants. I conduct the session in the same way I intend to conduct all of the following sessions. The twist this time is that observers from the design team should be present. At the end of the session, I debrief with them about the protocol.

Don’t waste good participant data
There have been times when I’ve been rushed by a client or was just too cavalier about going into a usability test and did not rehearse. I paid for it by having rough sessions that I couldn’t use all the data from. Every time it’s a reminder that preparation and practice are as important to getting good data as a good test design is.

Are you doing “user testing” or “usability testing”?

Calling anything user testing just seems bad. Okay, contrary to the usual content on this blog – which I’ve tried to make about method and technique – this discussion is philosophical and political. If you feel it isn’t decent to talk about the politics of user research in public, then you should perhaps click away right now.

I know, talking about “users” opens up another whole discussion that we’re not going to have here, now. In this post, I want to focus on the difference between “usability testing” and “user testing” and why we should be specific.

When I say “usability test,” what I’m talking about is testing a design for how usable it is. Rather, how unusable it is, because that’s what we can measure: how hard is it to use; how many errors do people make; how frustrated do people feel when using it. Usability testing is about finding the issues that leave a design lacking. By observing usability test sessions, a team can learn about what the issues are and make inferences about why they are happening to then implement informed design solutions.

If someone says “user testing,” what does that mean? Let’s talk about the two words separately.

First, what’s a “user”? It is true that we ask people who use (or who might use) a design to take part in the study of how usable the design is, and some of us might refer to those people as “users” of the product.

Now, “testing” is about using some specified method for evaluating something. If you call it “user testing,” it sure sounds like you are evaluating users, even though what you probably mean to say is that you’re putting a design in front of users to see how they evaluate it. It’s shorthand, but I think it is the wrong shorthand.

If the point is to observe people interacting with a design to see where the flaws in the design are and why those elements aren’t successful, then you’re going beyond user testing. You’re at usability testing. That’s what I do as part of my user research practice. I try not to test the users in the process.

Should you test in a lab or in the field?

I haven’t been in a usability test lab for about a year. Ironically, since I was writing a book about usability testing, much of my work was field research to learn about particular audiences and their tasks.

And, though my usual position about labs is that exploratory usability testing is probably better done in the user’s environment, I’m excited about getting back into the lab.

Good reasons to test in a lab
I’m doing these upcoming tests in a lab facility because

  • The testing is quantitative and summative. That is, I’m doing very specific counts of errors and failures that are strictly defined, so I want to control other aspects of the test such as the computer setup.

 

  • I don’t want to interact much with the participants. I only want to direct participants when to start their tasks. Otherwise, I will intervene in the session only at prescribed points, so I will direct the session from a different room from where the participants are working. 
  • I may have observers, but I won’t know until the last minute. Though I prefer it if observers arrive before the session starts and stay through a whole session, at a facility they can come and go because they can observe from a separate room.

 

Good reasons to test in the field
I recently did a usability study in the field. Why?

  • I wanted to learn about the user’s environment (rather than controlling it). In the exploratory study I’m thinking of, I got the best of both worlds: usability testing data in a realistic situation. I learned about lighting levels, surrounding noise, and what the participant’s desk setup was like. But I also got to observe relationships and interactions the participant had with others, typical interruptions (and recovery from those), and how the thing I was testing fit into the person’s work.

 

  • It was convenient for the participants. They don’t have travel to the testing site. The interruption of their typical day is minimized. 
  • The sessions were informal enough that observers could be present in the room (after they had been properly trained). In fact, people from neighboring cubes often chimed in comments or questions because they’d overheard what we were talking about. I took this to be a good thing because I learned about that communication dynamic, but those eavesdroppers often contributed information that was useful to me in my study.

 

In a future post, I’ll talk about what to look for in a lab facility if you’re renting one and how to find one.

It’s here (almost)! Handbook of Usability Testing 2.0

I’m tingling, I’m so excited. I like to think that this is a special event in the user experience world. But every book author probably thinks that.

Handbook of Usability Testing, Second Edition by Jeff Rubin and Dana Chisnell ships on Monday, April 28.

This is not your mother’s HUT. Well, of course not. The first edition was published in 1994. Technology isn’t special anymore, it’s everywhere. (There were DOS examples, for heaven’s sake!) For HUT 2.0, Jeff and I

    • Simplified the organization of the main sections
    • Reordered many chapters to more closely reflect the flow of planning and conducting a test
    • Updated dozens and dozens of examples, samples, and stories
    • Expanded and updated discussions about recruiting participants, whether you need a lab, working with observers, analyzing testing data, and (we think) the best way to make recommendations
    • Added a chapter on variations on the basic method
    • Populated www.wiley.com/go/usabilitytesting with
  • electronic versions of many of the deliverables used as examples used in the book
  • updated references
  • a (we hope) comprehensive list of other resources such as conferences and seminars, other books, blogs, and podcasts.

 

The drawings and diagrams are have been freshened and improved. The layout and format promise to be less nerdy and more accessible, too.

Oh, and we benefited from sage reviews from Janice James, founder of the Usability Professionals’ Association as our technical editor (brava!), and a foreword by Jared Spool.

Here’s the official cite:
Rubin and Chisnell, Handbook of Usability Testing, Second Edition: How to Plan, Design, and Conduct Effective Tests (Wiley, 0470185481, 450 pages, April 28, 2008).

Recruit based on demographics or behavior?

Recruiting for usability test is hard. (I’ve said this before.) And it’s the most important thing to get right in a test. So how do you decide who to recruit?

Demographics don’t describe behavior
If you buy the argument of your marketing department, you will look at the demographics of the various segments and try to match their proportions. You’ll know the ages, incomes, educations, ethnicity, and genders of your participants. But does knowing this help you predict behavior or performance? More importantly, with a sample of, say, eight participants, can you generalize discovered usability problems to the broader cohort?

Probably not. Here’s an example of why.

Though most video gamers are male, some are female. The problems and successes they have in using a game are similar. And there will be differences within the genders, too. Though most video gamers are young, there are a lot that aren’t. The problems they have in using a game are not likely due to differences in age if the participants have similar expertise on the platform and with the game (or similar games).

Behavior describes performance
Instead, the differences in behavior (interaction between the person and the technology) and performance (whether the human is successful in completing technology-mediated tasks) are much more likely to stem from differences in expertise.

Being younger or older doesn’t make you an expert at anything necessarily. Having a higher or lower household income doesn’t, either. You could argue that education level might, but it usually doesn’t unless there’s something in the test that is related to a particular domain that the educated person was specifically trained for.

You want people to be motivated to do the tasks you want them to do when they get into your test situation. This is a place where it might make it easier or more difficult to find people. For example, if you want to test an online banking service or find out if someone might sign up for a brokerage account online, it’s more likely that the participants will fall into a “mature” category on the age scale than at the younger end or the very old end. And that is just because people in the mature range are more likely to have or want a mortgage than someone who is younger and isn’t in the market to buy a house or someone who is older who really would rather have a reverse mortgage. But you might find some on either end, too. But you want to see a range of people with different aptitudes and skill levels.

How do you recruit, then?

Minimize the demographics for small tests, focus on knowledge and proficiency
Skip the demographic questionnaire (or minimize it at least) and focus on what participants have done related to what you’re testing.
If you are doing a test of a Web site, you might care about what kinds of things do participants do on the Internet and how often they do it. Also, when was the last time? For example, what’s the last thing they bought online? Purchasing at an e-commerce site, no matter how well designed the site is, involves complex interaction. It might be a reasonable proxy for searching, narrowing a search, going through a decision process, filling in online forms, handling error and information messages, understanding where in an online process they are, and so on. But it doesn’t matter how old participants are, how educated they are, or (usually) what their household income is.

If you’re testing how well text messaging works, you want to know whether people do it already and how much. If they don’t do texting, you might want some people in your study who have received messages but don’t send them. By asking what their recent experiences were related to what you want to test (without giving away your tasks), you can find out about motivation as well as expertise.

And this brings us to a discussion about “novice” versus “expert.” But that’s another post.

Does Geography Matter?

Today I’ve been writing for the new edition of Handbook of Usability Testing about setting up a test environment. Should you be in the lab or in the field? If you’re in the lab, what should the setup be like and why? These seemed like fairly easy questions to answer. But then I got to a question that I’ve been wondering about myself for years: Does geography matter?

Nielsen says it doesn’t

Jakob Nielsen’s April 30, 2007 Alertbox (http://www.useit.com/alertbox/user-test-locations.html) says that geography doesn’t matter (unless there are international considerations or a single industry dominates the location or a couple of other things). “You get the same insights regardless of where you conduct user testing, so there’s no reason to test in multiple cities. When a city is dominated by your own industry, however, you should definitely test elsewhere.”

I sent my question around to several usability testing experts. Jared Spool sent one of the most interesting, but nearly everyone had experience that indicates that geography does matter.

Spool, Killam, and James say it does matter

“Remember,” Jared Spool says, “if you know everything [emphasis mine] there is to know about your users, their tasks, and their contexts, then you never need to test in the first place — all you need to do is be really smart and create a simple design. At that point, it boils down to a simple matter of programming.”

Bill Killam, of User-Centered Design put it this way:

Performance and subjective preference and motivation are all linked, so any change in location that affects one or more of these can be a factor across all of them. But we usually find it appears only in subjective data – not as much in behavioral observations. Even local variations like testing within the client’s office versus a “neutral lab” sometimes have noticeable effects on things like projected responding. However, also consider regional differences in the use or exposure to the product being tested. That will certainly effect results. Not to use too specific an example, but consider testing voting machines in the DC area versus a rural location. Or DC where paper and DREs [direct recording electronic voting machines] already exist versus NY where a full face ballot is used versus Oregon where all votes are by [mail].

Janice James contributed, “I’ve found that it IS important to test across multiple locations because I’ve found that the users do differ in terms of their experience level and exposure to product types, and technology, in general.”

Professor Spool and I continued the conversation by IM:

Dana: Okay, so it seems like your answer and Jakob’s article come from different assumptions. Jakob seems to assume that the field work is done. The team knows the context, etc. You seem to be saying that teams don’t always do the field work, first. By Nielsen’s parking meter example, the design team seems to have some background about the location.

Jared: Except teams always think they know everything.

Dana: I also think Jakob is assuming a fairly mature UX [user experience] group.

Jared: But, Jakob says, except for the few special cases discussed below, we’ve always identified the same usability findings, no matter where we tested. By now, we can clearly conclude that it’s a waste of money to do user testing in more than one city within a country. Good thing he wasn’t testing soda. Or pop. Or coke.

Dana: Yes, to your example, testing IA [information architecture] is a REALLY good reason to test in multiple locations. And the design team always will get some benefit from being on site – usually something that wasn’t predictable.

Dana: And with the audience for this book, I think it’s safe to assume that they won’t have done much (or any) field work before doing usability testing.

Jared: Right.

Jared: Testing in more than one locale is definitely a luxury.

Jared: I wouldn’t not test at all because you can’t get to more than one venue. Another approach is to make it work great for the local community and look to support and other feedback channels to hear if regional differences pop up. It’s the cross-your-fingers approach to design. It’s worked well through the centuries. Another approach is to look at other competitive/comparable designs for things that might be regional. If the designs have elements that seem different, is there a regional explanation?

Jared: Many design issues are just pure human behavior, independent of any cultural or regional issues.

Dana: I believe that.

Jared: Rolf [Molich] and Carolyn [Synder] did a study where they tested people in two countries on the same sites. They found 80% of the problems were in common. They found regional biases. People in Europe didn’t understand the purpose of a gift registry (and found it to be quite vulgar). But, if you perfected the design for your local venue, you’d nail 80% of the problems found anywhere else, if you extrapolate their results. And that’s a pretty good hit rate for a small budget.

Dana: I agree.

Jared: My guess is that’s what Jakob was trying to say.

Dana: That’s possible.

Jared: It’s hard to say with his shield of impenetrable ego obscuring the real intent.

Dana: Do you mind if I clean up this thread and use it in a blog post?

Jared: Not at all.

Jared: You can even leave in the impenetrable ego comment.

Dana: Makes it more believable that it was a conversation with Jared Spool.

Jared: Remember, all elephants are tall and flat, except for the instances when they are long and skinny.

Dana: That’s right. Anyway, thanks for answering the email and for continuing the discussion. I appreciate it.

Jared: I’m saying his exceptions are the generalized case. And his generalized declaration is rarely executable.