Usability testing and democracy: evaluating ballot designs makes the headlines

Today the Brennan Center for Justice at the law school at NYU released a major report about the impact of poor ballot designs and unclear instructions on voters and the importance of usability testing.
Among the highlights is an overview of the Usability Professionals’ Association (UPA) usability testing kit for local election officials (the LEO Usability Testing Kit). Members of the UPA Usability in Civic Life Project are working with Brennan Center to provide direct training for election officials.
The report is titled Better Ballots, and can be found on the Brennan Center site:
http://www.brennancenter.org/

http://www.brennancenter.org/content/resource/better_ballots/
The report released today, and 3 articles in USA Today and the New York Times highlight it.
Study: Poor ballot designs still affect U.S. elections
http://www.usatoday.com/news/politics/election2008/2008-07-20-ballots_N.htm

Ballot designs are ‘literacy test for voters’
http://www.usatoday.com/news/politics/election2008/2008-07-20-ballot-inside_N.htm

Influx of Voters Expected to Test New Technology
http://www.nytimes.com/2008/07/21/us/21voting.html

Stop writing reports

I’ve had a few questions from readers lately about standardizing reports of usability test results. Why is there no report template in the Handbook? There’s no “template” for a final report because I think you probably shouldn’t be writing reports. Or at least written reports should be minimal. Mini.mal. Though the outline should basically be what’s in the Handbook, what you put in your report depends on

  • The test design and plan
  • What your team needs and can use

And let’s use “report” in the loosest possible way: delivering information to others. That’s it. Your report doesn’t have to be a long, prose-based, descriptive tome. (Not that there’s anything wrong with that.) And the delivery method doesn’t have to be paper.



That leaves a lot of options, from an email with a bulleted list of items, to a “top line” post on a blog or wiki that lightly covers the main trends and patterns. In the middle of the range might be a classic usability test report that describes results and findings in some detail. (I personally dislike slide decks as reports, but a lot of organizations do them.) These will all work for any type of test. For summative tests, you may want to go as far as the CIF, or Common Industry Format, established by the International Standards Organization. BUT if your team has observed the sessions and attended the debriefs, you probably don’t need much of a report. They won’t read it; everything has been discussed and decided already. Whatever you deliver is simply a record of that set of decisions and agreements.

Making it easy to collect the data you want to collect

As I have said before, taking notes is rife with danger. It’s so tempting to just write down everything that happens. But you probably can’t deal with all that data. First, it’s just too much. Second, it’s not organized.

Let’s look at an example research question: Do people make more errors on one version of the system than the other?

And we chose these measures to find out the answer:

  • Count of all incorrect selections (errors)
  • Count and location of incorrect menu choices
  • Count and location of incorrect buttons selected
  • Count of errors of omission
  • Count and location of visits to online help
  • Number and percentage of tasks completed incorrectly

Continue reading Making it easy to collect the data you want to collect

Translating research questions to data

There’s an art to asking a question and then coming up with a way to answer it. I find myself asking, What do you want to find out? The next question is How do we know what the answer is?

Maybe the easiest thing is to take you through an example.

Forming the right question

On a study I’m working on now, we have about 10 research questions, but the heart of the research is about this one:

Do people make more errors on one version of the system than the other?

Note that this is not a hypothesis, which would be worded something more like, “We expect people to make more mistakes and to be more likely to not complete tasks on the B version of the system than on the A version of the system.” (Some would argue that there are multiple hypotheses embedded in that statement.)

But in our study, we’re not out to prove or disprove anything. Rather, we just want to compare two versions to see what works well about each one and what doesn’t.

 

Choosing data to answer the question

There are dozens of possible measures you can look at in a usability test. Here are just a few examples:

Continue reading Translating research questions to data

Data collecting: Tips and tricks for taking notes

A common mistake people make when they’re new to conducting usability tests is taking verbatim notes.

Note taking for summative tests can be pretty straightforward. For those you should have benchmark data that you’re comparing against or at least clear success criteria. In that case, data collecting could (and probably should) be done mostly by the recording software (such as Morae). But for formative or exploratory tests, note taking can be more complex.

Why is it so tempting to write down everything?

Interesting things keep happening! Just last week I was the note taker for a summative test in which I noticed (after about 30 sessions), that women and men seemed to be holding the stylus for marking what we were testing differently and that it seemed that difference was causing a specific category of errors.

But the test wasn’t about using the hardware. This issue wasn’t something we had listed in our test plan as a measure. It was interesting, but not something we could investigate for this test. We will include it as an incidental observation in the report as something to research later.

Note taking don’ts

  • Don’t take notes yourself if you are moderating the session if you can help it.
  • Don’t take verbatim notes. Ever. If you want that, record the sessions and get transcripts. (Or do what Steve Krug does, and listen to the recordings and re-dictate them into a speech recognition application.)
  • Don’t take notes on anything that doesn’t line up with your research questions.
  • Don’t take notes on anything that you aren’t going to report on (either because you don’t have time or it isn’t in the scope of the test).

 

Tips and tricks

  • DO get observers to take notes. This is, in part, what observers are for. Give them specific things to look for. Some usability specialists like to get observer notes on large sticky notes, which is handy for the debriefing sessions.
  • DO create pick lists, use screen shots, or draw trails. For example, for one study, I was trying to track a path through a web site to see if the IA worked. I printed out the first 3 levels of IA in nested lists in 2 columns so it fit on one page of a legal sized sheet of paper. Then I used colored highlighters to draw arrows from one topic label to the next as the participant moved through the site, numbering as I went. It was reasonably easy to transfer this data to Excel spreadsheets later to do further analysis.
  • DO get participants to take notes for you. If the session is very formative, get the participants to mark up wireframes, screen flows, or other paper widgets to show where they had issues. For example, you might want to find out if a flow of screens matches the process a user typically follows. Start the session asking the participant to draw a boxes-and-arrows diagram of their process. At the end of the session, ask the participant to revise the diagram to a) get any refinements they may have forgotten, b) see gaps between their process and how the application works, or c) some variation or combination of a and b.
  • DO think backward from the report. If you have written a test plan, you should be able to use that as a basis for the final report. What are you going to report on? (Hint: the answers to your research questions, using the measures you said you were going to collect.)

The importance of rehearsal

You have designed a study. Everyone seems to be buying in. Scheduling participants is working out and the mix looks good. What’s left to be done except just doing the sessions? Three things:

  1. Practice.
  2. Practice.
  3. Practice.

There are three rounds of practice that I do before I do a “real” session. Jeez, I can hear you say, why would I need to practice so much? Why would you, Dana, who have been doing usability testing for so many years, need to practice so much? I do it for a couple of reasons:

  • It gives me multiple opportunities to clarify the intent of the test, the tasks, and the data measures.
  • I can focus on observing the participant in each regular session because any kinks have been worked out.

Walk through the script and gather tools and materials
The first is to walk through my test plan and script. I read the script aloud even though I’m by myself. While I’m doing that, I do two things: adjust the wording to sound more natural, and gather tools and materials I’ll need to do the sessions.

Do a dress rehearsal or dry run
For the second round of practice, I do a dry run of the now refined script with someone I know filling the role of the participant. We do everything you would normally do in a session, from greeting and filling out forms, to doing tasks, to closing the session. I might occasionally stop the session to adjust the script or to make notes about what to do differently next time. I might even ask the participant (usually a friend, neighbor, or colleague) questions about whether the test is making sense. It’s a combination of dress rehearsal and “logic and accuracy” test to get the sequence down and to make sure you’ve got all the necessary pieces.

Pilot the protocol
Finally, there’s the pilot test session. In this pilot, I work with a “real” participant – someone who was screened and scheduled along with all of the other participants. I conduct the session in the same way I intend to conduct all of the following sessions. The twist this time is that observers from the design team should be present. At the end of the session, I debrief with them about the protocol.

Don’t waste good participant data
There have been times when I’ve been rushed by a client or was just too cavalier about going into a usability test and did not rehearse. I paid for it by having rough sessions that I couldn’t use all the data from. Every time it’s a reminder that preparation and practice are as important to getting good data as a good test design is.

Are you doing “user testing” or “usability testing”?

Calling anything user testing just seems bad. Okay, contrary to the usual content on this blog – which I’ve tried to make about method and technique – this discussion is philosophical and political. If you feel it isn’t decent to talk about the politics of user research in public, then you should perhaps click away right now.

I know, talking about “users” opens up another whole discussion that we’re not going to have here, now. In this post, I want to focus on the difference between “usability testing” and “user testing” and why we should be specific.

When I say “usability test,” what I’m talking about is testing a design for how usable it is. Rather, how unusable it is, because that’s what we can measure: how hard is it to use; how many errors do people make; how frustrated do people feel when using it. Usability testing is about finding the issues that leave a design lacking. By observing usability test sessions, a team can learn about what the issues are and make inferences about why they are happening to then implement informed design solutions.

If someone says “user testing,” what does that mean? Let’s talk about the two words separately.

First, what’s a “user”? It is true that we ask people who use (or who might use) a design to take part in the study of how usable the design is, and some of us might refer to those people as “users” of the product.

Now, “testing” is about using some specified method for evaluating something. If you call it “user testing,” it sure sounds like you are evaluating users, even though what you probably mean to say is that you’re putting a design in front of users to see how they evaluate it. It’s shorthand, but I think it is the wrong shorthand.

If the point is to observe people interacting with a design to see where the flaws in the design are and why those elements aren’t successful, then you’re going beyond user testing. You’re at usability testing. That’s what I do as part of my user research practice. I try not to test the users in the process.

Should you test in a lab or in the field?

I haven’t been in a usability test lab for about a year. Ironically, since I was writing a book about usability testing, much of my work was field research to learn about particular audiences and their tasks.

And, though my usual position about labs is that exploratory usability testing is probably better done in the user’s environment, I’m excited about getting back into the lab.

Good reasons to test in a lab
I’m doing these upcoming tests in a lab facility because

  • The testing is quantitative and summative. That is, I’m doing very specific counts of errors and failures that are strictly defined, so I want to control other aspects of the test such as the computer setup.

 

  • I don’t want to interact much with the participants. I only want to direct participants when to start their tasks. Otherwise, I will intervene in the session only at prescribed points, so I will direct the session from a different room from where the participants are working. 
  • I may have observers, but I won’t know until the last minute. Though I prefer it if observers arrive before the session starts and stay through a whole session, at a facility they can come and go because they can observe from a separate room.

 

Good reasons to test in the field
I recently did a usability study in the field. Why?

  • I wanted to learn about the user’s environment (rather than controlling it). In the exploratory study I’m thinking of, I got the best of both worlds: usability testing data in a realistic situation. I learned about lighting levels, surrounding noise, and what the participant’s desk setup was like. But I also got to observe relationships and interactions the participant had with others, typical interruptions (and recovery from those), and how the thing I was testing fit into the person’s work.

 

  • It was convenient for the participants. They don’t have travel to the testing site. The interruption of their typical day is minimized. 
  • The sessions were informal enough that observers could be present in the room (after they had been properly trained). In fact, people from neighboring cubes often chimed in comments or questions because they’d overheard what we were talking about. I took this to be a good thing because I learned about that communication dynamic, but those eavesdroppers often contributed information that was useful to me in my study.

 

In a future post, I’ll talk about what to look for in a lab facility if you’re renting one and how to find one.

It’s here (almost)! Handbook of Usability Testing 2.0

I’m tingling, I’m so excited. I like to think that this is a special event in the user experience world. But every book author probably thinks that.

Handbook of Usability Testing, Second Edition by Jeff Rubin and Dana Chisnell ships on Monday, April 28.

This is not your mother’s HUT. Well, of course not. The first edition was published in 1994. Technology isn’t special anymore, it’s everywhere. (There were DOS examples, for heaven’s sake!) For HUT 2.0, Jeff and I

    • Simplified the organization of the main sections
    • Reordered many chapters to more closely reflect the flow of planning and conducting a test
    • Updated dozens and dozens of examples, samples, and stories
    • Expanded and updated discussions about recruiting participants, whether you need a lab, working with observers, analyzing testing data, and (we think) the best way to make recommendations
    • Added a chapter on variations on the basic method
    • Populated www.wiley.com/go/usabilitytesting with
  • electronic versions of many of the deliverables used as examples used in the book
  • updated references
  • a (we hope) comprehensive list of other resources such as conferences and seminars, other books, blogs, and podcasts.

 

The drawings and diagrams are have been freshened and improved. The layout and format promise to be less nerdy and more accessible, too.

Oh, and we benefited from sage reviews from Janice James, founder of the Usability Professionals’ Association as our technical editor (brava!), and a foreword by Jared Spool.

Here’s the official cite:
Rubin and Chisnell, Handbook of Usability Testing, Second Edition: How to Plan, Design, and Conduct Effective Tests (Wiley, 0470185481, 450 pages, April 28, 2008).

Recruit based on demographics or behavior?

Recruiting for usability test is hard. (I’ve said this before.) And it’s the most important thing to get right in a test. So how do you decide who to recruit?

Demographics don’t describe behavior
If you buy the argument of your marketing department, you will look at the demographics of the various segments and try to match their proportions. You’ll know the ages, incomes, educations, ethnicity, and genders of your participants. But does knowing this help you predict behavior or performance? More importantly, with a sample of, say, eight participants, can you generalize discovered usability problems to the broader cohort?

Probably not. Here’s an example of why.

Though most video gamers are male, some are female. The problems and successes they have in using a game are similar. And there will be differences within the genders, too. Though most video gamers are young, there are a lot that aren’t. The problems they have in using a game are not likely due to differences in age if the participants have similar expertise on the platform and with the game (or similar games).

Behavior describes performance
Instead, the differences in behavior (interaction between the person and the technology) and performance (whether the human is successful in completing technology-mediated tasks) are much more likely to stem from differences in expertise.

Being younger or older doesn’t make you an expert at anything necessarily. Having a higher or lower household income doesn’t, either. You could argue that education level might, but it usually doesn’t unless there’s something in the test that is related to a particular domain that the educated person was specifically trained for.

You want people to be motivated to do the tasks you want them to do when they get into your test situation. This is a place where it might make it easier or more difficult to find people. For example, if you want to test an online banking service or find out if someone might sign up for a brokerage account online, it’s more likely that the participants will fall into a “mature” category on the age scale than at the younger end or the very old end. And that is just because people in the mature range are more likely to have or want a mortgage than someone who is younger and isn’t in the market to buy a house or someone who is older who really would rather have a reverse mortgage. But you might find some on either end, too. But you want to see a range of people with different aptitudes and skill levels.

How do you recruit, then?

Minimize the demographics for small tests, focus on knowledge and proficiency
Skip the demographic questionnaire (or minimize it at least) and focus on what participants have done related to what you’re testing.
If you are doing a test of a Web site, you might care about what kinds of things do participants do on the Internet and how often they do it. Also, when was the last time? For example, what’s the last thing they bought online? Purchasing at an e-commerce site, no matter how well designed the site is, involves complex interaction. It might be a reasonable proxy for searching, narrowing a search, going through a decision process, filling in online forms, handling error and information messages, understanding where in an online process they are, and so on. But it doesn’t matter how old participants are, how educated they are, or (usually) what their household income is.

If you’re testing how well text messaging works, you want to know whether people do it already and how much. If they don’t do texting, you might want some people in your study who have received messages but don’t send them. By asking what their recent experiences were related to what you want to test (without giving away your tasks), you can find out about motivation as well as expertise.

And this brings us to a discussion about “novice” versus “expert.” But that’s another post.