How many participants are needed for reliable results?

October 19th, 2009 by Larry

Editor’s Note: This article was originally presented as a poster at UPA 2004.  It was also previously published on our blog on December 19th, 2007.  We recommend that you also review Jakob Nielsen’s response.

Tom Tullis, Fidelity Investments, Inc.

Larry Wood, ParallaxLC

Introduction

As card sorting has become more popular, several methodological questions/issues have arisen, such as the effect of instructions on participants, the differences in results from in-person vs. remote studies, and the number of participants needed for reliable results.  This report addresses this last issue (i.e., how many participants are needed for a study to produce reliable results).  The study was conducted online by the Usability Department at Fidelity Investments, Inc.  A total of 46 cards (items) were used in the study, many of which represented services offered internally by the the Usability Department, such as prototyping, usability testing, and card-sorting. Categories were not predefined; each user created and named their own categories (considered an “open sort”).

A total of 168 employees participated in the card-sorting study. From their data a similarity matrix was created, showing the frequency with which each pair of items was placed in the same category across all 168 participants. Therefore, the maximum similarity was 168 (all of the participants placed those two items in the same category), and the minimum was 0 (none of the participants placed the two items in the same category.

Data Analysis

The similarity matrix referred to above is the basis on which a statistical cluster analysis is performed, the result of which effectively “averages” the categorization accumulated across a set of participants. The resulting cluster analysis is then displayed as a hierarchical tree structure (known formally as a dendrogram), which shows clusters of similar items on which organization of content in a web site can be based.

The major goal of our research was to assess the degree of similarity of an organizational tree structure derived from random samples of participants to a structure based on the full set of 168 participants. This could then be used to estimate the minimum number of participants needed to produce an effective organization. As a means to that end, correlation coefficients were calculated between the similarity matrices for each sample size and the matrix for all 168 participants. The assumption is that the more similar the trees, the higher should be the correlation between the similarity matrices on which those trees are based. Thus, correlation coefficients between the sample similarity matrices and the full similarity matrix were calculated for 10 random samples each of sizes 2, 5, 8, 12, 15, 20, 30, 40, 50, 60, and 70 participants . A graph of the resulting mean correlation coefficients is shown in Figure 1.

Figure 1. Correlation coefficients for various sample sizes, with error bars.

As shown in the graph, the relationship between the sample size and the average correlation is a negatively increasing function. Thus, the increase is more dramatic at the smaller sample sizes so that as the size increases beyond 20-30, there is little increase in the size of the correlation coefficient. Also note that the variance of the values, as indicated by the error bars, is much greater for the smaller samples.

An important question is how the function shown in Figure 1 relates to the similarity of the actual tree structures as a function of sample size. One practical implication is that the structures derived from sample sizes above 30 are very similar to that derived from the full set of participants, while those based on smaller sample sizes are increasingly different with smaller sample sizes. To the extent that this is true, it would have implications for determining the minimum number of users needed to obtain valid information.

Conclusions

A general conclusion that can be drawn on the basis of this research is that it may not be cost effective to spend resources to gather information from more than 20-30 participants in a card-sorting study. However, it is important to note that even the trees based on the smallest sample sizes are probably closer to the one for all 168 participants than might be obtained from speculation by a designer who is not a potential user of the content or application for which the organization is being developed. As always, we must exercise appropriate caution in generalizing results from one study. Results will obviously differ as a function of the homogeneity of the participants in a sample and such things as the instructions given to the participants for the card-sorting task.

Beyond Card Sorting

  • Do you use video in your user research? The team behind WebSort also runs GuapoVideo. Upload, annotate, & share your research videos, all from a web-based interface.

Ready to try WebSort for online card sorting? Get started for free