Techniques to Identify Themes in Qualitative Data

Gery W. Ryan
RAND
1700 Main Street
P.O. Box 2138
Santa Monica, CA 90407-2138
 
H. Russell Bernard
Department of Anthropology
1350 Turlington Hall
University of Florida
Gaineville, FL 32611

Key Words: Theme Identification, Exploratory Analysis, Open Coding, Text Analysis, Qualitative Research Methods

Abstract

Theme identification is one of the most fundamental tasks in qualitative research. It also one of the most mysterious. Explicit descriptions of theme discovery are rarely described in articles and reports and if so are often regulated to appendices or footnotes. Techniques are shared among small groups of social scientists and are often impeded by disciplinary or epistemological boundaries. During the proposal-writing phase of a project, investigators struggle to clearly explain and justify plans for discovering themes. These issues are particularly cogent when funding reviewers are unfamiliar with qualitative traditions. In this article we have outlined a dozen techniques that social scientists have used to discover themes in texts. The techniques are drawn from across epistemological and disciplinary boundaries. They range from quick word counts to laborious, in-depth, line-by-line scrutiny. Some methods work well for short answers to open-ended questions while others are more appropriate for rich, complex narratives. Novices and non-native speakers may find some techniques easier than others. No single technique is does it all. To us, these techniques are simply tools to help us do better research.

Authors’ Statement

Gery W. Ryan is an Associate Behavioral Scientist at RAND in Santa Monica, California. H. Russell Bernard is professor of anthropology at the University of Florida. The research on which this article is based is part of a National Science Foundation Grant, on "Methods for Conducting Systematic Text Analysis" (SRB-9811166). We wish to thank Stephen Borgatti for his helpful suggestions and two anonymous reviewers for their invaluable comments on earlier drafts of this paper.

 

Introduction

At the heart of qualitative data analysis is the task of discovering themes. By themes, we mean abstract, often fuzzy, constructs which investigators identify before, during, and after data collection. Where do these themes come from?

They come from reviewing the literature, of course. Richer literatures produce more themes. They come from the characteristics of the phenomena being studied. And they come from already-agreed-upon professional definitions, from local common-sense constructs, and from researchers’ values, theoretical orientation, and personal experience with the subject matter (Bulmer 1979; Strauss 1987; Maxwell 1996).

Mostly, though, researchers who consider themselves part of the qualitative tradition in social science induce themes from texts. This is what grounded theorists call open coding, and what classic content analysts call qualitative analysis (Berleson 1952) or latent coding (Shapiro and Markoff 1997). There are many variations on these methods. Unfortunately, however, they are (a) scattered across journals and books that are read by disparate groups of specialists; and (b) often entangled in the epistemological wars that have divided the social sciences. Our goal in this paper is to cross these boundaries and lay out a variety of theme-dredging methods so that all researchers who deal with texts can use them to solve common research problems.

We outline here a dozen helpful techniques for discovering themes in texts. These techniques are based on: (1) an analysis of words (word repetitions, key-indigenous terms, and key-words-in contexts); (2) a careful reading of larger blocks of texts (compare and contrast, social science queries, and searching for missing information); (3) an intentional analysis of linguistic features (metaphors, transitions, connectors); and (4) the physical manipulation of texts (unmarked texts, pawing, and cut and sort procedures).

The list is by no means exhaustive. Social scientists are an enterprising lot. Over the last century they have invented solutions to all kinds of problems for managing and analyzing texts, and they will continue to do so. These bursts of methodological creativity, however, are commonly described perfunctorily, or are relegated to footnotes, and get little notice by colleagues across disciplines. The dozen methods we describe here come from across the social sciences and have been used by positivists and interpretivists alike.

1. Word repetitions

We begin with word-based techniques. Word repetitions, key-indigenous terms, and key-words-in-contexts (KWIC) all draw on a simple observation—if you want to understand what people are talking about, look at the words they use.

Words that occur a lot are often seen as being salient in the minds of respondents. D'Andrade notes that "perhaps the simplest and most direct indication of schematic organization in naturalistic discourse is the repetition of associative linkages" (1991:294). He observes that "indeed, anyone who has listened to long stretches of talk, whether generated by a friend, spouse, workmate, informant, or patient, knows how frequently people circle through the same network of ideas" (1991:287).

Word repetitions can be analyzed formally and informally. In the informal mode, investigators simply read the text and note words or synonyms that people use a lot. For example, while conducting multiple in-depth interviews with Tony, a retired blue collar worker in Connecticut, Claudia Strauss (1992) found that Tony repeatedly referred to ideas associated with greed, money, businessmen, siblings, and "being different." These repetitions indicated to Strauss that these ideas were important, recurring themes in Tony’s life. Strauss displayed the relationships among these ideas by writing the concepts on a page of paper and connecting them with lines and explanations. Computer programs such as ATLAS.ti and Nud*ist let you do this kind of connect-the-dots exercise by computer.1

A more formal analysis of word frequencies can be done by generating a list of all the unique words in a text and counting the number of times each occurs. Computers can easily generate word-frequency lists from texts and are a quick and easy way to look for themes. Ryan and Weisner (1996) asked fathers and mothers of adolescents: "Describe your children. In your own words, just tell us about them." Ryan and Weisner produced a list of all the unique words in the set of responses and the number of times each word was used by mothers and by fathers. Mothers were more likely than fathers to use words like friends, creative, time, and honest; fathers were more likely than mothers to use words like school, good, lack, student, enjoys, independent, and extremely. Ryan and Weisner used this information as clues for themes that they would use later in actually coding the texts.

2. Indigenous categories

Another way to find themes is to look for local terms that may sound unfamiliar or are used in unfamiliar ways. Patton (1990:306, 393-400) refers to these as "indigenous categories" and contrasts them with "analyst-constructed typologies." Grounded theorist refer to the process of identifying local terms as in vivo coding (Strauss 1987:28-32, Strauss and Corbin 1990:61-74).

Understanding indigenous categories and how they are organized has long been a goal of cognitive anthropologists. The basic idea in this area of research is that experience and expertise are often marked by specialized vocabulary. For example, Spradley (1972) recorded conversations among tramps at informal gatherings, meals, card games, and bull sessions. As the men talked to each other about their experiences, there were many references to making a flop.

Spradley combed through his recorded material and notes looking for verbatim statements made by informants about his topic. On analyzing the statements, he found that most of the statements could fit into subcategories such as kinds of flops, ways to make flops, ways to make your own flop, kinds of people who bother you when you flop, ways to make a bed, and kinds of beds. Spradley then returned to his informants and sought additional information from them on each of the subcategories. For other classic examples of coding for indigenous categories see Becker’s (1993) description of medical students use of the word crock, and Agar’s (1973) description of drug addicts’ understandings of what it means to shoot up.

3. Key-words-in-context (KWIC)

Key-words-in-context (KWIC) are closely associated with indigenous categories. KWIC is based on a simple observation: if you want to understand a concept, then look at how it is used. In this technique, researchers identify key words and then systematically search the corpus of text to find all instances of the word or phrase. Each time they find a word, they make a copy of it and its immediate context. Themes get identified by physically sorting the examples into piles of similar meaning.

The concept of deconstruction is an abstract and often incomprehensible term used by social scientists, literary critics and writers in the popular press. Jacques Derrida, who coined the term, refused to define it. To Derrida, the meaning of any text is inherently unstable and variable. Wiener (1997) was curious as to how the concept of deconstruction was used in the popular press. He used a text-based data set (such as Lexis/Nexis), to find instances of the word in popular publications. He found the term used in by everything from Entertainment Weekly to the American Banker. Wiener concludes that:

Most often writers use "deconstruction" as a fancy word for "analysis" or "explanation," or else as an upscale synonym for "destruction." But in some genres, like rock music writing, the term isn't negative at all; it has become a genuinely floating signifier, a verbal gesture that implies a kind of empty intellectual sophistication.

Word-based techniques are typically a fast and efficient ways to start looking for themes. We find that they are particularly useful at early stages of theme identification. These techniques are also easy for novice researchers to apply. Nothing, however, beats a careful scrutiny of the texts for finding themes that may be more subtle or that don’t get signified directly in the lexicon of the text. Scrutiny-based techniques are more time-intensive and require a lot of attention to details and nuances.

4. Compare and contrast

The compare and contrast approach is based on the idea that themes represent the ways in which texts are either similar or different from each other. Glazer and Strauss (1967:101_116) refer to this as the "constant comparison method." [For other good descriptions of the technique see Glazer (1978:56_72) and Strauss and Corbin (1990:84_95).] Typically, grounded theorists begin by conducting a careful line-by-line analysis. They read each line or sentence and ask themselves, "What is this about?" and "How does it differ from the preceding or following statements?" This kind of detailed work keeps the researcher focused on the data themselves rather than on theoretical flights of fancy (Charmaz 1990).

This approach is like interviewing the text and is remarkably similar to the ethnographic interviewing style that Spradley talks about using with his informants (1979:160_172). Researchers compare pairs of texts by asking "How is this text different from the preceding text?" and "What kinds of things are mentioned in both?" They ask hypothetical questions like "What if the informant who produced this text had been a woman instead of a man?" and "How similar is this text to my own experiences?" Bogdan and Biklen (1982:153) recommend reading through passages of text and asking "What does this remind me of?" Like a good journalist, investigators compare answers to questions across people, space, and time.

5. Social science queries

Besides identifying indigenous themes—themes that characterize the experience of informants—researchers are interested in understanding how textual data illuminate questions of importance to social science. Spradley (1979:199–201) suggested searching interviews for evidence of social conflict, cultural contradictions, informal methods of social control, things that people do in managing impersonal social relationships, methods by which people acquire and maintain achieved and ascribed status, and information about how people solve problems. Bogdan & Bilken (1982:156-162) suggested examining the setting and context, the perspectives of the informants, and informants’ ways of thinking about people, objects, processes, activities, events, and relationships. "Moving across substantive areas," says Charmaz, "fosters developing conceptual power, depth, and comprehensiveness" (1990:1163).

Strauss and Corbin (1990:158_175) urge investigators to be more sensitive to conditions, actions/interactions, and consequences of a phenomenon and to order these conditions and consequences into theories. To facilitate this, they offer a useful tool called the conditional matrix. The conditional matrix is a set of concentric circles, each level corresponding to a different unit of influence. At the center are actions and interactions. The inner rings represent individual and small group influences on these actions, and the outer rings represent international and national effects.

Querying the text as a social scientist is a powerful technique because investigators concentrate their efforts on searching for specific kinds of topics – any of which are likely to generate major social and cultural themes. By examining the data from a more theoretical perspective, however, researchers must be careful that they do not overfit the data – that is, find only that for which they are looking. There is a trade-off between bringing a lot of prior theorizing to the theme-identification effort and going at it fresh. Prior theorizing, as Charmaz says (1990), can inhibit the forming of fresh ideas and the making of surprising connections. Assiduous theory-avoidance brings the risk of not making the connection between data and important research questions. Novice researchers may be more comfortable with the tabula rasa approach. More seasoned researchers, who are more familiar with theory issues, may find the social science query approach more compatible with their interests.

6. Searching for missing information

The final scrutiny-based approach we describe works in reverse from typical theme identification techniques. Instead of identifying themes that emerge from the text, investigators search for themes that are missing in the text.

Much can be learned from a text by what is not mentioned. As early as 1959, propaganda analysts found that material not covered in political speeches were sometimes more predictive that material that was covered (George 1959). Sometimes silences indicate areas that people are unwilling or afraid to discuss. For instance, women with strong religious convictions may fail to mention abortion during discussions of birth control. In power-laden interviewers, silence may be tied to implicit or explicit domination (Gal 1991). In a study of birth planning in China, Greenhalgh (1994) surveyed 1,011ever-married women, gathered social and economic histories from 150 families. She conducted in-depth interviews with present and formal officials (known as cadres), and collected documentary evidence from local newspapers, journals and other sources. Greenhalgh notes that "Because I was largely constrained from asking direct questions about resistance, the informal record of field notes, interview transcripts, and questionnaire data contains few overt challenges to state policy (1994:9)." Greenhalgh concludes, however, that

I believe that in their conversations with us, both peasants and cadres made strategic use of silence to protest aspects of the policy they did not like. Cadres, for example were loathe to comment on birth-planning campaigns; peasant women were reluctant to talk about sterilization. These silences form one part of the unofficial record of birth planning in the villages. More explicit protests were registered in informal conversations. From these interactions emerged a sense of profound distress of villagers forced to choose between a resistance that was politically risky and a compliance that violated the norms of Chinese culture and of practical reason (1994:9).

Other times, absences may indicate primal assumptions made by respondents. Spradley (1987:314) noted that when people tell stories, they assume that their listeners share many assumptions about how the world works and so they leave out information that "everyone knows." He called this process abbreviating. Price (1987) takes this observation and builds on it. Thus, she looks for what is not said in order to identify underlying cultural assumptions. Price finds the missing pieces by trying to translate what people say in the stories into something that the general public would understand.

Of all the scrutiny-based techniques, searching for missing information is the most difficult. There are many reasons people do not mention topics. In addition to avoiding sensitive issues or assuming investigator already knows about the topic, people may not trust the interviewer, may not wish to speak when others are present, or may not understand the investigator’s questions. Distinguishing between when informants are unwilling to discuss topics and when they assume the investigator already knows about the topic requires a lot of familiarity with the subject matter.

In addition to word- and scrutiny-based techniques, researchers have used linguistic features such as metaphors, topical transitions, and keyword connectors to help identify themes.

7. Metaphors and analogies

Schema analysts suggest searching through text for metaphors, similes, and analogies (D’Andrade 1995, Quinn and Strauss 1997). The emphasis on metaphor owes much to the pioneering work by Lakoff and Johnson (1980) and the observation that people often represent their thoughts, behaviors, and experiences with analogies.

Naomi Quinn (1997) has analyzed hundreds of hours of interviews to discover concepts underlying American marriage and to show how these concepts are tied together. She began by looking at patterns of speech and at the repetition of key words and phrases, paying particular attention to informants' use of metaphors and the commonalities in their reasoning about marriage. Nan, one of her informants, says that "marriage is a manufactured product." This popular metaphor indicates that Nan sees marriages as something that has properties, like strength and staying power, and as something that requires work to produce. Some marriages are "put together well," while others "fall apart" like so many cars or toys or washing machines (Quinn 1987:174).

The object is to look for metaphors in rhetoric and deduce the schemas, or underlying principles, that might produce patterns in those metaphors. Quinn found that people talk about their surprise at the breakup of a marriage by saying that they thought the couple’s marriage was "like the Rock of Gibraltar" or that they thought the marriage had been "nailed in cement." People use these metaphors because they assume that their listeners know that cement and the Rock of Gibraltar are things that last forever.

But Quinn reasons that if schemas or scripts are what make it possible for people to fill in around the bare bones of a metaphor, then the metaphors must be surface phenomena and cannot themselves be the basis for shared understanding. Quinn found that the hundreds of metaphors in her corpus of texts fit into just eight linked classes that she calls: lastingness, sharedness, compatibility, mutual benefit, difficulty, effort, success (or failure), and risk of failure. For example, Quinn’s informants often compared marriages (their own and those of others) to manufactured and durable products ("it was put together pretty good") and to journeys ("we made it up as we went along; it was a sort of do-it-yourself project"). Quinn sees these metaphors, as well as references to marriage as "a lifetime proposition," as exemplars of the overall expectation of lastingness in marriage.

Other examples of the search for cultural schemas in texts include Holland’s (1985) study of the reasoning that Americans apply to interpersonal problems, Kempton’s (1987) study of ordinary Americans’ theories of home heat control, and Claudia Strauss’s (1997) study of what chemical plant workers and their neighbors think about the free enterprise system.

8. Transitions

Another linguistic approach is to look for naturally occurring shifts in thematic content. Linguistic forms of transition vary between oral and written texts. In written texts, new paragraphs are often used by authors to indicate either subtle or abrupt shifts in topics. In oral speech, pauses, change in tone, or particular phrases may indicate thematic transitions. Linguists who have worked with precisely recorded texts in Native American languages have noticed the recurrence of elements like "Now," "Then," "Now then," and "Now again." These often signal the separation of verses and "once such patterning has been discovered in cases with such markers, it can be discerned in cases without them" (Hymes 1977:439).

For example, Sherzer (1994) presents a detailed analysis of a two-hour performance by Chief Olopinikwa of a traditional San Blas Kuna chant. The chant was recorded in 1970. Like many linguistic anthropologists, Sherzer had taught an assistant, Alberto Campos, to use a phonetic transcription system. After the chant, Sherzer asked Campos, to transcribe and translate the tape. Campos put Kuna and Spanish on left- and right-facing pages (1994:907). By studying Campos’s translation against the original Kuna, Sherzer was able to pick out certain recurrent features. Campos left out the chanted utterances of the responding chief (usually something like "so it is"), which turned out to be markers for verse endings in the chant. Campos also left out so-called framing words and phrases (like "Thus" at the beginning of a verse and "it is said, so I pronounce" at the end of a verse). These contribute to the line and verse structure of the chant. Finally, "instead of transposing metaphors and other figurative and allusive language into Spanish" Campos "explains them in his translation" (Sherzer 1994:908). Researchers

In two-party and multiparty speech, transitions occur naturally. Conversation or discourse analysts closely examine linguistic features such as turn-taking and speaker interruptions to identify transitions in speech sequences. For a good overview, see Silverman (1993:114-143).

9. Connectors

A third linguistic approach is to look carefully at words and phrases that indicate relationships among things. For example, causal relationships are often indicated by such words and phrases as, because, since, and as a result. Words such as if or then, rather than, and instead of often signify conditional relationships. The phrase is a is often associated with taxonomic categories. Time-oriented relationships are expressed with words such as before, after, then, and next. Typically negative characteristics occur less often than positive characteristics. Simply searching for the words not, no, none, or the prefix non may be a quick way to identify themes. Investigator can discover themes by searching on such groups of word and looking to see what kinds of things the words connect.

What other kinds of relationships might be of interest to social scientists? Casagrande and Hale (1967) suggest looking for: attributes (e.g., X is Y), contingencies (e.g., if X, then Y), functions (e.g., X is a means of affecting Y), spatial orientations (e.g., X is close to Y), operational definitions (e.g., X is a tool for doing Y), examples (e.g., X is an instance of Y), comparisons (e.g., X resembles Y), class inclusions (X is a member of class Y), synonyms (e.g., X is equivalent to Y), antonyms (e.g., X is the negation of Y), provenience (e.g., X is the source of Y), and circularity (e.g., X is defined as X). [For lists of kinds of relationships that may be useful for identifying themes see Burton and Kirk (1980:271), Werner and Schoepfle (1987) and Lindsay and Norman (1972).]

Investigators often use the linguistic features described above unconsciously. Metaphors, transitions, and connectors are all part of a native speaker’s ability to grasp meaning in a text. By making these features more explicit, we sharpen our ability to find themes.

Finally, we turn to more tactile approaches for theme discovery. Each of the next three techniques requires some physical manipulation of the text itself.

10. Unmarked texts

One way to identify new themes is to examine any text that is not already associated with a theme (Ryan 1999). This technique requires multiple readings of a text. On the first reading, salient themes are clearly visible and can be quickly and readily marked with different colored pencils or highlighters. In the next stage, the search is for themes that remain unmarked. This tactic–marking obvious themes early and quickly—forces the search for new, and less obtrusive themes.

11. Pawing

We highly recommend pawing through texts and marking them up with different colored highlighter pens. Sandelowski (1995a:373) observes that analysis of texts begins with proofreading the material and simply underlining key phrases "because they make some as yet inchoate sense." Bernard (2000) refers to this as the ocular scan method, otherwise known as eyeballing. In this method, you get a feel for the text by handling your data multiple times. [Bogdan and Biklen (1982:165) suggest reading over the text at least twice.] Researchers have been known to spread their texts out on the floor, tack bunches of them to a bulletin board, and sort them into different file folders. By living with the data, investigators can eventually perform the interocular percussion test—which is where you wait for patterns to hit you between the eyes.

This may not seem like a very scientific way to do things, but it is one of the best ways we know of to begin hunting for patterns in qualitative data. Once you have a feel for the themes and the relations among, then we see no reason to struggle bravely on without a computer. Of course, a computer is required from the onset if the project involves hundreds of interviews, or if it’s part of a multi-site, multi-investigator effort. Even then, there is no substitute for following hunches and intuitions in looking for themes to code in texts (Dey 1993).

12. Cutting and sorting

Cutting and sorting is a more formal way of pawing and a technique we both use quite a bit. It is particularly useful for identifying subthemes. The approach is based on a powerful trick most of us learned in kindergarten and requires paper and scissors. We first read through the text and identify quotes that seem somehow important. We cut out each quote (making sure to maintain some of the context in which it occurred) and paste the material on small index cards. On the back of each card, we then write down the quote’s reference—who said it and where it appeared in the text. Then we lay out the quotes randomly on a big table and sort them into piles of similar quotes. Then we name each pile. These are the themes. This can be done with tag and search software, but we find that nothing beats the ability to manually sort and group the cards.

There are many variations on this pile-sorting technique. The principle investigator on a large project might ask several team members to sort the quotes into named piles independently. This is likely to generate a longer list of possible themes than would be produced by a group discussion. In really large projects, pairs of coders could sort the quotes together and decide on the names for the piles. The pile-sorting exercise should be video- or audiotaped and investigators should pay close attention to discussions—between themselves and coders or between coders—about which quotes belong together and why. These conversations are about as close as we will ever get to witnessing the emergence of themes.

Barkin et al. (1999) interviewed clinicians, community leaders, and parents about what physicians could and did do to prevent violence among youth. These were long, complex interviews, so Barkin et al. broke the coding process into two steps. They started with three major themes that they developed from theory. The principle investigator went through the transcripts and cut out all the quotes that pertained to each of the major themes. Then four other coders independently sorted the quotes from each major theme into piles. Then, the pile sort data were analyzed with multidimensional scaling and cluster analysis to identify subthemes shared across coders. [See Patterson et al. (1993) for another example.]

Jehn and Doucet (1997) had short answers to open-ended questions. They found that several coders could easily sort these paragraph-length descriptions of inter and intra-ethnic conflict. Then, like Barkin et al., Jehn and Doucet then used multidimensional scaling and cluster analysis to identify subthemes of conflict.

Another advantage to the cutting and sorting technique is that the data can be used to systematically describe how such themes are distributed across informants. After the piles have been formed and themes have been named, simply turn over each quote and identify who mentioned each theme. (If the people sorting the quotes are unaware of who the quotes came from, this is an unbiased way of coding.)

Discussion

The variety of methods available for coding texts raises some obvious questions:

(1) Which technique generates more themes?

Frankly, we don’t know. There are just too many factors that influence the number of themes that are generated, including the technique itself, who and how many people are looking for themes, and the kind and amount of texts being analyzed. If the goal is to generate as many themes as possible—which is often the case in initial exploratory phases of research—then more is better. This means using multiple techniques, investigators, and texts.

Nowhere is a multiple technique approach better exemplified than in the work of Jehn and Doucet (1996, 1997). Jehn and Doucet asked 76 U.S. managers who had worked in Sino_American joint ventures to describe recent interpersonal conflicts with business partners. Each person described a situation with a same_culture manager and a different_cultural manger. First they generated separate lists of words from the intercultural and intracultural conflict narratives. They asked 3 expatriate managers to act as judges and to identify all the words that were related to conflict. They settled on a list of 542 conflict words from the intercultural list and 242 words from the intracultural list.

Jehn and Doucet then asked the three judges to sort the words into piles or categories. The experts identified 15 subcategories for the intercultural data—things like conflict, expectations, rules, power, and volatile—and 15 categories for the intracultural data—things like conflict, needs, standards, power, contentious, and lose. Taking into consideration the total number of words in each corpus, conflict words were used more in intracultural interviews and resolution terms were more likely to be used in intercultural interviews.

Jehn and Doucet (1996, 1997) also used traditional content analysis on their data. The had two coders read the 152 conflict scenarios (76 intracultural and 76 intercultural) and evaluated (on a 5_point scale) each on 27 different themes they had identified from the literature. This produced two 76x27 scenario_by_theme profile matrices—one for the intracultural conflicts and one for the intercultural conflicts. The first three factors from the intercultural matrix reflect: (1) interpersonal animosity and hostility; (2) aggravation; and (3) the volatile nature of the conflict. The first two factors from the intracultural matrix reflect: (1) hatred and animosity with a volatile nature and (2) conflicts conducted calmly with little verbal intensity.

Finally, Jehn and Doucet identified the 30 intracultural and the 30 intercultural scenarios that they felt were the most clear and pithy. They recruited fifty more expatriate managers to assess the similarities (on a 5_point scale) of 60–120 randomly selected pairs of scenarios. When combined across informants, the managers judgments produced two aggregate, scenario_by_scenario, similarity matrices—one for the intracultural conflicts and one for the intercultural conflicts.

Multidimensional scaling of the intercultural similarity data identified four dimensions: (1) open versus resistant to change, (2) situational causes versus individual traits, (3) high_ versus low_resolution potential based on trust, and (4) high_ versus low_resolution potential based on patience. Scaling of the intracultural similarity data identified four different dimensions: (1) high versus low cooperation, (2) high versus low confrontation, (3) problem_solving versus accepting, and (4) resolved versus ongoing.

The work of Jehn and Doucet is impressive because the analysis of the data from these tasks produced different sets of themes. All three emically induced theme sets have some intuitive appeal and all three yield analytic results that are useful. They could have also used the techniques of grounded theory or schema analysis to discover even more themes.

(2) When are the various techniques most appropriate?

The choice of techniques depends minimally on the kind and amount of text, the experience of the researcher, and the goals of the project. Word-based techniques (e.g., word repetitions, indigenous categories, and KWIC) are probably the least labor intensive. Computer software such as Anthropac and Code-a-text have little trouble in generating frequency counts of key words.2 A careful look at the frequency list and maybe some quick pile sorts are often enough to identify quite a few themes. Word-based techniques are also the most versatile. They can easily be used with complex texts such as the complete works of Shakespear or the Bible, as well as, with simple short answers to open-ended questions. They can also be used relatively easily by novice and expert investigators alike. Given their very nature, however, they are best used in combination with other approaches.

Scrutiny-based techniques (e.g., compare and contrast, querying the text, and examining absences) are most appropriate for rich textual accounts and tend to be overkill for analyzing short answer responses. Investigators who are just beginning to explore a new topical area might want to start with compare-and-contrast techniques before moving on to the more difficult tasks of querying the text or searching for missing information. We do not advise using the latter two techniques unless the investigator is fluent in the language in which the data are collected. If the primary goal of the this portion of the investigation is to discover as many themes as possible, then nothing beats using these techniques on a line-by-line basis.

Like scrutiny-based techniques, linguist-based approaches are better used on narrative style accounts rather than short answer responses. Looking for transitions is the easiest technique to use, especially if the texts are actually written by respondents themselves (rather than transcribed from tape recordings of verbal interviews). Searching for metaphors is also relatively easy once novices have been trained on what kind of things to look for in the texts. Looking for connecting words and phrases is best used as a secondary wave of finding themes, once the investigator has a more definite idea of what kinds of themes he or she finds most interesting.

In the early stages of exploration, nothing beats a thorough reading and pawing through of the data. This approach is the easiest for novice researchers to master and is particularly good for identifying major themes. As the exploration progresses, investigators often find themselves looking for subthemes within these major themes. The cutting and sorting techniques are most helpful here. Investigators can identify all text passages that are related to a major theme, cut them out, and sort them into subthematic categories. Likewise, if they are marking texts for each newly discovered theme, then they can apply the unmarked text technique as they go. We have seen these three techniques applied successfully to both rich narrative data as well as simple responses to open-ended questions.

An even more powerful strategy would be to combine multiple techniques in a sequential manner. For example, investigators might begin by pawing through the data to see what kinds of themes just stick out. As part of this process, they might want to make comparisons between paragraphs and across informants. A quick analysis of word repetitions would also be appropriate for identifying themes at such an early stage of the analysis. If key words or indigenous phrases are present, researchers might followed-up by conducting more focused KWIC analyses. If the project is examining issues of equality, investigators might also look for texts that are indicative of power differentials and access to resources. Texts representing major themes can be marked either on paper or by computer. Investigators can then search areas that are not already marked for additional themes or cut and sort marked texts into subthemes.

Researchers also might consider beginning by looking for identifying all metaphors and similes, marking them, cutting them out and sorting them into thematic categories. There is no single way to discover themes. In theme discovery, we assume that more is always better.

(3) When do you know when you’ve found all the themes?

There is no magic formula to answer this question. The problem is similar to asking members of a population to list all the illnesses they know. One can never be sure of the full range of illnesses without interviewing the entire population. This is true because there is always the possibility that the last person interviewed will mention a new disease. We can simplify the process considerably, however, if we are willing to miss rarely-mentioned illness. One strategy would be to interview people until some number of respondents in a row (say five or more) fail to mention any new illnesses.

In text analysis, grounded theorists refer to the point at which no new themes are being identified as theoretical saturation (Strauss and Corbin 1990:188). When and how theoretical saturation is reached, however, depends the number of texts and their complexity, as well as on investigator experience and fatigue, and the number of investigators examining the texts. Again, more is better. Investigators who have more experience finding themes are likely to reach saturation latter than novices. Wilson and Hutchinson warn against premature closure where the researcher "fails to move beyond the face value of the content in the narrative (1990:123)."

Summary

Theme identification is one of the most fundamental tasks in qualitative research. It also one of the most mysterious. Explicit descriptions of theme discovery are rarely described in articles and reports and if so are often regulated to appendices or footnotes. Techniques are shared among small groups of social scientists and are often impeded by disciplinary or epistemological boundaries. The lack of clear methodological descriptions is most evident during the grant-writing phase of research. Investigators (ourselves included) struggle to clearly explain and justify plans for discovering themes in the qualitative data. These issues are particularly cogent when funding reviewers are unfamiliar with qualitative traditions.

In this article we have outlined a dozen techniques that social scientists have used to discover themes in texts. The techniques are drawn from across epistemological and disciplinary boundaries. They range from quick word counts to laborious, in-depth, line-by-line scrutiny. Some work well for short answers to open-ended questions while others are more appropriate for rich, complex narratives. Novices and non-native speakers may find some techniques easier than others. No single technique is does it all. To us, these techniques are simply tools to help us do better research.

 

Notes

1 ATLAS.ti (Scientific Software Development) and Nud•ist (Qualitative Solutions & Research) are qualitative analysis packages distributed in the United States by SCOLARI, Sage Publications, Inc., 2455 Teller Road, Thousand Oaks, CA 91320. Tel: (805) 499 1325. Fax: (805) 499 0871. E_mail: atlasti@scolari.com. Web: www.scolari.com.

2 Anthropac (Analytic Technologies) and Coda-A-Text (Cartwright) are software packages that have the capacity to convert free flowing texts into word-by-document matrices. Code-A-Text is distributed in the United States by SCOLARI, Sage Publications. Anthropac is created and distributed by Analytic Technologies, Inc., Analytic Technologies, Inc., 11 Ohlin Lane, Harvard, MA 01451. Tel: (978) 456_7372. Fax: (978) 456_7373. E_mail: sales@analytictech.com. Web: www.analytictech.com.

References Cited

Agar, Michael.

1973 Ripping and running: A formal ethnography of urban heroin addicts. New York: Seminar Press.

Agar, Michael and Jerry Hobbs

1985 How to grow schemata out of interviews. In Directions in Cognitive Anthropology. Janet Dougherty, ed. Pp. 413-431. Urbana, IL: University of Illinois Press.

Barkin, Shari, Gery Ryan, Lillian Gelberg

1999 What clinicians can do to further youth violence primary prevention: A qualitative study. Injury Prevention, 5:53-58.

Becker, Howard

1993 How I learned what a crock was. Journal of Contemporary Ethnography 22:28-35.

1998 Tricks of the trade: How to think about your research while you’re doing it. Chicago: University of Chicago Press.

Berelson, Bernard

1952 Content analysis in communication research. Glencoe, IL: Free Press.

Bernard, H. Russell

2000 Social Research Methods: Qualitative and Quantitative Approaches. Thousand Oaks, CA: Sage Publications.

Bogdan, Robert, and Sari Knopp Biklen

1992 Qualitative Research for Education: An Introduction to Theory and Methods, 2d ed. Boston: Allyn and Bacon.

Borgatti, Stephen

1999 Elicitation Methods for Cultural Domain Analysis. In J. Schensul & M. LeCompte (Ed.) The Ethnographer's Toolkit, Volume 3. Walnut Creek: Altamira Press, 115-151.

Bulmer, Martin

1979 Concepts in the analysis of qualitative data. Sociological Review 27(4)651-677).

Charmaz, Kathy

1990 "Discovering" Chronic Illness: Using Grounded Theory. Social Science and Medicine 30:1161–1172.

Charmaz, Kathy

2000 Grounded theory: Objectivist and constructivist methods. In Handbook of Qualitative Research, 2nd Edition. Norman Denzin and Yvonna Lincoln, eds. Thousand Oaks, CA: Sage Publications. Pp. 509-536.

D'Andrade, Roy

1995 The development of cognitive anthropology. Cambridge: Cambridge University Press.

Dey, Ian

1993 Qualitative Data Analysis: A User_Friendly Guide for Social Scientists. London: Routledge and Kegan Paul.

Gal, Susan

1991 Between speech and silence: The problematics of research on language and gender. In Gender at the crossroads of knowledge: Feminist anthropology in the postmodern era. Michaela di Leonardo, ed. Berkeley: University of California Press. Pp. 175-203.

George, A. L. 1959. Quantitative and qualitative approaches to content analysis. In Trends in content analysis I. de Sola Pool, ed. Pp. 7_32. : University of Illinois Press.

Glaser, Barney G. and Anselm Strauss

1967 The Discovery of Grounded Theory: Strategies for Qualitative Research. New York: Aldine.

Gladwin, Christina

1989 Ethnographic Decision Tree Modeling. Newbury Park, CA: Sage Publications.

Greenhalgh, Susan

1994 Controlling births and bodies. American Ethnologist 21:3_30.

Henley, N.M.

1969 A Psychological Study of the Semantics of Animal Terms. Journal of Verbal Learning and Verbal Behavior 8:176-84.

Jehn, Karen A. and Lorna Doucet

1996 Developing Categories from Interview Data: Text Analysis and Multidimensional Scaling. Part 1. Cultural Anthropology Methods Journal 8(2):15–16.

1997 Developing Categories for Interview Data: Consequences of Different Coding and Analysis Strategies in Understanding Text. Part 2. Cultural Anthropology Methods Journal 9(1):1–7.

Lindsay, Peter H. and Donald A Norman

1972. Human information processing: An introduction to psychology. New York: Academic Press.

Maxwell, Joseph

1996 Qualitative research design: An interactive approach. Thousand Oaks, CA: Sage Publications.

Miles, Matthew and A. Michael Huberman

1994 Qualitative Data Analysis, 2d ed. Thousand Oaks, CA: Sage Publications.

Patton, Michael Q.

1990 Qualitative Evaluation and Research Methods. Thousand Oaks, CA: Sage Publications.

Pool, de Sola, Ithiel, ed.

1959 Trends in Content Analysis. Urbana: University of Illinois Press.

Price, Laurie

1987 Ecuadorian Illness Stories. In Cultural Models in Language and Thought. D. Holland and N. Quinn, eds. Pp. 313–342. Cambridge: Cambridge University Press.

Ryan, Gery

1999 Measuring the typicality of text: Using multiple coders for more than just reliability and validity checks. Human Organization, 58(3):313-322.

Spradley, James

1972 Adaptive Strategies of Urban Nomads. In Culture and Cognition: Rules, Maps, and Plans. J. P. Spradley, ed. Pp. 235-278. New York: Chandler Publishing Company.

Spradley, James

1979. The Ethnographic Interview. New York: Holt, Rinehart and Winston.

Strauss, Claudia

1992 What makes Tony run? Schemas as motive reconsideration. In Human motives and cultural models R. D'Andrade and C. Strauss, eds. Pp. 191-224. Cambridge: Cambridge University Press.

Strauss, Claudia and Naomi Quinn

1997 A cognitive theory of cultural meaning. Cambridge: Cambridge University Press.

Wiener, Jon

1997 Deconstruction goes pop. (The increasing use of the word 'deconstruction'). The Nation, April 7, 264(13):43-45.

Wilson, Holy Skodol and Sally Hutchinson

1990 Methodologic mistakes in grounded theory. Nursing Research, 45(2):122-124.

Wright, Joanne

1997 Deconstructing development theory: Feminism, the public/private dichotomy and the Mexican maquiladoras. The Canadian Review of Sociology and Anthropology, 34(1):71-92.