CAM, The Cultural Anthropology Methods Journal, Vol. 8 no. 1, 1996

Qualitative Data, Quantitative Analysis


H. Russell Bernard
University of Florida


About Texts

Every cultural anthropologist ought to be interested in finding, or creating, and analyzing texts. By "finding" texts I mean things like diaries, property transactions, food recipes, personal correspondence, and so on. By "creating" texts I mean recording what people say during interviews.

But by creating text, I also mean doing what Franz Boas did with George Hunt, and what Paul Radin did with Sam Blowsnake. In 1893, Boas taught Hunt to write Kwakiutl, Hunt's native language. By the time Hunt died in 1933, he had produced 5,650 pages of text -- a corpus from which Boas produced most of his reports about Kwakiutl life (Rohner 1966).

Sam Blowsnake was a Winnebago who wrote the original manuscript (in Winnebago) that became, in translation, Crashing Thunder: An Autobiography of a Winnebago Indian (Radin 1926). More recently, Fadwa El Guindi (1986), James Sexton (1981), and I (Bernard and Salinas 1989), among others, have helped indigenous people create narratives in their first languages.

Original texts provide us with rich data -- data that can be turned to again and again through the years as new insights and new methods of analysis become available. Robert Lowie's Crow texts and Margaret Mead's hours and hours of cinema verité about Bali dance are clear examples of the value of original text. Theories come and go but, like the Pentateuch, the Christian Gospels, the Q'uran and other holy writ, original texts remain for continued analysis and exegesis.

If we include all the still and moving images created in the natural course of events (all the television sitcoms, for example), and all the sound recordings (all the jazz and rock and country songs, for example), as well as all the books and magazines and newspapers, then most of the recoverable information about human thought and human behavior is naturally-occurring text. In fact, only the tiniest fraction of the data on human thought and behavior was ever collected for the purpose of studying those phenomena. I suppose that if we piled up all the ethnographies and questionnaires in the world we'd have a pretty big hill of data. But it would be dwarfed by the mountain of naturally-occurring texts that are available right now, many of them in machine-readable form.(2)

Qual/Quant and Texts

One of the things I like best about texts is that they are as valuable to positivists as they are to interpretivists. Positivists can tag text and can study regularities across the tags. This is pretty much what content analysis (including cross-cultural hypothesis testing) is about. Interpretivists can study meaning and (among other things) look for the narrative flourishes that authors use in the (sometimes successful, sometimes unsuccessful) attempt to make texts convincing.

Scholars of social change have lots of longitudinal quantitative data available (the Gallup poll for the last 50 years, the Bureau of Labor Statistics surveys for the last couple of decades, baseball statistics for over a hundred years, to name a few well-studied data sets), but longitudinal text data are produced naturally all the time. For a window on American popular culture, take a look at the themes dealt with in country music and in Superman comics over the years.

Or look at sitcoms and product ads from the 1950s and from the 1990s. Notice the differences in, say, the way women are portrayed or in the things people think are funny in different eras. In the 1950s, Lucille Ball created a furor when she became pregnant and dared to continue making episodes of the I Love Lucy show. Now think about almost any episode of Seinfeld. Or scan some of the recent episodes of popular soap operas and compare them to episodes from 30 years ago. Today's sitcoms and soaps contain much more sexual innuendo.

How much more? If you were interested in measuring that, you could code a representative sample of exemplars (sitcoms, soaps) from the 1950s and another representative sample from the 1990s, and compare the codes (content analysis again). Interpretivists, on the other hand, might be more interested in understanding the meaning across time of concepts like "flirtation," "deceit," "betrayal," "sensuality," and "love," or the narrative mechanisms by which any of these concepts is displayed or responded to by various characters.

Suppose you ask a hundred women to describe their last pregnancy and birth, or a hundred labor migrants to describe their last (or most dangerous, or most memorable) illegal crossing of the border, or a hundred hunters (in New Jersey or in the Brazilian Amazon region) to describe their last (or greatest, or most difficult, or most thrilling) kill. In the same way that a hundred episodes of soap operas will contain patterns about culture that are of interest, so will a hundred texts about pregnancies and hunts and border crossings.

The Coding Problem

The difficulty, of course, is in the coding of texts and in finding the patterns. Coding turns qualitative data (texts) into quantitative data (codes), and those codes can be just as arbitrary as the codes we make up in the construction of questionnaires.

When I was in high school, a physics teacher put a bottle of Coca-Cola on his desk and challenged our class to come up with interesting was to describe that bottle. Each day for weeks that bottle sat on his desk as new physics lessons were reeled off, and each day new suggestions for describing that bottle were dropped on the desk on the way out of class.

I don't remember how many descriptors we came up with, but there were dozens. Some were pretty lame (pour the contents into a beaker and see if the boiling point was higher or lower than that of sea water) and some were pretty imaginative (let's just say that they involved anatomically painful maneuvers), but the point was to show us that there was no end to the number of things we could measure (describe) about that Coke bottle, and the point sunk in. I remember it every time I try to code a text.

The QDA Problem

Coding is one of the steps in what is often called "qualitative data analysis," or QDA. Deciding on themes or codes is an unmitigated, qualitative act of analysis in the conduct of a particular study, guided by intuition and experience about what is important and what is unimportant. Once data are coded, statistical treatment is a matter of data processing, followed by further acts of data analysis.

When it comes right down to it, qualitative data (text) and quantitative data (numbers) can be analyzed by quantitative and qualitative methods. In fact, in the phrases "qualitative data analysis" and "quantitative data analysis," it is impossible to tell if the adjectives "qualitative" and "quantitative" modify the simple noun "data" or the compound noun "data analysis." It turns out, of course, that both QDA phrases get used in both ways. Consider the following table:

Analysis   Data
  Qualitative Quantitative
Qualitative a b
Quantitative c d


Cell a is the qualitative analysis of qualitative data. Interpretive studies of texts are of this kind. At the other extreme, studies of the cell d variety involve, for example, the statistical analysis of questionnaire data, as well as more mathematical kinds of analysis.

Cell b is the qualitative analysis of quantitative data. It's the search for, and the presentation of, meaning in the results of quantitative data processing. It's what quantitative analysts do after they get through doing the work in cell d. Without the work in cell b, cell d studies are puerile.

Which leaves cell c, the quantitative analysis of qualitative data. This involves turning the data from words or images into numbers. Scholars in communications, for example, might tag a set of television ads from Mexico and the U.S. in order test whether consumers are portrayed as older in one country than in the other. Political scientists might code the rhetoric of a presidential debate to look for patterns and predictors. Archeologists might code a set of artifacts to produce emergent categories or styles, or to test whether some intrusive artifacts can be traced to a source. Cultural anthropologists might test hypotheses across cultures by coding data from the million-pages of ethnography in the Human Relations Area Files and then doing a statistical analysis on the set of codes.

Strictly speaking, then, there is no such thing as a quantitative analysis of qualitative data. The qualitative data (artifacts, speeches, ethnographies, TV ads) have to be turned first into a matrix, where the rows are units of analysis (artifacts, speeches, cultures, TV ads), the columns are variables, and the cells are values for each unit of analysis on each variable.

On the other hand, the idea of a qualitative analysis of qualitative data is not so clear-cut, either. It's tempting to think that qualitative analysis of text (analysis of text without any recourse to coding and counting) keeps you somehow "close to the data." I've heard a lot of this kind of talk, especially on e-mail lists about working with qualitative data.

Now, when you do a qualitative analysis of a text, you interpret it. You focus on and name themes and tell the story, as you see it, of how the themes got into the text in the first place (perhaps by telling your audience something about the speaker whose text you're analyzing). You talk about how the themes are related to one another. You may deconstruct the text, look for hidden subtexts, and in general try to let your audience know the deeper meaning or the multiple meanings of the text.

In any event, you have to talk about the text and this means you have to produce labels for themes and labels for articulations between themes. All this gets you away from the text, just as surely as numerical coding does. Quantitative analysis involves reducing people (as observed directly or through their texts) to numbers, while qualitative analysis involves reducing people to words -- and your words, at that.

I don't want to belabor this, and I certainly don't want to judge whether one reduction is better or worse than the other. It seems to me that scholars today have at their disposal a tremendous set of tools for collecting, parsing, deconstructing, analyzing, and understanding the meaning of data about human thought and human behavior. Different methods for doing these things leads us to different answers, insights, conclusions and, in the case of policy issues, actions. Those actions have consequences, irrespective of whether our input comes from the analysis of numbers or of words.


1. 1. This was written while I was at the University of Cologne (July 1994-July 1995). I thank the Alexander von Humboldt Foundation, the Institut für Völkerkunde at the University of Cologne, and the College of Arts and Sciences, University of Florida for support during this time.

2. 2. The Human Relations Area Files (HRAF) consists of about one million pages of text on about 550 societies around the world. All the data on a 60-culture sample from that database are now available on CD-ROM. HRAF plans to convert the entire million-page corpus of text to machine-readable form over the next few years. The Center for Electronic Texts in the Humanities at Rutgers University is bringing together hundreds of machine-readable corpera (the Bible, all of Shakespeare's work, all the ancient Greek and Latin plays and epics). Lexis has placed the entire corpus of Supreme Court opinions on line. The list goes on and on. Conversions of text corpera to on-line databases proceeds at a breathtaking pace.