Text Analysis


There are, obviously, many different kinds of text analysis, with very different goals. Literary criticism is text analysis. A professor grading student papers is doing text analysis. A biblical scholar examining word counts to determine whether a given passage is likely to have been authored by the same person who wrote another passage is doing text analysis. The scoring of a Thematic Apperception Test (TAT) is text analysis. The reading and rereading of field notes is text analysis. The use of open-ended questions in a survey is text analysis. And the reading of this handout is text analysis.

Furthermore, all of observed behavior can be seen as a kind of text, in which there are sequences of moves that seem to behave according to underlying scripts or grammars. Thus, the interpretation of behavior can usefully be seen as text analysis.

The different ways of working with text vary along several dimensions. For example, one dimension is formality. In general the more formal, the more replicable, and the less powerful and the less sweeping the interpretation.

To my mind, the most important difference among textual analysis methods is whether they are corpus-based or case-based. Corpus-based methods take as input a mass of textual material that is analyzed as a whole. For example, one's field notes from a participant observation study, or the transcript from a focus group. Within this frame, there is a smaller unit of analysis which is the idea. A given sentence of text might have anything from zero to many ideas. Grounded Theory analysis is usually corpus-based.

Case-based methods view the data as a set of comparable cases that replicate each other. For example, you ask one hundred people a question like "What is a good leader?". Each case corresponds to a respondent, and all are reacting to the same stimulus. Content analysis is usually case-based.

Corpus-based analyses are typically more interpretive, less formal, less replicable, and more subject to biases.

Both case-based and corpus-based analyses make heavy use of coding. Coding is a way of transforming the raw natural formless low-level data into a restricted set of interrelated symbols that we like to think with (see Goodwin's paper on Professional Vision on this point).

coding (Copy 1).jpg (17168 bytes)

See handout on coding.