Introduction to
Thematic Coding


The basic purpose of thematic coding (or "tagging") is data retrieval. It is used to classify text according to theme, so that later on, when doing analysis, it easy to retrieve all passages that relate to a given topic. The essence of thematic coding is classification. Consider, for example, the following passage:


Mixing millennialism, space beliefs and cult charisma can result in a deadly brew, as illustrated by this year’s Heaven’s Gate suicides. Forty people took their lives in the belief that shedding their earthly containers would enable them to board an alien rescue craft headed for the “Next Level” of existence.
  • cult
  • suicide
  • religion
  • ufo
  • spiritual beliefs
  • transformation

The first two codes, <cult> and <suicide> are also words contained in the text. The remaining codes are interpretations of what the paragraph is about. This illustrations both the power and the danger of codes. If we were to rely solely on the presence of key words in text to identify relevant passages, we would often miss important material because an informant can spend quite a bit of time talking about, say, marriage, without ever mentioning the word. By coding texts, we make sure that we can retrieve all the relevant texts.

The danger, of course, is that our interpretations may be wrong, idiosyncratic, or otherwise not useful. For example, I coded the passage as <transformation> because I saw the idea of committing suicide so that as to enter another level of existence as an instance of transformation. However, the author of the passage might not have seen it that way.

This issue brings up to important questions about thematic coding. First, how will we be using the information? In most cases, thematic codes are used for data retrieval. For example, an anthropologist goes out in the field for 9 months and returns with thousands of pages of field notes. In order to pull together a coherent view of some aspect of life in the village studied, she would like to find all passages in her notes which relate to certain themes. So she codes all the text, then uses a computer to pull together all passages related to any given theme. In this case, the problem of reading too much into a passage when coding, or miscoding entirely, is not very serious. Because of coding errors, she may miss certain passages that should have been relevant, and she may include some passages that are really not relevant. However, the passages are really just stimuli for thinking. They are not measurements or data. Any errors she makes because certain passages were omitted or misinterpreted are errors she could have made with our without the coding. Any problems with coding pale before the enormous problems of interpreting culture and social behavior!

Sometimes, however, thematic codes are

The second key question is whether our objective is to code emically or etically. To code etically means that we judge what a paragraph relates to according to our own criteria. To code emically means that we judge the topic of a passage according to what the informant (the author) himself believes the topic is. Consider the following passage:

I feel like my work is taking over my life, you know? Even when I don't stay late, I come home tired and don't feel like doing anything, or I can't stop thinking about what's happening [in the office].
  • work
  • dissatisfaction
  • managing boundaries

The code <managing boundaries> is, more than likely, an etic code. The informant probably does not think in such abstract terms. Rather, the concept of managing boundaries is an element from the researcher's culture. This does not make the attribution less valid, as long as we understand that it is not intended as a representation of what the informant was trying to say.

In contrast, the code <dissatisfaction> is plausibly an emic code. It seems likely that the informant himself would agree that in this passage, he was intending to express dissatisfaction. Note that even if a code is emic, it is always inferred. We can be wrong when we assign emic codes. What distinguishes an emic code from an etic one is the intention to capture the informants' statements from the informants' point of view -- using categories that the informant himself would use.