Gery Ryan and Thomas Weisner
Dept. of Psychiatry and Biobehavioral Sciences
UCLA, Los Angeles, CA 90095-1759
How much can we learn from a simple word analysis of
qualitative data? Judging from the literature on content analysis
(Krippendorff 1980, Weber 1990) and recent articles in CAM
by Jehn & Doucet (1996) and Schnegg & Bernard (1996), the
answer is "a lot." Here we extend what can be done with
words by examining parents' descriptions of their adolescents. We
ask two questions. First, what do the words parents use in their
descriptions tell us about the goals they have for their
children? Second, what do differences and similarities in word
use tell us about the differences and similarities in informants'
perceptions of their children?
We rely on standard word processing programs and other readily
available software. No special formatting or coding of the data
is required. The methods we describe are useful for discovering
patterns in any body of text, whether fieldnotes or responses to
open-ended questions, and are particularly helpful when used
along with ethnographic data and with other sources of
information. Word analysis can tell us about salience,
patterning, and context of words, and the relationships between
words, but word analysis cannot produce a holistic interpretation
of cultural data.
Systematic Descriptions of Children
We know that parental perceptions of adolescents and children
vary across cultures. When Super and Harkness (1986) asked
Kipsigi parents in western Kenya to describe boys, the
descriptions included the terms "warrior" and
"fierce." When Raghavan (1993) asked South Asian
parents living in the United States about their daughters, the
descriptions included "hospitable" and
"responsible." Such phrases or words would strike most
American parents as unusual or odd. Instead, American parents are
more likely to use such terms as "athletic,"
"independent," "argumentative," and
"well-rounded" -- terms which would seem odd to most
Kipsigis or South Asians.
Seeking to understand parental perceptions and attitudes
toward their adolescent children, we asked parents of adolescents
in the United States to write a short description of their sons
or daughters. This was a relatively easy, comfortable task for
most of the people whom we interviewed. Our informants are all
participants in the Family Lifestyles Project (FLS) -- a 20-year
longitudinal study of nonconventional and countercultural
families and their children. [See Eiduson & Weisner (1978),
Weisner (1986), Weisner & Garnier (1992), and Weisner et al.
(1994) for reviews and key findings from the project].
In 1974 and 1975, investigators contacted 200 mothers during
their third trimester of pregnancy. Mothers were involved in
conventional and nonconventional living arrangements.
Nonconventional arrangements included single mothers, social
contract couples (not legally married), and mothers in communes
or group living situations. Members of the research team have
followed the mothers, their mates, and their child ever since.
Attrition has been remarkably low. In 1992-1994, the FLS
researchers conducted a follow-up study of the adolescent
children and reached 100% of the mothers, 98% of the teenagers,
and 48% of the fathers or other mates. The central question of
the adolescent follow-up was: How did these "children of the
children of the 60's" turn out? Investigators asked parents
about their child's performance in school, personal
relationships, political attitudes, gender identity, drug use,
and other characteristics.
As part of a larger questionnaire, parents were asked,
"What is your teenager like now? Does she or he have any
special qualities or abilities?" Parents wrote their answers
in short phrases. It is on these data that we focus now. How did
parents describe their children? Did mothers and fathers describe
their children differently?
In 82 of the 200 families interviewed, both a male and female
parent independently described their child. Nearly all the
descriptions came from the biological parents. In three cases,
the male parent was a step-father, and in one case the female
parent was a step-mother. Since we are interested in how parents
raise their children, we treat biological and step-parents as
equal in our analysis. We thus have two descriptions for each of
82 children, one from the mother and the other from the father,
for a total of 164 descriptions.
Each of the 82 children are different (some are more artistic,
social, academic, or temperamental than others) but we can make
comparisons across children because: a) we were systematic in how
we asked parents to describe their experiences (we always asked
the exact same question each time); b) each pair of parents
described the same child; and c) we have the same number of
descriptions (82) in each file. Of course, the child is not
"the same" to each parent. Children respond and
identify differently to each parent and parents to each child. So
when parents are asked to describe their child, they are not
reacting to exactly the same stimuli, but rather to a comparable
family situation that has different meanings to each family
Handling the data
We transcribed the parents' verbatim answers into a word
processor (in this case, WordPerfect 6.0). For each
answer, we typed in the family identification number, the type of
family, the sex of the child being described, the sex of the
parent who gave the description, and the complete description.
Each description was followed by a single hard return. Figure 1
shows the first three descriptions in our master file
To facilitate analysis, we separated each unique
phrase/descriptor by a period and a space. The period/space
combination has two advantages. First, a period indicates the end
of a sentence, and we can then use the word processor or style
checker to count the number of sentences in a document (Harris
1996). Second, we can use the period as a delimiter for importing
the text data into a spreadsheet or a database (like Excel
or Quattro Pro).
ID009. F1030. Boy. Fthr. Loving. Obedient. Maintains own identity. Likes being home. Independent. Anxious to go to California to school.
|ID016. F1130. Boy. Fthr. Smart. Energetic. Arrogant. Dependent. Slick. Passive. Lack of imagination. Attraction to inner-city lifestyle.|
|ID124. F1130. Girl. Mthr. Great kid. Willing to communicate with parents. Listens. Motivated in school. Helpful around the house. Healthy. Active. Lots of friends. She tends to play it safe.|
Figure 1. Examples of master file of parents'
descriptions of their children
Once we had our master file of descriptions, we sorted the
descriptions by parent's sex. Since we consistently made parent's
sex the fourth word of the paragraph, we can do this with our
word processor. Select all your text, and tell the word processor
to use the fourth word to sort the highlighted paragraphs.(1) (Before sorting, backup your
We then copied mothers' and fathers' responses to separate
files (MOTHER.WP & FATHER.WP). At this point we were only
interested in the descriptors, so we stripped out the extraneous
information in each file. This is easily semi-automated with a
macro that goes to the beginning of each paragraph and deletes
the first four words (ID, family type, child's and parent's
sex,). Our two stripped files contained only the verbatim
descriptions provided by mothers and fathers.
Simple tricks you can do with a word processor
We used WordPerfect's document information function
to calculate some general statistics.(2)
Document information is located under File on the top menu. Among
other things, it calculates the number of characters, words and
sentences, plus the average word length, the average number of
words per sentence, and the maximum words per sentence. Table 1
compares these statistics for mothers' and fathers' responses.
|Average Word Length||5.76||5.66||5.72|
|Average Words per Sentence||3.20||3.27||3.24|
|Maximum Words per Sentence||14||17||17|
Table 1. Text statistics generated from WordPerfect
These simple statistics tell us that:
1) Mothers use more words to describe their children than do
fathers. Of all the words used to describe the 82 children, 56%
come from mothers and 44% come from fathers.
2) On average, mothers used 28% more sentences than did men.
[Mothers used 528/82=6.4 phrases to describe their children,
while men used 411/82=5.0 phrases. Mothers and fathers use the
same number of words per phrases, but mothers said more things
about their children.]
3) Mothers and fathers use roughly the same size words, about
5.7 characters each.
Fathers and mothers are more similar in this sample than they
are different. Mothers use more words, but not very much more,
and on other measures, fathers and mothers are about equal.
Clearly, parents used the same "standard social science
questionnaire schema" to answer our questions -- writing a
series of terse phrases and words for a minute or so.
Learning from unique word lists
We next examine whether mothers and fathers use different words
to describe their children. WORDS 2.0 (Johnson 1995) is
a useful program that counts the number of running words in a
text, identifies the number of unique words forms, and lists the
number of occurrences of each unique form.(3)
(See Bernard 1995 for a review of WORDS 2.0.) Other
programs, such as CATPAC, also count the frequency of
unique words. (See Doerfel and Barnett 1996 for a review of CATPAC).
To get the files ready for WORDS 2.0, we first saved
our WordPerfect files (MOTHER.WP and FATHER.WP) in ASCII
format (calling them MOTHER.ASC and FATHER.ASC so as not to
overwrite the original files). When we analyzed each file, we
used WORDS 2.0's "common word list" to exclude
125 of the most-used English terms. Figure 2 shows a portion of
the two outputs. Each output tells us how many words each file
contained originally,(4) how many
unique words were found (including unique common words), and how
many words were removed when we eliminated the common ones. WORDS
2.0 outputs the list of unique words with their respective
frequency of occurrence. We indicate the rank order of each word
under the # sign. (You can do this in your word processor by
turning on the line numbering option.)(5)
Figure 2 shows that the MOTHER file contained a total of 1,721
words in 734 unique word forms. It contained 542 instances of the
125 common words that were eliminated from further consideration.
In the end, there were 666 unique words in the file and mothers
mentioned the words good, friends, loving, out, and people
at least 11 times. The last word on the mothers' list, zest,
was mentioned only once.
We can think of unique word lists as concentrated
data or, as Tesch (1990:138-139) called them, distillations. We
can produce different measures of concentration and we can
compare those measures across the MOTHER and FATHER files. With
734 unique words in a corpus of 1721 words, mothers have a
type-token rate, or concentration rate, of 57% (1-734/1721).
Fathers have a concentration rate of 55% (1-607/1355). If we use
only the 666 unique substantive words (eliminating all
occurrences of words in the common-word file), then the
concentration rate for mothers is 1-666/1721=61% and for fathers
1-548/1355=60%. Just 207 of the 666 substantive words occur more
than once in the MOTHER file. This produces a concentration rate
of 1-207/1721=88% identical to the rate (1-159/1355=88%) for
We lose a lot of information when we examine unique words. We
do not know the context in which the words occurred, nor whether
informants used words negatively or positively. Nor do we know
how the words related to each other. But distillations like these
introduce very little investigator bias (we do have to
choose what words to leave out of the analysis), and they can
help us identify constructs used by parents to describe their
The word lists suggest things about parents' values and goals
for their children and the lists can be compared across fathers
and mothers. For example, from Table 1 we do not know if fathers
have less to say about their children or they just have less to
say about all topics. From Figure 2, however, we see that men's
vocabulary for describing children is as rich as women's
vocabulary. (The ratio of unique words to total words is roughly
equivalent for men (607/1355=.45) and for women (734/1721=.43)).
Figure 2 allows us to make crude comparisons between men's and
women's use of different words. (The measures are crude because
they represent rank order data and do not take into consideration
the total number of words used by each group.) Both mothers and
fathers use the word good a lot more than any other
word. Women, for example use good almost twice as much
as friends, their second most popular word. Antonyms of good
are not prevalent among the word list, indicating that people
might have a tendency to be optimistic in describing their
children, have a response bias on questionnaires to use positive
words, and are accessing a cultural model for describing one's
child that emphasizes positive, growing cultural careers.
Figure 2 also suggests that men and women focus on different
characteristics of their children. A comparison of the
most-frequently-used words shows that friends, loving,
people, and responsible are ranked higher for
women than they are for men. In contrast school, hard,
intelligent, bright, and independent
are ranked higher for men than for women. This suggests that
mothers, on first mention, express concern over interpersonal
issues, while men appear to give priority to achievement-oriented
and individualistic issues.
The rank ordering of word frequencies, however, are somewhat
deceptive since they do not take into consideration the total
number of words mentioned by men and women. We can, however,
standardize the word frequencies according to what we expect to
find if men and women used the same number of words. Table 2
shows the results of such a process.
Table 2. Word frequencies sorted by standardized frequency difference in gender
Figure 2. Counts of words used more than 5 times by mothers and fathers
|Mothers' Descriptions||Fathers' Descriptions|
Total number of running words in file: 1,721
Number of unique word forms in file: 734
The following counts exclude 542 occurrences of 125 common word forms.
Total number of running words in file: 1,355
Number of unique word forms in file: 607
The following counts exclude 419 occurrences of 125 common word forms.
To create this table, we put mothers' and fathers' responses
in a single ASCII file and counted the words again. We then
selected the 131 words that informants mentioned at least four
times. We put these words in the first column of a spreadsheet
and put their frequency counts in the second column. In the third
and fourth columns, we put the number of times each word was
mentioned by women and by men respectively. Next we calculated
the expected word frequencies for men if men used the
same number of words as women. Since women on average used 1.27
(1721/1355) times more words as did men, we multiply the observed
mens' frequency by 1.27. We put the result in the fifth column.
To compare mothers' and fathers' word use, we subtracted the
expected mens' frequencies in column five from the observed
women's frequencies in column three. We put the results in column
six. This gave us a more accurate difference in word use between
the files. Negative numbers in column six mean that the word was
more likely to be used by fathers than used by mothers. Positive
numbers mean that the word was more likely to be used by mothers.
Numbers close to zero mean that there wasn't that much difference
between the men's and women's descriptions.
Finally, we sorted the rows in the spreadsheet by the values
in column six. In Table 2, we show some selected results of this
comparison: words whose frequencies varied a lot between men and
women as well as some examples of words whose frequencies varied
When we standardize for the number of words used, we find that
fathers use the words school, good, lack, student, enjoys,
independent, extremely, like, ability, own, wants, high, and
interested more than do mothers. On the other hand,
mothers use the words friends, creative, time, honest, uses,
talented, respect, lots, kid, goes, difficulty, average, and
adult more than do fathers. Men and women, however, are
equally likely to use the words great, mature, humor, times,
attitude, and caring.
Notice the differences between the standardized measures and
the rank orders shown in Figure 2. The rank-order data tell us
about the relative priority of words within each gender while the
standardized data allow us to compare use rates across genders.
For instance, the word good was the most-used word for
both women and men. When we compare across genders, we find that
men tend to use the word more often. In contrast, men and women
are equally likely to use the word caring in their
descriptions but the simple rank-order for the word is higher for
women than it is for men.
Our findings are similar to other research on gender
differences. On many measures, men and women, boys and girls show
substantial overlap in behavioral tendencies. Although mean or
modal differences often are relatively small, specific measures
(in our case, emphasis on different concerns in describing teens)
are quite constant and are found cross culturally (Best et al.
The word counting techniques described here do not require
complex and expensive text analysis programs. These simple
methods help researchers concentrate often confusing data into a
more manageable form, and are relatively bias free. The
techniques can be used for exploring central themes and for
systematically comparing within and across groups.
Of course, these are just the first univariate, exploratory
steps in a more detailed qualitative analysis. We still want to
examine the context in which these words occur and how key words
are related to each other. For example, how does the sex of the
teen as well as the parent influence word use? We also want to
explore some of the hypotheses that we have formed in this simple
first step. Treating words as units of analysis offers
researchers a simple way of exploring text and confirming
1. To do this in WordPerfect for Windows 6.1: Select Tools/Sort from the menu. When the menu for sorting appears, tell WordPerfect to sort by paragraph. (Note: WordPerfect assumes that paragraphs are separated by two hard returns). Make sure that the appropriate settings are marked as follows: Type = Alpha, Sort Order = Ascending, Line = 1, Field = 1, and Word = 4. After making the changes, select OK. WordPerfect will put all the fathers' responses on top of the file and all the mothers' responses on the bottom. To do this in Word 6.0: Select Table/Sort Text. Select Options. In the "Separates fields at:" dialogue box, select "Other" and fill in the box with a single period. Select OK. In the "Sort by" dialogue box, select "Field 4." Select OK.
2. Similar statistics can be obtained in Microsoft's Word. Word, however, does not automatically count the number of sentences in a document. To do so, you need to build a macro, as follows:
Count = 0
While SentRight(1, 1) <> 0
If Right$(Selection$(), 1) <>Chr$(13) Then count = count +1
MsgBox "Number of sentences in document:" + Str$(count)
3. WORDS 2.0 was created by Eric Johnson and is distributed by TEXT Technology, 114 Beadle hall, Dakota State University, Madison, SD, 57042-1799. Email: firstname.lastname@example.org. For information on other programs that Johnson has created, check out the website http://www.dsu.edu/~johnsone/ericpgms.html.
4. 4. The total number of words identified by WordPerfect 6.0 for the MOTHERS.WP file (1,692) differs from the total number of words identified by WORDS 2.0 (1,721). For the same file, Word 6.0 counts 1,731 words. Discrepancies occur because each program has a slightly different definition of what counts as a word. In our case, single hyphens (-) are the leading culprits. WordPerfect 6.0 counts the hyphen as a word while WORDS 2.0 and Word 6.0 do not. Since most of our calculations use the total number of words as a fixed denominator and this denominator tends to be quite large, slight increases or decreases have little effect on the overall analysis. Be aware, however, that these differences do exist -- and are rarely documented.
5. 5. In WordPerfect 6.0 this is found under Format/Line. In Word 6.0 it is located under File/Page Setup/Layout.