Emic Measures of Item Similarity

There are two basic ways of measuring similarity of items in a domain. You might call the two approaches emic and etic, or direct and indirect. Pilesorts and triads are emic or direct methods of measuring similarity among items. We will talk about etic or indirect methods later on, but basically what they involve is measuring each item on a series of attributes, and then correlating each item's profile across these attributes.

One approach to measuring similarity we have not mentioned is the most direct of all: asking the respondent to rate each pair of items on some scale like 1 to 7 where 7 is as similar as you can get and 1 is totally dissimilar. The questionnaire can consist either of an item-by-item grid to be filled in by the respondent, or all possible pairs of items can be presented to the respondent one at a time. Either way, the results can be organized as an item-by-item matrix. The former is best with savvy respondents while the latter is best with less "questionnaire-ready" respondents.

A single pilesort can be seen as a special type of similarity rating. For each respondent, the result is an item-by-item matrix of 1s and 0s where a 1 means that a given pair of items was put in the same pile and 0 means it was not. This is really like rating pairs of items on a 0 to 1 scale in which a 1 means similar and a 0 means not-similar. When a person puts items A and B in a pile together they are making a judgement that those items are more similar than dissimilar. And when they put items A and C in different piles, they are saying they are more dissimilar than similar.

It is important to realize that the pilesort task is nothing more than a similarity rating: it is not the case that people have piles in their heads which they have been waiting for an anthropologist to come around to extract them. It is doubtful that within a domain people divide things up categorically. Rather it seems that people can construct a partitioning of objects on demand, though it's not necessarily the same each time they do it. So even in free pilesort, if a person makes 4 piles, you do not conclude that the burundi see four kinds of animals. In this sense, the pilesort task is not as emic as it looks. This is also why it is not a bad thing to force respondents to make a certain number of piles, like 5 piles.

Moreover, as a similarity task the simple pilesort introduces certain constraints into the data that are not imposed by a direct rating task. If a person puts item A and B as similar and so puts them into a pile together, and then you show them item C, which they see as similar to A but less similar to B, they will either have to put in the pile with A and B, in which case it appears that A, B and C are equally similar, or they will have to put it into a different pile, in which case the similarity with A is lost. Mathematically what's happening is that the pilesort task yields a proximity measure pij which is constrained to satisfy transitivity. That is, if pij = 1 and pjk = 1, then pik = 1. This transitivity does not necessarily occur in a direct task.

Advantages and disadvantages of the three methods:

Emic Measures of Item Similarity



Method \ Evaluation




Can be part of paper/pencil questionnaire

Imposes few constraints on the data

Can get strength of similarity of items at the individual level

Tedious for large domains

Numerical ratings are cognitively difficult

Unless read aloud, requires literacy


Fun to do

Works well with large domains

Can't compare across respondents

Imposes transitivity on the data

Results in 1/0 binary data at the individual level


Fun to do

Works well with large domains

Can compare across respondents

Imposes transitivity on the data

Results in 1/0 binary data at the individual level

Can be more stressful than free pilesort


Works well with large domains

Generates strengths of similarity at the individual level

Data are comparable across respondents

Very tricky to administer

Imposes transitivity on the data

Can be difficult for the respondent C requires some imagination




Can get (limited) strength of similarity at the individual level

Can compare data across respondents

Evokes discriminating attributes very well

Can be very tedious for respondents

Limited to small domains

Methods of handling larger domains result in either loss of accuracy or inability to use individual-level data