
PROFIT (PROperty FITting) is a method of testing hypotheses
about the attributes that influence people's judgement of the
similarities among a set of items. As discussed above, there are
two basic approaches to analyzing proximities: searching for
clusters and searching for dimensions. PROFIT is a way of testing
hypotheses about underlying dimensions.
Suppose we have an MDS map, based on perceived similarities
among dog breeds, such as the one shown on the next page (the
data are made up). You might hypothesize that the pattern of
similarities we observe is partly a function of breed size. As we
move from top left to bottom right, the breeds seem to be getting
larger. However, the pattern is not perfect and may in fact be
more a function of my selective attention than truly present.
What we need is an objective assessment of the degree to which
breeds in fact get larger as we move down and right. Also, the
exact direction along which breeds get larger is open to
question. Perhaps it is more leftright than updown.
The way to do this is to estimate the parameters of a model
that relates breed size to position on the map, or, equivalently,
relates position on the map to breed size. Putting it that way,
it is apparent that what we need to do is regress breed size on
map location. Map location is given by the coordinates of each
breed on the map. If the map is 2dimensional, then there are two
coordinates for each breed. Hence there are two independent
variables in the regression.
The dependent variable is breed size. You can get these data
by looking it up in a dog book. Similarly, if you are scaling
cars and you think that price is a factor in assessing
similarities, then you can look up the price of each car in a
reference book. However, in general this is not a good idea,
because the purpose of running PROFIT is to understand the
criteria that respondents used to assess similarities. To use
book figures is to assume that respondents are aware of those
same figures, which is unlikely. The best thing to do is to
collect new data from a sample of people drawn from the same
population as the respondents who generated the proximities. Have
the new sample rate each item on the attribute you have
hypothesized. In our case, we would ask respondents to indicate
the typical size of each dog breed, either via a rating system
(such as 7point scale), or direct estimation of the number of
pounds or height at the withers, or both. The data are then
averaged across respondents to produce a single value for each
breed.
Both the coordinate data and the attribute data (in separate
files) are input to the PROFIT program. PROFIT then performs a
multiple regression using the coordinates as independent
variables and the attribute as the dependent variable. If you
have more than one attribute, such as size, ferocity, retrieving
ability, length of hair, etc., the program performs a separate
regression for each one. For each attribute, there are two key
outputs: an rsquare statistic and the direction cosines.
The rsquare tells you whether location on the map was related
to values of the attribute (i.e., does size really increase as
you go from left to right?). The higher the rsquare, the closer
the relationship. For domains with less than 20 items, the rule
of thumb is that you need an rsquare of at least .80 to support
a conclusion that the hypothesized attribute was driving the
perceived similarities among items. (And of course, you can never
prove it, even with an rsquare of 1.0. However, a low rsquare
does disprove the hypothesis.)
The direction cosines are rescalings of the regression
coefficients. They give the relative contribution of each axis of
the map to the prediction of the attribute. In other words, they
tell you what precise direction the attribute increases along.
For example, for the dog data, both cosines are positive, which
means that breed size increases you move both east and north on
the map. However, the cosine for the horizontal (X) axis is
larger than the cosine for the vertical (Y) axis. This means that
larger breeds are more east than they are north.
We use the direction cosines to draw arrows representing the
attributes on the map. The values of the direction cosines give
the coordinates of the head of the arrow. The middle of the arrow
is always located at the dead center of the map (coordinates
0,0). To draw the arrow, draw a line from the spot indicated by
the by the direction cosines (the head), through the center of
the map, and out the other side. If the attribute data were coded
in such a way that bigger numbers meant more of the attribute,
then we draw an arrowhead at the spot indicated by the direction
cosines, as shown below. Otherwise, we draw an arrowhead at the
other end of the line. The arrowhead always points in the
direction of increasing attribute values.
To interpret the line, do NOT think of it as a boundary
separating dogs above the line from those below the line: this is
totally wrong. Instead, draw perpendicular lines from each dog to
the PROFIT arrow (see map on next page). This is called the
projection of location onto breed size. The length of the line
from the dog to the arrow is utterly irrelevant. It means
absolutely nothing. What matters is where the line from the dog
meets the arrow. If the line is closer to the arrowhead then
another line is, then dog associated with the first line is
(predicted to be) larger than the dog associated with the second
line. For example, in the picture, the doberman
("dobie") is predicted to be larger than the pitbull
("pitt"). The square of the correlation between these
projections and the actual breed size is equal to the rsquare
discussed above.
[geneva97/eop.htm]