Homework Assignment
Subgroup Analysis


Your Mission

To detect the subgroup structure of the Sampson Monastery dataset.

A Few Notes

The sampson dataset contains many social relations. Here's a description of the dataset drawn directly from the UCINET Help facility:


SAMPSON MONASTERY


DATASET SAMPSON

DESCRIPTION Ten 18x18 matrices

SAMPLK1 non-symmetric, valued (rankings)
SAMPLK2 non-symmetric, valued (rankings)
SAMPLK3 non-symmetric, valued (rankings)
SAMPDLK non-symmetric, valued (rankings)
SAMPES non-symmetric, valued (rankings)
SAMPDES non-symmetric, valued (rankings)
SAMPIN non-symmetric, valued (rankings)
SAMPNIN non-symmetric, valued (rankings)
SAMPPR non-symmetric, valued (rankings)

SAMPNPR non-symmetric, valued (rankings)

BACKGROUND Sampson recorded the social interactions among a group of monks while resident as an experimenter on vision, and collected numerous sociometric rankings. During his stay, a political "crisis in the cloister" resulted in the expulsion of four monks (Nos. 2, 3, 17, and 18) and the voluntary departure of several others - most immediately, Nos. 1, 7, 14, 15, and 16. (In the end, only 5, 6, 9, and 11 remained).

Most of the present data are retrospective, collected after the breakup occurred. They concern a period during which a new cohort entered the monastery near the end of the study but before the major conflict began. The exceptions are "liking" data gathered at three times: SAMPLK1 to SAMPLK3 - that reflect changes in group sentiment over time (SAMPLK3 was collected in the same wave as the data described below). Information about the senior monks was not included.

Four relations are coded, with separate matrices for positive and negative ties on the relation. Each member ranked only his top three choices on that tie. The relations are esteem (SAMPES) and disesteem (SAMPDES), liking (SAMPLK) and disliking (SAMPDLK), positive influence (SAMPIN) and negative influence (SAMPNIN), praise (SAMPPR) and blame (SAMPNPR). In all rankings 3 indicates the highest or first choice and 1 the last choice. (Some subjects offered tied ranks for their top four choices).

REFERENCES Breiger R., Boorman S. and Arabie P. (1975). An algorithm for clustering relational data with applications to social network analysis and comparison with multidimensional scaling. Journal of Mathematical Psychology, 12, 328-383.

Sampson, S. (1969). Crisis in a cloister. Unpublished doctoral dissertation, Cornell University.


How you handle the multiple relations is up to you. There are many interesting things to do, such as sum together all the positive ones, sum together all the negative ones, reverse the negative ones, then average those with the positives to create a single supermatrix. Or you can analyze each relation separately.

What to do next? With valued data, it is always a good idea to run nonmetric MDS and cluster analysis. Then you dichotomize the data at different cutoff levels and try some of the graph-theoretic routines such as cliques and k-plexes.

After you've done a number of runs, you need to do a meta-analysis to put together a coherent picture of what the structure of the network is. How many subgroups? Who is in what subgroup?

AFTER you have done that, you can look at Breiger et al's paper (in the readings for positions/roles) to see what Breiger found and what Sampson, the ethnographer, found.