Contents - Index


PURPOSE Computes a single link hierarchical clustering and a measure of regular equivalence for binary or nominal data using categorical REGE.

DESCRIPTION Two actors are regularly equivalent if they are equally related to equivalent others.  Nominal data is any integer valued adjacency matrix in which the value represents a coding of the relationship in terms of a category.  

For example, we could use 1 to represent close friend, 2 to represent friend and 3 to represent works with.  The values 1, 2 and 3 DO NOT measure the strength of the relationship, they simply refer to the categories.

Two actors are regularly equivalent for nominal data if in addition to the normal regularity condition they relate to equivalent others in the same category.

The categorical REGE algorithm searches for matches in successive neighborhoods.  For binary data in the first iteration, vertices are classified as sinks, sources or repeaters.  At the next iteration the neighborhoods of all the vertices are considered, two vertices would be classified differently if one neighborhood contained a representative of one of these categories while the other did not.  The next iteration classifies the vertices in terms of the neighborhood's classification in the previous iteration.  The process continues until stable (a maximum of n different categories are possible in a graph of n vertices).

For nominal data the initial categories are included at the first iteration.  The process is easily extended to multiple relations.

From this procedure a similarity matrix can be formed with entries which give the value of the iteration at which vertices were separated into different categories.

Initially the procedure places all vertices in the same category;  or into user specified categories.  Subsequent iterations split the groups into hierarchical clusters.

Input dataset
Name of file containing dataset to be analyzed. Data type: Valued graph - integer values.  Multirelational.

Dataset with starting partition (if any):
A null return will initially place all vertices in a single cluster. For user specified partition enter the name of a data file which contains a partition indicator matrix.  A partition indicator matrix has each row as a separate partition.  Each row is of the form (k1,k2,...,ki...) where ki assigns vertex i to partition ki so that (1 1 2 1 2) assigns vertices 1, 2 and 4 to partition 1 and 3 and 5 to partition 2. 

Convert data to geodesic distances: (Default = 'YES')
Yes performs the analysis on the geodesic distance matrix.
No uses the raw adjacencies.
Note for undirected data the partitioning would be trivial and in this case the YES option should be selected.

Diagram Type: (Default = 'Dendrogram')
The clustering diagram can either be a Tree Diagram or a Dendrogram.

(Output) Equivalence matrix: (Default = 'CATREGEQUIV').
Name of file which contains actor by actor regular similarity matrix described in LOG FILE.

Output partition matrix: (Default = 'CATREGPART').
Name of file which contains a partition indicator matrix corresponding to the single link hierarchical clustering displayed in the LOG FILE.  A value of k in a row labeled i and column j means that vertex j is in partition k at level i.  Vertex k is always a member of partition k and is a representative label for the group. This matrix is not displayed in the LOG FILE. 

LOG FILE Single link hierarchical clustering dendrogram (or tree diagram) of the regular similarity measure. The level at which any pair of actors are aggregated is the point at which both can be reached by tracing from the start to the actors from right to left. Each level corresponds to an iteration, level 1 represents the initial clustering specified in PARAMETERS.  The top level gives strict regular equivalence clusters.  The higher the level the greater the degree of regular equivalence The diagram can be printed or saved. Parts of the diagram can be viewed by moving the mouse to the split point in a tree diagram or the beginning of a line in the dendrogram and clicking. The first click will highlight a portion of the diagram and the second click will display just the highlighted portion. To return to the original right click on the mouse. There is also a simple zoom facility simply change the values and then press enter. If the labels need to be edited (particularly the scale labels) then you should take the partition indicator matrix into the spreadsheet editor remove or reduce the labels and then submit the edited data to Tools>Dendrogram>Draw. 

Behind the dendrogram is an alternative cluster diagram. The columns have been rearranged and labeled.  A '' in row labeled i column label j indicates that vertex j is in a singleton cluster at level i.  An 'X' indicates that vertex j is in a non-trivial cluster at level i, all other members of j's cluster are found by tracing along the row labeled i in both directions from column j until a space is encountered in each direction.  The column labels corresponding to an 'X' which are connected to j's X are all members of j's cluster at level i.

An actor by actor exact similarity matrix.  A k in row i column j means that actor i and j were separated at level k, provided k is less than the value on the diagonal.  If k is equal to the value on the diagonal then i and j are regularly equivalent.



REFERENCES Borgatti S P and Everett M G (1989).  The class of all regular equivalences: algebraic structure and computation.  Social Networks 11, 65-88.

Borgatti S P and Everett M G (1993). Two algorithms for computing regular equivalence, Social Networks 15, 361- 376.