Connections 17(2):78-80

# How to Explain Hierarchical Clustering

Stephen P. Borgatti
University of South Carolina

Given a set of N items to be clustered, and an NxN distance (or similarity) matrix, the basic process of Johnson's (1967) hierarchical clustering is this:

1. Start by assigning each item to its own cluster, so that if you have N items, you now have N clusters, each containing just one item. Let the distances (similarities) between the clusters equal the distances (similarities) between the items they contain.
2. Find the closest (most similar) pair of clusters and merge them into a single cluster, so that now you have one less cluster.
3. Compute distances (similarities) between the new cluster and each of the old clusters.
4. Repeat steps 2 and 3 until all items are clustered into a single cluster of size N.

Step 3 can be done in different ways, which is what distinguishes single-link from complete-link and average-link clustering. In single-link clustering (also called the connectedness or minimum method), we consider the distance between one cluster and another cluster to be equal to the shortest distance from any member of one cluster to any member of the other cluster. If the data consist of similarities, we consider the similarity between one cluster and another cluster to be equal to the greatest similarity from any member of one cluster to any member of the other cluster. In complete-link clustering (also called the diameter or maximum method), we consider the distance between one cluster and another cluster to be equal to the longest distance from any member of one cluster to any member of the other cluster. In average-link clustering, we consider the distance between one cluster and another cluster to be equal to the average distance from any member of one cluster to any member of the other cluster. A variation on average-link clustering is the UCLUS method of D'Andrade (1978) which uses the median distance.

Example. The following pages trace a hierarchical clustering of distances in miles between U.S. cities. The method of clustering is single-link.

Input distance matrix:

 BOS NY DC MIA CHI SEA SF LA DEN BOS 0 206 429 1504 963 2976 3095 2979 1949 NY 206 0 233 1308 802 2815 2934 2786 1771 DC 429 233 0 1075 671 2684 2799 2631 1616 MIA 1504 1308 1075 0 1329 3273 3053 2687 2037 CHI 963 802 671 1329 0 2013 2142 2054 996 SEA 2976 2815 2684 3273 2013 0 808 1131 1307 SF 3095 2934 2799 3053 2142 808 0 379 1235 LA 2979 2786 2631 2687 2054 1131 379 0 1059 DEN 1949 1771 1616 2037 996 1307 1235 1059 0

The nearest pair of cities is BOS and NY, at distance 206. These are merged into a single cluster called "BOS/NY".

Then we compute the distance from this new compound object to all other objects. In single link clustering the rule is that the distance from the compound object to another object is equal to the shortest distance from any member of the cluster to the outside object. So the distance from "BOS/NY" to DC is chosen to be 233, which is the distance from NY to DC. Similarly, the distance from "BOS/NY" to DEN is chosen to be 1771.

After merging BOS with NY:

 BOS/NY DC MIA CHI SEA SF LA DEN BOS/NY 0 223 1308 802 2815 2934 2786 1771 DC 223 0 1075 671 2684 2799 2631 1616 MIA 1308 1075 0 1329 3273 3053 2687 2037 CHI 802 671 1329 0 2013 2142 2054 996 SEA 2815 2684 3273 2013 0 808 1131 1307 SF 2934 2799 3053 2142 808 0 379 1235 LA 2786 2631 2687 2054 1131 379 0 1059 DEN 1771 1616 2037 996 1307 1235 1059 0

The nearest pair of objects is BOS/NY and DC, at distance 223. These are merged into a single cluster called "BOS/NY/DC". Then we compute the distance from this new cluster to all other clusters, to get a new distance matrix:

After merging DC with BOS-NY:

 BOS/NY/DC MIA CHI SEA SF LA DEN BOS/NY/DC 0 1075 671 2684 2799 2631 1616 MIA 1075 0 1329 3273 3053 2687 2037 CHI 671 1329 0 2013 2142 2054 996 SEA 2684 3273 2013 0 808 1131 1307 SF 2799 3053 2142 808 0 379 1235 LA 2631 2687 2054 1131 379 0 1059 DEN 1616 2037 996 1307 1235 1059 0

Now, the nearest pair of objects is SF and LA, at distance 379. These are merged into a single cluster called "SF/LA". Then we compute the distance from this new cluster to all other objects, to get a new distance matrix:

After merging SF with LA:

 BOS/ NY/DC MIA CHI SEA SF/LA DEN BOS/NY/DC 0 1075 671 2684 2631 1616 MIA 1075 0 1329 3273 2687 2037 CHI 671 1329 0 2013 2054 996 SEA 2684 3273 2013 0 808 1307 SF/LA 2631 2687 2054 808 0 1059 DEN 1616 2037 996 1307 1059 0

Now, the nearest pair of objects is CHI and BOS/NY/DC, at distance 671. These are merged into a single cluster called "BOS/NY/DC/CHI". Then we compute the distance from this new cluster to all other clusters, to get a new distance matrix:

After merging CHI with BOS/NY/DC:

 BOS/NY/DC/ CHI MIA SEA SF/LA DEN BOS/NY/DC/CHI 0 1075 2013 2054 996 MIA 1075 0 3273 2687 2037 SEA 2013 3273 0 808 1307 SF/LA 2054 2687 808 0 1059 DEN 996 2037 1307 1059 0

Now, the nearest pair of objects is SEA and SF/LA, at distance 808. These are merged into a single cluster called "SF/LA/SEA". Then we compute the distance from this new cluster to all other clusters, to get a new distance matrix:

After merging SEA with SF/LA:

 BOS/NY/DC/CHI MIA SF/LA/SEA DEN BOS/NY/DC/CHI 0 1075 2013 996 MIA 1075 0 2687 2037 SF/LA/SEA 2054 2687 0 1059 DEN 996 2037 1059 0

Now, the nearest pair of objects is DEN and BOS/NY/DC/CHI, at distance 996. These are merged into a single cluster called "BOS/NY/DC/CHI/DEN". Then we compute the distance from this new cluster to all other clusters, to get a new distance matrix:

After merging DEN with BOS/NY/DC/CHI:

 BOS/NY/DC/CHI/DEN MIA SF/LA/SEA BOS/NY/DC/CHI/DEN 0 1075 1059 MIA 1075 0 2687 SF/LA/SEA 1059 2687 0

Now, the nearest pair of objects is BOS/NY/DC/CHI/DEN and SF/LA/SEA, at distance 1059. These are merged into a single cluster called "BOS/NY/DC/CHI/DEN/SF/LA/SEA". Then we compute the distance from this new compound object to all other objects, to get a new distance matrix:

After merging SF/LA/SEA with BOS/NY/DC/CHI/DEN:

 BOS/NY/DC/CHI/DEN/SF/LA/SEA MIA BOS/NY/DC/CHI/DEN/SF/LA/SEA 0 1075 MIA 1075 0

Finally, we merge the last two clusters at level 1075. This process is summarized by the clustering diagram printed by many software packages:

In the diagram, the columns are associated with the items and the rows are associated with levels (stages) of clustering. An 'X' is placed between two columns in a given row if the corresponding items are merged at that stage in the clustering.

### References

D'andrade,R. 1978, "U-Statistic Hierarchical Clustering" Psychometrika, 4:58-67.

Johnson,S.C. 1967, "Hierarchical Clustering Schemes" Psychometrika, 2:241-254.

[geneva97/eop.htm]