Connections 17(2):78-80

Copyright 1994 INSNA

Hierarchical Clustering

Stephen P. Borgatti

*University of South Carolina*

Given a set of N items to be clustered, and an NxN distance (or similarity) matrix, the basic process of Johnson's (1967) hierarchical clustering is this:

- Start by assigning each item to its own cluster, so that if you have N items, you now have N clusters, each containing just one item. Let the distances (similarities) between the clusters equal the distances (similarities) between the items they contain.
- Find the closest (most similar) pair of clusters and merge them into a single cluster, so that now you have one less cluster.
- Compute distances (similarities) between the new cluster and each of the old clusters.
- Repeat steps 2 and 3 until all items are clustered into a single cluster of size N.

Step 3 can be done in different ways, which is what distinguishes *single-link*
from *complete-link* and *average-link* clustering. In *single-link*
clustering (also called the *connectedness* or *minimum* method), we
consider the distance between one cluster and another cluster to be equal to the __shortest__
distance from any member of one cluster to any member of the other cluster. If the data
consist of similarities, we consider the similarity between one cluster and another
cluster to be equal to the __greatest__ similarity from any member of one cluster to
any member of the other cluster. In *complete-link* clustering (also called the *diameter*
or *maximum* method), we consider the distance between one cluster and another
cluster to be equal to the __longest__ distance from any member of one cluster to any
member of the other cluster. In *average-link* clustering, we consider the distance
between one cluster and another cluster to be equal to the __average__ distance from
any member of one cluster to any member of the other cluster. A variation on average-link
clustering is the UCLUS method of D'Andrade (1978) which uses the __median__ distance.

**Example.** The following pages trace a hierarchical clustering of
distances in miles between U.S. cities. The method of clustering is *single-link*.

**Input distance matrix:**

BOS | NY | DC | MIA | CHI | SEA | SF | LA | DEN | |

BOS | 0 | 206 | 429 | 1504 | 963 | 2976 | 3095 | 2979 | 1949 |

NY | 206 | 0 | 233 | 1308 | 802 | 2815 | 2934 | 2786 | 1771 |

DC | 429 | 233 | 0 | 1075 | 671 | 2684 | 2799 | 2631 | 1616 |

MIA | 1504 | 1308 | 1075 | 0 | 1329 | 3273 | 3053 | 2687 | 2037 |

CHI | 963 | 802 | 671 | 1329 | 0 | 2013 | 2142 | 2054 | 996 |

SEA | 2976 | 2815 | 2684 | 3273 | 2013 | 0 | 808 | 1131 | 1307 |

SF | 3095 | 2934 | 2799 | 3053 | 2142 | 808 | 0 | 379 | 1235 |

LA | 2979 | 2786 | 2631 | 2687 | 2054 | 1131 | 379 | 0 | 1059 |

DEN | 1949 | 1771 | 1616 | 2037 | 996 | 1307 | 1235 | 1059 | 0 |

The nearest pair of cities is BOS and NY, at distance 206. These are merged into a single cluster called "BOS/NY".

Then we compute the distance from this new compound object to all other objects. In single link clustering the rule is that the distance from the compound object to another object is equal to the shortest distance from any member of the cluster to the outside object. So the distance from "BOS/NY" to DC is chosen to be 233, which is the distance from NY to DC. Similarly, the distance from "BOS/NY" to DEN is chosen to be 1771.

**After merging BOS with NY:**

BOS/NY | DC | MIA | CHI | SEA | SF | LA | DEN | |

BOS/NY | 0 | 223 | 1308 | 802 | 2815 | 2934 | 2786 | 1771 |

DC | 223 | 0 | 1075 | 671 | 2684 | 2799 | 2631 | 1616 |

MIA | 1308 | 1075 | 0 | 1329 | 3273 | 3053 | 2687 | 2037 |

CHI | 802 | 671 | 1329 | 0 | 2013 | 2142 | 2054 | 996 |

SEA | 2815 | 2684 | 3273 | 2013 | 0 | 808 | 1131 | 1307 |

SF | 2934 | 2799 | 3053 | 2142 | 808 | 0 | 379 | 1235 |

LA | 2786 | 2631 | 2687 | 2054 | 1131 | 379 | 0 | 1059 |

DEN | 1771 | 1616 | 2037 | 996 | 1307 | 1235 | 1059 | 0 |

The nearest pair of objects is BOS/NY and DC, at distance 223. These are merged into a single cluster called "BOS/NY/DC". Then we compute the distance from this new cluster to all other clusters, to get a new distance matrix:

**After merging DC with BOS-NY:**

BOS/NY/DC | MIA | CHI | SEA | SF | LA | DEN | |

BOS/NY/DC | 0 | 1075 | 671 | 2684 | 2799 | 2631 | 1616 |

MIA | 1075 | 0 | 1329 | 3273 | 3053 | 2687 | 2037 |

CHI | 671 | 1329 | 0 | 2013 | 2142 | 2054 | 996 |

SEA | 2684 | 3273 | 2013 | 0 | 808 | 1131 | 1307 |

SF | 2799 | 3053 | 2142 | 808 | 0 | 379 | 1235 |

LA | 2631 | 2687 | 2054 | 1131 | 379 | 0 | 1059 |

DEN | 1616 | 2037 | 996 | 1307 | 1235 | 1059 | 0 |

Now, the nearest pair of objects is SF and LA, at distance 379. These are merged into a single cluster called "SF/LA". Then we compute the distance from this new cluster to all other objects, to get a new distance matrix:

**After merging SF with LA:**

BOS/ NY/DC |
MIA | CHI | SEA | SF/LA | DEN | |

BOS/NY/DC | 0 | 1075 | 671 | 2684 | 2631 | 1616 |

MIA | 1075 | 0 | 1329 | 3273 | 2687 | 2037 |

CHI | 671 | 1329 | 0 | 2013 | 2054 | 996 |

SEA | 2684 | 3273 | 2013 | 0 | 808 | 1307 |

SF/LA | 2631 | 2687 | 2054 | 808 | 0 | 1059 |

DEN | 1616 | 2037 | 996 | 1307 | 1059 | 0 |

Now, the nearest pair of objects is CHI and BOS/NY/DC, at distance 671. These are merged into a single cluster called "BOS/NY/DC/CHI". Then we compute the distance from this new cluster to all other clusters, to get a new distance matrix:

**After merging CHI with BOS/NY/DC:**

BOS/NY/DC/ CHI |
MIA | SEA | SF/LA | DEN | |

BOS/NY/DC/CHI | 0 | 1075 | 2013 | 2054 | 996 |

MIA | 1075 | 0 | 3273 | 2687 | 2037 |

SEA | 2013 | 3273 | 0 | 808 | 1307 |

SF/LA | 2054 | 2687 | 808 | 0 | 1059 |

DEN | 996 | 2037 | 1307 | 1059 | 0 |

Now, the nearest pair of objects is SEA and SF/LA, at distance 808. These are merged into a single cluster called "SF/LA/SEA". Then we compute the distance from this new cluster to all other clusters, to get a new distance matrix:

**After merging SEA with SF/LA:**

BOS/NY/DC/CHI | MIA | SF/LA/SEA | DEN | |

BOS/NY/DC/CHI | 0 | 1075 | 2013 | 996 |

MIA | 1075 | 0 | 2687 | 2037 |

SF/LA/SEA | 2054 | 2687 | 0 | 1059 |

DEN | 996 | 2037 | 1059 | 0 |

Now, the nearest pair of objects is DEN and BOS/NY/DC/CHI, at distance 996. These are merged into a single cluster called "BOS/NY/DC/CHI/DEN". Then we compute the distance from this new cluster to all other clusters, to get a new distance matrix:

**After merging DEN with BOS/NY/DC/CHI:**

BOS/NY/DC/CHI/DEN | MIA | SF/LA/SEA | |

BOS/NY/DC/CHI/DEN | 0 | 1075 | 1059 |

MIA | 1075 | 0 | 2687 |

SF/LA/SEA | 1059 | 2687 | 0 |

Now, the nearest pair of objects is BOS/NY/DC/CHI/DEN and SF/LA/SEA, at distance 1059. These are merged into a single cluster called "BOS/NY/DC/CHI/DEN/SF/LA/SEA". Then we compute the distance from this new compound object to all other objects, to get a new distance matrix:

**After merging SF/LA/SEA with BOS/NY/DC/CHI/DEN:**

BOS/NY/DC/CHI/DEN/SF/LA/SEA | MIA | |

BOS/NY/DC/CHI/DEN/SF/LA/SEA | 0 | 1075 |

MIA | 1075 | 0 |

Finally, we merge the last two clusters at level 1075. This process is summarized by the clustering diagram printed by many software packages:

In the diagram, the columns are associated with the items and the rows are associated with levels (stages) of clustering. An 'X' is placed between two columns in a given row if the corresponding items are merged at that stage in the clustering.

D'andrade,R. 1978, "U-Statistic Hierarchical Clustering" *Psychometrika*,
4:58-67.

Johnson,S.C. 1967, "Hierarchical Clustering Schemes" *Psychometrika*,
2:241-254.