NETWORK>SUBGROUPS>MARKOV CLUSTERING

Contents - Index

NETWORK>SUBGROUPS>MARKOV CLUSTERING

PURPOSE Implements the Markov Cluster Algorithm to partition a graph.

DESCRIPTION The Markov clustering algorithm partitions a graph into non-overlapping clusters. The algorithm determines the appropriate number of clusters deduced from the structural properties of the graph. This is an iterative algorithm which is based on a bootstrapping procedure and consists of applying two operations expansion and inflation. Both of these operations are based upon random walks. The adjacency matrix is first normalized so that it is column stochastic, hence the column sums are all unity. Expansion is simply squaring the matrix and inflation is squaring the entries followed by a re-scaling so that the matrix is again column stochastic. The process increases probabilities within clusters and decreases those between. Eventually the graph will consist of disconnected segments and these are the clusters. Informally the expansion step causes flow to dissipate within clusters and inflation prevents flow between clusters. We can increase the inflation operation by using powers larger than 2, this is called the inflation parameter. Larger values result in fewer clusters

PARAMETERS
Input dataset
Name of file containing dataset to be analysed. Data type: Valued graph. The algorithm works best on symmetric or nearly symmetric data.

(Output) Clustering partition
Name of dataset which contains a partition indicator vector. This vector has the form (k1,k2,...ki,...)' where ki assigns vertex i to faction ki so that (1 1 2 1 2)' assigns vertices 1, 2 and 4 to cluster 1 and 3 and 5 to cluster 2

Gamma Inflation Factor: (Default=2)
The power that the elements of the matrix are raised to in the inflation step. Higher powers give fewer clusters.It is recommended that values between 1.5 and 3.0 are the most appropriate.

Diagonal value:(Default =1)
The nature of the algorithm means that diagonal values affect the solution. A value of 1 is neutral since a node needs to be connected to itself.

Maximal Residual:(Default=0.001)
This is an iterative algorithm and this sets the convergence criteria.

Pruning threshold:(Default=0.001)
The value under which edges are finally dropped when their probability is lower than this value. Note the algorithm is sensitive to this parameter. This should be set lower for larger networks.

Maximum iterations:(Default=25)
The number of iterations after which the process is terminated and convergence has deemed to have failed. There are some graphs for which the method fails to converge.

Lump singletons together
Ticking this box (the default) puts all the singletons into their own cluster. If not ticked each will be in its own cluster.

Use sparse matrix methods
If ticked this will implement methods that are more efficient for large sparse graphs. Note for dense or small graphs this is less efficient.

LOG FILE Number of iterations required to obtain convergence.
Partition indicator vector. This vector has the form (k1,k2,...ki,...)' where ki assigns vertex i to faction ki so that (1 1 2 1 2)' assigns vertices 1, 2 and 4 to cluster 1 and 3 and 5 to cluster 2

TIMING This is an iterative algorithm but experiments show it has a quadratic level of convergence in general.

COMMENTS None.

REFERENCES Stijn van Dongen, Graph Clustering by flow simulation. PhD thesis, University of Utrecht, May 2000.
Stijn van Dongen. Graph clustering via a discrete uncoupling process. SIAM Journal on Matrix Analysis and Applications 30, 2008 211-239.