Contents - Index


PURPOSE Compute correlation and other similarity measures between entries of two square matrices, and assess the frequency of random measures as large as actually observed.

DESCRIPTION The procedure is principally used to test the association between networks.  Often, one network is an observed network while the other is a model or expected network.  
The algorithm proceeds in two steps.  In the first step, it computes Pearson's correlation coefficient (plus simple matching, Jaccard, Goodman Kruskal Gamma and Hamming distance) between corresponding cells of the two data matrices.  In the second step, it randomly permutes rows and columns (synchronously) of one matrix (the observed matrix, if the distinction is relevant) and recomputes the correlation and other measures. 

The second step is carried out hundreds of times in order to compute the proportion of times that a random measure is larger than or equal to the observed measure calculated in step 1.  A low proportion (< 0.05) suggests a strong relationship between the matrices that is unlikely to have occurred by chance.

Data Matrix:
Name of dataset containing the first matrix (the observed or dependent matrix, if such distinctions are meaningful). Data type:  Square Matrix.

Structure Matrix:
Name of dataset containing the expected, modeled or independent matrix (if such distinctions are meaningful). Data type: Square Matrix.

Number of random permutations: (Default = 500)
Number of correlations to compute between the data matrix and the randomly permuted structure matrix.  The larger the number of permutations, the better the estimates of standard error and "significance", but the longer the computation time.

Treat diagonals as valid? (Default = NO)
If YES, the values along the main diagonals of each matrix are included in the computation of correlation.  Otherwise, they are treated as missing.

Random number seed:
The random number seed sets off the random permutations.  UCINET generates a different random number as default each time it is run.  This number should be changed if the user wishes to repeat an analysis.  The range is 1 to 32000.

LOG FILE The output consists of some summary statistics of each of the matrices followed by the results. The following sample output is generated:

                               Value   Signif   Avg     SD    P(Large) P(Small)   NPerm
                              -------  ------ ------- ------  -------  -------   ---------
          Pearson Correlation: 0.120   0.101  -0.002   0.086   0.101    0.943    2500.000
              Simple Matching: 0.667   0.101   0.625   0.032   0.101    0.943    2500.000
          Jaccard Coefficient: 0.176   0.101   0.120   0.039   0.101    0.943    2500.000
        Goodman-Kruskal Gamma: 0.319   0.101  -0.021   0.249   0.101    0.943    2500.000
             Hamming Distance:70.000   0.101  78.738   6.348   0.943    0.101    2500.000

The Value column indicates  the observed value between the two networks, in this case 0.120 for correlation and 0.176 for Jaccard.  The average random correlation was almost zero with a standard error of 0.086.  The percentage of random correlations that were as large as 0.120 was 0.101 that is 10.01%. Hence of the 2,500 random permutations just over 250 produced a correlation of 0.120 or higher.  At a typical 0.05 level, this correlation would not be considered significant since 0.101> 0.05. The table gives the P(Large) as well as P(Small) note that for the Hamming distance it is P(Small) that needs to be considered as smaller values imply more similarity. The column headed significance attempts to identify  the correct value from P(Small) and P(Large), however when the observed value is close to zero it can get this wrong since this selection is based upon whether the observed value is positive or negative. In this instance the user should consider the measure used and the type of data.   

TIMING O(N^2) per permutation.

COMMENTS The program ignores missing values.