Assume that we are measuring the similarity between vector X and vector Y. We
use X* and Y* to refer to the canonical normalizations (or uniformed versions)
of the X and Y.
Generic Measure of Similarity
 If X* indicates the uniformed version of X, then Zegers & ten Berge family
of association measures can all be described by the same equation:
Absolute Scale Data
 Identity coefficient. Scale differences not normalized away
 Not mentioned by Z & ten B is the Euclidean distance coefficient. This
measure is not normed  varies from 0 to ??
Ratio Scale Data
 Tucker's congruence = coefficient of proportionality. Differences in
amplitude normalized away
Additive Scale Data
 Coefficient of additivity = Winer's I
Interval Scale Data
 Pearson correlation = coefficient of linearity
Ordinal data
 Spearman's rho = r(X*,Y*)
 Goodman and Kruskal Gamma = (P  Q)/(P + Q), P is concordant pair and Q is
discordant
 example:

X 
Y 
1 
1 
1 
2 
1 
2 
3 
2 
1 
4 
2 
1 
5 
3 
1 
6 
3 
1 
7 
3 
2 

1 
2 
3 
4 
5 
6 
7 
1 

n 
n 
n 
n 
n 
p 
2 


q 
q 
q 
q 
n 
3 



n 
n 
n 
p 
4 




n 
n 
p 
5 





n 
n 
6 






n 
7 







P = 3, Q = 4, gamma = 1/7
Or do it via contingency table:
P = 1*(0+1) + 2*(1) = 3
Q = 1*(2+2) +0*(2) = 4
Gamma = 1/7
Another example:
City Size/Arenas 
Small 
Medium 
Large 
Weak Mayor 
a = 10 
b = 5 
c = 2 
Strong Mayor 
d = 10 
e = 15 
f = 20 
P = a(e+f) + bf = 10(15+20) + 5*20 = 450
Q = c(d+e) + bd = 2(10+15) + 5*10 = 100
gamma = (P  Q)/(P + Q) = (450100)/(450 + 100) = .636
Presence/Absence Data
 Simple matches
 Jaccard
 Gamma / Yule's Q
 (adbc)/(ad+bc)
 (OR1)/(OR+1)
Nominal Data
 (equals phi when table is 2 by 2
