QAP – Testing Network Hypotheses

1) Introduction

a) Topic is testing hypotheses: Is problematic in network analysis

i) E.g., computing correlations between variables

b) It is different in network analysis because

i) There are many units of analysis, such as the dyad, the node, and the group (whole network)

ii) The data usually don’t satisfy the conditions for significance testing

c) I will be presenting the permutation or QAP approach to testing hypotheses

2) Units of analysis

a) Dyadic.

i) Social ties “grease the wheels” for business exchanges

ii) Distance between offices influences likelihood of communication

b) Nodal.

i) Centrality à earlier promotion

ii) Cheerful personality à centrality

c) Group / whole network

i) Teams with core/periphery structure perform better than teams with clumpy structure

d) Mixed Dyadic / Nodal

i) [selection] gender determines who talks to whom (homophily)

ii) [diffusion] interaction leads to similarity in beliefs

3) Problems with statistical methods in network context

a) Measuring association between variables is not a problem. Only significance.

b) [for dyadic hypotheses] Data are whole matrices

c) Data are rarely random samples – if samples at all

d) Data not normally distributed (or distribution unknown)

e) Observations not independent of each other

4) Why do we need significance, especially if data are not samples?

a) Even when the data are a population, chance plays a role

b) We must know how likely it is to obtain the result that we have given that in truth the variables are independent.

5) Testing dyadic hypotheses with QAP

a) Correlate corresponding cells of the two adjacency matrices using ordinary Pearson correlation

i) Same as stringing out all values into two vectors each with N(N-1) values and correlating in SPSS

b) To get significance, we must compare the observed correlation with a distribution of correlations that can occur when the matrices are in truth are independent

i) In principle, can just create a few thousand random matrices and correlate them with each other, then count the proportion of random matrices that are as correlated as our two observed matrices.

(1) This proportion is the significance – the likelihood that the matrices are independent

ii) However, we want to compare only with matrices that are similar to our observed data – same size, same distribution of values, same properties

iii) Solution is to randomly permute the rows and columns of the adjacency matrices and correlate – repeat this thousands of times and count proportion of times the correlation is as large as the real correlation

6) QAP regression

a) Multiple independent variables (relations)

7) Testing Homophily with categorical variable

8) Testing homophily with continuous variable