Cohesive Subgroups
Overview
Two basic reasons why we are interested in defining and detecting subgroups in network
data. The theoretical is to provide a structural definition of the sociological notion of
group. The practical is to obtain data reduction/description of patterns of observed
dyadic cohesion.
We assume that groups will be more homogeneous with respect to key variables than other
subgraphs either because of either homophily or diffusion.
Data Reduction Approach
a. find dense regions of the network
b. cluster analysis of adjacency matrix (particularly if you have valued data.)
c. cluster analysis of other cohesion index, such as geodesic distance, maximum flow
Theoretical approach
Several formal definitions of subgroups have been advanced. I review a few:
1. clique (Luce & Perry 1949):
 a maximally complete subgraph
 note that density of clique is perfect 1: everyone connected to all others all possible
ties present
 also, distance from everyone to everyone is minimal: 1.
 overlapping
 can be numerous
 too stringent in some cases, but depends on the relation studied
 Some relaxations developed, along two different lines
 based on distances within group (nclique, nclan, nclub)
 based on density/number of ties present (kplex)
2. nclique (Luce 1950):
 a maximal set of nodes such that all pairs of nodes in the set are distance n or less
 but one problem with nclique is that even a 2clique is not very cohesive. In the graph
{1,3,5} is a 2clique, but none are connected to each other.
 also, from substantive point of view is wierd that shortest path between two members
requires outside intermediary
3. nclan: (Alba 19xx) an nclique such that the induced subgraph has diameter n or less
4. nclub: (Mokken 19xx) a maximal subgraph of diameter <= n
 every nclan is an nclub and every nclan is an nclique
 but every nclub is not an nclan or nclique, although it is contained in them (fail
nclique maximality condition)

 2cliques: {1,2,3,4,5}, {2,3,4,5,6};
 2clan {2,3,4,5,6};
 2clubs {1,2,3,4}, {1,2,3,5} and {2,3,4,5,6}

5. kplex (Seidman 19xx):
 a (maximal) subgraph in which each member is connected to at least nk other members.
 when k = 1 each person is connected to all other members = clique.
 2plexes are quite cohesive, at least if we observe some basic rules:
 if you remove a node from a kplex, the residual subgraph is still a kplex (if we ignore
maximality rule)
 kplexes with k < (n+2)/2 vertices cannot contain bridges (but may contain cutpoint)
6. lssets (Seidman):
 Let alpha(K) be the number of ties from members of a set K to nonmembers. Let
alpha(X,Y) be the number of ties from members of X to members of Y. A subset H of V is an
LS set iff for any proper subset K of H, alpha(K) > alpha(H)
 or, H is ls iff alpha(K,HK) > alpha(K,VH)
 1,2,3,4 is ls
 1,2,3,4 is not ls
 1,2,3,4 is a clque but not ls
7. lambda sets
i. one property of ls sets is that each nodes in the set has more edge indep paths to
each other node in the set than to nodes outside the set.
ii. lambda sets have that definition
iii. same as clustering max flow matrix.
Cohesive Regions
A related concept is that of a cohesive region. A cohesion region is an area within
which there are subgroups.
1. Component
 A maximal set of nodes which are mutually reachable.
2. kcores (seidman):
 Let delta(H) represent the minimum number of ties that any node in H has to the other
nodes in H. A KCore is a maximal subgraph in which delta(H) >= k.
 a graph is a tree iff no 2cores
 connected graph has just one 2core
 any kcore contains at least k+1 vertices
 vertices in different kcores cannot be adjacent
 all kplexes are contained with kcores.
 a 1core is just a component.