**Sociograms and Graph Theory**

A graph diagram 2 aims to represent each row or column in an incidence matrix - each of the cases or affiliations under investigation - by a point on the paper. Once the appropriate adjacency matrix has been derived, the 'I' and 'O' entries in the cells of the matrix, representing the presence or absence of a relation, can be indicated by the presence or absence of lines between the points. In Figure 3.5, for example, the symmetrical 4-by-4 adjacency matrix of companies can be drawn as a four point graph containing six lines, which correspond to the non-zero entries in the matrix.

In a graph, it is the *pattern *of connections that is
important, and not the actual positioning of the points on the
page. The graph theorist has no interest in the relative position
of two points on the page, the lengths of the lines which are
drawn between them, or the size of character used to indicate the
points. Graph theory does involve concepts of length and
location, for example, but these do

*68 Social network analysis*

not correspond to those concepts of physical length and location with which we are most familiar. It is usual in a graph diagram to draw all the lines with the same physical length, wherever this is possible, but this is a purely aesthetic convention and a matter of practical convenience. Indeed, it is not always possible to maintain this convention if the graph is to be drawn with any clarity. For this reason, there is no one 'correct' way to draw a graph. The graph diagrams in Figure 4.1, for example, are equally valid ways of drawing the same graph - all convey exactly the same graph theoretical information.

The concepts of graph theory, then, are used to describe the pattern of connections among points. The simplest of graph theoretical concepts refer to the properties of the individual points and lines from which a graph is constructed, and these are the building blocks for more complex structural ideas. In this chapter I shall review these basic concepts and show how they can be used to give an overview of both the ego-centric and the global features of networks. Subsequent chapters will explore some of the more complex concepts.

It is necessary first to consider the types of lines that can
be used in the construction of graphs. Lines can correspond to
any of the types of relational data distinguished in Figure 3.6:
undirected, directed, valued or both directed and valued. The
graphs in Figure 4.1 consist of undirected lines. These graphs
derive from a symmetrical data matrix where it is simply the
presence or absence of a relation which is of importance. If the
relations are directed from one agent to another, then they can
be represented in a **directed graph, **sometimes termed a
'digraph'. A directed graph is represented in drawn form by
attaching an arrow head to each line, the direction of the arrow
indicating the direction of the relation. Figure 4.2 shows a
simple directed graph.

If, on the other hand, the intensity of the relation is an
important consideration and can be represented by a numerical
value, the researcher can construct a **valued graph **in
which numerical values are attached to each of the lines. I have
already shown that a matrix for a directed graph will not usually
be symmetrical, as relations will not normally be reciprocated. A
matrix for a valued graph may or may not be symmetrical, but it
will contain values instead of simple binary entries. 3 An
example of a valued graph is that in Figure 3.5. One of the
simplest and most widely used measures of intensity is the **multiplicity
**of a line. This is simply the number of separate contacts
which make up the relationship. If, for example, two companies
have two directors in common, the relation between the two
companies can be represented by a line of multiplicity 2. If they

*70 Social network analysis*

Figure 4.2 A *directed graph and its matrix*

have three directors in common, the interlocking directorship can be seen as a line of multiplicity 3. The values in a graph can, of course, relate to any other suitable measure of intensity, such as, for example, the frequency of the relation.

The fundamental ideas of graph theory can most easily be understood in relation to simple undirected and un-valued graphs. A number of apparently straightforward words are used to refer to graph theoretical terms, and it may appear pedantic to define these at great length. But these definitional matters are important, as the apparently simple words are used in highly specific and technical ways. It is essential that their meanings are clarified if the power of graph theory is to be understood.

Two points which are connected by a line are said to be **adjacent
**to one another. Adjacency is the graph theoretical expression
of the fact that two agents represented by points are directly
related or connected with one another. Those points to which a
particular point is adjacent are termed its **neighborhood, **and
the total number of other points in its neighbourhood is termed
its **degree **(strictly, its 'degree of connection'). Thus,
the 'degree' of a point is a numerical measure of the size of its
neighbourhood. The degree of a point is shown by the number of
non-zero entries for that point in its row or column entry in the
adjacency matrix. Where the data are binary, as in Figure 4.1,
the degree is simply the row or column sum

for that point. Because each line in a graph connects two points -- is 'incident' to two points -- the total sum of the degrees of all the points in a graph must equal twice the total number of lines in the graph. The reason for this is that each line is counted twice when calculating the degrees of the separate points. This can be confirmed by examining Figure 4. 1. In this graph, point B has a degree of 4 and all the other points have a degree of 2. Thus, the sum of the degrees is 12, which is equal to twice the number of lines (six).

*Lines, direction and density 71*

Points may be directly connected by a line, or they may be
indirectly connected through a sequence of lines. A sequence of
lines in a graph is a 'walk', and a walk in which each point and
each line are distinct is called a **path. **The concept of
the path is, after those of the point and the line, one of the
most basic of all graph theoretical concepts. The **length **of
a path is measured by the number of lines which make it up. In
Figure 4. 1, for example, points A and D are not directly
connected by a line, but they are connected through the path ABD,
which has a length of 2. A particularly important concept in
graph theory is that of 'distance', but neither distance nor
length correspond to their everyday physical meanings. The length
of a path, I have said, is simply the number of lines which it
contains - the number of 'steps' necessary to get from one point
to another. The **distance **between two points is the length
of the shortest path (the 'geodesic') which connects them.

Figure 4.3 *Lines and paths*

Consider the simple graph in Figure 4.3. In this graph, AD is a path of length 1 (it is a line), while ABCD is a path of length 3. The walk ABCAD is not a path, as it passes twice through point A. It can be seen that points A and D are connected by three distinct paths: AD at length 1, ACD at length 2, and ABCD at length 3.' The distance between A and D, however, is the length of the shortest path between them, which, in this case, is 1. The distance between points B and D, on the other hand, is 2. Many of the more complex graph theoretical measures take account only of geodesics, shortest paths, while others consider all the paths in a graph.

These same concepts can be used with directed graphs, though some modifications must be made to them. In a directed graph lines are directed to or from the various points. Each line must be considered along with its direction, and there will not be the symmetry that exists in simple, undirected relational data. The fact that, for example, A chooses B as a friend does not mean that there

*72 Social network analysis*

will be a matching friendship choice from B to A. For this
reason, the 'degree' of a point in a directed graph comprises two
distinct elements, called the 'indegree' and the 'outdegree'.
These are defined by the direction of the lines which represent
the social relations. The **indegree **of a point is the total
number of other points which have lines directed towards it; and
its **outdegree **is the total number of other points to which
it directs lines. The indegree of a point, therefore, is shown by
its column sum in the matrix of the directed graph, while its
outdegree is shown by its row sum. The column sum of point B in
Figure 4.2, for example, is 2, as it ,receives' two lines (from A
and from C). The corresponding sociogram shows clearly that its
indegree is 2. The row sum for B, on the other hand, is 1,
reflecting the fact that it directs just one line, to point C.

A path in a directed graph is a sequence of lines in which all the arrows point in the same direction. The sequence CAB in Figure 4.2, for example, is a path, but CBA is not: the changing direction of the arrows means that it is not possible to 'reach' A from C by 6

passing through B. It can be seen that the criteria for
connection are much stricter in a directed graph, as the
researcher must take account of the direction of the lines rather
than simply the presence or absence of a line. The distance
between two points in a directed graph, for example, must be
measured only along the paths that can be identified when
direction is taken into account. When agents are regarded as
either 'sources' or 'sinks' for the 'flow' of resources or
information through a network, for example, it is sensible to
take serious account of this directional information in analysing
the graph of the network. Sometimes, however, the direction of
the lines can legitimately be ignored. If it *is *the mere
presence or absence of a line which is important, its direction
being a relatively unimportant factor, it is possible to relax
the usual strict criteria of connection and to regard any two
points as connected if there is a sequence of lines between them,
regardless of the directions of the arrows. In such an analysis
it is usual to speak of the presence of a ,semi-path' rather than
a path. CBA in Figure 4.2 is a semi-path. Treating directed data
as if they were undirected, therefore, means that all the usual
measures for undirected data may then be used.

Next Section | Return to Table of Contents |