HANDOUT

Handling Missing Data

It is often said that network analysis is less forgiving of missing data than other forms of research. This is probably not nearly as true as people think (see Borgatti, Carley and Krackhardt, 2006), but there is something to it.

We should distinguish between two forms of missing data:  node level, and tie level. Node level is where a respondent does not answer the network portion of a survey at all, as if they were not part of the study. Tie level missing data is where they choose not to given an evaluation of a particular actor (such as their boss), but do answer for other actors.

Tie level missing data -- if not excessive -- can be handled by standard imputation approaches, such as the Ward, Hoff and Lofdahl (2003) approach, and we don't discuss it further.

Node level missing data is more problematic. Two main strategies are used to deal with it. First, you can ignore the node entirely, as if it never existed. From a matrix perspective, if the original data matrix had 50 rows and columns, the new matrix will not have 49 rows and columns.

Another approach is to impute the missing data, which means to guess what the person would have answered had they had a chance. There are several ways to do this, including modeling the dataset by fitting an ERG model, then filling in the missing data with maximum likelihood estimates based on the parameters of the ERGM. But the simple way is as follows.

### Undirected (logically symmetric data)

First, let's consider the case of an undirected relation (i.e., a social relation that is logically symmetric). In that case, the simple strategy is to assume that if the respondent had answered, he would have responded the same way that others in fact did about him. In short, the person's column in the data matrix (what people say about them) is used to fill in the values of the person's row, which is missing.

The easy way to do this in UCINET is via an undocumented matrix algebra command called replacena. Given input matrices A and B, the replacena routine changes any missing values found in A to the corresponding value found in B, and saves the result in a new matrix C. For, example, typing

--> C = replacena(A B)

ask the program to create a new dataset C such that cij = bij if aij is missing, and cij = aij otherwise.

So how do you use this to replaced missing values with what the other person said? The information about what the other person said about the missing person is given in the missing person's column. So you cij = aij when person i answered the survey, but aji when they didn't. In other words, you want to use the transpose of the matrix. So if A is the raw data matrix, you want to create a new version of it called, say, A-cleaned as follows:

--> A-cleaned = replacena(A transpose(B))

### Directed (logically nonSymmetric data)

--> CLEANEDGET = replacena(GET transpose(GIVE))

--> CLEANEDGIVE = replacena(GIVE, transpose(GET))

### Bibliography

• Borgatti, S.P., Carley, K., and Krackhardt, D. 2006. Robustness of Centrality Measures under Conditions of Imperfect Data. Social Networks 28: 124–136. [pdf]
• Gueorgi Kossinets, Effects of missing data in social networks, Social Networks Volume 28, Issue 3, , July 2006, Pages 247-268.[^pdf]
• Ward, M.D., Hoff, P.D., and Lofdahl, C.L. (2003) "Identifying International Networks: Latent Spaces and Imputation," in Dynamic Social Network Modeling and Analysis: Workshop Summary and Papers , 345-359, Ronald Breiger, Kathleen Carley, and Philippa Pattison, eds., Washington, D.C., The National Academies Press. [pdf]
 Visits: