## Cross-Product Matrices

• If X is any matrix (say, a case by variable matrix in which the rows are cases and the columns are variables measured on those cases), then both X'X and XX' are cross product matrices.
• X'X can be thought of as a collection of inner products between all pairs of columns of X. Suppose we extract the ith column of X as a free-standing vector and call this vector I. Do the same for the jth column of X and call it J. Now, if we take the inner product of I and J, we obtain a single number which is given by the SIkJk -- in other words the sum of products of corresponding cells of I and J. If we did this for all possible pairs of columns of X, we would obtain X'X
• In other words, cell (i,j) in X'X is just the product of the ith column and the jth column in X
• Now, consider what happens if the columns of X are mean-centered. That is, we subtract from each value in a column the mean of that column, obtaining a new set of values whose mean will be zero. In this case, the matrix X'X will be proportional to the covariance matrix of X. That is, each value X'Xij will give the covariance between column i and column j of X (multiplied by a constant). How does one know this? The formula for the covariance between variable Y and variable Z is usually written as follows:  Cov(Y,Z) =  1/nSYkZk - (1/nSYk)(1/nSZk). Notice that 1/nSYk is just the mean of Y. Since that is zero, the right-hand term disappears, leaving just 1/nSYkZk , which is the inner product of Y and Z divided by n. In short, the covariance of two variables is just their inner product divided by the number of cases.
• Similarly, consider what happens if the columns of X are standardized (i.e., forced to have mean 0 and standard deviation 1). Now, the matrix X'X is just the matrix of correlations among the columns of X (well, the correlations times n, so you have to divide by n). How does one know? Well the formula that defines correlation for a pair of variables Y and Z goes like this:  Corr(Y,Z) = Cov(Y,Z)/(SySz) where Sy is the standard deviation of variable Y and Sz is the standard deviation of variable Z. Since we have standardized, the standard deviations of both variables is 1, so the correlation equals the covariance, which equals the 1/n times the cross-products of the columns of X.