Singular Value Decomposition

notes by Steve Borgatti

1. Review of Correlation of Two Vectors

Suppose A and Y are two column vectors. Suppose they are standardized. What’s the correlation between them? Sum(AiYi)/n

In matrix or vector notation, this is 1/nA’Y. You remember that A’Y is basically correlation, right? So what is it if mean centered but not standardized? Sum(AiYi)/(n||A||*||Y||)

In textbooks, you often see A’Y defined as ||A||*||Y||Cos(theta). Theta is the angle between A and Y in geometric space. This tells you that cos(theta) is a measure of correlation, since ).

[figure 2.6 in textbook]

A’Y has another interpretation as well. It is the projection of A along the axis defined by Y.

2. Multiplying a series of row vectors (collected into a matrix ) by a vector of weights

Let the matrix be X. Let the vector of weights by V. Suppose it has just two columns and n rows. Each row is a row vector. XV is a new vector Z.

So this gives the projection of each row in X onto the dimension defined by V.

[figure 2.8 in textbook]

3. Orthogonality

Two vectors are orthogonal if they are at right angles of each other. We can express this mathematically as A’B = 0. in other words, if the correlation between them is zero. In other words, if the cos of the angle between them is 0.

[figure 2.9 in textbook]

A matrix V of many column vectors is orthogonal if V’V = I. i.e., if every pair of column vectors V1’V2 = 0. if V is standardize by columns, this means the columns are uncorrelated.

4. Rotation of Axes

Suppose X is a matrix of column vectors. And V is an orthogonal matrix. Say column 1 is .707 .707, a 45 deg ray. What vector is orthogonal to it? -707 707. test it.

[figure 2.9 again]

Suppose we multiply XV to get Z.

What is Z? first column is projection of the row points of X onto coordinate system defined by col 1 of V. 2^nd col is projection of the points onto coord sys defined by col 2 of V. In other words, is a rotation of X to new coordinate system defined by V. is clockwise rotation of points 45 deg. Or counterclockwise rotation of axes by 45 deg.

Called a rigid rotation. To rotate any specified number of degrees, multiply all points by this:

Cos(theta) Sin

-Sin Cos

Here's some height by weight data:

Height	Weight	H-hbar	W-wbar	H*	W*	HR	WR	HR*	WR*
57	93	-6	-31	-1.86532	-2.06601	-2.77945	-0.14189	-2.03443	-0.38922
58	110	-5	-14	-1.54646	-0.91822	-1.74253	0.444163	-1.27546	1.218429
60	99	-3	-25	-0.90874	-1.66091	-1.81674	-0.53178	-1.32978	-1.45878
59	111	-4	-13	-1.2276	-0.85071	-1.46937	0.266464	-1.07551	0.730966
61	115	-2	-9	-0.58989	-0.58064	-0.82756	0.006536	-0.60574	0.01793
60	122	-3	-2	-0.90874	-0.10803	-0.71886	0.566108	-0.52617	1.552949
62	110	-1	-14	-0.27103	-0.91822	-0.8408	-0.45757	-0.61543	-1.2552
61	116	-2	-8	-0.58989	-0.51313	-0.77983	0.05427	-0.5708	0.148875
62	122	-1	-2	-0.27103	-0.10803	-0.26799	0.115243	-0.19616	0.316135
63	128	0	4	0.047829	0.297073	0.243845	0.176215	0.178484	0.483395
62	134	-1	10	-0.27103	0.702172	0.304818	0.688053	0.223113	1.88747
64	117	1	-7	0.366686	-0.44561	-0.0558	-0.57429	-0.04084	-1.5754
63	123	0	-1	0.047829	-0.04051	0.005174	-0.06246	0.003787	-0.17133
65	129	2	5	0.685544	0.364589	0.742444	-0.22692	0.543437	-0.62247
64	135	1	11	0.366686	0.769688	0.803417	0.284922	0.588066	0.7816
66	128	3	4	1.004402	0.297073	0.920143	-0.50008	0.673504	-1.37183
67	135	4	11	1.32326	0.769688	1.479714	-0.39138	1.083086	-1.07362
66	148	3	24	1.004402	1.647403	1.874826	0.454601	1.37229	1.247065
68	142	5	18	1.642118	1.242304	2.039286	-0.28267	1.492668	-0.77542
69	155	6	31	1.960975	2.120018	2.885263	0.112443	2.111885	0.308455

Plot of the mean-centered data:

Now we rotate by multiplying by

.707 -.707

.707 .707

5. Stretching and Shrinking

We can stretch a picture up and down or left and right by simply multiplying each column of X by some constant > 0. If > 1 then stretch otherwise is shrink. Often expressed by storing the constants in a diagonal matrix D’ and multiplying XD^-1.

For example, let D contain the std deviations of each column in X (which is mean centered). Then multiplying X by D^-1 would adjust the configuration along each axis to have same length

6. SVD

Suppose we rotate and then stretch a data matrix X, to yield U. i.e. U= XVD^-1

Now let’s solve for X.

UD = XV

UDV^-1 = UDV’ = X

X is nxm, U is nxm, D is mxm and V is mxm as is V’.

Let’s write that differently.

Xij = SUMk( UikDkkVjk)

Xij = U_i1*D₁₁*V_j1 + U_i2*D₂₂*V_j2 + …

Suppose we sort columns of U , the rows and columns of D and columns of V so that the singular values are in descending order. Then we can drop off the ones in which Dkk are small. So we can approximate the matrix:

Xij == U_i1*D₁₁*V_j1 + U_i2*D₂₂*V_j2 + …

X is nxm, U is nxp, D is pxp and V is mxp

7. Generalized inverse

X = UDV’

X^-1 = (UDV’)^-1

X^-1 = (V’)^-1D^-1U^-1

X^-1 = VD^-1U’

So we can compute the inverse of any matrix via svd. Presto.

8. Principal components

We know any matrix A can be decomposed (via svd) as triple product UDV’.
When A happens to be square and symmetric (like a correlation matrix or a cross-products matrix), we will find that U = V, so that A = UDU’ or A = VDV’.
Suppose we compute the cross-products matrix from A. That is, we compute S= A’A. Obviously, we can decompose S into a triple product XGY’. Question is, how does X relate to U, and G to D and Y to V?
Well, if A = UDV’ then A’A = A’UDV’ = (UDV’)’UDV’ = VDU’UDV’. And since U and V are orthogonal (i.e., columns are independent of each other), U’U = I , so VDU’UDV’ = VD²V’. So svd of A’A gets you VD²V’ (and, similarly, svd of AA’ gets you UD²U’)
We call the svd of a cross-products matrix (such as correlation matrix) the eigen structure of the matrix. The Us and Vs are called eigenvectors, and the D²s are eigenvalues.

Eigenvectors

Since S=A’A = VD²V’, then SV = D²V’n. So an eigenvector v of a matrix S is any vector that satisfies this equation: Sv = λv. It’s a vector which, if pre-multiplied by a vector, gets you the matrix back again (a property called idempotency).
When S is a correlation matrix (or sometimes a covariance matrix), the eigenvectors of S are referred to as factor loadings

9. Correspondence Analysis

Step 1. normalization of the data. Square root transformation.

Hij = fij/sqrt(fi.)*sqrt(f.j)

Step 2. svd of normalized matrix H = UDV’

Step 3. rescale the Us and Vs. this part varies.

Xik = Uik/sqrt(f../fi.)

Yjk = Vjk/sqrt(f../f.j)

Fij = fi.f.j/f..(1 + sumk(dk*xik*yjk)

Chisq/n = Sum(dk) for k > 1 (exclude trivial first factor).