Contents - Index


TOOLS >SCALING/DECOMPOSITION > CORRESPONDENCE
   

PURPOSE Perform a correspondence analysis of a single real-valued matrix.

DESCRIPTION Given a non-negative, n-by-m matrix with n ³ m, this routine represents the n rows and m columns as vectors in a common multidimensional space.  The algorithm essentially performs a singular value decomposition of an adjusted data matrix in which rows and columns have been separately normalized to yield more equal marginals.

PARAMETERS
Input dataset:
Name of file containing matrix to be analyzed, it must have at least as many rows as columns (otherwise transpose the matrix then resubmit). Data type: Matrix.

How to scale row and column scores: (Default = COORDINATES)
Choices are:

Coordinates - Scores for each point on each dimension adjusted both for point marginals and dimension weights (eigenvalues).

CGS - According to Carroll-Green-Schaffer, this transformation makes distance between a row and a column just as interpretable as distance between a row and a row or a column and a column.

Optimal - Scores for each point are corrected for point marginals, but not dimension weights.

Axes - No rescaling is performed.

Number of factors to save: (Default = 3)
Maximum value of r, the number of eigenvectors used to decompose the matrix.

Reconstruct matrix from factors: (Default = No)
If YES, the row and column scores are combined to approximate the data matrix with r eigenvectors (see 'Number of factors to save', above).  The result is the best possible approximation of X using matrices of rank r based on a least squares criterion.

Keep the trivial first factor: (Default = No)
The Normalization step prior to singular value decomposition causes first eigenvector to be constant.  If Yes, this factor is retained and eigenvalue percentages include it.  If No, the factor is dropped and eigenvalue percentages do not include it.

 
(Output) File to contain row scores: (Default = 'CorrespondenceRScores')
Name of dataset to contain coordinates of row points.

(Output) File to contain column scores: (Default = 'CorrespondenceCScores')
Name of dataset to contain coordinates of column points.

(Output) File to contain singular values: (Default = 'CorrespondenceEigen')
Name of dataset to contain eigenvalue of each dimension.

(Output) File to contain reconstructed matrix: (Default = CorrespondenceRecon')
Name of dataset to contain the approximated data matrix (if any).

(Output) File to contain combined row/column scores: (Default = 'CorrespondenceRscores')
Name of dataset to contain concatenated row and column scores to produce single (m+n)-by-r matrix (useful for plotting row and column scores on same map).


LOG FILE The output consists of a log file and a scatterplot viewed in the scatter plot viewer. The viewer gives a 2D scatterplot of the first pair of co-ordinates, the x-axis is the row co-ordinate set and the y-axis is the column. If the number of factors selected was greater than two then any pair can be plotted by selecting them in the x-axis and y-axis drop down boxes. If the dataset had multiple levels then other levels can be viewed by selecting the required level in the matrix box. The scatterplot can be saved or printed and previously stored plots can also be opend. The labels can be turned on or off or resized using the options on the right hand side. Clicking on the arrows allow the plot to be flipped in the horizontal or vertical axis. The points can also be turned on or off. The label positions can be moved away from the points in an upward direction by increasing the value in the box headed Label Pos:, the margins can also be increased or decreased and the plots can be centred if required. Clicking on the axes scales displays the values on the x and y axes. Individual points (with their labels) can be moved by left clicking and then dragging them to the required position. The original plot positions can be restored by clicking the R button. the text log file is a  numeric display of coordinates of each point in space together with information about the stress.  
 
The log file has a numeric display of the singular values together with the coordinates (singular vectors) for rows and columns.  

TIMING O(N^3).

COMMENTS See the SVD routine for more information.
This routine only gives a plot if the regional settings are set to UK or USA. If you do not have these regional settings and do not get a plot then change them in the settings control panel on your machine.


REFERENCES None.