Multivariate Statistics: Concepts, Models, and Applications
David W. Stockburger

# Linear Transformations

### THE GENERAL CASE

Most multivariate statistical methods are built on the foundation of linear transformations. A linear transformation is a weighted combination of scores where each scores is first multiplied by a constant and then the products are summed. In its most general form, a linear transformation appears as follows:

Xi' = w0 + w1X1i + w2X2i + … + wkXki

where K is the number of different scores for each subject and X' is the linear combination of all the scores for a given individual.

A linear transformation combines a number of scores into a single score. A linear transformation is useful in that it is cognitively simpler to deal with a single number, the transformed score, than it is to deal with many numbers individually. For example, suppose a statistics teacher had records of absences (X1) and number of missed homework assignments (X2) during a semester for six students (N=6).

 absences missed assignments Student (i) X1i X2i 1 10 2 2 12 4 3 8 3 4 14 6 5 0 0 6 4 2

The teacher wishes to combine these separate measures into a single measure of student tardiness. The teacher could just add the two numbers together, with implied weights of one for each variable. This solution is rejected, however, as more weight would be given to absences than missed assignments because absences have greater variability. The solution, the teacher decides, is to take the sum of one-half of the absences and twice the missed homework assignments. This would result in a linear transformation of the following form:

Xi' = w0 + w1X1i + w2X2i

where: w0 = 0, w1 = .5, and w2 = 2 giving

Xi' = .5X1i + 2X2i

Application of this transformation to the first subject's scores would result in the following:

Xi' = .5X1i + 2X2i

Xi' = .5*10 + 2*2 = 5 + 4 = 9

The following table results when the linear transformation is applied to all scores for each of the six students:

 absences missed assignment tardiness Student (i) X1i X2i X'i 1 10 2 9 2 12 4 14 3 8 3 10 4 14 6 19 5 0 0 0 6 4 2 6 Mean 8 2.833 9.667 Sd. 5.215 2.041 6.532 Var. 27.196 4.166 42.665

As can be seen, student number 4 has the largest measure of tardiness with a score of 19.

### The Mean and Variance of the Transformed Scores

As in the section on simple linear transformations, the mean and standard deviation of the transformed scores are related to the mean and standard deviation of the scores combined to create the transformed score. In addition, when transforming more than a single score into a combined score, the correlation coefficient between the scores effects the size of the resulting transformed variance and standard deviation. The formulas that describe the relationship between the means, standard deviations, and variances of the scores are presented below:

' = w0 + w1 1 + w2 2

sx'2 = w12 s12 + w22 s22 + 2w1w2s1s2r12

Application of these formulas to the example data results in the following, where the correlation (r12)between the X1i and X2i is .902:

' = w0 + w1 1 + w2 2

' = 0 + .5*8 + 2*2.833 = 4 + 5.66 = 9.66

sx'2 = w12 s12 + w22 s22 + 2*w1w2s1s2r12

sx'2 = .52*5.2152 + 22*2.0412 + 2*.5*2*5.215*2.041*.902

= .25*27.196 + 4*4.166 + 19.202 = 42.665

Note that the values computed with these formulas agree with the actual values presented in an earlier table.

When combining two variables in a linear transformation the variance of the transformed scores is a function of the variances of the individual variables and the correlation between the variables. A number of possibilities exist, depending upon sign of the correlation coefficient and the signs of the weights.

1. If the correlation between the variables is zero, then the variance of the transformed score will be a weighted sum of the variances of the individual scores.
2. sx'2 = w12 s12 + w22 s22

3. If the correlation between the variables is positive then, then the resulting variance will be greater than the weighted sum of the individual variances if both weights are positive or both are negative, otherwise the variance will decrease.
4. If the correlation between the individual scores is negative, then the resulting variance will be less than the weighted sum of the individual variances if both weights are positive or both are negative, otherwise the variance will increase.

### GRAPHIC REPRESENTATIONS OF TRANSFORMATIONS

The pairs of data may be represented as points on a scatter plot. The linear transformation, defined by the equation Xi' = w0 + w1X1i + w2X2I, may be represented as a line on the scatter plot. For example, the six pairs of example data appear as six points on the following scatter plot.

The linear transformation, Xi' = .5X1i + 2X2i, appears as a straight line on this scatter plot. Plotting two points that fall on the line and then using a straight edge to connect the points draws the line. For example, when X1 = 0 then X2 = 0 because

0 = .5X1i + 2X2i

0 = .5*0 + 2X2i

0 = 2X2i

0 = X2i

In a similar manner, if X1 = 4, then X2 = -1

0 = .5X1i + 2X2i

0 = .5*4 + 2X2i

0 = 2 + 2X2i

-2 = 2X2i

-1 = X2i

Solving for X2i in terms of X1i

0 = .5X1i + 2X2i

2X2i = -.5X1i

X2i = -.5X1i/2 = -.25X1i

In general where

Xi' = w0 + w1X1i + w2X2i

X2i = - (w0 / w2) - (w1/w2) X1i

This line is drawn on the example scatter plot below.

A line is drawn perpendicular to the line that defines the transformation. In the example below, the perpendicular line is shown as the green line on the graph. The points on the graph are then projected onto the perpendicular line by drawing a lines parallel to the one that defined the transformation. On the example below, the red numbers written next to the line (0, 6, 9, 10, 14, 19) indicate the resulting transformations. Note the relative spacing between the numbers is similar to the difference between the numbers. For example, the distance between the points 0 and 6 is twice that between 6 and 9.

In general, if X2i = - (w0 / w2) - (w1/w2) X1I then a perpendicular results when X2i = (w2/w1) X1i. In this case, the perpendicular line is defined by

X2i = (w2/w1) X1i = (2/.25) X1i = 4X1i

when X1i = -1, X2i = -4

and when X1i = 1, X2i = 4

### Similar Transformations as Multiples of Weights

Suppose a second transformation is selected that takes the form Xi' = 1.5 X1i + 6 X2i. The transformed values are presented in the table below. Note that the new values of both w1 and w2, w1* and w2*, are three times the values of the original weights. The transformed scores and resulting mean and standard deviation are all three times the size of the first transformation.

 Student (i) X1i X2i X'i 1 10 2 27 2 12 4 52 3 8 3 30 4 14 6 57 5 0 0 0 6 4 2 18 Mean 8 2.833 29 Sd. 5.215 2.041 19.596

The line defining the transformation is similar to the line defining the previous transformation.

X2i = - (w0 / w2) - (w1/w2) X1i

X2i = - (0/6) - (1.5/6) X1i

X2i = - .25 X1i

As before, plotting two points and then using a straight edge to connect the points draws the line defining the transformation. For example, when X1 = 0 then X2 = 0, and when X1 = 4, X2 = -1

The line defining this transformation, then, is identical to the line defining the first transformation. In a like manner, drawing a line perpendicular to this line and projecting the points onto this line results in an identical figure. The only difference is the numbers assigned to the projected values. These numbers, as noted earlier, are three times those of the first transformation.

In some ways, then, all transformations where the weights are both multiples of some other transformation are similar. The correlation coefficient between the resulting values of the first and second transformations is 1.0. In general, if w1/w2 = w1*/w2*, then the transformations are similar except for a multiplicative constant.

### The Constant Term

Adding a constant term (w0) to the transformation changes the origin of the perpendicular line, shifting it up or down depending upon whether the value of w0 is positive or negative. The differences between transformed numbers remain unchanged. For example, application of the transformation Xi' = -8 + .5X1i + 2X2i results if the following table.

 Student (i) X1i X2i X'i 1 10 2 1 2 12 4 6 3 8 3 2 4 14 6 11 5 0 0 -8 6 4 2 -2 Mean 8 2.833 1.667 Sd. 5.215 2.041 6.532

The individual of this transformation are eight less than those of the first transformation. The mean is eight less than the mean of the first transformation, while the standard deviation remains unchanged. This agrees in principle with the effect of the additive component in a simple linear transformation.

When this line is drawn on the scatter plot, it appears parallel to the line defining the first transformations. Note, however, that it no longer passes through the origin. The projections of the points on a line perpendicular to the line defining the transformation preserve the relative distances between the projected points of earlier transformations. The only real difference is the intercept of the line defining the transformation.

### Another Example Transformation

The final transformation that will be examined in this section involves different  weights. Suppose a linear transformation with weights w1 = 2 and w2 = 1 was done on the example data. The transformation would be defined with the equation Xi' = 2X1i + X2i, and would result in the following table.

 Student (i) X1i X2i X'i 1 10 2 22 2 12 4 28 3 8 3 19 4 14 6 34 5 0 0 0 6 4 2 10 Mean 8 2.833 18.833 Sd. 5.215 2.041 12.303 Var. 27.196 4.166 151.353

Verifying that the mean and standard deviation of the transformed scores are correct results in the following computations.

' = w0 + w1 1 + w2 2

' = 2*8 + 1*2.833 = 18.833

sx'2 = w12 s12 + w22 s22 + 2w1w2s1s2r12

sx'2 = 22*5.2152 + 12*2.0412 + 2*2*1*5.215*2.041*.902

sx'2 = 151.353 = 12.3032

The numbers agree, within rounding error.

Using the procedures outlined above, the line defining the transformation may be found.

X2i = - (w0 / w2) - (w1/w2) X1i

X2i = - (0/1) - (2/1) X1i = - 2 X1i

Such that when X1 = 0, X2 = 0 and when X1 = 1, X2 = -2

The perpendicular line is drawn as

X2i = (w2/w1) X1i = 1/ 2 X1i = .5 X1i

Using these equations, the lines and projections may be drawn on the scatter plot.

Note that the line perpendicular to the line defining the transformation seems to "pass through" the points to a much greater extent than the first transformations. Note also that the variance of the resulting points has increased.

Statisticians are interested in the linear transformation that maximizes the obtained variance. It is obvious, however, that increasing the size of the transformation weights will arbitrarily increase the variance of the obtained transformed scores. In order to control for this artifact, the scores will first be mean centered and the transformation weights will be normalized.

### MEAN CENTERED TRANSFORMATIONS

A linear transformation is called a mean centered transformation if the mean is subtracted from the scores before the linear transformation is done. Mean centering basically allows a cleaner view of the data. The following table presents the results of mean centering the example data and applying the transformation Xi' = .5X1i + 2X2i.

 Score X1 X1 - 1 X2 X2 - 2 X' 1 10 2 2 -.833 -.667 2 12 4 4 1.167 4.333 3 8 0 3 .167 .333 4 14 6 6 3.167 9.333 5 0 -8 0 -2.833 -9.667 6 4 -4 2 -.833 -3.667 Mean 8 0 2.833 0 0 s.d. 5.215 5.125 2.041 2.041 6.531 Variance 27.2 27.2 4.167 4.167 42.650

### NORMALIZED LINEAR TRANSFORMATIONS

A linear transformation is said to be normalized if the sum of the squared transformation weights is equal to one, not including w0. In the case of two variables, any transformation where w12 + w22 = 1 would be a normalized linear transformation. For example, the linear transformation X'i = .8X1i + .6X2i would be a normalized linear transformation because w12 + w22 = .82 + .62 = .64 + .36 = 1.

Any linear transformation may be normalized by applying the following formula to its weights (reference)

For example, the transformation X' = .5X1 + 2X2 could be normalized by transforming the weights to values of

Note that w1'2 + w2'2 = .24252 + .97012 = .0588 + .9411 = .9999 and -w1/w2 = -.5/2 =-.25 = -w1'/w2' = -.2425/.9701 = -.25. The first result implying that the transformation is a normalized linear transformation and the second implying that the same line defines both transformations.

The advantages of mean centering and normalizing a linear transformation include:

1. The transformed values will be measured on the same scale as the original variables. In other words, the units of measurement on the projection line will not shrink or grow as they did in the previous examples. The units will remain constant and will be the same as both the x and y axes.
2. The normalized weights are the sine and cosine of the line defining the transformation (reference).

### TWO SIMULTANEOUS NORMALIZED LINEAR TRANSFORMATIONS

Given that a normalized linear transformation, Xi' = w1X1i + w2X2i, has been defined, there exists a second normalized linear transformation, Xi'' = w1'X1i + w2'X2i, such that w1' = -w2 and w2' = w1. A line that is perpendicular to the line defined by the first normalized transformation will define this second normalized transformation.

A line perpendicular to Xi' = w1X1i + w2X2i is defined by X2i = (w2/w1) X1i. The second normalized linear transformation is defined by X2i = -(w1'/w2')X1i, but w1' = -w2 and w2' = w1, thus the second normalized linear transformation is defined by X2i = (w2/w1) X1i, a line perpendicular to the line defining the first normalized linear transformation.

In addition, the sum of the transformed variances will be equal to the sum of the variances of the untransformed scores.

For example, application of the normalized transformation X' = w1X1 + w2X2 = .8X1 + .6X2 and X'' = w'1X1 + w'2X2 = -.6X1 + .8X2 to the mean centered example data results in the following table.

 Score X1 - 1 X2 - 2 X' X'' 1 2 -.833 1.10 -1.866 2 4 1.167 3.90 -1.466 3 0 .167 .10 .134 4 6 3.167 6.70 -1.066 5 -8 -2.833 -8.10 2.534 6 -4 -.833 -3.70 1.737 Mean 0 0 0 0 s.d. 5.125 2.041 5.303 1.801 Variance 27.2 4.167 28.124 3.245

Note that both transformations are normalized as w12 + w22 = .82 + .62 = .64 + .36 = 1.00 and w'12 + w'22 = .62 + (-.8)2 = .36 + .64 = 1.00. Note also that the sum of the variances of the untransformed variables (s12 + s22 = 27.2 + 4.167 = 31.367) is equal to the sum of the variances of the transformed variables (s'2 + s''2 = 28.124 + 3.245 = 31.369), at least within rounding error.

The sum of the transformed variances must always equal the sum of the untransformed variances as the following proves.

Where X' = w1X1 + w2X2, X'' = -w'2X1 + w'1X2, and w12 + w22 = 1.00

s'2 + s''2

(w12 s12 + w22 s22 + 2w1w2s1s2r12) + ((-w2)2s12 + w12 s22 + 2(-w2)w1s1s2r12)

w12 s12 + w22 s22 + 2w1w2s1s2r12 + w22s12 + w12 s22 - 2w2w1s1s2r12

w12 s12 + w22 s22 + w22s12 + w12 s22

w12 s12 + w22s12 + w22 s22+ w12 s22

(w12 + w22) s12 + (w22 + w12) s22

s12 + s22

As always, if you are unable (or unwilling) to follow the proofs, you must "believe."

### VISUALIZING NORMALIZED LINEAR TRANSFORMATIONS

The two transformations presented above may be visualized in a manner similar to that described earlier. Since the transformations have been normalized, the "scales" on the transformed axes (the blue and green lines) are identical to the scales of the original axes. Conceptually, the axes are "rotated" and the points are "projected" onto the new axes.

It appears the variance of X' might be increased if the axes were rotated clockwise even further than the present transformation. At some point the variance would begin to grow smaller again. Obtaining transformation weights that optimize variance is the problem that the next section addresses.

### EIGENVALUES AND EIGENVECTORS

It was proved earlier that the total variability is unchanged when normalized transformations are done on mean centered data. It was also demonstrated that the distribution of variability changed, that is, X' had greater variance than X''. Mathematically, the question can be asked, "can a transformation be found such that one variable has a maximal amount of variance and the other has a minimal amount of variance?" Obviously the answer is "Yes" or I wouldn't have asked the question. Optimizing linear transformations such that transformed variables contain a maximal amount of variability is the fundamental problem addressed by eigenvalues and eigenvectors.

Eigenvalues are the variances of the transformations when an optimal (maximal variance) linear transformation has been found. Eigenvectors are the transformation weights of optimal linear transformations.

Mathematical procedures are available to compute eigenvalues and eigenvectors and will be presented shortly. Before these methods are presented, however, a manual method using an interactive computer exercise will be discussed.

### USING THE TRANSFORMATION PROGRAM TO FIND APPROXIMATE EIGENVALUES AND EIGENVECTORS

The transformation program has been modified by reducing the data pairs to six and rescaling the axes. After clicking on the "Enter Own Data" button, the first step is to enter the mean centered data. After entering the data, click on the "Compute Own Data" button. The means and variances of the data will appear in the appropriately labeled boxes.

In addition, the following scatter plots, controls, and text boxes will appear. Note that the variances of the transformed variables (X*1 and X*2) are the same as the original variables (X1 and X2). The weights are set at values w1=1 and w2=0 so that the transformed axes are identical to the original axes.

The weights can be changed two different ways. Clicking on the large area of the scroll bar causes a fairly large change in the transformation weights.

Clicking on the triangles on either end causes a small change in the transformation weights.

In either case, the variances of the transformed scores are recomputed and displayed. The points on the scatter plot on the left remain unchanged, but the axes are rotated to display the lines defined by the transformations. The scatter plot on the right displays the plot of the transformed scores.

The goal is to adjust the axes so that the variance of one of the transformed variables is maximized and the other is minimized. This can be accomplished by first changing the weights with fairly large steps. The variance will continue increasing until a certain point has been reached. At this point begin using smaller steps. Continue until the variance begins to decrease. Because of what I believe is rounding error, the program sometimes behaves badly at this level. Be sure to continue in both directions for a number of small steps before deciding that a maximum and minimum variance has been found.

Note that the program automatically normalizes the transformation weights and the sum of the variances remains a constant, no matter what weights are used.

In the example data, the adjustments to the weights were continued until the values in the display were found. Note that the axes pass through the points in the direction that most students intuitively believe is the position of the regression line (it isn't). In this case the eigenvalues would be 30.67 and .69. The two pairs of eigenvectors would be (.936, .352) and (.352, -.936). The transformations corresponding to the above values are presented below.

 Score X1 - 1 X2 - 2 X' X'' 1 2 -.833 1.579 1.48 2 4 1.167 4.155 .316 3 0 .167 .059 -.156 4 6 3.167 6.731 -.864 5 -8 -2.833 -8.485 -.164 6 -4 -.833 -4.037 -.628 Mean 0 0 0 0 s.d. 5.125 2.041 5.538 .835 Variance 27.2 4.167 30.672 .696

### USING SPSS/WIN FACTOR ANALYSIS TO FIND EIGENVALUES AND EIGENVECTORS

It should come as no surprise to the student that mathematical procedures have been developed to find exact eigenvalues and eigenvectors of both this relatively simple case of two variables and far more complicated situations involving linear combinations of many variables. The procedures involve matrix algebra and are beyond the scope of this text. The interested reader will find a much more complete and mathematical treatment in Johnson and Wickren, 1996.

Eigenvalues and eigenvectors can be found using the Factor Analysis package of SPSS/WIN. Starting with the raw data as variables in a data matrix, the next step is to click on "Statistics", "Data Reduction", and "Factor" in that order. The display should appear as follows:

The program will then display the choices associated with the Factor Analysis package. Select the variables that are to be included in the analysis and click them to the right-hand box. At this point some of the default values associated with the "Extraction" button will need to be modified, so clicking on this button gives the following choices:

Checking the "Covariance matrix" will result in the analysis of raw data rather than standardized scores. In addition, the computer will be told that 2 "factors" will be extracted, rather than allowing the computer to automatically decide how many "factors" will be extracted. Be sure that the "Principal components" is the selected method for factor extraction. Click on "Continue" and the main factor analysis selections should reappear. Click the "Scores" button to modify the output to print tables that will allow the computation of the eigenvectors.

Click on the "Display factor score coefficient matrix" option and then click on "Continue." Back in the main factor analysis display, click on the "OK" button to run the program.

The eigenvalues appear in an output table labeled "Total Variance Explained." Note that the values of 30.676 and .690 closely correspond to what was found by manually rotating the axes.

The eigenvectors do not appear directly on the table. They may be computed by normalizing the "Raw Components" in the following "Component Matrix" table.

While not exact, these values are within rounding error of the values found using the manual approximation procedure. The student may verify that the "Raw Components" for "2" correspond to the second normalized eigenvector.

### APPLICATIONS OF LINEAR TRANSFORMATIONS

Linear transformations are used to simplify the data. In general, if the same amount of information (in this case variance) can be explained by fewer variables, the interpretation will generally be simpler.

Linear transformations are the cornerstone of multivariate statistics. In multiple regression linear transformations are used to find weights that allow many independent variables to predict a single dependent variable. In canonical correlation, both the dependent and independent variables have two or more variables and the goal of the analysis is to find a linear combination of the independent variables which best predicts a linear combination of the dependent variables.

Analysis of Variance is a special case of multiple regression where the independent variable is "dummy coded." Multivariate Analysis of Variance is a special case of canonical correlation where the independent variable(s) are also "dummy coded."

Factor analysis is similarly a linear transformation of many variables. The goal in factor analysis is a description of the variables, rather than prediction of a variable or set of variables. In factor analysis, a combination of weights is selected (extracted), usually with some goal, such as maximizing the variance of the transformed score or maximizing the correlation of the transformed score with all the scores that produce it. In factor analysis, a second combination of weights is then selected which meets the goal of the analysis. This process could continue until the number of transformed variables equals the number of original variable, but usually does not because after a few meaningful transformations, the rest do not make much sense and are discarded. The goal of factor analysis is to explain a set of variables with a few transformed variables