Multiple regression is an extension of

With two independent variables the prediction of Y is expressed by the following equation:

Y'_{i} = b_{0} + b_{1}X_{1i} + b_{2}X_{2i}

Note that this transformation is similar to the linear transformation of two variables discussed in the previous chapter except that the w's have been replaced with b's and the X'_{i} has been replaced with a Y'_{i}.

The "b" values are called regression weights and are computed in a way that minimizes the sum of squared deviations

in the same manner as in simple linear regression. The difference is that in simple linear regression only two weights, the _{0}) and _{1}), were estimated, while in this case, three weights (b_{0}, b_{1}, and b_{2}) are estimated.

The data used to illustrate the inner workings of

Y_{1} | Y_{2} | X_{1} | X_{2} | X_{3} | X_{4} |

125 | 113 | 13 | 18 | 25 | 11 |

158 | 115 | 39 | 18 | 59 | 30 |

207 | 126 | 52 | 50 | 62 | 53 |

182 | 119 | 29 | 43 | 50 | 29 |

196 | 107 | 50 | 37 | 65 | 56 |

175 | 135 | 64 | 19 | 79 | 49 |

145 | 111 | 11 | 27 | 17 | 14 |

144 | 130 | 22 | 23 | 31 | 17 |

160 | 122 | 30 | 18 | 34 | 22 |

175 | 114 | 51 | 11 | 58 | 40 |

151 | 121 | 27 | 15 | 29 | 31 |

161 | 105 | 41 | 22 | 53 | 39 |

200 | 131 | 51 | 52 | 75 | 36 |

173 | 123 | 37 | 36 | 44 | 27 |

175 | 121 | 23 | 48 | 27 | 20 |

162 | 120 | 43 | 15 | 65 | 36 |

155 | 109 | 38 | 19 | 62 | 37 |

230 | 130 | 62 | 56 | 75 | 50 |

162 | 134 | 28 | 30 | 36 | 20 |

153 | 124 | 30 | 25 | 41 | 33 |

The example data can be obtained as a text file and as an SPSS data.

If a student desires a more concrete description of this data file, meaning could be given the variables as follows:

Y_{1} - A measure of success in graduate school.

X_{1} - A measure of intellectual ability.

X_{2} - A measure of "work ethic."

X_{3} - A second measure of intellectual ability.

X_{4} - A measure of spatial ability.

Y_{2} - Score on a major review paper.

The first step in the analysis of

The second step is an analysis of bivariate relationships between variables. This can be done using a

In the case of the example data, it is noted that all X variables correlate significantly with Y_{1}, while none correlate significantly with Y_{2}. In addition, X_{1} is significantly correlated with X_{3} and X_{4}, but not with X_{2}. Interpreting the variables using the suggested meanings, success in graduate school could be predicted individually with measures of

A visual presentation of the scatter plots generating the correlation matrix can be generated using the

These graphs may be examined for multivariate outliers that might not be found in the

The results are less than satisfactory. In the three representations that follow, all scores have been standardized. The rotating 3D graph below presents X_{1}, X_{2}, and Y_{1}.

The graph below presents X_{1}, X_{3}, and Y_{1}.

The graph below presents X_{1}, X_{4}, and Y_{2}.

The formulas to compute the

The multiple regression is done in SPSS by selecting

In the first analysis, Y_{1} is the dependent variable and two independent variables are entered in the first block, X_{1} and X_{2}. In addition, under the "Save..." option, both unstandardized predicted values and unstandardized residuals were selected.

The output consists of a number of tables. The "Coefficients" table presents the optimal weights in the regression model, as seen in the following.

Recalling the prediction equation, Y'_{i} = b_{0} + b_{1}X_{1i} + b_{2}X_{2i}, the values for the weights can now be found by observing the "B" column under "_{0} = 101.222, b_{1} = 1.000, and b_{2} = 1.071, and the regression equation appears as:

Y'_{i} = 101.222 + 1.000X_{1i} + 1.071X_{2i}

The "_{0} is always 0 and not included in the regression equation. The equation and weights for the example data appear below.

Z_{Y} = b _{1} Z_{X1} + b _{2} Z_{X2}

Z_{Y} = .608 Z_{X1} + .614 Z_{X2}

The standardization of all variables allows a better comparison of regression weights, as the unstandardized weights are a function of the variance of both the Y and the X variables.

The values of Y_{1i} can now be predicted using the following linear transformation.

Y'_{1i} = 101.222 + 1.000X_{1i} + 1.071X_{2i}

Thus, the value of Y_{1i} where X_{1i} = 13 and X_{2i} = 18 for the first student could be predicted as follows.

Y'_{11} = 101.222 + 1.000X_{11} + 1.071X_{21}

Y'_{11} = 101.222 + 1.000 * 13 + 1.071 * 18

Y'_{11} = 101.222 + 13.000 + 19.278

Y'_{11} = 133.50

The scores for all students are presented below, as computed in the data file of SPSS. Note that the predicted Y score for the first student is 133.50. The predicted Y and residual values are automatically added to the data file when the

The difference between the observed and predicted score, Y-Y ', is called a ^{2} may be computed in SPSS by squaring the residuals using

The analysis of residuals can be informative. The larger the residual for a given observation, the larger the difference between the observed and predicted value of Y and the greater the error in prediction. In the example data, the regression under-predicted the Y value for observation 10 by a value of 10.98, and over-predicted the value of Y for observation 6 by a value of 10.60. In some cases the analysis of errors of prediction in a given model can direct the search for additional independent variables that might prove valuable in more complete models.

The residuals are assumed to be normally distributed when the testing of hypotheses using analysis of variance (R^{2} change). Although analysis of variance is fairly robust with respect to this assumption, it is a good idea to examine the distribution of residuals, especially with respect to

The

The value of R can be found in the "Model Summary" table of the SPSS output. In the case of the example data, the value for the multiple R when predicting Y_{1} from X_{1} and X_{2} is .968, a very high value.

The ^{2} ) is also called the * coefficient of determination*. It may be found in the

The adjustment in the "

The

The difference between this formula and the formula presented in an earlier chapter is in the denominator of the equation. In both cases the denominator is N - k, where N is the number of observations and k is the number of parameters which are estimated to find the predicted value of Y. In the case of simple linear regression, the number of parameters needed to be estimated was two, the intercept and the slope, while in the case of the example with two independent variables, the number was three, b_{0}, b_{1}, and b_{2}.

The computation of the standard error of estimate using the definitional formula for the example data is presented below. The numerator, or ^{2} column.

Note that the value for the standard error of estimate agrees with the value given in the output table of SPSS.

The _{1} and X_{2} are entered in the first block when predicting Y_{1} appears as follows.

Because the _{1} and X_{2} significantly predicted Y_{1}. As described in the chapter on testing hypotheses using regression, the _{i} = b_{0} and Y'_{i} = b_{0} + b_{1}X_{1i} + b_{2}X_{2i}. The _{1} and b_{2}, were computed.

The following table illustrates the computation of the various sum of squares in the example data.

Note that this table is identical in principal to the table presented in the chapter on testing hypotheses in regression.

When more terms are added to the regression model, the

A minimal model, predicting Y1 from the mean of Y1 results in the following.

Y'_{i} = b_{0}

Y'_{i} = 169.45

A partial model, predicting Y_{1} from X_{1} results in the following model.

Y'_{i} = b_{0} + b_{1}X_{1i}

Y'_{i} = 122.835 + 1.258 X_{1i}

A second partial model, predicting Y_{1} from X_{2} is the following.

Y'_{i} = b_{0} + b_{2}X_{2I}

Y'_{i} = 130.425 + 1.341 X_{2i}

As established earlier, the full regression model when predicting Y_{1} from X_{1} and X_{2} is

Y'_{i} = b_{0} + b_{1}X_{1i} + b_{2}X_{2i}

Y'_{i} = 101.222 + 1.000X_{1i} + 1.071X_{2i}

As can be observed, the values of both b_{1} and b_{2} change when both X_{1} and X_{2} are included in the regression model. The size and effect of these changes are the foundation for the

The unadjusted R^{2} value will increase with the addition of terms to the regression model. The amount of change in R^{2} is a measure of the increase in predictive power of the independent variable or variables, given the independent variable or variables already in the model. For example, the effect of work ethic (X_{2}) on success in graduate school (Y_{1}) could be assessed given one already has a measure of intellectual ability (X_{1}.) The following table presents the results for the example data.

^{2} and R^{2} change in successive models.

Variables in Equation | R^{2} | Increase in R^{2} |
---|---|---|

None | 0.00 | - |

X_{1} | .584 | .584 |

X_{1}, X_{2} | .936 | .352 |

A similar table can be constructed to evaluate the increase in predictive power of X_{3} given X_{1} is already in the model.

Variables in Equation | R^{2} | Increase in R^{2} |

None | 0.00 | - |

X_{1} | .584 | .584 |

X_{1}, X_{3} | .592 | .008 |

As can be seen, although both X_{2} and X_{3} individually correlate significantly with Y_{1}, X_{2} contributes a fairly large increase in predictive power in combination with X_{1}, while X_{3} does not. Because X_{1} and X_{3} are highly correlated with each other, knowledge of one necessarily implies knowledge of the other. In regression analysis terms, X_{2} in combination with X_{1} predicts _{1}, while X_{3} in combination with X_{1} predicts

It is possible to do ^{2}. This significance test is the topic of the next section.

In order to test whether a variable adds significant predictive power to a regression model, it is necessary to construct the regression model in stages or blocks. This is accomplished in _{2} after X_{1} has been entered in the model was desired, then X_{1} would be entered in the first block and X_{2} in the second block. The following demonstrates how to construct these sequential models. The figure below illustrates how X_{1} is entered in the model first.

The next figure illustrates how X_{2} is entered in the second block.

In order to obtain the desired hypothesis test, click on the "Statistics..." button and then select the "

The additional output obtained by selecting these option include a model summary,

an ANOVA table,

and a table of coefficients.

The only new information presented in these tables is in the model summary and the "^{2} change for model 2. In this case the change is statistically significant. It could be said that X_{2} adds significant predictive power in predicting Y_{1} after X_{1} has been entered into the regression model.

Conducting a similar hypothesis test for the increase in predictive power of X_{3} when X_{1} is already in the model produces the following model summary table.

Note that in this case the change is not significant. The table of coefficients also presents some interesting relationships.

Note that the "Sig." level for the X_{3} variable in model 2 (.562) is the same as the "Sig. F Change" in the preceding table. The interpretation of the "Sig." level for the "Coefficients" is now apparent. It is the significance of the addition of that variable given all the other independent variables are already in the regression equation. Note also that the "_{1} in Model 2 is .039, still significant, but less than the significance of X_{1} alone (Model 1 with a value of .000). Thus a variable may become "less significant" in combination with another variable than by itself.

The regression equation, Y'_{i} = b_{0} + b_{1}X_{1i} + b_{2}X_{2i}, defines a _{1} and X_{2}, all the points would fall on a _{1}, X_{2}) pairs of data, plotting these points in a three-dimensional space, and then fitting a plane through the points in the space. The plane is represented in the three-dimensional

The residuals can be represented as the distance from the points to the plane parallel to the Y-axis.

Graphically, multiple regression with two independent variables fits a plane to a three-dimensional scatter plot such that the sum of squared residuals is minimized. The multiple _{1} predicted by X_{1} and X_{2}.

A similar relationship is presented below for Y_{1} predicted by X_{1} and X_{3}.

While humans have difficulty visualizing data with more than three dimensions, mathematicians have no such problem in mathematically thinking about with them. When dealing with more than three dimensions, mathematicians talk about fitting a

With three variable involved, X_{1}, X_{2}, and Y, many varieties of relationships between variables are possible. It will prove instructional to explore three such relationships.

In this example, both X_{1} and X_{2} are correlated with Y, and X_{1} and X_{2} are **uncorrelated** with each other. In the example data, X_{1} and X_{2} are correlated with Y_{1} with values of .764 and .769 respectively. The independent variables, X_{1} and X_{2}, are correlated with a value of .255, not exactly zero, but close enough. In this case X_{1} and X_{2} contribute independently to predict the variability in Y. It doesn't matter much which variable is entered into the regression equation first and which variable is entered second.

The following table of R square change predicts Y_{1} with X_{1} and then with both X_{1} and X_{2}.

The next table of _{1} with X_{2} and then with both X_{1} and X_{2}.

The value of R square change for X_{1} from Model 1 in the first case (.584) to Model 2 in the second case (.345) is not identical, but fairly close. If the correlation between X_{1} and X_{2} had been 0.0 instead of .255, the R square change values would have been identical.

Because of the structure of the relationships between the variables, slight changes in the

In this case, both X_{1} and X_{2} are correlated with Y, and X_{1} and X_{2} are **correlated** with each other. In the example data, X_{1} and X_{3} are correlated with Y_{1} with values of .764 and .687 respectively. The independent variables, X_{1} and X_{3}, are correlated with a value of .940. In this situation it makes a great deal of difference which variable is entered into the regression equation first and which is entered second.

Entering X_{1} first and X_{3} second results in the following R square change table.

Entering X_{3} first and X_{1} second results in the following R square change table.

As before, both tables end up at the same place, in this case with an R^{2} of .592. In this case, however, it makes a great deal of difference whether a variable is entered into the equation first or second. Variable X_{3}, for example, if entered first has an R square change of .561. If entered second after X_{1}, it has an

As two independent variables become more highly correlated, the solution to the optimal regression weights becomes unstable. This can be seen in the rotating scatter plots of X_{1}, X_{3}, and Y_{1}. The plane that models the relationship could be modified by rotating around an axis in the middle of the points without greatly changing the degree of fit. The solution to the regression weights becomes unstable. That is, there are any number of solutions to the regression weights which will give only a small difference in sum of squared residuals. This is called the problem of * multicollinearity* in mathematical vernacular.

One of the many varieties of relationships occurs when neither X_{1} nor X_{2} individually correlates with Y, X_{1} correlates with X_{2}, but X_{1} and X_{2} together correlate highly with Y. This phenomena may be observed in the relationships of Y_{2}, X_{1}, and X_{4}. In the example data neither X_{1} nor X_{4} is highly correlated with Y_{2}, with _{1} and X_{4} are correlated with a value of .847. Fitting X1 followed by X4 results in the following tables.

In this case, the _{1} and X_{4} are significant when entered together, but insignificant when entered individually. It is also noted that the regression weight for X_{1} is positive (.769) and the regression weight for X_{4} is negative (-.783). In this case the variance in X_{1} that does not account for variance in Y_{2} is cancelled or suppressed by knowledge of X_{4}. Variable X_{4} is called a * suppressor variable*.

In terms of the descriptions of the variables, if X_{1} is a measure of intellectual ability and X_{4} is a measure of spatial ability, it might be reasonably assumed that X_{1} is composed of both verbal ability and spatial ability. If the score on a major review paper is correlated with verbal ability and not spatial ability, then subtracting spatial ability from general intellectual ability would leave verbal ability. Thus the high multiple R when spatial ability is subtracted from general intellectual ability. It is for this reason that X_{1} and X_{4}, while not correlated individually with Y_{2}, in combination correlate fairly highly with Y_{2}.

Multiple regression predicting a single dependent variable with two independent variables is conceptually similar to simple linear regression, predicting a single dependent variable with a single independent variable, except more weights are estimated and rather than fitting a line in a two-dimensional scatter plot, a plane is fitted to describe a three-dimensional scatter plot. Interpretation of the results is confounded by both the relationship between the two independent variables and their relationship with dependent variable.

A variety of relationships and interactions between the variables were then explored. These relationships discussed barely scratched the surface of the possibilities. Suffice it to say that the more variables that are included in an analysis, the greater the complexity of the analysis. Multiple regression is usually done with more than two independent variables. The next chapter will discuss issues related to more complex regression models.