<?xml version='1.0'?>
<?xml:stylesheet type="text/xsl" href="MultiBook.xsl" ?>
<chapter>
<number>5</number>
<author>David W. Stockburger</author>
<title>Multiple Regression with Two Predictor Variables </title>
<modified>03/21/2001</modified>
<URL>mlt06.xml</URL>
<section>
<P>Multiple regression is an extension of <index>simple linear regression</index> in which more than one independent variable (X) is used to predict a single dependent variable (Y). The predicted value of Y is a <index>linear transformation</index> of the X variables such that the sum of squared deviations of the observed and predicted Y is a minimum. The computations are more complex, however, because the interrelationships among all the variables must be taken into account in the weights assigned to the variables. The interpretation of the results of a multiple regression analysis is also more complex for the same reason.</P>
<definition word="dependent variable">the variable to be predicted.</definition> 
<definition word="independent variables">the set of variables used to predict the dependent variable.</definition> 
		<TestItem type="MC">
			<question>The difference between simple linear regression and multiple regression is</question>
			<answer type="correct">multiple regression has more independent measures.</answer>
			<answer>simple linear regression has a single dependent measure.</answer>
			<answer>multiple regression has multiple standard errors of estimate.</answer>
			<answer>multiple regression has many different regression equations.</answer>
			<difficulty></difficulty>
			<discriminability></discriminability>
			<author>David Stockburger</author>
			<date>03/24/2001</date>
			<concept>Linear regression with two variables</concept>
		</TestItem> <P>With two independent variables the prediction of Y is expressed by the following equation: </P>
<P>Y'<SUB>i</SUB> = b<SUB>0</SUB> + b<SUB>1</SUB>X<SUB>1i</SUB> + b<SUB>2</SUB>X<SUB>2i</SUB></P>
<P>Note that this transformation is similar to the linear transformation of two variables discussed in the previous chapter except that the w's have been replaced with b's and the X'<SUB>i</SUB> has been replaced with a Y'<SUB>i</SUB>.</P>
<P>The "b" values are called regression weights and are computed in a way that minimizes the sum of squared deviations</P>
<P>
	<figure>
		<description> The sum of squared deviation of the observed and predicted Y values.</description>
		<url>Images/mlt0638.gif </url>
		<width>86</width>
		<height>45</height>
		<align></align>
		<caption></caption>
		<alt>The sum of squared deviation of the observed and predicted Y values.</alt>
	</figure>
</P>
<P>in the same manner as in simple linear regression. The difference is that in simple linear regression only two weights, the <index>intercept</index> (b<SUB>0</SUB>) and <index>slope</index> (b<SUB>1</SUB>), were estimated, while in this case, three weights (b<SUB>0</SUB>, b<SUB>1</SUB>, and b<SUB>2</SUB>) are estimated.</P>
<P><h2>Example Data</h2></P>
<P>The data used to illustrate the inner workings of <index>multiple regression</index> are presented below:</P>
<P> 
<table cellPadding="2" cellSpacing="7" summary = "Example regression assignment data." title=" Example regression assignment data.">
<tcaption> Example Homework Assignment </tcaption> 
<TR><TD>Y<SUB>1</SUB> </TD><TD>Y<SUB>2</SUB> </TD><TD>X<SUB>1</SUB></TD><TD>X<SUB>2</SUB></TD><TD>X<SUB>3</SUB></TD><TD>X<SUB>4</SUB></TD></TR>
<TR><TD>125 </TD><TD>113 </TD><TD>13 </TD><TD>18 </TD><TD>25 </TD><TD>11 </TD></TR>
<TR><TD>158 </TD><TD>115 </TD><TD>39 </TD><TD>18 </TD><TD>59 </TD><TD>30 </TD></TR>
<TR><TD>207 </TD><TD>126 </TD><TD>52 </TD><TD>50 </TD><TD>62 </TD><TD>53 </TD></TR>
<TR><TD>182 </TD><TD>119 </TD><TD>29 </TD><TD>43 </TD><TD>50 </TD><TD>29 </TD></TR>
<TR><TD>196 </TD><TD>107 </TD><TD>50 </TD><TD>37 </TD><TD>65 </TD><TD>56 </TD></TR>
<TR><TD>175 </TD><TD>135 </TD><TD>64 </TD><TD>19 </TD><TD>79 </TD><TD>49 </TD></TR>
<TR><TD>145 </TD><TD>111 </TD><TD>11 </TD><TD>27 </TD><TD>17 </TD><TD>14 </TD></TR>
<TR><TD>144 </TD><TD>130 </TD><TD>22 </TD><TD>23 </TD><TD>31 </TD><TD>17 </TD></TR>
<TR><TD>160 </TD><TD>122 </TD><TD>30 </TD><TD>18 </TD><TD>34 </TD><TD>22 </TD></TR>
<TR><TD>175 </TD><TD>114 </TD><TD>51 </TD><TD>11 </TD><TD>58 </TD><TD>40 </TD></TR>
<TR><TD>151 </TD><TD>121 </TD><TD>27 </TD><TD>15 </TD><TD>29 </TD><TD>31 </TD></TR>
<TR><TD>161 </TD><TD>105 </TD><TD>41 </TD><TD>22 </TD><TD>53 </TD><TD>39 </TD></TR>
<TR><TD>200 </TD><TD>131 </TD><TD>51 </TD><TD>52 </TD><TD>75 </TD><TD>36 </TD></TR>
<TR><TD>173 </TD><TD>123 </TD><TD>37 </TD><TD>36 </TD><TD>44 </TD><TD>27 </TD></TR>
<TR><TD>175 </TD><TD>121 </TD><TD>23 </TD><TD>48 </TD><TD>27 </TD><TD>20 </TD></TR>
<TR><TD>162 </TD><TD>120 </TD><TD>43 </TD><TD>15 </TD><TD>65 </TD><TD>36 </TD></TR>
<TR><TD>155 </TD><TD>109 </TD><TD>38 </TD><TD>19 </TD><TD>62 </TD><TD>37 </TD></TR>
<TR><TD>230 </TD><TD>130 </TD><TD>62 </TD><TD>56 </TD><TD>75 </TD><TD>50 </TD></TR>
<TR><TD>162 </TD><TD>134 </TD><TD>28 </TD><TD>30 </TD><TD>36 </TD><TD>20 </TD></TR>
<TR><TD>153 </TD><TD>124 </TD><TD>30 </TD><TD>25 </TD><TD>41 </TD><TD>33 </TD></TR>
</table>
</P>
<DataFile type="text">
<url>mlt06.txt</url>
	<description>Example data for Multiple Regression with Two Variables</description>
</DataFile>
<DataFile type="SPSS">
	<url>mlt06.dat</url>
	<description> Example data for Multiple Regression with Two Variables </description>
</DataFile>
<P>The example data can be obtained as a text file and as an SPSS data.</P>
<P>If a student desires a more concrete description of this data file, meaning could be given the variables as follows:</P>
<P>Y<SUB>1</SUB> - A measure of success in graduate school.</P>
<P>X<SUB>1</SUB> - A measure of intellectual ability.</P>
<P>X<SUB>2</SUB> - A measure of "work ethic."</P>
<P>X<SUB>3</SUB> - A second measure of intellectual ability.</P>
<P>X<SUB>4</SUB> - A measure of spatial ability.</P>
<P>Y<SUB>2</SUB> - Score on a major review paper.</P>
<P><h2>Univariate Analysis</h2></P>
<definition word=" univariate analysis ">analysis of each variable individually</definition>
<P>The first step in the analysis of <index>multivariate data</index> is a table of <index>means</index> and <index>standard deviations</index>. Additional analysis recommendations include <index>histograms</index> of all variables with a view for <index>outliers</index>, or scores that fall outside the range of the majority of scores. In a multiple regression analysis, these score may have a large "<index>influence</index>" on the results of the analysis and are a cause for concern. In the case of the example data, the following means and standard deviations were computed using SPSS by clicking <SPSSCommand>Analyze/Summarize/Descriptives</SPSSCommand>.</P>
<definition word="outlier">a score that falls outside the range of the majority of scores.</definition> <P>
	<figure>
		<description>A table containing the variable name, sample size, minimum, maximum, mean, and standard deviation for all variables in the regression equation is presented.</description>
		<url>Images/mlt0639.gif</url>
		<width>451</width>
		<height>220</height>
		<align></align>
		<caption>Descriptive statistics from the SPSS regression program.</caption>
		<alt> Descriptive statistics from the SPSS regression program </alt>
	</figure>
</P> 
<P><h2>The Correlation Matrix</h2></P>
<definition word="correlation matrix">a table of all possible correlation coefficients between a set of variables.</definition> 
<P>The second step is an analysis of bivariate relationships between variables. This can be done using a <index>correlation matrix</index>, generated using the <SPSSCommand>Analyze/Correlate/Bivariate</SPSSCommand> commands in SPSS.</P>
<P>
	<figure>
		<description>The correlation matrix from the SPSS regression program is shown. It contains three tables, stacked one on top of the other. All tables list all variables included in the regression analysis as both rows and columns. The top table presents correlation coefficients, the middle table two tailed significance levels, and the third the sample size of each correlation, in this case all sample sizes are twenty.</description>
		<url> Images/mlt0640.gif </url>
		<width>602</width>
		<height>412</height>
		<align></align>
		<caption>Correlation matrix from the SPSS regression program.</caption>
		<alt> Correlation matrix from the SPSS regression program </alt>
	</figure>
</P>
<P>In the case of the example data, it is noted that all X variables correlate significantly with Y<SUB>1</SUB>, while none correlate significantly with Y<SUB>2</SUB>. In addition, X<SUB>1</SUB> is significantly correlated with X<SUB>3</SUB> and X<SUB>4</SUB>, but not with X<SUB>2</SUB>. Interpreting the variables using the suggested meanings, success in graduate school could be predicted individually with measures of <index>intellectual ability</index>, spatial ability, and work ethic. The measures of intellectual ability were correlated with one another. Measures of intellectual ability and work ethic were not highly correlated. The score on the review paper could not be accurately predicted with any of the other variables.</P>
<P>A visual presentation of the scatter plots generating the correlation matrix can be generated using the <SPSSCommand>Graphs/Scatter/Matrix</SPSSCommand> commands in SPSS.</P>
<P>
	<figure>
		<description></description>
		<url> Images/mlt0641.gif </url>
		<width>725</width>
		<height>580</height>
		<align></align>
		<caption></caption>
		<alt></alt>
	</figure>
</P>
<P>These graphs may be examined for multivariate outliers that might not be found in the <index>univariate view</index>.</P>
<definition word="multivariate outlier">a score that falls outside the standard range in a multivariate relationship.</definition> 
<P><index>Three-dimensional scatter plots</index> also permit a graphical representation in the same information as the multiple scatter plots. Using the <SPSSCommand>Graphs/Scatter/3-D</SPSSCommand> commands in SPSS results in the following two graphs.</P>
<P>
	<figure>
		<description>A three dimensional scatter plot created using SPSS to show the relationship between Y1, X1, and X3. It appears as a box with red dots.</description>
		<url> Images/mlt0642.gif </url>
		<width>363</width>
		<height>291</height>
		<align></align>
		<caption>Predicting Y1 from X1 and X3.</caption>
		<alt> Predicting Y1 from X1 and X3</alt>
	</figure>
</P>
<P>
	<figure>
		<description> A three dimensional scatter plot created using SPSS to show the relationship between Y1, X1, and X2. It appears as a box with red dots.</description>
		<url> Images/mlt0643.gif </url>
		<width>363</width>
		<height>291</height>
		<align></align>
		<caption> Predicting Y1 from X1 and X2.</caption>
		<alt> Predicting Y1 from X1 and X2.</alt>
	</figure>
</P>
<P>The results are less than satisfactory. In the three representations that follow, all scores have been standardized. The rotating 3D graph below presents X<SUB>1</SUB>, X<SUB>2</SUB>, and Y<SUB>1</SUB>. </P>
<P>
<APPLET code="RSWithPlaneDataOnly.class" height="200" id="X12Y" style="LEFT: 0px; TOP: 0px" width="200"></APPLET>
</P>
<P>The graph below presents X<SUB>1</SUB>, X<SUB>3</SUB>, and Y<SUB>1</SUB>. </P>
<P>
<APPLET code="RSWithPlaneDataOnly.class" height="200" id="X13Y" style="LEFT: 0px; TOP: 0px" width="200"></APPLET>
</P>
<P>The graph below presents X<SUB>1</SUB>, X<SUB>4</SUB>, and Y<SUB>2</SUB>.</P>
<P>
<APPLET code="RSWithPlaneDataOnly.class" height="200" id="X14Y" style="LEFT: 0px; TOP: 0px" width="200"></APPLET>
</P>
<P><script language="VBScript">
Sub window_onLoad()
  document.X12Y.ResetData
  document.X13Y.ResetData
  document.X14Y.ResetData
  document.X12Y.DataPoint 12, -1.61510, -.78970, -1.81300 
  document.X12Y.DataPoint 13, .13095, -.78970, -.46702 
  document.X12Y.DataPoint 14, 1.00398, 1.48692, 1.53157 
  document.X12Y.DataPoint 15, -.54060, .98891, .51188 
  document.X12Y.DataPoint 16, .86967, .56204, 1.08290 
  document.X12Y.DataPoint 17, 1.80985, -.71856, .22637 
  document.X12Y.DataPoint 18, -1.74941, -.14940, -.99725 
  document.X12Y.DataPoint 19, -1.01070, -.43398, -1.03804 
  document.X12Y.DataPoint 20, -.47345, -.78970, -.38544 
  document.X12Y.DataPoint 21, .93682, -1.28772, .22637 
  document.X12Y.DataPoint 22, -.67492, -1.00314, -.75253 
  document.X12Y.DataPoint 23, .26527, -.50513, -.34465 
  document.X12Y.DataPoint 24, .93682, 1.62921, 1.24605 
  document.X12Y.DataPoint 25, -.00336, .49090, .14480 
  document.X12Y.DataPoint 26, -.94354, 1.34463, .22637 
  document.X12Y.DataPoint 27, .39958, -1.00314, -.30387 
  document.X12Y.DataPoint 28, .06380, -.71856, -.58938 
  document.X12Y.DataPoint 28, 1.67554, 1.91379, 2.46968 
  document.X12Y.DataPoint 30, -.60776, .06403, -.30387 
  document.X12Y.DataPoint 31, -.47345, -.29169, -.67095 
  document.X13Y.DataPoint 12, -1.61510, -1.05019, -1.81300 
  document.X13Y.DataPoint 13, .13095, .56932, -.46702 
  document.X13Y.DataPoint 14, 1.00398, .71222, 1.53157 
  document.X13Y.DataPoint 15, -.54060, .14063, .51188 
  document.X13Y.DataPoint 16, .86967, .85512, 1.08290 
  document.X13Y.DataPoint 17, 1.80985, 1.52198, .22637 
  document.X13Y.DataPoint 18, -1.74941, -1.43125, -.99725 
  document.X13Y.DataPoint 19, -1.01070, -.76439, -1.03804 
  document.X13Y.DataPoint 20, -.47345, -.62149, -.38544 
  document.X13Y.DataPoint 21, .93682, .52169, .22637 
  document.X13Y.DataPoint 22, -.67492, -.85966, -.75253 
  document.X13Y.DataPoint 23, .26527, .28353, -.34465 
  document.X13Y.DataPoint 24, .93682, 1.33145, 1.24605 
  document.X13Y.DataPoint 25, -.00336, -.14517, .14480 
  document.X13Y.DataPoint 26, -.94354, -.95492, .22637 
  document.X13Y.DataPoint 27, .39958, .85512, -.30387 
  document.X13Y.DataPoint 28, .06380, .71222, -.58938 
  document.X13Y.DataPoint 29, 1.67554, 1.33145, 2.46968 
  document.X13Y.DataPoint 30, -.60776, -.52623, -.30387 
  document.X13Y.DataPoint 31, -.47345, -.28806, -.67095 
  document.X14Y.DataPoint 12, -1.61510, -1.65333, -.84102 
  document.X14Y.DataPoint 13, .13095, -.19225, -.61675 
  document.X14Y.DataPoint 14, 1.00398, 1.57643, .61675 
  document.X14Y.DataPoint 15, -.54060, -.26915, -.16820 
  document.X14Y.DataPoint 16, .86967, 1.80713, -1.51383 
  document.X14Y.DataPoint 17, 1.80985, 1.26884, 1.62597 
  document.X14Y.DataPoint 18, -1.74941, -1.42263, -1.06529 
  document.X14Y.DataPoint 19, -1.01070, -1.19194, 1.06529 
  document.X14Y.DataPoint 20, -.47345, -.80744, .16820 
  document.X14Y.DataPoint 21, .93682, .57674, -.72888 
  document.X14Y.DataPoint 22, -.67492, -.11535, .05607 
  document.X14Y.DataPoint 23, .26527, .49984, -1.73811 
  document.X14Y.DataPoint 24, .93682, .26915 ,1.17743 
  document.X14Y.DataPoint 25, -.00336, -.42295, .28034 
  document.X14Y.DataPoint 26, -.94354, -.96124, .05607 
  document.X14Y.DataPoint 27, .39958, .26915,-.05607 
  document.X14Y.DataPoint 28, .06380, .34605, -1.28956 
  document.X14Y.DataPoint 29, 1.67554, 1.34573, 1.06529 
  document.X14Y.DataPoint 30, -.60776, -.96124, 1.51383 
  document.X14Y.DataPoint 31, -.47345, .03845, .39248 
end sub 
</script></P>
</section>
<section>
<P><h2>The Regression Weights</h2></P>
<P>The formulas to compute the <index>regression weights</index> with two independent variables are available from various sources (<ref>Pedhazur, 1997</ref>).  They are messy and do not provide a great deal of insight into the mathematical "meanings" of the terms.  For that reason, computational procedures will be done entirely with a statistical package.</P>
		<ReferenceSource type="book">
			<title></title>
			<ISBN></ISBN>
			<edition></edition>
			<price type="hard/soft/web"></price>
			<refs> Pedhazur, 1997</refs>
			<URL></URL>
			<pubdate type="year/other"></pubdate>
			<author vCard=""><first></first><last>Pedhazur</last></author>
			<location></location>
			<publisher vCard=""></publisher>
			<booknote date=""></booknote>
			<quote page=""></quote>
		</ReferenceSource> 
<P>The multiple regression is done in SPSS by selecting <SPSSCommand>Analyze/Regression/Linear</SPSSCommand>. The interface should appear as follows:</P>
<P>
	<figure>
		<description>The SPSS user interface for the Regression program is shown. The variables X1 and X2 have been clicked over to the independent box and the variable Y1 has been clicked over to the dependent box. The default method of Enter appears in the Method text box.</description>
		<url> Images/mlt0644.gif</url>
		<width>394</width>
		<height>359</height>
		<align></align>
		<caption>SPSS user interface for Regression.</caption>
		<alt> SPSS user interface for Regression.</alt>
	</figure>
</P>
<P>In the first analysis, Y<SUB>1</SUB> is the dependent variable and two independent variables are entered in the first block, X<SUB>1</SUB> and X<SUB>2</SUB>. In addition, under the "Save..." option, both unstandardized predicted values and unstandardized residuals were selected.</P>
<definition word="regression coefficients">the weights in a linear model that optimally predict a dependent variable.</definition> 
<P>The output consists of a number of tables. The "Coefficients" table presents the optimal weights in the regression model, as seen in the following.</P>
<P>
	<figure>
		<description>The coefficients table of SPSS Regression program is shown. It is similar to the coefficients tables described in earlier chapters, except it has three rows rather than two to describe the terms in the model. In this case there are rows for a constant term, X1, and X2. For all rows, columns provide unstandardized coefficients, standardized coefficients, t values, and significance levels.</description>
		<url>Images/mlt0645.gif </url>
		<width>527</width>
		<height>202</height>
		<align></align>
		<caption>Coefficients table of SPSS Regression program.</caption>
		<alt> Coefficients table of SPSS Regression program.</alt>
	</figure>
</P>
<P>Recalling the prediction equation, Y'<SUB>i</SUB> = b<SUB>0</SUB> + b<SUB>1</SUB>X<SUB>1i</SUB> + b<SUB>2</SUB>X<SUB>2i</SUB>, the values for the weights can now be found by observing the "B" column under "<index>Unstandardized Coefficients</index>." They are b<SUB>0</SUB> = 101.222, b<SUB>1</SUB> = 1.000, and b<SUB>2</SUB> = 1.071, and the regression equation appears as:</P>
<P>Y'<SUB>i</SUB> = 101.222 + 1.000X<SUB>1i</SUB> + 1.071X<SUB>2i</SUB></P>
		<TestItem type="MC">
			<question>The standardized regression weights</question>
			<answer type="correct">will always have a constant term of zero.</answer>
			<answer>will equal the correlation coefficients between the independent and dependent variables.</answer>
			<answer>will be the same as the unstandardized coefficients if multicollinearity is assumed.</answer>
			<answer>are difficult to interpret because they are a function of the variability of the independent measure.</answer>
			<difficulty></difficulty>
			<discriminability></discriminability>
			<author>David Stockburger</author>
			<date>03/24/2001</date>
			<concept>Linear regression with two variables</concept>
		</TestItem> 
<P>The "<index>Beta</index>" column under "<index>Standardized Coefficients</index>" gives similar information, except all values of X and Y have been standardized (set to mean of zero and standard deviation of one) before the weights are computed. In this case the value of b<SUB>0</SUB> is always 0 and not included in the regression equation. The equation and weights for the example data appear below.</P>
<P>Z<SUB>Y</SUB> = b <SUB>1</SUB> Z<SUB>X1</SUB> + b <SUB>2</SUB> Z<SUB>X2</SUB></P>
<P>Z<SUB>Y</SUB> = .608 Z<SUB>X1</SUB> + .614 Z<SUB>X2</SUB></P>
<P>The standardization of all variables allows a better comparison of regression weights, as the unstandardized weights are a function of the variance of both the Y and the X variables.</P>
	<TestItem type="MC">
		<question>The equation Y" = b<sub>0</sub> + b<sub>1</sub>X<sub>1</sub> + b<sub>2</sub>X<sub>2</sub> describes </question>
		<answer type="incorrect">a straight line in three-dimensional space</answer>
		<answer type="incorrect">a curved line in three-dimensional space</answer>
		<answer type="incorrect">a bell curve</answer>
		<answer type="correct">a plane in three-dimensional space</answer>
		<difficulty></difficulty>
		<discriminability></discriminability>
		<author>David Stockburger</author>
		<date>03/05/2001</date>
		<concept>Linear regression with two variables</concept>
	</TestItem>
</section>
<section>
<P><h2>Predicted and Residual Values</h2></P>
<definition word="residual">the difference between an observed and predicted value.</definition> 
<definition word="squared residual">the squared difference between an observed and predicted value.</definition> 
<P>The values of Y<SUB>1i</SUB> can now be predicted using the following linear transformation.</P>
<P>Y'<SUB>1i</SUB> = 101.222 + 1.000X<SUB>1i</SUB> + 1.071X<SUB>2i</SUB> </P>
<P>Thus, the value of Y<SUB>1i</SUB> where X<SUB>1i</SUB> = 13 and X<SUB>2i</SUB> = 18 for the first student could be predicted as follows.</P>
<P>Y'<SUB>11</SUB> = 101.222 + 1.000X<SUB>11</SUB> + 1.071X<SUB>21</SUB> </P>
<P>Y'<SUB>11</SUB> = 101.222 + 1.000 * 13 + 1.071 * 18 </P>
<P>Y'<SUB>11</SUB> = 101.222 + 13.000 + 19.278 </P>
<P>Y'<SUB>11</SUB> = 133.50</P>
<P>The scores for all students are presented below, as computed in the data file of SPSS. Note that the predicted Y score for the first student is 133.50.  The predicted Y and residual values are automatically added to the data file when the <index>unstandardized predicted values</index> and unstandardized <index>residuals</index> are selected using the "Save" option.</P>
<P>
	<figure>
		<description>A data table report is shown resulting from the use of the save options in the SPSS Regression program. The table contains twenty rows, each with a value for X1, X2, Y1, Predicted Y, Y1 minus predicted Y or the unstandardized residuals, and the squared residuals. The first row for subject one has values of 13, 18, 125, 133.50, -8.50, and 72.30.</description>
		<url> Images/mlt0646.gif </url>
		<width>409</width>
		<height>481</height>
		<align></align>
		<caption>Viewing the saved results of the SPSS Regression program.</caption>
		<alt> Viewing the saved results of the SPSS Regression program.</alt>
	</figure>
</P> 
	<TestItem type="MC">
		<question>A large residual for a given individual means </question>
		<answer type="incorrect">lack of a good fit for that individual</answer>
		<answer type="incorrect">possible greater influence of that individual on the regression</answer>
		<answer type="incorrect">the predicted and observed values of Y are different</answer>
		<answer type="correct">all of the answers are correct</answer>
		<difficulty></difficulty>
		<discriminability></discriminability>
		<author>David Stockburger</author>
		<date>03/05/2001</date>
		<concept>Linear regression with two variables</concept>
	</TestItem>
	<TestItem type="MC">
		<question>In a regression equation predicting points in a graduate statistics course, unstandardized regression weights were -10.37, 1.33, and .78 for the constant term, a measure of intellectual ability, and a measure of motivation.  What would be the predicted number of points for a student with a score of 123 on the measure of intellectual ability and a score of 154 on the measure of motivation? </question>
		<answer type="incorrect">294.08</answer>
		<answer type="incorrect">283.71</answer>
		<answer type="correct">273.34</answer>
		<answer type="incorrect">-8.26</answer>
		<difficulty></difficulty>
		<discriminability></discriminability>
		<author>David Stockburger</author>
		<date>03/05/2001</date>
		<concept>Linear regression with two variables</concept>
	</TestItem>
	<TestItem type="MC">
		<question> In a regression equation predicting points in a graduate statistics course, unstandardized regression weights were -10.37, 1.33, and .78 for the constant term, a measure of intellectual ability, and a measure of motivation.  What would be the residual for a student with a score of 123 on the measure of intellectual ability and a score of 154 on the measure of motivation, and an observed number of points of 298, the residual for this student would be </question>
		<answer type="incorrect">-10.37</answer>
		<answer type="incorrect">-14.34</answer>
		<answer type="incorrect">73.34</answer>
		<answer type="correct">24.66</answer>
		<difficulty></difficulty>
		<discriminability></discriminability>
		<author>David Stockburger</author>
		<date>03/05/2001</date>
		<concept>Linear regression with two variables</concept>
	</TestItem>
	<TestItem type="MC">
		<question>Relatively small values for residuals in a multiple regression equation can be interpreted as </question>
		<answer type="incorrect">lack of a good fit for that score</answer>
		<answer type="incorrect">violation of the assumption of normality</answer>
		<answer type="incorrect">multicollinearity</answer>
		<answer type="correct">small error in prediction</answer>
		<difficulty></difficulty>
		<discriminability></discriminability>
		<author>David Stockburger</author>
		<date>03/05/2001</date>
		<concept>Linear regression with two variables</concept>
	</TestItem>
<P>The difference between the observed and predicted score, Y-Y ', is called a <index>residual</index>. This column has been computed, as has the column of <index>squared residuals</index>.  The squared residuals (Y-Y')<SUP>2</SUP> may be computed in SPSS by squaring the residuals using <SPSSCommand>Transform/Compute</SPSSCommand> commands.</P>
<P>The analysis of residuals can be informative. The larger the residual for a given observation, the larger the difference between the observed and predicted value of Y and the greater the error in prediction. In the example data, the regression under-predicted the Y value for observation 10 by a value of 10.98, and over-predicted the value of Y for observation 6 by a value of 10.60. In some cases the analysis of errors of prediction in a given model can direct the search for additional independent variables that might prove valuable in more complete models.</P>
	<TestItem type="MC">
		<question>Assumptions made when using analysis of variance in multiple regression include </question>
		<answer type="incorrect">multicollinearity</answer>
		<answer type="incorrect">small error in prediction</answer>
		<answer type="incorrect">equality of regression weights</answer>
		<answer type="correct">normally distributed residuals</answer>
		<difficulty></difficulty>
		<discriminability></discriminability>
		<author>David Stockburger</author>
		<date>03/05/2001</date>
		<concept>Linear regression with two variables</concept>
	</TestItem>
<P>The residuals are assumed to be normally distributed when the testing of hypotheses using analysis of variance (R<SUP>2</SUP> change). Although analysis of variance is fairly robust with respect to this assumption, it is a good idea to examine the distribution of residuals, especially with respect to <index>outliers</index>. The distribution of residuals for the example data is presented below.</P>
<P>
	<figure>
		<description>A histogram of standardized residuals is shown with a normal curve overlaid on top of the histogram. The x-axis is labeled with numbers from -1.5 to +1.5. The distribution appears reasonably normal with an unexpected peak at the value 1.0.</description>
		<url> Images/mlt0652.gif </url>
		<width>290</width>
		<height>239</height>
		<align></align>
		<caption>Histogram of standardized residuals.</caption>
		<alt> Histogram of standardized residuals.</alt>
	</figure>
</P> 
</section>
<section>
<P><h2>The Multiple Correlation Coefficient</h2></P>
	<TestItem type="MC">
		<question>The correlation coefficient between X and Y (r<sub>xy</sub>) will be different than the multiple correlation coefficient (R<sub>yx</sub>) when </question>
		<answer type="incorrect">the value of b<sub>0</sub> is greater than 100</answer>
		<answer type="correct">the value of b<sub>1</sub> is less than zero</answer>
		<answer type="incorrect">the value of r<sub>xy</sub> is greater than .5</answer>
		<answer type="incorrect">they will never differ.</answer>
		<difficulty></difficulty>
		<discriminability></discriminability>
		<author>David Stockburger</author>
		<date>03/05/2001</date>
		<concept>Linear regression with two variables</concept>
	</TestItem>
	<TestItem type="MC">
		<question>The value of the multiple correlation coefficient R, is </question>
		<answer type="incorrect">the proportion of variance in Y accounted for by all the X values.</answer>
		<answer type="correct">the correlation between the predicted and observed values of Y.</answer>
		<answer type="incorrect">called the coefficient of determination.</answer>
		<answer type="incorrect">a value between -1 and 1.</answer>
		<difficulty></difficulty>
		<discriminability></discriminability>
		<author>David Stockburger</author>
		<date>03/05/2001</date>
		<concept>Linear regression with two variables</concept>
	</TestItem>
<definition word="multiple correlation coefficient">the correlation coefficient between the observed and predicted dependent variables.</definition> 
<P>The <index>multiple correlation coefficient</index>, R, is the correlation coefficient between the observed values of Y and the predicted values of Y. For this reason, the value of R will always be positive and will take on a value between zero and one. The direction of the multivariate relationship between the independent and dependent variables can be observed in the sign, positive or negative, of the regression weights. The interpretation of R is similar to the interpretation of the <index>correlation coefficient</index>, the closer the value of R to one, the greater the linear relationship between the independent variables and the dependent variable.</P>
		<TestItem type="MC">
			<question>A new statistician found a multiple correlation coefficient of .54 when the previous statistician found a multiple correlation coefficient of .32 on the same data but different variables, you should</question>
			<answer type="correct">give the new statistician a bonus.</answer>
			<answer>fire the new statistician.</answer>
			<answer>fire both statisticians.</answer>
			<answer>only do qualitative research.</answer>
			<difficulty></difficulty>
			<discriminability></discriminability>
			<author>David Stockburger</author>
			<date>03/24/2001</date>
			<concept>Linear regression with two variables</concept>
		</TestItem> 
		<TestItem type="MC">
			<question>A new statistician found a multiple correlation coefficient of -.54 when the previous statistician found a multiple correlation coefficient of .32 on the same data but different variables, you should</question>
			<answer>give the new statistician a bonus.</answer>
			<answer type="correct">fire the new statistician.</answer>
			<answer>fire both statisticians.</answer>
			<answer>only do qualitative research.</answer>
			<difficulty></difficulty>
			<discriminability></discriminability>
			<author>David Stockburger</author>
			<date>03/24/2001</date>
			<concept>Linear regression with two variables</concept>
		</TestItem> 
<P>The value of R can be found in the "Model Summary" table of the SPSS output. In the case of the example data, the value for the multiple R when predicting Y<SUB>1</SUB> from X<SUB>1</SUB> and X<SUB>2</SUB> is .968, a very high value.</P>
<P>
	<figure>
		<description>The model summary table of SPSS Regression output is shown. The table contains columns for model, R, R squared, adjusted R squared, and standard error of estimate. In the example, the corresponding values are model 1, .968, .936, .929, and 6.54.</description>
		<url> Images/mlt0647.gif </url>
		<width>379</width>
		<height>154</height>
		<align></align>
		<caption>Model summary table of SPSS Regression output.</caption>
		<alt> Model summary table of SPSS Regression output.</alt>
	</figure>
</P>
<definition word="coefficient of determination">the multiple correlation coefficient squared.</definition> 
<P>The <index>multiple correlation coefficient squared</index> ( R<SUP>2</SUP> ) is also called the <I><index>coefficient of determination</index></I>. It may be found in the <index>SPSS</index> output alongside the value for R. The interpretation of R<SUP>2</SUP> is similar to the interpretation of r<SUP>2</SUP>, namely the proportion of variance in Y that may be predicted by knowing the value of the X variables. The value for R squared will always be less than the value for R. In general the value of multiple R is to be preferred over R squared as a measure of relationship because R squared is measured in units of measurement squared while R is in terms of units of measurement.</P>
		<TestItem type="MC">
			<question>The coefficient of determination is</question>
			<answer type="correct">the multiple correlation coefficient squared.</answer>
			<answer>the proportion of variance in Y that cannot be predicted by knowing the values for the independent measures.</answer>
			<answer>the absolute value of the standard error of estimate.</answer>
			<answer>a measure of multicollineariety.</answer>
			<difficulty></difficulty>
			<discriminability></discriminability>
			<author>David Stockburger</author>
			<date>03/24/2001</date>
			<concept>Linear regression with two variables</concept>
		</TestItem>
		<TestItem type="MC">
			<question>Which of the following will be the largest?</question>
			<answer type="correct">R</answer>
			<answer>R squared</answer>
			<answer>Adjusted R squared</answer>
			<difficulty></difficulty>
			<discriminability></discriminability>
			<author>David Stockburger</author>
			<date>03/24/2001</date>
			<concept>Linear regression with two variables</concept>
		</TestItem>
		<TestItem type="MC">
			<question>Which of the following will be the smallest?</question>
			<answer> R</answer>
			<answer>R squared</answer>
			<answer type="correct">Adjusted R squared</answer>
			<difficulty></difficulty>
			<discriminability></discriminability>
			<author>David Stockburger</author>
			<date>03/24/2001</date>
			<concept>Linear regression with two variables</concept>
		</TestItem>
<P>The adjustment in the "<index>Adjusted R Square</index>" value in the output tables is a correction for the number of X variables included in the prediction model. In general, the smaller the N and the larger the number of variables, the greater the adjustment. In the example data, the results could be reported as "92.9% of the variance in the measure of success in graduate school can be predicted by measures of intellectual ability and work ethic."</P>
	<TestItem type="MC">
		<question>In multiple regression, the unadjusted R<sup>2</sup> value will </question>
		<answer type="incorrect">be negative if there is an inverse relationship</answer>
		<answer type="correct">be larger for a full model than a partial model</answer>
		<answer type="incorrect">be positively related to the standard error of estimate</answer>
		<answer type="incorrect">indicate multicollinearily if the hyperplane extends the hyperspace</answer>
		<difficulty></difficulty>
		<discriminability></discriminability>
		<author>David Stockburger</author>
		<date>03/05/2001</date>
		<concept>Linear regression with two variables</concept>
	</TestItem>
	<TestItem type="MC">
		<question>The "adjustment" in the "Adjusted R Square" is for </question>
		<answer type="correct">the number of variables in the regression equation</answer>
		<answer type="incorrect">the lack of fit of the standard error of estimate</answer>
		<answer type="incorrect">missing data in the predictor variables</answer>
		<answer type="incorrect">repressed memories of childhood trauma</answer>
		<difficulty></difficulty>
		<discriminability></discriminability>
		<author>David Stockburger</author>
		<date>03/05/2001</date>
		<concept>Linear regression with two variables</concept>
	</TestItem>
</section>
<section>
<P><h2>The Standard Error of Estimate</h2></P>
<definition word="standard error of estimate">a measure of error in prediction.</definition> 
<P>The <index>standard error of estimate</index> is a measure of error of prediction. The definitional formula for the standard error of estimate is an extension of the definitional formula in simple linear regression and is presented below.</P>
<P>
	<figure>
		<description>The definitional formula for the standard error of estimate is shown. It is symbolized by s sub y dot x1, x2, ..., xk and equal to the square root of the sum of the squared residuals divided by N minus K.</description>
		<url> Images/mlt0649.gif </url>
		<width>190</width>
		<height>65</height>
		<align></align>
		<caption>The definitional formula for the standard error of estimate.</caption>
		<alt> The definitional formula for the standard error of estimate.</alt>
	</figure>
</P> 
		<TestItem type="MC">
			<question>The denominator in the definitional formula for the standard error of estimate in linear regression is</question>
			<answer type="correct">the number of scores minus the number of terms in the model.</answer>
			<answer>the number of independent variables in the model.</answer>
			<answer>the multiple correlation coefficient (R) squared.</answer>
			<answer>the number of scores times the number of independent variables.</answer>
			<difficulty></difficulty>
			<discriminability></discriminability>
			<author>David Stockburger</author>
			<date>03/24/2001</date>
			<concept>Linear regression with two variables</concept>
		</TestItem>
		<TestItem type="MC">
			<question>The larger the value of the unadjusted R squared the</question>
			<answer type="correct">smaller the standard error of estimate.</answer>
			<answer>smaller the value of the adjusted R squared.</answer>
			<answer>smaller the standardized regression coefficients.</answer>
			<answer>weaker the relationship between the independent variables.</answer>
			<difficulty></difficulty>
			<discriminability></discriminability>
			<author>David Stockburger</author>
			<date>03/24/2001</date>
			<concept>Linear regression with two variables</concept>
		</TestItem>
<P>The difference between this formula and the formula presented in an earlier chapter is in the denominator of the equation. In both cases the denominator is N - k, where N is the number of observations and k is the number of parameters which are estimated to find the predicted value of Y. In the case of simple linear regression, the number of parameters needed to be estimated was two, the intercept and the slope, while in the case of the example with two independent variables, the number was three, b<SUB>0</SUB>, b<SUB>1</SUB>, and b<SUB>2</SUB>.</P>
<definition word="degrees of freedom">the number of values that are free to vary.</definition> 
<P>The computation of the standard error of estimate using the definitional formula for the example data is presented below.  The numerator, or <index>sum of squared residuals</index>, is found by summing the (Y-Y')<SUP>2</SUP> column.</P>
<P>
	<figure>
		<description> The definitional formula for the standard error of estimate is shown. It is symbolized by s sub y dot x1, x2, ..., xk and equal to the square root of the sum of the squared residuals divided by N minus K.</description>
		<url> Images/mlt0650.gif</url>
		<width>204</width>
		<height>65</height>
		<align></align>
		<caption></caption>
		<alt>The definitional formula for the standard error of estimate.</alt>
	</figure>
</P> 
<P>
	<figure>
		<description>A value of 727.26 is substituted for the sum of squared residuals and 17 for the number of scores minus the number of terms in the model (20-3). The calculation results in a value of 6.54.</description>
		<url> Images/mlt0651.gif </url>
		<width>364</width>
		<height>38</height>
		<align></align>
		<caption>Computing the standard error of estimate using the definitional formula.</caption>
		<alt> Computing the standard error of estimate using the definitional formula.</alt>
	</figure>
</P> 
		<TestItem type="MC">
			<question>If the sum of squared residuals was 133.48 in a linear regression model for twenty-five score, two independent variables, and a constant term, the standard error of estimate would be.</question>
			<answer type="correct">2.46</answer>
			<answer>6.07</answer>
			<answer>5.38</answer>
			<answer>11.55</answer>
			<difficulty></difficulty>
			<discriminability></discriminability>
			<author>David Stockburger</author>
			<date>03/24/2001</date>
			<concept>Linear regression with two variables</concept>
		</TestItem>
<P>Note that the value for the standard error of estimate agrees with the value given in the output table of SPSS.</P>
</section>
<section>
<P><h2>The ANOVA Table</h2></P>
	<TestItem type="MC">
		<question>In the ANOVA table produced when doing a multiple regression with SPSS the Sum of Squares due to Regression is </question>
		<answer type="incorrect">the numerator for the standard error of estimate</answer>
		<answer type="incorrect">the sum of the squared differences between the predicted an observed value for Y</answer>
		<answer type="incorrect">both of the answers</answer>
		<answer type="correct">neither of the answers</answer>
		<difficulty></difficulty>
		<discriminability></discriminability>
		<author>David Stockburger</author>
		<date>03/05/2001</date>
		<concept>Linear regression with two variables</concept>
	</TestItem>
<P>The <index>ANOVA</index> table output when both X<SUB>1</SUB> and X<SUB>2</SUB> are entered in the first block when predicting Y<SUB>1</SUB> appears as follows.</P>
<P>
	<figure>
		<description> The ANOVA table from the SPSS Regression program is presented. The sum of squares residual is equal to the sum of squared residuals found earlier. The value of the F statistic is 124.979 with a significance level of .000.</description>
		<url> Images/mlt0653.gif </url>
		<width>535</width>
		<height>175</height>
		<align></align>
		<caption>The ANOVA table from the SPSS Regression program.</caption>
		<alt> The ANOVA table from the SPSS Regression program.</alt>
	</figure>
</P> 
<P>Because the <index>exact significance level</index> is less than alpha, in this case assumed to be .05, the model with variables X<SUB>1</SUB> and X<SUB>2</SUB> significantly predicted Y<SUB>1</SUB>. As described in the chapter on testing hypotheses using regression, the <index>Sum of Squares</index> for the <index>residual</index>, 727.29, is the sum of the squared residuals (see the standard error of estimate above). The <index>mean square residual</index>, 42.78, is the squared standard error of estimate. The total sum of squares, 11420.95, is the sum of the squared differences between the observed values of Y and the mean of Y. The regression sum of squares, 10693.66, is the sum of squared differences between the model where Y'<SUB>i</SUB> = b<SUB>0</SUB> and Y'<SUB>i</SUB> = b<SUB>0</SUB> + b<SUB>1</SUB>X<SUB>1i</SUB> + b<SUB>2</SUB>X<SUB>2i</SUB>. The <index>regression sum of squares</index> is also the difference between the total sum of squares and the residual sum of squares, 11420.95 - 727.29 = 10693.66. The regression mean square, 5346.83, is computed by dividing the regression sum of squares by its <index>degrees of freedom</index>. In this case the regression mean square is based on two degrees of freedom because two additional parameters, b<SUB>1</SUB> and b<SUB>2</SUB>, were computed.</P>
<P>The following table illustrates the computation of the various sum of squares in the example data.</P>
<P>
	<figure>
		<description> A data table report is shown resulting from the use of the save options in the SPSS Regression program. The table contains twenty rows, each with a value for Y, Y minus Y bar, Y minus Y bar squared, Predicted Y, Y1 minus predicted Y or the unstandardized residuals, the squared residuals, predicted Y minus Y bar, and predicted Y minus Y bar squared. The sum of Y minus Y bar is circled and labeled as sum of squares total. The sum of the squared residuals also circled and labeled as sum of squares residual. In addition, the sum of the squared predicted values minus the mean of Y, Y bar, is also circled and labeled as the sum of squares regression. The values of the three sum of squares columns are 11420.95, 1024.74, and 10693.66.</description>
		<url> Images/mlt0656.gif </url>
		<width>356</width>
		<height>293</height>
		<align></align>
		<caption>Computing the Sums of Squares from a data table.</caption>
		<alt> Computing the Sums of Squares from a data table.</alt>
	</figure>
</P> 
<P>Note that this table is identical in principal to the table presented in the chapter on testing hypotheses in regression. </P>
</section>
<section>
<P><h2>Changes in the Regression Weights</h2></P>
<P>When more terms are added to the regression model, the <index>regression weights</index> change as a function of the relationships between both the independent variables and the dependent variable. This can be illustrated using the example data.</P>
		<TestItem type="MC">
			<question>When adding additional terms to a multiple regression model</question>
			<answer type="correct">the regression weights in the original model will change in value.</answer>
			<answer>only the regression weights in the original model will not change in value.</answer>
			<answer>the regression weights in the original model will change only if the R squared change is statistically significant.</answer>
			<answer>the regression weights in the original model will not change only if the R squared change is greater than zero.</answer>
			<difficulty></difficulty>
			<discriminability></discriminability>
			<author>David Stockburger</author>
			<date>03/24/2001</date>
			<concept>Linear regression with two variables</concept>
		</TestItem>
<P>A minimal model, predicting Y1 from the mean of Y1 results in the following.</P>
<P>Y'<SUB>i</SUB> = b<SUB>0</SUB></P>
<P>Y'<SUB>i</SUB> = 169.45</P>
<P>A partial model, predicting Y<SUB>1</SUB> from X<SUB>1</SUB> results in the following model.</P>
<P>Y'<SUB>i</SUB> = b<SUB>0</SUB> + b<SUB>1</SUB>X<SUB>1i</SUB></P>
<P>Y'<SUB>i</SUB> = 122.835 + 1.258 X<SUB>1i</SUB></P>
<P>A second partial model, predicting Y<SUB>1</SUB> from X<SUB>2</SUB> is the following.</P>
<P>Y'<SUB>i</SUB> = b<SUB>0</SUB> + b<SUB>2</SUB>X<SUB>2I</SUB></P>
<P>Y'<SUB>i</SUB> = 130.425 + 1.341 X<SUB>2i</SUB></P>
<P>As established earlier, the full regression model when predicting Y<SUB>1</SUB> from X<SUB>1</SUB> and X<SUB>2</SUB> is </P>
<P>Y'<SUB>i</SUB> = b<SUB>0</SUB> + b<SUB>1</SUB>X<SUB>1i</SUB> + b<SUB>2</SUB>X<SUB>2i</SUB></P>
<P>Y'<SUB>i</SUB> = 101.222 + 1.000X<SUB>1i</SUB> + 1.071X<SUB>2i</SUB></P>
<P>As can be observed, the values of both b<SUB>1</SUB> and b<SUB>2</SUB> change when both X<SUB>1</SUB> and X<SUB>2</SUB> are included in the regression model. The size and effect of these changes are the foundation for the <index>significance testing of sequential models</index> in regression.</P>
<definition word="R square change">the difference in the unadjusted multiple correlation coefficient squared between a partial model and a full model.</definition> 
<P><h2>R<SUP>2</SUP> Change</h2></P>
<P>The unadjusted R<SUP>2</SUP> value will increase with the addition of terms to the regression model. The amount of change in R<SUP>2</SUP> is a measure of the increase in predictive power of the independent variable or variables, given the independent variable or variables already in the model. For example, the effect of work ethic (X<SUB>2</SUB>) on success in graduate school (Y<SUB>1</SUB>) could be assessed given one already has a measure of intellectual ability (X<SUB>1</SUB>.) The following table presents the results for the example data.</P>
		<TestItem type="MC">
			<question>R<SUP>2</SUP> change</question>
			<answer type="correct">measures the gain in predictive power.</answer>
			<answer>will be negative when the correlation between the dependent and independent variables is negative.</answer>
			<answer>will be larger than the adjusted R<SUP>2</SUP>.</answer>
			<answer>cannot be tested for significance.</answer>
			<difficulty></difficulty>
			<discriminability></discriminability>
			<author>David Stockburger</author>
			<date>03/24/2001</date>
			<concept>Linear regression with two variables</concept>
		</TestItem><P><table cellPadding="2" cellSpacing="5" summary = "" title=""><tcaption>R<SUP>2</SUP> and R<SUP>2</SUP> change in successive models.</tcaption>
<TR> <TH scope = "col" colspan = "1" abbr = "variables" >Variables in Equation</TH> <TH scope = "col" colspan = "1" abbr = "R squared" >R<SUP>2</SUP></TH> <TH scope = "col" colspan = "1" abbr = "R squared change" >Increase in R<SUP>2</SUP></TH></TR>
<TR><TD>None</TD><TD>0.00</TD><TD>-</TD></TR>
<TR><TD>X<SUB>1</SUB></TD><TD>.584</TD><TD>.584</TD></TR>
<TR><TD>X<SUB>1</SUB>, X<SUB>2</SUB></TD><TD>.936</TD><TD>.352</TD></TR>
</table></P>
<P>A similar table can be constructed to evaluate the increase in predictive power of X<SUB>3</SUB> given X<SUB>1</SUB> is already in the model.</P>
<table cellPadding="5" cellSpacing="2" summary = "" title=""><tcaption></tcaption>
<TR><TD>Variables in Equation</TD><TD>R<SUP>2</SUP></TD><TD>Increase in R<SUP>2</SUP></TD></TR>
<TR><TD>None</TD><TD>0.00</TD><TD>-</TD></TR>
<TR><TD>X<SUB>1</SUB></TD><TD>.584</TD><TD>.584</TD></TR>
<TR><TD>X<SUB>1</SUB>, X<SUB>3</SUB></TD><TD>.592</TD><TD>.008</TD></TR>
</table>
<P>As can be seen, although both X<SUB>2</SUB> and X<SUB>3</SUB> individually correlate significantly with Y<SUB>1</SUB>, X<SUB>2</SUB> contributes a fairly large increase in predictive power in combination with X<SUB>1</SUB>, while X<SUB>3</SUB> does not. Because X<SUB>1</SUB> and X<SUB>3</SUB> are highly correlated with each other, knowledge of one necessarily implies knowledge of the other. In regression analysis terms, X<SUB>2</SUB> in combination with X<SUB>1</SUB> predicts <index>unique variance</index> in Y<SUB>1</SUB>, while X<SUB>3</SUB> in combination with X<SUB>1</SUB> predicts <index>shared variance</index>.</P>
<P>It is possible to do <index>significance testing</index> to determine whether the addition of another dependent variable to the regression model significantly increases the value of R<SUP>2</SUP>. This significance test is the topic of the next section.</P>
</section>
<section>
<P><h2>Sequential Significance Testing</h2></P>
<P>In order to test whether a variable adds significant predictive power to a regression model, it is necessary to construct the regression model in stages or blocks. This is accomplished in <index>SPSS</index> by entering the independent variables in different <index>blocks</index>. For example, if the increase in predictive power of X<SUB>2</SUB> after X<SUB>1</SUB> has been entered in the model was desired, then X<SUB>1</SUB> would be entered in the first block and X<SUB>2</SUB> in the second block. The following demonstrates how to construct these sequential models. The figure below illustrates how X<SUB>1</SUB> is entered in the model first.</P>
		<TestItem type="MC">
			<question>Sequential hypothesis testing in multiple regression models is done using SPSS by</question>
			<answer type="correct">entering the variables in sequential blocks.</answer>
			<answer>by clicking on the additional test button.</answer>
			<answer>by requesting that the unstandardized predicted values be saved to the data file.</answer>
			<answer>selecting multiple scatter plots.</answer>
			<difficulty></difficulty>
			<discriminability></discriminability>
			<author>David Stockburger</author>
			<date>03/24/2001</date>
			<concept>Linear regression with two variables</concept>
		</TestItem>
<P>
	<figure>
		<description> The SPSS user interface for the Regression program showing sequential model testing is demonstrated. The X1 variable is entered as the independent variable and the Y1 is entered as the dependent variable. The next button is circled in red following the title Block 1 of 1.</description>
		<url> Images/mlt0654.gif </url>
		<width>396</width>
		<height>361</height>
		<align></align>
		<caption>The SPSS user interface for the Regression program showing sequential model testing.</caption>
		<alt> The SPSS user interface for the Regression program showing sequential model testing.</alt>
	</figure>
</P> 
<P>The next figure illustrates how X<SUB>2</SUB> is entered in the second block.</P>
<P>
	<figure>
		<description> The second SPSS user interface for the Regression program showing sequential model testing is demonstrated. In this view, the variable X2 is entered as the independent variables and the interface is titled Block 2 of 2 (circled in red).</description>
		<url> Images/mlt0655.gif</url>
		<width>398</width>
		<height>363</height>
		<align></align>
		<caption>The second SPSS user interface for the Regression program showing sequential model testing.</caption>
		<alt> The second SPSS user interface for the Regression program showing sequential model testing.</alt>
	</figure>
</P> 
<P>In order to obtain the desired hypothesis test, click on the "Statistics..." button and then select the "<index>R squared change</index>" option, as presented below.</P>
<P>
	<figure>
		<description> Selecting the R squared change option in the SPSS Regression command by checking the appropriate box in the SPSS user interface is shown.</description>
		<url> Images/mlt0657.gif </url>
		<width>403</width>
		<height>137</height>
		<align></align>
		<caption>Selecting the R squared change option in the SPSS Regression command.</caption>
		<alt> Selecting the R squared change option in the SPSS Regression command.</alt>
	</figure>
</P> 
<P>The additional output obtained by selecting these option include a model summary,</P>
<P>
	<figure>
		<description> The model summary table of SPSS Regression output is shown. The table contains columns for model, R, R squared, adjusted R squared, standard error of estimate, R squared change, F change, df1, df2, and significance of F change. In the example, the corresponding values are model 1, .764, .584,. .561, 16.25, .584, 25.261, 1, 18, and .000 for the first row and 1, .968, .936, .929, 6.54, .352, 94.076, 1, 17, .000 for the second.</description>
		<url> Images/mlt0658.gif </url>
		<width>600</width>
		<height>205</height>
		<align></align>
		<caption>The model summary table with R squared change option of SPSS Regression output.</caption>
		<alt> The model summary table with R squared change option of SPSS Regression output.</alt>
	</figure>
</P> 
<P>an ANOVA table,</P>
<P>
	<figure>
		<description> Two ANOVA table from the SPSS Regression program showing sequential significance testing are presented. The sum of squares residual in the second model is equal to the sum of squared residuals found earlier. The value of the F statistic is 124.979 with a significance level of .000.</description>
		<url> Images/mlt0659.gif </url>
		<width>535</width>
		<height>251</height>
		<align></align>
		<caption>The ANOVA tables in the SPSS Regression program showing sequential hypothesis testing.</caption>
		<alt> The ANOVA tables in the SPSS Regression program showing sequential hypothesis testing.</alt>
	</figure>
</P> 
<P>and a table of coefficients.</P>
<P>
	<figure>
		<description> Two <index>coefficients tables</index> of SPSS Regression program are shown. They are similar to the coefficients tables described in earlier chapters, except they have three rows rather than two to describe the terms in the model. In this case there are rows for a constant term, X1, and X2. For all rows, columns provide <index>unstandardized coefficients</index>, <index>standardized coefficients</index>, <index>t values</index>, and <index>significance levels</index>.</description>
		<url> Images/mlt0660.gif </url>
		<width>527</width>
		<height>239</height>
		<align></align>
		<caption> The coefficients tables in the SPSS Regression program showing sequential hypothesis testing.</caption>
		<alt> The coefficients tables in the SPSS Regression program showing sequential hypothesis testing.</alt>
	</figure>
</P>
		<TestItem type="MC">
			<question>If the R squared change is statistically significant</question>
			<answer type="correct">then the additional independent variables predicted additional variability in the dependent measure better than chance.</answer>
			<answer>then multicollinearity is present.</answer>
			<answer>then the prediction model is also practically significant.</answer>
			<answer>the principal components of the independent variables are similar to the principal components of the dependent variables.</answer>
			<difficulty></difficulty>
			<discriminability></discriminability>
			<author>David Stockburger</author>
			<date>03/24/2001</date>
			<concept>Linear regression with two variables</concept>
		</TestItem> 
<P>The only new information presented in these tables is in the model summary and the "<index>Change Statistics</index>" entries. The critical new entry is the test of the significance of R<SUP>2</SUP> change for model 2. In this case the change is statistically significant. It could be said that X<SUB>2</SUB> adds significant predictive power in predicting Y<SUB>1</SUB> after X<SUB>1</SUB> has been entered into the regression model.</P>
<P>Conducting a similar hypothesis test for the increase in predictive power of X<SUB>3</SUB> when X<SUB>1</SUB> is already in the model produces the following model summary table.</P>
<P>
	<figure>
		<description> The model summary table of SPSS Regression output is shown. The table contains columns for model, R, R squared, adjusted R squared, standard error of estimate, R squared change, F change, df1, df2, and significance of F change. In the example, the corresponding values are model 1, .764, .584,. .561, 16.25, .584, 25.261, 1, 18, and .000 for the first row and 1, .770, .592, .544, 16.55, .008, .350, 1, 17, .562 for the second.</description>
		<url> Images/mlt0661.gif </url>
		<width>589</width>
		<height>197</height>
		<align></align>
		<caption> The model summary table with R squared change option of SPSS Regression output.</caption>
		<alt> The model summary table with R squared change option of SPSS Regression output.</alt>
	</figure>
</P> 
<P>Note that in this case the change is not significant. The table of coefficients also presents some interesting relationships.</P>
<P>
	<figure>
		<description> Two coefficients tables of SPSS Regression program are shown. They are similar to the coefficients tables described in earlier chapters, except they have three rows rather than two to describe the terms in the model. In this case there are rows for a constant term, X1, and X2. For all rows, columns provide unstandardized coefficients, standardized coefficients, t values, and significance levels.</description>
		<url> Images/mlt0662.gif </url>
		<width>527</width>
		<height>239</height>
		<align> The coefficients tables in the SPSS Regression program showing sequential hypothesis testing.</align>
		<caption> The coefficients tables in the SPSS Regression program showing sequential hypothesis testing.</caption>
		<alt></alt>
	</figure>
</P> 
		<TestItem type="MC">
			<question>The value of the sig. column in the coefficients table of multiple regression</question>
			<answer type="correct">will be the same the significance level of R squared change if that variable was entered last by itself to the sequential regression model.</answer>
			<answer>will be the same value as the R squared change.</answer>
			<answer>will be the same the significance level of R squared change if that variable was entered first by itself to the sequential regression model.</answer>
			<answer>can only be interpreted using the standardized regression weights.</answer>
			<difficulty></difficulty>
			<discriminability></discriminability>
			<author>David Stockburger</author>
			<date>03/24/2001</date>
			<concept>Linear regression with two variables</concept>
		</TestItem>
<P>Note that the "Sig." level for the X<SUB>3</SUB> variable in model 2 (.562) is the same as the "Sig. F Change" in the preceding table. The interpretation of the "Sig." level for the "Coefficients" is now apparent. It is the significance of the addition of that variable given all the other independent variables are already in the regression equation. Note also that the "<index>Sig. </index>" Value for X<SUB>1</SUB> in Model 2 is .039, still significant, but less than the significance of X<SUB>1</SUB> alone (Model 1 with a value of .000). Thus a variable may become "less significant" in combination with another variable than by itself. </P>
	<TestItem type="MC">
		<question>The Sig. level provided by SPSS on the Coefficients table in the Multiple Regression procedure </question>
		<answer type="incorrect">is the significance of the correlation of that variable with the value of Y</answer>
		<answer type="incorrect">tells whether that variable in isolation is a significant predictor of Y</answer>
		<answer type="incorrect">can sometimes be negative when that variable is an extremely poor predictor of Y</answer>
		<answer type="correct">will always be the same as the Sig. level of R<sup>2</sup> change when that variable is added to the regression equation last.</answer>
		<difficulty></difficulty>
		<discriminability></discriminability>
		<author>David Stockburger</author>
		<date>03/05/2001</date>
		<concept>Linear regression with two variables</concept>
	</TestItem>
</section>
<section>
<P><h2>Visual Representation of Multiple Regression</h2></P>
<definition word="plane">a two-dimension surface in a three-dimensional space.</definition> 
<P>The regression equation, Y'<SUB>i</SUB> = b<SUB>0</SUB> + b<SUB>1</SUB>X<SUB>1i</SUB> + b<SUB>2</SUB>X<SUB>2i</SUB>, defines a <index>plane</index> in a <index>three dimensional space</index>. If all possible values of Y were computed for all possible values of X<SUB>1</SUB> and X<SUB>2</SUB>, all the points would fall on a <index>two-dimensional surface</index>. This surface can be found by computing Y' for three arbitrarily (X<SUB>1</SUB>, X<SUB>2</SUB>) pairs of data, plotting these points in a three-dimensional space, and then fitting a plane through the points in the space. The plane is represented in the three-dimensional <index>rotating scatter plot</index> as a yellow surface.</P>
<P>The residuals can be represented as the distance from the points to the plane parallel to the Y-axis. <index>Residuals</index> are represented in the rotating scatter plot as red lines.</P>
<P>Graphically, multiple regression with two independent variables fits a plane to a three-dimensional scatter plot such that the sum of squared residuals is minimized. The multiple <index>regression plane</index> is represented below for Y<SUB>1</SUB> predicted by X<SUB>1</SUB> and X<SUB>2</SUB>.</P>
<P>
<APPLET code="RSWithPlane.class" height="200" id="X12YP" style="LEFT: 0px; TOP: 0px" width="200"></APPLET></P>
<P>A similar relationship is presented below for Y<SUB>1</SUB> predicted by X<SUB>1</SUB> and X<SUB>3</SUB>.</P>
<P>
<APPLET code="RSWithPlane.class" height="200" id="X13YP" style="LEFT: 0px; TOP: 0px" width="200"></APPLET></P>
<P><script language="VBScript">
Sub window_onLoad()
  document.X12YP.RegressionLine 0.6, 0.6, .25, .25
  document.X13YP.RegressionLine 0.6, 0.6, .45, .85
end sub 
</script></P>
<definition word="hyperplane">a N-1 dimensional surface in an N dimensional space.</definition> 
<definition word="hyperspace">an N dimensional space.</definition> 
<P>While humans have difficulty visualizing data with more than three dimensions, mathematicians have no such problem in mathematically thinking about with them. When dealing with more than three dimensions, mathematicians talk about fitting a <index>hyperplane</index> in <index>hyperspace</index>.</P>
		<TestItem type="MC">
			<question>A linear regression equation predicting a single dependent variable from two independent variables can be represented as a</question>
			<answer type="correct">plane.</answer>
			<answer>curved line.</answer>
			<answer>straight line.</answer>
			<answer>rotation of the axis.</answer>
			<difficulty></difficulty>
			<discriminability></discriminability>
			<author>David Stockburger</author>
			<date>03/24/2001</date>
			<concept>Linear regression with two variables</concept>
		</TestItem> 
</section>
<section>
<P><h2>Variations of relationships</h2></P>
<P>With three variable involved, X<SUB>1</SUB>, X<SUB>2</SUB>, and Y, many varieties of relationships between variables are possible. It will prove instructional to explore three such relationships.</P>
<P><h3>Unrelated Independent Variables</h3></P>
<P>In this example, both X<SUB>1</SUB> and X<SUB>2</SUB> are correlated with Y, and X<SUB>1</SUB> and X<SUB>2</SUB> are <B>uncorrelated</B> with each other. In the example data, X<SUB>1</SUB> and X<SUB>2</SUB> are correlated with Y<SUB>1</SUB> with values of .764 and .769 respectively. The independent variables, X<SUB>1</SUB> and X<SUB>2</SUB>, are correlated with a value of .255, not exactly zero, but close enough. In this case X<SUB>1</SUB> and X<SUB>2</SUB> contribute independently to predict the variability in Y. It doesn't matter much which variable is entered into the regression equation first and which variable is entered second.</P>
<P>The following table of R square change predicts Y<SUB>1</SUB> with X<SUB>1</SUB> and then with both X<SUB>1</SUB> and X<SUB>2</SUB>.</P>
		<TestItem type="MC">
			<question>In a sequential multiple regression modeling procedure predicting Dept, adding Indp02 after Indp01 is already in the regression model would most likely</question>
				<figure>
					<description>A correlation matrix with one dependent and five independent variables. Used in the second multivariate test pool.</description>
					<url>RegressCorrMatrix.gif</url>
					<width>553</width>
					<height>349</height>
					<align></align>
					<caption></caption>
					<alt>Correlation matrix for Multivariate Test Two</alt>
				</figure>
			<answer type="correct">increase the multiple R squared, but not significantly.</answer>
			<answer>significantly increase the multiple R squared.</answer>
			<answer>leave the multiple R squared unchanged.</answer>
			<answer>decrease the multiple R squared.</answer>
			<difficulty></difficulty>
			<discriminability></discriminability>
			<author>David Stockburger</author>
			<date>03/24/2001</date>
			<concept>Linear regression with two variables</concept>
		</TestItem>
		<TestItem type="MC">
			<question>In a sequential multiple regression modeling procedure predicting Dept, adding Indp04 after Indp01 is already in the regression model would most likely</question>
				<figure>
					<description>A correlation matrix with one dependent and five independent variables. Used in the second multivariate test pool.</description>
					<url>RegressCorrMatrix.gif</url>
					<width>553</width>
					<height>349</height>
					<align></align>
					<caption></caption>
					<alt>Correlation matrix for Multivariate Test Two</alt>
				</figure>
			<answer>increase the multiple R squared, but not significantly.</answer>
			<answer type="correct">significantly increase the multiple R squared.</answer>
			<answer>leave the multiple R squared unchanged.</answer>
			<answer>decrease the multiple R squared.</answer>
			<difficulty></difficulty>
			<discriminability></discriminability>
			<author>David Stockburger</author>
			<date>03/24/2001</date>
			<concept>Linear regression with two variables</concept>
		</TestItem>
		<TestItem type="MC">
			<question>In a multiple regression model, which of the following two variables in combination would most likely best predict Dept?</question>
				<figure>
					<description>A correlation matrix with one dependent and five independent variables. Used in the second multivariate test pool.</description>
					<url>RegressCorrMatrix.gif</url>
					<width>553</width>
					<height>349</height>
					<align></align>
					<caption></caption>
					<alt>Correlation matrix for Multivariate Test Two</alt>
				</figure>
			<answer type="correct">Indp03 and Indp04</answer>
			<answer>Indp01 and Indp02</answer>
			<answer>Indp01 and Indp03</answer>
			<answer>Indp02 and Indp03</answer>
			<difficulty></difficulty>
			<discriminability></discriminability>
			<author>David Stockburger</author>
			<date>03/24/2001</date>
			<concept>Linear regression with two variables</concept>
		</TestItem>
		<TestItem type="MC">
			<question>In a sequential multiple regression modeling procedure predicting Dept, adding Indp03 after Indp01 is already in the regression model</question>
				<figure>
					<description>A regression summary table. Used in the second multivariate test pool.</description>
					<url>RegressSummary.gif</url>
					<width>622</width>
					<height>172</height>
					<align></align>
					<caption></caption>
					<alt>Regression Summary for Multivariate Test Two</alt>
				</figure>
			<answer type="correct">increases the multiple R squared, but not significantly.</answer>
			<answer>significantly increases the multiple R squared.</answer>
			<answer>leaves the multiple R squared unchanged.</answer>
			<answer>decreases the adjusted multiple R squared.</answer>
			<difficulty></difficulty>
			<discriminability></discriminability>
			<author>David Stockburger</author>
			<date>03/24/2001</date>
			<concept>Linear regression with two variables</concept>
		</TestItem>
		<TestItem type="MC">
			<question>In a sequential multiple regression modeling procedure predicting Dept01, the significance level on the coefficients table for Indp03 when both Indp01 and Indp03 have been entered in the regression model would be</question>
				<figure>
					<description>A regression summary table. Used in the second multivariate test pool.</description>
					<url>RegressSummary.gif</url>
					<width>622</width>
					<height>172</height>
					<align></align>
					<caption></caption>
					<alt>Regression Summary for Multivariate Test Two</alt>
				</figure>
			<answer type="correct">.149.</answer>
			<answer>.000.</answer>
			<answer>.053.</answer>
			<answer>unknown from the information given.</answer>
			<difficulty></difficulty>
			<discriminability></discriminability>
			<author>David Stockburger</author>
			<date>03/24/2001</date>
			<concept>Linear regression with two variables</concept>
		</TestItem>
		<TestItem type="MC">
			<question>In a sequential multiple regression modeling procedure predicting Dept01, the significance level on the coefficients table for Indp01 when only Indp01 has been entered in the regression model would be</question>
				<figure>
					<description>A regression summary table. Used in the second multivariate test pool.</description>
					<url>RegressSummary.gif</url>
					<width>622</width>
					<height>172</height>
					<align></align>
					<caption></caption>
					<alt>Regression Summary for Multivariate Test Two</alt>
				</figure>
			<answer>.149.</answer>
			<answer type="correct">.000.</answer>
			<answer>.053.</answer>
			<answer>unknown from the information given.</answer>
			<difficulty></difficulty>
			<discriminability></discriminability>
			<author>David Stockburger</author>
			<date>03/24/2001</date>
			<concept>Linear regression with two variables</concept>
		</TestItem>
		<TestItem type="MC">
			<question>In a sequential multiple regression modeling procedure predicting Dept, the significance level for R squared change on the summary table for Indp01 after Indp03 has been entered in the regression model would be</question>
				<figure>
					<description>A regression coefficients table. Used in the second multivariate test pool.</description>
					<url>RegressCoef.gif</url>
					<width>535</width>
					<height>207</height>
					<align></align>
					<caption></caption>
					<alt>Regression Summary for Multivariate Test Two</alt>
				</figure>
			<answer>.149.</answer>
			<answer>.000.</answer>
			<answer type="correct">.692.</answer>
			<answer>unknown from the information given.</answer>
			<difficulty></difficulty>
			<discriminability></discriminability>
			<author>David Stockburger</author>
			<date>03/24/2001</date>
			<concept>Linear regression with two variables</concept>
		</TestItem>
		<TestItem type="MC">
			<question>In a sequential multiple regression modeling procedure predicting Dept, the predicted value of Dept01 when Indp01=30 and Indp03=20 would be</question>
				<figure>
					<description>A regression coefficients table. Used in the second multivariate test pool.</description>
					<url>RegressCoef.gif</url>
					<width>535</width>
					<height>207</height>
					<align></align>
					<caption></caption>
					<alt>Regression Summary for Multivariate Test Two</alt>
				</figure>
			<answer type="correct">109.60.</answer>
			<answer>102.685.</answer>
			<answer>100.822.</answer>
			<answer>unknown from the information given.</answer>
			<difficulty></difficulty>
			<discriminability></discriminability>
			<author>David Stockburger</author>
			<date>03/24/2001</date>
			<concept>Linear regression with two variables</concept>
		</TestItem>
		<TestItem type="MC">
			<question>In a sequential multiple regression modeling procedure predicting Dept01, the predicted stand score value of Dept01 when the standard score of Indp01=1.30 and the standard score of Indp03=-2.20 would be</question>
				<figure>
					<description>A regression coefficients table. Used in the second multivariate test pool.</description>
					<url>RegressCoef.gif</url>
					<width>535</width>
					<height>207</height>
					<align></align>
					<caption></caption>
					<alt>Regression Summary for Multivariate Test Two</alt>
				</figure>
			<answer type="correct">-1.153.</answer>
			<answer>1.584.</answer>
			<answer>-1.249.</answer>
			<answer>unknown from the information given.</answer>
			<difficulty></difficulty>
			<discriminability></discriminability>
			<author>David Stockburger</author>
			<date>03/24/2001</date>
			<concept>Linear regression with two variables</concept>
		</TestItem>
<P>
	<figure>
		<description> The model summary table of SPSS Regression output is shown. The table contains columns for model, R, R squared, adjusted R squared, standard error of estimate, R squared change, F change, df1, df2, and significance of F change. In the example, the corresponding values are model 1, .764, .584,. .561, 16.25, .584, 25.261, 1, 18, and .000 for the first row and 2, .968, .936, .929, 6.54, .352, 92.156, 1, 17, .000 for the second.</description>
		<url> Images/mlt0658.gif </url>
		<width>600</width>
		<height>205</height>
		<align></align>
		<caption> The model summary table with R squared change option of SPSS Regression output.</caption>
		<alt> The model summary table with R squared change option of SPSS Regression output </alt>
	</figure>
</P> 
<P>The next table of <index>R square change</index> predicts Y<SUB>1</SUB> with X<SUB>2</SUB> and then with both X<SUB>1</SUB> and X<SUB>2</SUB>.</P>
<P>
	<figure>
		<description> The model summary table of SPSS Regression output is shown. The table contains columns for model, R, R squared, adjusted R squared, standard error of estimate, R squared change, F change, df1, df2, and significance of F change. In the example, the corresponding values are model 1, .7649 .591, .568, 16.11, .591, 26.022, 1, 18, and .000 for the first row and 2, .968, .936, .929, 6.54, .345, 92.1560, 1, 17, .000 for the second.</description>
		<url> Images/mlt0663.gif </url>
		<width>593</width>
		<height>196</height>
		<align> The model summary table with R squared change option of SPSS Regression output.</align>
		<caption> The model summary table with R squared change option of SPSS Regression output.</caption>
		<alt></alt>
	</figure>
</P> 
<P>The value of R square change for X<SUB>1</SUB> from Model 1 in the first case (.584) to Model 2 in the second case (.345) is not identical, but fairly close. If the correlation between X<SUB>1</SUB> and X<SUB>2</SUB> had been 0.0 instead of .255, the R square change values would have been identical.</P>
<P>Because of the structure of the relationships between the variables, slight changes in the <index>regression weights</index> would rather dramatically increase the errors in the fit of the plane to the points.</P>
<P><h3>Related Predictor Variables</h3></P>
<P>In this case, both X<SUB>1</SUB> and X<SUB>2</SUB> are correlated with Y, and X<SUB>1</SUB> and X<SUB>2</SUB> are <B>correlated</B> with each other. In the example data, X<SUB>1</SUB> and X<SUB>3</SUB> are correlated with Y<SUB>1</SUB> with values of .764 and .687 respectively. The independent variables, X<SUB>1</SUB> and X<SUB>3</SUB>, are correlated with a value of .940. In this situation it makes a great deal of difference which variable is entered into the regression equation first and which is entered second. </P>
<P>Entering X<SUB>1</SUB> first and X<SUB>3</SUB> second results in the following R square change table.</P>
<P>
	<figure>
		<description> The model summary table of SPSS Regression output is shown. The table contains columns for model, R, R squared, adjusted R squared, standard error of estimate, R squared change, F change, df1, df2, and significance of F change. In the example, the corresponding values are model 1, .764, .584,. .561, 16.25, .584, 25.261, 1, 18, and .000 for the first row and 2, .770, .592, .544, 16.55, .008, .350, 1, 17, .562 for the second.</description>
		<url> Images/mlt0661.gif </url>
		<width>589</width>
		<height>197</height>
		<align></align>
		<caption> The model summary table with R squared change option of SPSS Regression output.</caption>
		<alt> The model summary table with R squared change option of SPSS Regression output.</alt>
	</figure>
</P> 
<P>Entering X<SUB>3</SUB> first and X<SUB>1</SUB> second results in the following R square change table.</P>
		<TestItem type="MC">
			<question>If two independent variables are highly correlated</question>
			<answer type="correct">then adding the second to the regression model will result in only slightly better prediction.</answer>
			<answer>then the combined regression model will become more stable.</answer>
			<answer>then the multiple R will be high.</answer>
			<answer>then the multiple R will be low.</answer>
			<difficulty></difficulty>
			<discriminability></discriminability>
			<author>David Stockburger</author>
			<date>03/24/2001</date>
			<concept>Linear regression with two variables</concept>
		</TestItem> 
<P>
	<figure>
		<description> The model summary table of SPSS Regression output is shown. The table contains columns for model, R, R squared, adjusted R squared, standard error of estimate, R squared change, F change, df1, df2, and significance of F change. In the example, the corresponding values are model 1, .687, .472, .442, 18.30, .472, 16.114, 1, 18, and .000 for the first row and 2, .770, .592, .544, 16.55, .008, .350, 1, 17, .562 for the second.</description>
		<url> Images/mlt0664.gif </url>
		<width>595</width>
		<height>196</height>
		<align></align>
		<caption> The model summary table with R squared change option of SPSS Regression output.</caption>
		<alt> The model summary table with R squared change option of SPSS Regression output.</alt>
	</figure>
</P> 
<P>As before, both tables end up at the same place, in this case with an R<SUP>2</SUP> of .592. In this case, however, it makes a great deal of difference whether a variable is entered into the equation first or second. Variable X<SUB>3</SUB>, for example, if entered first has an R square change of .561. If entered second after X<SUB>1</SUB>, it has an <index>R square change</index> of .008. In the first case it is <index>statistically significant</index>, while in the second it is not.</P>
<P>As two independent variables become more highly correlated, the solution to the optimal regression weights becomes unstable. This can be seen in the rotating scatter plots of X<SUB>1</SUB>, X<SUB>3</SUB>, and Y<SUB>1</SUB>. The plane that models the relationship could be modified by rotating around an axis in the middle of the points without greatly changing the degree of fit. The solution to the regression weights becomes unstable. That is, there are any number of solutions to the regression weights which will give only a small difference in sum of squared residuals. This is called the problem of <I><index>multicollinearity</index></I> in mathematical vernacular.</P>
<definition word="multicollinearity">the independent variables are highly correlated with one another, or a linear combination of a subset of independent variables correlates highly with another independent variable.</definition> 
<P><h3>Suppressor Variables</h3></P>
<definition word="suppressor variable">an  independent variable that does not by itself correlate highly with the dependent variable, but when included in a set of independent variables causes the set as a whole to be more highly correlated with the dependent variable.</definition> 
<P>One of the many varieties of relationships occurs when neither X<SUB>1</SUB> nor X<SUB>2</SUB> individually correlates with Y, X<SUB>1</SUB> correlates with X<SUB>2</SUB>, but X<SUB>1</SUB> and X<SUB>2</SUB> together correlate highly with Y. This phenomena may be observed in the relationships of Y<SUB>2</SUB>, X<SUB>1</SUB>, and X<SUB>4</SUB>. In the example data neither X<SUB>1</SUB> nor X<SUB>4</SUB> is highly correlated with Y<SUB>2</SUB>, with <index>correlation coefficients</index> of .251 and .018 respectively. Variables X<SUB>1</SUB> and X<SUB>4</SUB> are correlated with a value of .847. Fitting X1 followed by X4 results in the following tables.</P>
<P>
	<figure>
		<description> The model summary table of SPSS Regression output is shown. The table contains columns for model, R, R squared, adjusted R squared, standard error of estimate, R squared change, F change, df1, df2, and significance of F change. In the example, the corresponding values are model 1, .687, .472, .442, 18.30, .472, 16.114, 1, 18, and .000 for the first row and 2, .770, .592, .544, 16.55, .008, .350, 1, 17, .562 for the second.</description>
		<url> Images/mlt0665.gif </url>
		<width>593</width>
		<height>196</height>
		<align></align>
		<caption> The model summary table with R squared change option of SPSS Regression output.</caption>
		<alt> The model summary table with R squared change option of SPSS Regression output.</alt>
	</figure>
</P> 
<P>
	<figure>
		<description> Two <index>coefficients tables</index> of SPSS Regression program are shown. They are similar to the coefficients tables described in earlier chapters, except they have three rows rather than two to describe the terms in the model. In this case there are rows for a constant term, X1, and X2. For all rows, columns provide unstandardized coefficients, standardized coefficients, t values, and significance levels.</description>
		<url> Images/mlt0666.gif </url>
		<width>527</width>
		<height>239</height>
		<align></align>
		<caption> The coefficients tables in the SPSS Regression program showing sequential hypothesis testing with a suppressor variable.</caption>
		<alt>The coefficients tables in the SPSS Regression program showing sequential hypothesis testing with a suppressor variable</alt>
	</figure>
</P> 
	<TestItem type="MC">
		<question>When predicting Y from two predictor variables, if the predictor variables are uncorrelated then </question>
		<answer type="correct">the value of R squared change will be equal to the correlation coefficient squared between the predictor variables and Y</answer>
		<answer type="incorrect">the ANOVA table provided by SPSS/WIN will almost always be significant</answer>
		<answer type="incorrect">multicollinearity is almost always a problem</answer>
		<answer type="incorrect">one of the predictor variables is most likely a suppressor variable</answer>
		<difficulty></difficulty>
		<discriminability></discriminability>
		<author>David Stockburger</author>
		<date>03/05/2001</date>
		<concept>Linear regression with two variables</concept>
	</TestItem>
	<TestItem type="MC">
		<question> When two predictor variables are highly correlated in multiple regression </question>
		<answer type="incorrect">suppressor variables are unlikely</answer>
		<answer type="correct">collinearity may be a problem</answer>
		<answer type="incorrect">the hyperplane will describe the hyperspace</answer>
		<answer type="incorrect">R<sup>2</sup> change will likely be relatively high.</answer>
		<difficulty></difficulty>
		<discriminability></discriminability>
		<author>David Stockburger</author>
		<date>03/05/2001</date>
		<concept>Linear regression with two variables</concept>
	</TestItem>
	<TestItem type="MC">
		<question> A suppressor variable relationship is possible when </question>
		<answer type="incorrect">two predictor variables are uncorrelated, neither individually correlate highly with Y, and together they have a small multiple R.</answer>
		<answer type="incorrect">two predictor variables are highly correlated, both individually highly correlate highly with Y, and together they have a large multiple R.</answer>
		<answer type="correct">two predictor variables are highly correlated, neither individually correlate highly with Y, and together they have a large multiple R.</answer>
		<answer type="incorrect">two predictor variables are moderately correlated, one correlates highly with Y, the other is negatively correlated with Y, and together they have a moderate multiple R.</answer>
		<difficulty></difficulty>
		<discriminability></discriminability>
		<author>David Stockburger</author>
		<date>03/05/2001</date>
		<concept>Linear regression with two variables</concept>
	</TestItem>
<P>In this case, the <index>regression weights</index> of both X<SUB>1</SUB> and X<SUB>4</SUB> are significant when entered together, but insignificant when entered individually. It is also noted that the regression weight for X<SUB>1</SUB> is positive (.769) and the regression weight for X<SUB>4</SUB> is negative (-.783). In this case the variance in X<SUB>1</SUB> that does not account for variance in Y<SUB>2</SUB> is cancelled or suppressed by knowledge of X<SUB>4</SUB>. Variable X<SUB>4</SUB> is called a <I><index>suppressor variable</index></I>. </P>
<P>In terms of the descriptions of the variables, if X<SUB>1</SUB> is a measure of intellectual ability and X<SUB>4</SUB> is a measure of spatial ability, it might be reasonably assumed that X<SUB>1</SUB> is composed of both verbal ability and spatial ability. If the score on a major review paper is correlated with verbal ability and not spatial ability, then subtracting spatial ability from general intellectual ability would leave verbal ability. Thus the high multiple R when spatial ability is subtracted from general intellectual ability. It is for this reason that X<SUB>1</SUB> and X<SUB>4</SUB>, while not correlated individually with Y<SUB>2</SUB>, in combination correlate fairly highly with Y<SUB>2</SUB>.</P>
<P><h2>Summary</h2></P>
<P>Multiple regression predicting a single dependent variable with two independent variables is conceptually similar to simple linear regression, predicting a single dependent variable with a single independent variable, except more weights are estimated and rather than fitting a line in a two-dimensional scatter plot, a plane is fitted to describe a three-dimensional scatter plot. Interpretation of the results is confounded by both the relationship between the two independent variables and their relationship with dependent variable.
</P>
<P>A variety of relationships and interactions between the variables were then explored. These relationships discussed barely scratched the surface of the possibilities. Suffice it to say that the more variables that are included in an analysis, the greater the complexity of the analysis. Multiple regression is usually done with more than two independent variables. The next chapter will discuss issues related to more complex regression models.</P>
</section>
</chapter>

