<?xml version="1.0"?>
<?xml:stylesheet type="text/xsl" href="MultiBook.xsl" ?>
<chapter>
	<number>6</number>
	<author>David W. Stockburger</author>
	<title>Multiple Regression with Many Predictor Variables</title>
	<modified>04/09/2001</modified>
	<URL>mlt07.xml</URL>
	<section>
<definition word="multiple regression">a statistical procedure used to predict a single dependent variable from one or more independent variables. The procedure uses a linear transformation of the independent variables to predict the dependent variable. The linear transformation is one that minimizes the sum of the squared differences between the observed and predicted values of the dependent variable.</definition>
<definition word="independent variable">the variables used to predict the dependent variable.</definition>
<definition word="dependent variable">the variable that is predicted.</definition>
		<TestItem type="MC">
			<question>In multiple regression there may be many</question>
			<answer type="correct">independent variables</answer>
			<answer>constant terms</answer>
			<answer>dependent variables</answer>
			<answer>optimal solutions</answer>
			<difficulty></difficulty>
			<discriminability></discriminability>
			<author>David Stockburger</author>
			<date>03/24/2001</date>
			<concept>Linear regression with many predictor variables</concept>
		</TestItem>
<P>The purpose of <index>multiple regression</index> is to predict a single variable from one or more independent variables.  Multiple regression with many predictor variables is an extension of <index>linear regression</index> with two predictor variables. A linear transformation of the X variables is done so that the sum of squared deviations of the observed and predicted Y is a minimum. The computations are more complex, however, because the interrelationships among all the variables must be taken into account in the weights assigned to the variables. The interpretation of the results of a multiple regression analysis is also more complex for much the same reason.</P>
<P>The prediction of Y is accomplished by the following equation: </P>
<P><math type="equation">Y'<SUB>i</SUB> = b<SUB>0</SUB> + b<SUB>1</SUB>X<SUB>1i</SUB> + b<SUB>2</SUB>X<SUB>2i </SUB>+ ... + b<SUB>k</SUB>X<SUB>ki</SUB></math></P>
		<TestItem type="MC">
			<question>In multiple regression, the unstandardized regression weights </question>
			<answer type="correct">minimize the sum of squared residuals.</answer>
			<answer>maximize the error.</answer>
			<answer>revolve around the value of the mean of Y.</answer>
			<answer>when squared they must sum to one.</answer>
			<difficulty></difficulty>
			<discriminability></discriminability>
			<author>David Stockburger</author>
			<date>03/24/2001</date>
			<concept>Linear regression with many predictor variables</concept>
		</TestItem>
<P>The "b" values are called <index>regression weights</index> and are computed in a way that minimizes the sum of squared deviations</P>
<P>
	<figure>
		<description>The sum of squared deviations of the observed and predicted Y values.</description>
		<url>Images/mlt0638.gif</url>
		<width>86</width>
		<height>45</height>
		<align></align>
		<caption>The sum of squared deviations of the observed and predicted Y values.</caption>
		<alt>The sum of squared deviations of the observed and predicted Y values.</alt>
	</figure>
</P>
<definition word="predictor variable">see independent variable.</definition>
<P>in the same manner as in simple linear regression. In this case there are K independent or <index>predictor variables</index> rather than two and K + 1 regression weights must be estimated, one for each of the K predictor variable and one for the constant (b<SUB>0</SUB>) term.</P>
		<TestItem type="MC">
			<question>In multiple regression, if there are K independent variables and N observations, how many parameters must be estimated?</question>
			<answer type="correct">K + 1</answer>
			<answer>K</answer>
			<answer>N - K - 2</answer>
			<answer>N - K - 1</answer>
			<difficulty></difficulty>
			<discriminability></discriminability>
			<author>David Stockburger</author>
			<date>03/24/2001</date>
			<concept>Linear regression with many predictor variables</concept>
		</TestItem>
</section>
<section>
<P><h2>EXAMPLE DATA</h2></P>
<P>The data used to illustrate the inner workings of multiple regression will be generated from the "Example Student." The data are presented below:</P>
<P>
<table cellPadding="1" cellSpacing="0" summary = "Data for multiple regression chapter" title="Data for multiple regression chapter.">
  <tcaption>Life Satisfaction Simulated Data</tcaption>
<TR><TH scope = "col" colspan = "1" abbr = "Sub">Subject</TH><TH scope = "col" colspan = "1" abbr = "age" >Age</TH><TH scope = "col" colspan = "1" abbr = "sex" >Gender</TH><TH scope = "col" colspan = "1" abbr = "married" >Married</TH><TH scope = "col" colspan = "1" abbr = "incomec" >IncomeC</TH><TH scope = "col" colspan = "1" abbr = "" >HealthC</TH><TH scope = "col" colspan = "1" abbr = "" >ChildC</TH><TH scope = "col" colspan = "1" abbr = "" >LifeSatC</TH><TH scope = "col" colspan = "1" abbr = "" >SES</TH><TH scope = "col" colspan = "1" abbr = "smoke" >Smoke</TH><TH scope = "col" colspan = "1" abbr = "" >Spirit</TH><TH scope = "col" colspan = "1" abbr = "" >Finish</TH><TH scope = "col" colspan = "1" abbr = "lifesat" >LifeSat7</TH><TH scope = "col" colspan = "1" abbr = "" >Income7</TH></TR>
<TR><TD>1</TD><TD>16</TD><TD>0</TD><TD>0</TD><TD>0</TD><TD>38</TD><TD>0</TD><TD>17</TD><TD>17</TD><TD>1</TD><TD>30</TD><TD>1</TD><TD>22</TD><TD>26</TD></TR>
<TR><TD>2</TD><TD>28</TD><TD>1</TD><TD>0</TD><TD>0</TD><TD>38</TD><TD>0</TD><TD>16</TD><TD>21</TD><TD>1</TD><TD>39</TD><TD>1</TD><TD>20</TD><TD>15</TD></TR>
<TR><TD>3</TD><TD>16</TD><TD></TD><TD>1</TD><TD>16</TD><TD>52</TD><TD>1</TD><TD>39</TD><TD>40</TD><TD>0</TD><TD>30</TD><TD>1</TD><TD>42</TD><TD>88</TD></TR>
<TR><TD>4</TD><TD>23</TD><TD>1</TD><TD>0</TD><TD>6</TD><TD>51</TD><TD>0</TD><TD>22</TD><TD>31</TD><TD>0</TD><TD>60</TD><TD>1</TD><TD>48</TD><TD>73</TD></TR>
<TR><TD>5</TD><TD>18</TD><TD>0</TD><TD>1</TD><TD>7</TD><TD>52</TD><TD>0</TD><TD>25</TD><TD>38</TD><TD>0</TD><TD>32</TD><TD>0</TD><TD> </TD><TD>14</TD></TR>
<TR><TD>6</TD><TD>30</TD><TD>0</TD><TD>1</TD><TD>25</TD><TD>43</TD><TD>2</TD><TD>53</TD><TD>36</TD><TD>1</TD><TD>39</TD><TD>0</TD><TD>33</TD><TD>38</TD></TR>
<TR><TD>7</TD><TD>19</TD><TD>0</TD><TD>1</TD><TD>19</TD><TD>55</TD><TD>0</TD><TD>28</TD><TD>41</TD><TD>0</TD><TD>51</TD><TD>1</TD><TD>33</TD><TD>45</TD></TR>
<TR><TD>8</TD><TD>19</TD><TD>1</TD><TD>0</TD><TD>0</TD><TD>52</TD><TD>2</TD><TD>17</TD><TD>52</TD><TD>0</TD><TD>35</TD><TD>1</TD><TD>21</TD><TD>16</TD></TR>
<TR><TD>9</TD><TD>34</TD><TD>0</TD><TD>0</TD><TD>29</TD><TD>60</TD><TD>2</TD><TD>20</TD><TD>56</TD><TD>0</TD><TD>23</TD><TD>1</TD><TD>26</TD><TD>64</TD></TR>
<TR><TD>10</TD><TD>16</TD><TD>1</TD><TD>0</TD><TD>0</TD><TD>53</TD><TD>0</TD><TD>21</TD><TD>27</TD><TD>0</TD><TD>29</TD><TD>0</TD><TD>37</TD><TD>19</TD></TR>
<TR><TD>11</TD><TD>25</TD><TD>1</TD><TD>0</TD><TD>3</TD><TD>39</TD><TD>0</TD><TD>18</TD><TD>34</TD><TD>1</TD><TD>61</TD><TD>1</TD><TD>40</TD><TD>56</TD></TR>
<TR><TD>12</TD><TD>16</TD><TD>1</TD><TD>1</TD><TD>1</TD><TD>42</TD><TD>0</TD><TD>31</TD><TD>29</TD><TD>1</TD><TD>58</TD><TD>1</TD><TD>35</TD><TD>70</TD></TR>
<TR><TD>13</TD><TD>16</TD><TD></TD><TD>0</TD><TD>0</TD><TD>43</TD><TD>0</TD><TD>15</TD><TD>28</TD><TD>1</TD><TD>39</TD><TD>1</TD><TD>32</TD><TD>71</TD></TR>
<TR><TD>14</TD><TD>16</TD><TD>0</TD><TD>1</TD><TD>18</TD><TD>54</TD><TD>1</TD><TD>34</TD><TD>38</TD><TD>0</TD><TD>40</TD><TD>0</TD><TD>37</TD><TD>44</TD></TR>
<TR><TD>15</TD><TD>16</TD><TD>1</TD><TD>0</TD><TD>0</TD><TD>52</TD><TD>0</TD><TD>20</TD><TD>38</TD><TD>0</TD><TD>27</TD><TD>1</TD><TD>35</TD><TD>25</TD></TR>
<TR><TD>16</TD><TD>32</TD><TD>1</TD><TD>1</TD><TD>26</TD><TD>54</TD><TD>1</TD><TD>39</TD><TD>37</TD><TD>0</TD><TD>30</TD><TD></TD><TD>47</TD><TD>38</TD></TR>
<TR><TD>17</TD><TD>19</TD><TD>0</TD><TD>0</TD><TD>0</TD><TD>46</TD><TD>0</TD><TD>17</TD><TD>25</TD><TD>0</TD><TD>36</TD><TD>1</TD><TD>26</TD><TD>39</TD></TR>
<TR><TD>18</TD><TD>17</TD><TD>1</TD><TD>1</TD><TD>10</TD><TD>55</TD><TD>2</TD><TD>48</TD><TD>53</TD><TD>0</TD><TD>43</TD><TD>0</TD><TD>42</TD><TD>6</TD></TR>
<TR><TD>19</TD><TD>24</TD><TD>0</TD><TD>0</TD><TD>17</TD><TD>52</TD><TD>0</TD><TD>16</TD><TD>36</TD><TD>0</TD><TD>54</TD><TD>1</TD><TD>38</TD><TD>75</TD></TR>
<TR><TD>20</TD><TD>26</TD><TD>1</TD><TD>1</TD><TD> </TD><TD>57</TD><TD>1</TD><TD>39</TD>
<TD>41</TD><TD>0</TD><TD>32</TD><TD>1</TD><TD>42</TD><TD>67</TD></TR></table>
</P>
<P>
<ul>
<li>Age</li>
<li>Gender (0=Male, 1=Female) </li>
<li>Married (0=No, 1=Yes) </li>
<li>IncomeC Income in College (in thousands) </li>
<li>HealthC Score on Health Inventory in College </li>
<li>ChildC Number of Children while in College </li>
<li>LifeSatC Score on Life Satisfaction Inventory in College </li>
<li>SES Socio Economic Status of Parents </li>
<li>LifeSatC Score on Life Satisfaction Inventory in College </li>
<li>Smoker (0=No, 1=Yes) </li>
<li>SpiritC Score on Spirituality Inventory in College </li>
<li>Finish Finish the program in college (0=No, 1=Yes) </li>
<li>LifeSat7 Score on Life Satisfaction Inventory seven years after College </li>
<li>Income7 Income seven years after College (in thousands) </li>
</ul>
</P>
<DataFile type="text">
	<url>h22.txt</url>
	<description>Life satisfaction data file in text format.</description>
</DataFile>
<DataFile type="SPSS">
	<url>h22.sav</url>
	<description>Life satisfaction data file in SPSS SAV format</description>
</DataFile>
<P>The major interest of this study is the prediction of <index>life satisfaction</index> seven years after college from the variables that can be measured while the student is in college. These data are available both as a text file and as an SPSS data file.</P>
<definition word="outlier">a value that lies outside the range of the other values in the sample.</definition>
<P>After doing a <index>univariate</index> analysis to check for outliers, the first step in analysis of data such as this is to explore the relationship borders. The minimum border of the relationships will be the bivariate correlations of all possible predictor variables with the dependent measures, LifeSat7 and Income7. The maximum border will be a linear regression model with all possible predictor variables in the regression model.</P>
</section>
<section>
<P><h2>The Correlation Matrix</h2></P>
		<TestItem type="MC">
			<question>The largest multiple R predicting a dependent variable from a single independent variable can be found </question>
			<answer type="correct">using a correlation matrix of the dependent variable with all independent variables.</answer>
			<answer>using a table of partial correlations of the dependent variable with all possible independent variables.</answer>
			<answer>using the step-down regression procedure.</answer>
			<answer>using the coefficients table predicting the dependent variable simultaneously from all independent variables.</answer>
			<difficulty></difficulty>
			<discriminability></discriminability>
			<author>David Stockburger</author>
			<date>03/24/2001</date>
			<concept>Linear regression with many predictor variables</concept>
		</TestItem>
<P>The <index>correlation matrix</index> is given below for all possible predictor variables and the two dependent measures, LifeSat7 and Income7.</P>
<P>
	<figure>
		<description>The SPSS output of a correlation matrix of eleven possible independent variables with the two dependent variables is illustrated. This table has three eleven by two tables shown, with each table having all independent variables listed as rows and the two dependent variables listed as columns. The first table shows correlation coefficients, the second significance levels, and the third sample sizes.</description>
		<url>images/mlt0768.gif</url>
		<width>316</width>
		<height>700</height>
		<align></align>
		<caption>Correlation matrix of independent variables with dependent variables.</caption>
		<alt>Correlation matrix of independent variables with dependent variables.</alt>
	</figure>
</P>
<P>The best and only significant (<alpha/> =.05) predictor of life satisfaction seven years after college was life satisfaction in college with a correlation coefficient of .494. Other relatively high correlation coefficients included: Married (.454), Health in College (.386), Gender (.350 with females showing a generally higher level of life satisfaction), and Smoking (-.349 with Non-smokers showing a generally higher level of life satisfaction).</P>
<P>Income seven years after college was best predicted by knowing whether the student finished the college program or not (.499). Other variables that predicted income included the measure of spirituality (.340) and income in college (.282).</P>
<P>The matrix of correlations of all predictor variables is presented below.</P>
<P>
	<figure>
		<description>The SPSS output of a correlation matrix of all independent variables is illustrated. In this case there are eleven possible independent variables. This table has three eleven by eleven tables shown, with each table having all variables listed as both columns and rows. The first table shows correlation coefficients, the second significance levels, and the third sample sizes.</description>
		<url>images/mlt0769.gif</url>
		<width>916</width>
		<height>706</height>
		<align></align>
		<caption>Correlation matrix of all independent variables.</caption>
		<alt>Correlation matrix of all independent variables.</alt>
	</figure>
</P>
</section>
<section>
<P><h2>The Full Model</h2></P>
		<TestItem type="MC">
			<question>The largest unadjusted multiple R predicting a dependent variable from a sample of data can be found using</question>
			<answer type="correct">the summary table with all independent variables in the model.</answer>
			<answer>a correlation matrix of the dependent variable with all independent variables.</answer>
			<answer>a table of partial correlations of the dependent variable with all possible independent variables.</answer>
			<answer>the step-up regression procedure.</answer>
			<difficulty></difficulty>
			<discriminability></discriminability>
			<author>David Stockburger</author>
			<date>03/24/2001</date>
			<concept>Linear regression with many predictor variables</concept>
		</TestItem>
<definition word="full model">the regression model that contains all the independent variables.</definition>
<P>The other boundary in multiple regression is called the <index>full model</index>, or model with all possible predictor variables included. To construct the full model, all predictor variables are included in the first block and the "Method" remains on the default value of "Enter." The three tables of output for life satisfaction seven years after college are presented below.</P>
<P>
	<figure>
		<description>The multiple regression summary table predicting life satisfaction seven years after college with eleven independent variables is shown. The value for both the multiple R is .976 and the R squared change is .953. The significance level for the R squared change is .094.</description>
		<url>Images/mlt0770.gif</url>
		<width>739</width>
		<height>181</height>
		<align></align>
		<caption>Multiple regression summary table predicting life satisfaction seven years after college with eleven independent variables.</caption>
		<alt>Multiple regression summary table predicting life satisfaction seven years after college with eleven independent variables.</alt>
	</figure>
</P>
<definition word="adjusted R squared">a multiple correlation coefficient squared that has been adjusted for both the number of independent variables in the model and the number of observations.</definition>
		<TestItem type="MC">
			<question>The difference between the unadjusted and adjusted multiple R squared will be greatest when</question>
			<answer type="correct">the number of independent variables and the number of observations are almost equal.</answer>
			<answer>there are a relatively large number of observations.</answer>
			<answer>collinearity exits between the independent variables.</answer>
			<answer>the unadjusted multiple R is less than .30.</answer>
			<difficulty></difficulty>
			<discriminability></discriminability>
			<author>David Stockburger</author>
			<date>03/24/2001</date>
			<concept>Linear regression with many predictor variables</concept>
		</TestItem>
<P>Note that the <index>unadjusted multiple R</index> for this data is .976, but that the <index>adjusted multiple R</index> is .779. This rather large change is due to the fact that a relatively small number of observations are being predicted with a relatively large number of variables. the unadjusted value of R<SUP>2</SUP> means that all subsets of predictor variables will have a value of multiple R that is smaller than .976.  Note also that these variables in combination do not significantly (Sig. F Change = .094) predict life satisfaction seven years after college.</P>
<P>The middle table <index>ANOVA</index> doesn't provide much information in addition to the R<SUP>2</SUP> change in the previous table.  Note that the "Sig. F Change" in the preceding table is the same as the "Sig." value in the "ANOVA" table.  This table was more useful in previous incarnation of multiple regression analysis (see <refs>Draper and Smith, 1981</refs>).</P>
<P>
	<figure>
		<description>The multiple regression ANOVA table from the SPSS output predicting life satisfaction seven years after college with eleven independent variables is presented. The significance value for the F statistic in this table is .094.</description>
		<url>images/mlt0771.gif</url>
		<width>534</width>
		<height>189</height>
		<align></align>
		<caption>Multiple regression ANOVA table predicting life satisfaction seven years after college with eleven independent variables.</caption>
		<alt>Multiple regression ANOVA table predicting life satisfaction seven years after college with eleven independent variables.</alt>
	</figure>
</P>
<P>
	<figure>
		<description>The multiple regression coefficients table from the SPSS output predicting life satisfaction seven years after college with eleven independent variables is presented. None of the coefficients are statistically significant at the .05 level.</description>
		<url>images/mlt0772.gif</url>
		<width>527</width>
		<height>366</height>
		<align></align>
		<caption>Multiple regression coefficients table predicting life satisfaction seven years after college with eleven independent variables.</caption>
		<alt>Multiple regression coefficients table predicting life satisfaction seven years after college with eleven independent variables.</alt>
	</figure>
</P>
<P>The full model is not <index>statistically significant</index> (F = 5.493, df = 11,3, sig.= .094), even though life satisfaction in college was statistically significant (p&lt;.05) by itself. The value for this table had a total <index>degrees of freedom</index> of 14 because four observation had <index>missing data</index> and were not included in the analysis. The other degree of freedom corresponds to the <index>intercept</index> (<index>constant</index>) of the regression line. The method of handling missing data is called "<index>listwise</index>" because all data for a particular observation are not included if a single variable is missing.</P>
<P>The "Sig." column on the "Coefficients" table presents the statistical significance of that variable given all the other variables have been entered into the model. Note that no variables are statistically significant in this table. The variable "Married" comes close (Sig. = .055), but close doesn't count in significance testing. </P>
		<TestItem type="MC">
			<question>The value in the significance column on the coefficients table in multiple regression is the</question>
			<answer type="correct">exact significance level of that variable given all the other variables have been entered into the model.</answer>
			<answer>statistical significance of the full model relative to the partial model.</answer>
			<answer>exact significance level of the full model.</answer>
			<answer>probability of the coefficient given that there were real effects.</answer>
			<difficulty></difficulty>
			<discriminability></discriminability>
			<author>David Stockburger</author>
			<date>03/24/2001</date>
			<concept>Linear regression with many predictor variables</concept>
		</TestItem>
<P>Previously it was found that the <index>correlation</index> between being married and life satisfaction seven years after college was relative high and positive (.454), meaning that individuals who were married in college were generally more satisfied with life seven years later. The <index>regression weight</index> for this same variable in the <index>full model</index> was negative (-20.542), meaning that over twenty points would be subtracted from an individual's predicted life satisfaction score seven years after college if they were married in college! Such are the nuances of multiple regression.</P>
<P>Partial output for the full model predicting the other dependent measure, income seven years after college, is presented below.</P>
<P>
	<figure>
		<description>The multiple regression summary table predicting income seven years after college with eleven independent variables is shown. The value for the multiple R is .905 and the R squared change is .820. The significance level for the R squared change is .333.</description>
		<url>images/mlt0773.gif</url>
		<width>739</width>
		<height>181</height>
		<align></align>
		<caption>Multiple regression summary table predicting income seven years after college with eleven independent variables.</caption>
		<alt>Multiple regression summary table predicting income seven years after college with eleven independent variables.</alt>
	</figure>
</P>
<P>
	<figure>
		<description>The multiple regression coefficients table from the SPSS output predicting income seven years after college with eleven independent variables is presented. None of the coefficients are statistically significant at the .05 level.</description>
		<url>images/mlt0774.gif</url>
		<width>527</width>
		<height>366</height>
		<align></align>
		<caption>Multiple regression coefficients table predicting income seven years after college with eleven independent variables.</caption>
		<alt>Multiple regression coefficients table predicting income seven years after college with eleven independent variables.</alt>
	</figure>
</P>
<P>The results are similar to the prediction on life satisfaction, with an unadjusted multiple R of .905, giving an upper limit to the combined predictive power of all the predictor variables.</P>
</section>
<section>
<P><h2>Fitting Sequential Models</h2></P>
<definition word="sequential regression models">a statistical procedure that uses multiple regression to examine how adding independent variables in stages affects the prediction equation.</definition>
<P>After the boundaries of the regression analysis have been established, the area between the extremes may be examined to get an idea of the interaction between the independent variables with respect to prediction. There are different schools of thought about how this should be accomplished. One school, hierarchical regression, argues that theory should drive the statistical model and that the decision of what and when terms enter the regression model should be determined by theoretical concerns. A second school of thought, stepwise regression, argues that the data can speak for themselves and allows the procedure to select predictor variables to enter the regression equation.</P>
<definition word="hierarchical regression">a sequential regression model procedure that enters the independent variables in a predetermined sequence.</definition>
		<TestItem type="MC">
			<question>Entering blocks of independent variables in an order determined by the statistician is done using</question>
			<answer type="correct">hierarchical regression.</answer>
			<answer>step-up regression.</answer>
			<answer>step-down regression.</answer>
			<answer>step-around regression.</answer>
			<difficulty></difficulty>
			<discriminability></discriminability>
			<author>David Stockburger</author>
			<date>03/24/2001</date>
			<concept>Linear regression with many predictor variables</concept>
		</TestItem>
		<TestItem type="MC">
			<question>A mantra that could be associated with the author of the text would be</question>
			<answer type="correct">Let the theory drive the data analysis.</answer>
			<answer>Let the data analysis drive the theory.</answer>
			<answer>Hypothesis testing of model assumptions is a critical first step.</answer>
			<answer>Start with the complex model and work toward the simple model.</answer>
			<difficulty></difficulty>
			<discriminability></discriminability>
			<author>David Stockburger</author>
			<date>03/24/2001</date>
			<concept>Linear regression with many predictor variables</concept>
		</TestItem>
<P><h3>Hierarchical Regression</h3></P>
<P><index>Hierarchical regression</index> adds terms to the regression model in stages. At each stage, an additional term or terms are added to the model and the change in R<SUP>2</SUP> is calculated. An hypothesis test is done to test whether the change in R<SUP>2</SUP> is significantly different from zero. </P>
<P>Using the example data, suppose a researcher wishes to examine the prediction of life satisfaction seven years after college in several stages. In the first stage, he/she enters demographic variables that the individual has little or no control over, age, gender, and socio-economic status of parents. In the second block variables are entered that the individual has at least some control, such as smoking, having children, being married, etc. The third block consists of the two attitudinal variables, life satisfaction and spirituality.  This is accomplished in <index>SPSS</index> by entering the independent variables in blocks.  Be sure the R<SUP>2</SUP> change box is selected as a "Statistics" option.</P>
<P>The first table is a table of what variables were entered or removed at the different stages. The second table is summary of the results of the different models.</P>
<P>
	<figure>
		<description>The multiple regression summary table of the SPSS output predicting life satisfaction seven years after college with eleven independent variables sequentially entered as three blocks of independent variables (SES, Gender, and Age), (Finish, Married, Smoke, ChildC, HealthC, and IncomeC), and (SpiritC and LifeSatC) is shown. The values for the multiple Rs are .319, .900, and .976 and the corresponding R squared change values are .102, .708, and .143. The significance levels for the R squared change (sig. F change) are .746, .117, and  .124.</description>
		<url>images/mlt0775.gif</url>
		<width>739</width>
		<height>259</height>
		<align></align>
		<caption>Multiple regression summary table predicting life satisfaction seven years after college with eleven independent variables sequentially fitted in three blocks.</caption>
		<alt>Multiple regression summary table predicting life satisfaction seven years after college with eleven independent variables sequentially fitted in three blocks.</alt>
	</figure>
</P>
<P>The largest change in R<SUP>2</SUP> was from model 1 to model 2, with an R<SUP>2</SUP> change of .708 from .102 to .810. This value was not significant, however, as were R<SUP>2</SUP> changes associated with either of the other two models. Then final model has the same multiple R as the full model presented in an earlier section.</P>
<P>The third table presents the <index>ANOVA</index> significance table for the three models. The fourth table contains the <index>regression weights</index> and significance levels for each model. As before, the "Sig." column is an hypothesis test of the significance of that variable, given all the other variables at that stage have been entered into the model.</P>
<P>
	<figure>
		<description>The multiple regression coefficients table of the SPSS output predicting life satisfaction seven years after college with eleven independent variables sequentially entered as three blocks of independent variables (SES, Gender, and Age), (Finish, Married, Smoke, ChildC, HealthC, and IncomeC), and (SpiritC and LifeSatC) is shown. As before, noe of the coefficients, other than the constant term, are statistically significant.</description>
		<url>images/mlt0776.gif</url>
		<width>527</width>
		<height>621</height>
		<align></align>
		<caption>Multiple regression coefficients table predicting life satisfaction seven years after college with eleven independent variables sequentially fitted in three blocks.</caption>
		<alt>Multiple regression coefficients table predicting life satisfaction seven years after college with eleven independent variables sequentially fitted in three blocks.</alt>
	</figure>
</P>
<P>Note how the values of the regression weights and significance levels change as a function of when they have been entered into the model and what other variables are present.</P>
<P>The fifth table presents information about variables <I>not</I> in the regression equation at any particular stage, called <index>excluded variables</index>.</P>
		<TestItem type="MC">
			<question>As additional independent variables are added or subtracted from a multiple regression model, which of the following values will remain constant for a given independent variable?</question>
			<answer>the significance level of the variable in the coefficients table</answer>
			<answer>the standardized regression coefficient</answer>
			<answer>the unstandardized regression coefficient</answer>
			<answer type="correct">none of these values will remain constant</answer>
			<difficulty></difficulty>
			<discriminability></discriminability>
			<author>David Stockburger</author>
			<date>03/24/2001</date>
			<concept>Linear regression with many predictor variables</concept>
		</TestItem>
<P>
	<figure>
		<description>The multiple regression excluded variables table of the SPSS output predicting life satisfaction seven years after college with eleven independent variables sequentially entered as three blocks of independent variables (SES, Gender, and Age), (Finish, Married, Smoke, ChildC, HealthC, and IncomeC), and (SpiritC and LifeSatC) is shown.</description>
		<url>images/mlt0777.gif</url>
		<width>534</width>
		<height>357</height>
		<align></align>
		<caption>Multiple regression excluded variables table predicting life satisfaction seven years after college with eleven independent variables sequentially fitted in three blocks.</caption>
		<alt>Multiple regression excluded variables table predicting life satisfaction seven years after college with eleven independent variables sequentially fitted in three blocks.</alt>
	</figure>
</P>
<definition word="partial correlation coefficient">the correlation between that variable and the residual of the previous model.</definition>
<definition word="tolerance">the degree to which an independent variable cannot be predicted by other independent variables.</definition>
<definition word="collinearity">the degree to which an independent variable can be predicted by other independent variables.</definition>
<P>The value of "Beta In" is the size of the <index>standardized regression weight</index> if that variable had been entered into the model by itself in the next stage. The "Sig." column is the R<SUP>2</SUP> change significance level that the variable would enter the regression equation. In this case, it can be seen that individually both INCOMEC and SPIRITC would significantly enter the regression model in the second stage. The "<index>Partial Correlation</index>" is the correlation between that variable and the residual of the previous model. The higher the partial correlation, the greater the change in R<SUP>2</SUP> if that variable were entered into the equation by itself at the next stage. </P>
<P>As described in the help files of SPSS, the "Collinearity Statistics Tolerance" is "calculated as 1 minus R squared for an independent variable when it is predicted by the other independent variables already included in the analysis." This statistic may be interpreted as "A variable with very low <index>tolerance</index> contributes little information to a model, and can cause computational problems." (SPSS v. 10 help files.) In this case LIFESATC has a low Collinearity Statistics Tolerance (7.835E-02 or .07835) in model 2 and might cause problems if entered into the model at that point.  Problems in <index>collinearity</index> were discussed in an earlier chapter in this text.</P>
		<TestItem type="MC">
			<question>Collinearity occurs when</question>
			<answer type="correct">some of the independent variables are highly correlated.</answer>
			<answer>the multiple R is less than .30.</answer>
			<answer>an independent variable is highly correlated with the dependent variable.</answer>
			<answer>the residuals are not normally distributed.</answer>
			<difficulty></difficulty>
			<discriminability></discriminability>
			<author>David Stockburger</author>
			<date>03/24/2001</date>
			<concept>Linear regression with many predictor variables</concept>
		</TestItem>
		<TestItem type="MC">
			<question>The following figure presents example SPSS output from a study predicting number of offences from various predictor variables.</question>
				<figure>
					<description>Example SPSS output consisting of a model summary and coefficients table used for testing purposes.</description>
					<url>mregress1.gif</url>
					<width>659</width>
					<height>614</height>
					<align></align>
					<caption></caption>
					<alt>Figure for test</alt>
				</figure>
			<answer type="correct"></answer>
			<answer></answer>
			<answer></answer>
			<answer></answer>
			<difficulty></difficulty>
			<discriminability></discriminability>
			<author>David Stockburger</author>
			<date>04/12/2001</date>
			<concept>Linear regression with many predictor variables</concept>
		</TestItem>
		<TestItem type="MC">
			<question>The following figure presents example SPSS output from a study predicting number of offences from various predictor variables. The analysis could be best described as</question>
				<figure>
					<description>Example SPSS output consisting of a model summary and coefficients table used for testing purposes.</description>
					<url>mregress1.gif</url>
					<width>659</width>
					<height>614</height>
					<align></align>
					<caption></caption>
					<alt>Figure for test</alt>
				</figure>
			<answer type="correct">hierarchical.</answer>
			<answer>step-up.</answer>
			<answer>step-down.</answer>
			<answer>non-linear.</answer>
			<difficulty></difficulty>
			<discriminability></discriminability>
			<author>David Stockburger</author>
			<date>04/12/2001</date>
			<concept>Linear regression with many predictor variables</concept>
		</TestItem>
		<TestItem type="MC">
			<question>The following figure presents example SPSS output from a study predicting number of offences from various predictor variables. The best predictor, given all the other variables were already entered in the model would be</question>
				<figure>
					<description>Example SPSS output consisting of a model summary and coefficients table used for testing purposes.</description>
					<url>mregress1.gif</url>
					<width>659</width>
					<height>614</height>
					<align></align>
					<caption></caption>
					<alt>Figure for test</alt>
				</figure>
			<answer type="correct">AGE@REF</answer>
			<answer>VIQ</answer>
			<answer>subabuse</answer>
			<answer>speced</answer>
			<difficulty></difficulty>
			<discriminability></discriminability>
			<author>David Stockburger</author>
			<date>04/12/2001</date>
			<concept>Linear regression with many predictor variables</concept>
		</TestItem>
		<TestItem type="MC">
			<question>The following figure presents example SPSS output from a study predicting number of offences from various predictor variables. The variable that least predicts given all the other variables are entered into the model would be</question>
				<figure>
					<description>Example SPSS output consisting of a model summary and coefficients table used for testing purposes.</description>
					<url>mregress1.gif</url>
					<width>659</width>
					<height>614</height>
					<align></align>
					<caption></caption>
					<alt>Figure for test</alt>
				</figure>
			<answer type="correct">speced</answer>
			<answer>AGE@REF</answer>
			<answer>VIQ</answer>
			<answer>subabuse</answer>
			<difficulty></difficulty>
			<discriminability></discriminability>
			<author>David Stockburger</author>
			<date>04/12/2001</date>
			<concept>Linear regression with many predictor variables</concept>
		</TestItem>
		<TestItem type="MC">
			<question>The following figure presents example SPSS output from a study predicting number of offences from various predictor variables. The conclusion reached with respect to the three IQ measures could best be stated as</question>
				<figure>
					<description>Example SPSS output consisting of a model summary and coefficients table used for testing purposes.</description>
					<url>mregress1.gif</url>
					<width>659</width>
					<height>614</height>
					<align></align>
					<caption></caption>
					<alt>Figure for test</alt>
				</figure>
			<answer type="correct">in combination they significantly increased the predictive power of the model.</answer>
			<answer>in combination they increased the predictive power of the model, but not significantly.</answer>
			<answer>any increase in predictive power was mainly due to VIQ and not FSIQ or PMIQ.</answer>
			<answer>individually all three measures significantly increased the predicative power of the model given that the other IQ measures were already in the model.</answer>
			<difficulty></difficulty>
			<discriminability></discriminability>
			<author>David Stockburger</author>
			<date>04/12/2001</date>
			<concept>Linear regression with many predictor variables</concept>
		</TestItem>
		<TestItem type="MC">
			<question>The following figure presents example SPSS output from a study predicting number of offences from various predictor variables. The conclusion reached with respect to AGE@REF measure could best be stated as</question>
				<figure>
					<description>Example SPSS output consisting of a model summary and coefficients table used for testing purposes.</description>
					<url>mregress1.gif</url>
					<width>659</width>
					<height>614</height>
					<align></align>
					<caption></caption>
					<alt>Figure for test</alt>
				</figure>
			<answer type="correct">the variable became a better predictor as more variables were entered into the model.</answer>
			<answer>the variable was statistically significant by itself.</answer>
			<answer>the larger the size of this variable, the fewer the number of offences.</answer>
			<answer>this variable would be eliminated first in a step-down regression procedure.</answer>
			<difficulty></difficulty>
			<discriminability></discriminability>
			<author>David Stockburger</author>
			<date>04/12/2001</date>
			<concept>Linear regression with many predictor variables</concept>
		</TestItem>

		<TestItem type="MC">
			<question>The following figure presents example SPSS output predicting current salary using a regression analysis of only the clerical workers in the Employees.sav file included with the SPSS package.</question>
				<figure>
					<description>Example SPSS output consisting of a model summary and coefficients table used for testing purposes. The analysis could best be described as</description>
					<url>mregress2.gif</url>
					<width>554</width>
					<height>689</height>
					<align></align>
					<caption></caption>
					<alt>Figure for test</alt>
				</figure>
			<answer type="correct">hierarchical.</answer>
			<answer>step-up.</answer>
			<answer>step-down.</answer>
			<answer>non-linear.</answer>
			<difficulty></difficulty>
			<discriminability></discriminability>
			<author>David Stockburger</author>
			<date>04/12/2001</date>
			<concept>Linear regression with many predictor variables</concept>
		</TestItem>
		<TestItem type="MC">
			<question>The following figure presents example SPSS output predicting current salary using a regression analysis of only the clerical workers in the Employees.sav file included with the SPSS package. With respect to the Months since hire variable</question>
				<figure>
					<description>Example SPSS output consisting of a model summary and coefficients table used for testing purposes.</description>
					<url>mregress2.gif</url>
					<width>554</width>
					<height>689</height>
					<align></align>
					<caption></caption>
					<alt>Figure for test</alt>
				</figure>
			<answer type="correct">the fewer the number of months, the less the current salary, given the other variables in the model.</answer>
			<answer>the greater the number of months, the less the current salary, given the other variables in the model.</answer>
			<answer>the direction of the relationship of this variable to the current salary changed as a function of the other variables in the equation.</answer>
			<answer>no conclusions about this data can be reached based on the information given.</answer>
			<difficulty></difficulty>
			<discriminability></discriminability>
			<author>David Stockburger</author>
			<date>04/12/2001</date>
			<concept>Linear regression with many predictor variables</concept>
		</TestItem>
		<TestItem type="MC">
			<question>The following figure presents example SPSS output predicting current salary using a regression analysis of only the clerical workers in the Employees.sav file included with the SPSS package. In making a 95% confidence interval for a given score using all the included variables, the range of the confidence interval would be approximately</question>
				<figure>
					<description>Example SPSS output consisting of a model summary and coefficients table used for testing purposes.</description>
					<url>mregress2.gif</url>
					<width>554</width>
					<height>689</height>
					<align></align>
					<caption></caption>
					<alt>Figure for test</alt>
				</figure>
			<answer type="correct">$19098</answer>
			<answer>$4872</answer>
			<answer>$9436</answer>
			<answer>$2593</answer>
			<difficulty></difficulty>
			<discriminability></discriminability>
			<author>David Stockburger</author>
			<date>04/12/2001</date>
			<concept>Linear regression with many predictor variables</concept>
		</TestItem>
		<TestItem type="MC">
			<question>The following figure presents example SPSS output predicting current salary using a regression analysis of only the clerical workers in the Employees.sav file included with the SPSS package. Conclusions with respect to gender and minority status might include</question>
				<figure>
					<description>Example SPSS output consisting of a model summary and coefficients table used for testing purposes.</description>
					<url>mregress2.gif</url>
					<width>554</width>
					<height>689</height>
					<align></align>
					<caption></caption>
					<alt>Figure for test</alt>
				</figure>
			<answer type="correct">everything else being equal, both females and minorities were paid significantly less.</answer>
			<answer>everything else being equal, females were paid less, but not significantly less, and minorities were paid significantly less.</answer>
			<answer>everything else being equal, females were paid significantly less while minorities were paid significantly more.</answer>
			<answer>everything else being equal, both females and minorities were paid significantly more.</answer>
			<difficulty></difficulty>
			<discriminability></discriminability>
			<author>David Stockburger</author>
			<date>04/12/2001</date>
			<concept>Linear regression with many predictor variables</concept>
		</TestItem>
		<TestItem type="MC">
			<question>The following figure presents example SPSS output predicting current salary using a regression analysis of only the clerical workers in the Employees.sav file included with the SPSS package. Which of the following variables has results that go against common sense?</question>
				<figure>
					<description>Example SPSS output consisting of a model summary and coefficients table used for testing purposes.</description>
					<url>mregress2.gif</url>
					<width>554</width>
					<height>689</height>
					<align></align>
					<caption></caption>
					<alt>Figure for test</alt>
				</figure>
			<answer type="correct">Previous experience</answer>
			<answer>Months since hire</answer>
			<answer>Beginning salary</answer>
			<answer>Minority classification</answer>
			<difficulty></difficulty>
			<discriminability></discriminability>
			<author>David Stockburger</author>
			<date>04/12/2001</date>
			<concept>Linear regression with many predictor variables</concept>
		</TestItem>
		<TestItem type="MC">
			<question>The following figure presents example SPSS output predicting current salary using a regression analysis of only the clerical workers in the Employees.sav file included with the SPSS package. Everything else being equal, the best predictor of current salary would be</question>
				<figure>
					<description>Example SPSS output consisting of a model summary and coefficients table used for testing purposes.</description>
					<url>mregress2.gif</url>
					<width>554</width>
					<height>689</height>
					<align></align>
					<caption></caption>
					<alt>Figure for test</alt>
				</figure>
			<answer type="correct">Beginning salary</answer>
			<answer>Previous experience</answer>
			<answer>Minority classification</answer>
			<answer>Months since hire</answer>
			<difficulty></difficulty>
			<discriminability></discriminability>
			<author>David Stockburger</author>
			<date>04/12/2001</date>
			<concept>Linear regression with many predictor variables</concept>
		</TestItem>
</section>
<section>
<P><h3>Step-up Regression</h3></P>
	<TestItem type="MC">
		<question>A full model in multiple regression will likely be "less significant" than a partial model </question>
		<answer type="incorrect">when multicollinearity is present</answer>
		<answer type="correct">when the additional variables only slightly increase predictive power</answer>
		<answer type="incorrect">when the additional variables are uncorrelated with the predictor variables already in the model</answer>
		<answer type="incorrect">it is mathematically impossible</answer>
		<difficulty></difficulty>
		<discriminability></discriminability>
		<author>David Stockburger</author>
		<date>03/05/2001</date>
		<concept></concept>
	</TestItem>
	<TestItem type="MC">
		<question>In a step-up regression, the next variable to be entered into the regression equation </question>
		<answer type="incorrect">will have the largest value of "Beta in" in the "Excluded Variables" table.</answer>
		<answer type="incorrect">will have the smallest value of "Beta in" in the "Excluded Variables" table.</answer>
		<answer type="correct">will have the largest value of "Partial Correlation" in the "Excluded Variables" table.</answer>
		<answer type="incorrect">will have the largest value of "Sig." in the "Coefficients" table.</answer>
		<difficulty></difficulty>
		<discriminability></discriminability>
		<author>David Stockburger</author>
		<date>03/05/2001</date>
		<concept></concept>
	</TestItem>
	<TestItem type="MC">
		<question> The step-up and step-down procedures in multiple regression </question>
		<answer type="incorrect">will always converge on the same solution</answer>
		<answer type="incorrect">will always provide an "optimal" solution using the least-squares criterion</answer>
		<answer type="incorrect">require the assumption of multicollinearity be satisfied for the uncorrelated predictor variables</answer>
		<answer type="correct">are viewed as the mindless application of statistical procedures to multivariate data by some statisticians</answer>
		<difficulty></difficulty>
		<discriminability></discriminability>
		<author>David Stockburger</author>
		<date>03/05/2001</date>
		<concept></concept>
	</TestItem>
		<TestItem type="MC">
			<question>In a step-down regression procedure, the next variable to be removed will have the</question>
			<answer type="correct">highest significance level for that coefficient.</answer>
			<answer>highest standardized coefficient.</answer>
			<answer>highest unstandardized coefficient.</answer>
			<answer>lowest unstandardized coefficient.</answer>
			<difficulty></difficulty>
			<discriminability></discriminability>
			<author>David Stockburger</author>
			<date>03/24/2001</date>
			<concept>Linear regression with many predictor variables</concept>
		</TestItem>
<definition word="step-up regression">a multiple regression modeling procedure that sequentially adds variables to the regression equation based on how much additional predictive power the variable adds to the prediction equation.</definition>
<P>At any stage, rather than entering all the variables as a block, <index>step-up regression</index> enters the variables one at a time, the order of entry determined by the variable that causes the greatest R<SUP>2</SUP> increase, given the variables already entered into the model. To do a step-up regression using SPSS, enter all the variables in the first block and select "Method" as "Forward."</P>
<P>The results of the step-up regression can be better understood if the <index>correlation</index> coefficients are recomputed between life satisfaction seven years after college and all the predictor variables, using the "<index>Listwise</index>" option for <index>missing data</index>.</P>
<P>
	<figure>
		<description>The SPSS output of a correlation matrix using the listwise deletion of missing data of eleven possible independent variables with life satisfaction seven years after college is illustrated. This table has two eleven by one tables shown, with each table having all independent variables listed as rows and the dependent variable listed as columns. The first table shows correlation coefficients and the second significance levels. All correlations are based on a sample size of 15 scores, those being the scores with complete data on all variables.</description>
		<url>images/mlt0778.gif</url>
		<width>244</width>
		<height>521</height>
		<align></align>
		<caption>Correlation matrix using the listwise deletion of missing data of eleven possible independent variables with life satisfaction seven years after college.</caption>
		<alt>Correlation matrix using the listwise deletion of missing data of eleven possible independent variables with life satisfaction seven years after college.</alt>
	</figure>
</P>
<P>Note that the correlation coefficients have changed from the original table and that the highest correlation is with SPIRITC with a value of .587.  The SPIRITC variable, then, would enter the step-up regression in the first step.  The <index>partial correlation</index> of all the remaining variables and the <index>residual</index> of the first stage model would then be computed.  The variable with the largest partial correlation would be entered into the regression at the next step, given that it was statistically significant.  The criteria for entering variables into the regression model may be optionally adjusted.</P>
<P>
	<figure>
		<description>The multiple regression coefficients table of the SPSS output predicting life satisfaction seven years after college with eleven independent variables entered using the step-up option is shown. The table shows two models, the first with the variable SpiritC and the second with the variables SpiritC and Finish. The multiple R for the first model is .587 and a similar value for the second is .743. Corresponding values for R squared change are .344 and .208 with significance levels of .022 and .036.</description>
		<url>images/mlt0779.gif</url>
		<width>739</width>
		<height>205</height>
		<align></align>
		<caption>Multiple regression summary table predicting life satisfaction seven years after college with eleven independent variables using a step-up procedure.</caption>
		<alt>Multiple regression summary table predicting life satisfaction seven years after college with eleven independent variables using a step-up procedure.</alt>
	</figure>
</P>
<P>The "Model Summary" table shows that two variables, SPIRITC and FINISH, are entered into the prediction model with a <index>multiple R</index> of .743. The SPIRITC variable was entered first (it had the largest correlation with life satisfaction) and FINISH was entered next.</P>
<P>The "Coefficients" table is presented next.</P>
<P>
	<figure>
		<description>The multiple regression coefficients table of the SPSS output predicting life satisfaction seven years after college with eleven independent variables entered using the step-up option is shown. The table shows two models, the first with the variable SpiritC and the second with the variables SpiritC and Finish. The unstandardized regression coefficients for the second model are 20.307 for the constant, .449 for SpiritC, and -8.374 for Finish. Corresponding significance levels are .004, .005, and .036.</description>
		<url>images/mlt0780.gif</url>
		<width>527</width>
		<height>239</height>
		<align></align>
		<caption>Multiple regression coefficients table predicting life satisfaction seven years after college with eleven independent variables using a step-up procedure.</caption>
		<alt>Multiple regression coefficients table predicting life satisfaction seven years after college with eleven independent variables using a step-up procedure.</alt>
	</figure>
</P>
<P>The final table presents information about variables not in the regression equation.</P>
<P>
	<figure>
		<description>The multiple regression coefficients table of the SPSS output predicting life satisfaction seven years after college with eleven independent variables entered using the step-up option is shown. The table shows two models, the first with the variable SpiritC and the second with the variables SpiritC and Finish. The first table shows all variables except SpiritC and the second shows all variables except SpiritC and Finish. In the first table, both Finish (.036) and HealthC (.044) are statistically significant, while in the second, none of the variables are statistically significant.</description>
		<url>images/mlt0781.gif</url>
		<width>534</width>
		<height>541</height>
		<align></align>
		<caption>Multiple regression excluded variables table predicting life satisfaction seven years after college with eleven independent variables using a step-up procedure.</caption>
		<alt>Multiple regression excluded variables table predicting life satisfaction seven years after college with eleven independent variables using a step-up procedure.</alt>
	</figure>
</P>
<P>At the conclusion of the first model, both FINISH and HEALTHC would significantly (p&lt;.05) enter the regression equation at the next step. Since FINISH had a higher partial correlation (-.653) than HEALTHC (.544) it was entered into the equation at the next step. When FINISH was entered into the equation in model 2, HEALTHC would no longer significantly enter the regression model.</P>
</section>
<section>
<P><h3>Step-down Regression</h3></P>
		<TestItem type="MC">
			<question>The step-up and step-down regression procedures</question>
			<answer type="correct">will seldom result in similar regression models.</answer>
			<answer>will usually result in a regression model with the same variables.</answer>
			<answer>will usually result in a regression model with the same number of variables.</answer>
			<answer>will always result in a regression model with the same variables.</answer>
			<difficulty></difficulty>
			<discriminability></discriminability>
			<author>David Stockburger</author>
			<date>03/24/2001</date>
			<concept>Linear regression with many predictor variables</concept>
		</TestItem>
		<TestItem type="MC">
			<question>The next variable to be eliminated in a step-down regression procedure</question>
			<answer type="correct">will have the smallest standardized coefficient.</answer>
			<answer>will have the smallest significance level.</answer>
			<answer>will have the largest unstandardized coefficient.</answer>
			<answer>will have the largest standardized coefficient.</answer>
			<difficulty></difficulty>
			<discriminability></discriminability>
			<author>David Stockburger</author>
			<date>03/24/2001</date>
			<concept>Linear regression with many predictor variables</concept>
		</TestItem>
<definition word="step-down regression">a multiple regression modeling procedure that sequentially subtracts variables from the regression equation based on how little additional predictive power the variable adds to the current prediction equation.</definition>
<P>By starting with a full model and eliminating variables that do not significantly enter the regression equation, a <index>partial model</index> may be found. This can be accomplished in SPSS by selecting a "Method" of "<index>Backward</index>" in the linear regression procedure. As can be seen below, the results of this analysis differ greatly from the use of the Forward Method.</P>
<P>The Model Summary table is presented below.</P>
<P>
	<figure>
		<description>The multiple regression coefficients table of the SPSS output predicting life satisfaction seven years after college with eleven independent variables entered using the step-down option is shown. Starting with all eleven independent variables, five models are shown, sequentially dropping HealthC, Smoke, IncomeC, and Gender. The multiple R drops from a value of .989 to a value of .981. None of the R squared change values are statistically significant.</description>
		<url>images/mlt0782.gif</url>
		<width>739</width>
		<height>353</height>
		<align></align>
		<caption>Multiple regression summary table predicting life satisfaction seven years after college with eleven independent variables using a step-down procedure.</caption>
		<alt>Multiple regression summary table predicting life satisfaction seven years after college with eleven independent variables using a step-down procedure.</alt>
	</figure>
</P>
<P>As the table above illustrates, this method starts with the <index>full model</index> with an R<SUP>2</SUP> of .978. The variable of HEALTC is eliminated at the first step because it has the lowest <index>partial correlation</index> of any variable given that all the other predictor variables are entered into the regression equation. The next variables eliminated, in order, were SMOKE, INCOMEC, and GENDER, resulting in a model with eight predictor variables and a multiple R of .981. Note that all variables in Model 5 were significant in the following table.</P>
<P>
	<figure>
		<description>The multiple regression coefficients table of the SPSS output predicting life satisfaction seven years after college with eleven independent variables entered using the step-down option is shown. Starting with all eleven independent variables, five models are shown, sequentially dropping HealthC, Smoke, IncomeC, and Gender. In the final model, all coefficients are statistically significant except the constant term (.837) and Age (.058).</description>
		<url>images/mlt0783.gif</url>
		<width>527</width>
		<height>1151</height>
		<align></align>
		<caption>Multiple regression coefficients table predicting life satisfaction seven years after college with eleven independent variables using a step-down procedure.</caption>
		<alt>Multiple regression coefficients table predicting life satisfaction seven years after college with eleven independent variables using a step-down procedure.</alt>
	</figure>
</P>
<P>As before, the table of <index>excluded values</index> gives information about variables not in the regression equation at any point in time.</P>
<P>
	<figure>
		<description>The multiple regression excluded variables table of the SPSS output predicting life satisfaction seven years after college with eleven independent variables entered using the step-down option is shown. Starting with all eleven independent variables, five models are shown, sequentially dropping HealthC, Smoke, IncomeC, and Gender. None of the excluded variables are statistically significant.</description>
		<url>images/mlt0784.gif</url>
		<width>534</width>
		<height>445</height>
		<align></align>
		<caption>Multiple regression excluded variables table predicting life satisfaction seven years after college with eleven independent variables using a step-down procedure.</caption>
		<alt>Multiple regression excluded variables table predicting life satisfaction seven years after college with eleven independent variables using a step-down procedure.</alt>
	</figure>
</P>
<P>Note that none of these variables were significant in the final model.</P>
</section>
<section>
<P><h3>Caveats and Options</h3></P>
<P>Stepwise procedures allow the data to drive the theory. Some statisticians (I would have to include myself among them) object to the mindless application of statistical procedures to multivariate data.</P>
<P>There is no guarantee that the Forward and Backward procedures would agree on the same model if the options were set to different values so that the same number of variables were entered into the model. At some point a variable may no longer contribute to the regression model because of other variables in the model, even if it did contribute at an earlier point in time. For that reason SPSS provides methods of "STEPWISE" and "REMOVE" which test at each stage to see if a variable still belongs in the model. These methods could be considered a combination of <index>Forward</index> and <index>Backward</index> methods. Using them still does not guarantee that the methods will converge on a single regression model.</P>
</section>
<section>
<P><h2>Cross-validation</h2></P>
	<TestItem type="MC">
		<question>Cross-validation</question>
		<answer type="correct">will almost always result in substantial shrinkage of the multiple R</answer>
		<answer type="incorrect">is easy to apply in real-life applications</answer>
		<answer type="incorrect">is related to mean-validation, except crosses are used instead of means</answer>
		<answer type="incorrect">is viewed as the mindless application of statistical procedures to multivariate data by some statisticians</answer>
		<difficulty></difficulty>
		<discriminability></discriminability>
		<author>David Stockburger</author>
		<date>03/05/2001</date>
		<concept></concept>
	</TestItem>
		<TestItem type="MC">
			<question>Using a similar multiple regression analysis on two separate samples of data</question>
			<answer type="correct">will result in two different regression models.</answer>
			<answer>will result in different unstandardized regression coefficients, but identical standardized regression coefficients.</answer>
			<answer>will result in identical values for the multiple R.</answer>
			<answer>will result in both standardized and unstandardized coefficients with similar absolute values, but different signs.</answer>
			<difficulty></difficulty>
			<discriminability></discriminability>
			<author>David Stockburger</author>
			<date>03/24/2001</date>
			<concept>Linear regression with many predictor variables</concept>
		</TestItem>
		<TestItem type="MC">
			<question>Cross-validation</question>
			<answer type="correct">requires two separate samples of data.</answer>
			<answer>can be done using the Probability Calculator.</answer>
			<answer>is a statistical technique to insure the significance of the analysis.</answer>
			<answer>tests whether the dependent measure is reliable.</answer>
			<difficulty></difficulty>
			<discriminability></discriminability>
			<author>David Stockburger</author>
			<date>03/24/2001</date>
			<concept>Linear regression with many predictor variables</concept>
		</TestItem>
<definition word="shrinkage">the loss of predictive power in a model when a sample other than the sample used to create the model is used.</definition>
<definition word="cross-validation">a statistical procedure that checks the accuracy of a model by using the model created by a given sample to predict a different sample.</definition>
<P>The manner is which <index>regression weights</index> are computed guarantee that they will provide an <index>optimal fit</index> with respect to the <index>least square criterion</index> <I>for the existing set of data</I>. If a statistician wishes to predict a different set of data, the regression weights are no longer optimal. There will be substantial <index>shrinkage</index> in the value of R<SUP>2</SUP> if the weights estimated on one set of data are used on a second set of data. The amount of shrinkage can be estimated using a <index>cross-validation procedure</index>.</P>
<P>In cross-validation, regression weights are estimated using one set of data and are tested on a second set of data. If the regression weights estimated on the first set of data predict the second set of data, the weights are said to be cross-validated.</P>
<P>Suppose an industrial/organization psychologist wished to predict job success using four different test scores. The psychologist could collect the four test scores from a randomly selected group of job applicants. After hiring all the selected group of job applicants, regardless of their scores on the tests, a measure of success on the job is taken. Success on the job is now predicted from the four test scores using a multiple regression procedure. Stepwise procedures may be used to eliminate tests that are predicting similar variance in job success. In any case, the psychologist is now ready to predict job success from the test scores for a new set of job applicants.</P>
<P>Not so fast! Careful application of multiple regression methods require that the regression weights be cross-validated on a different set of job applicants. Another random sample of job applicants is taken. Each applicant is given the test battery and then hired, again regardless of what scores they made on the tests. After some time on the job a measure of job success is taken. Job success is then predicted by using the regression weights found using the first set of job applicants. If the new data is successfully predicted using old regression weights, the regression procedure is said to be cross-validated. It is expected that the accuracy of prediction will not be as good for the second set of data. This is because the regression procedure is subject to variances in data from sample to sample, called "<index>error</index>". The greater the error in the regression procedure, the greater the shrinkage of the value of R<SUP>2</SUP>.</P>
<P>The above procedure is an idealized method of the use of multiple regression. In many real life applications of the procedure, random samples of job applicants are not feasible. There may be considerable pressure from administration to select on the basis of the test battery for the first sample, let alone the second sample needed for cross-validation. In either case the multiple regression procedure is compromised. In most cases application of regression procedures to a selected rather than a random sample will result in poorer predictions. All this must be kept in mind when evaluating research on prediction models.</P>
<P><h2>Summary</h2></P>
<P>Multiple regression provides a powerful method to analyze multivariate data.  Considerable caution, however, must be observed when interpreting the results of a multiple regression analysis.  Personal recommendations include a theory that drives the selection of variables and cross-validation of the results of the analysis.</P>
</section>
</chapter>

