<?xml version='1.0'?>
<?xml:stylesheet type="text/xsl" href="MultiBook.xsl" ?>
<chapter>
<number>7</number>
<author>David W. Stockburger</author>
<title>Multiple Regression with Categorical Variables</title>
<modified>04/16/2001</modified>
<URL>mlt08.xml</URL>
<section>
<P>When a researcher wishes to include a <index>categorical variable</index> with more than two level in a multiple regression prediction model, additional steps are needed to insure that the results are interpretable.  These steps include recoding the categorical variable into a number of separate, dichotomous variables.  This recoding is called "<index>dummy coding</index>." In order for the rest of the chapter to make sense, some specific topics related to multiple regression will be reviewed at this time.</P>
<definition word="dummy coding">recoding a categorical variable with more than two levels into a number of dichotomous variables so that they may be used in a linear model, such a linear regression</definition>
		<TestItem type="MC">
			<question>Dummy coding involves</question>
			<answer type="correct">recoding a categorical variable with more than two levels into a number of dichotomous variables.</answer>
			<answer>transforming a continuous variable, such as age, into a number of discrete units.</answer>
			<answer>recoding a number of categorical variables into a single continuous variable.</answer>
			<answer>reversing the order that the independent variables appear in a multiple regression model.</answer>
			<difficulty></difficulty>
			<discriminability></discriminability>
			<author>David Stockburger</author>
			<date>03/24/2001</date>
			<concept>Multiple Regression with Categorical Variables</concept>
		</TestItem><P><h2>The Multiple Regression Model</h2></P>
<P>Multiple regression is a linear transformation of the X variables such that the sum of squared deviations of the observed and predicted Y is minimized. The prediction of Y is accomplished by the following equation: </P>
<P>Y'<SUB>i</SUB> = b<SUB>0</SUB> + b<SUB>1</SUB>X<SUB>1i</SUB> + b<SUB>2</SUB>X<SUB>2i </SUB>+ ... + b<SUB>k</SUB>X<SUB>ki</SUB></P>
<P>The "b" values are called regression weights and are computed in a way that minimizes the sum of squared deviations.</P>
<P>
	<figure>
		<description>The sum of Y observed minus Y predicted squared is shown. It is also called the sum of squared residuals or sum of squared errors.</description>
		<url>Images/mlt0638.gif</url>
		<width>86</width>
		<height>45</height>
		<align></align>
		<caption></caption>
		<alt>Sum of squared residuals.</alt>
	</figure></P>
<P><h3>Dichotomous Predictor Variables</h3></P>
<definition word="dichotomous predictor variable">a categorical variable with two levels.</definition>
<P>Categorical variables with two levels may be directly entered as predictor or predicted variables in a multiple regression model. Their use in multiple regression is a straightforward extension of their use in simple linear regression. When entered as predictor variables, interpretation of regression weights depends upon how the variable is coded. If the <index>dichotomous variable</index> is coded as 0 and 1, the regression weight is added or subtracted to the predicted value of Y depending upon whether it is positive or negative. If the dichotomous variable is coded as -1 and 1, then if the regression weight is positive, it is subtracted from the group coded as -1 and added to the group coded as 1. If the regression weight is negative, then addition and subtraction is reversed. Dichotomous variables can be included in hypothesis tests for R<SUP>2</SUP> change like any other variable.</P>
<P><h3>Testing for Blocks of Variables</h3></P>
<P>A block of variables can simultaneously be entered into an <index>hierarchical regression analysis</index> and tested as to whether as a whole they significantly increase R<SUP>2</SUP>, given the variables already entered into the regression equation. The degrees of freedom for the R<SUP>2</SUP> change test corresponds to the number of variables entered in the block of variables.</P>
		<TestItem type="MC">
			<question>When performing an hierarchical regression analysis, one of the degrees of freedom for an R squared change hypothesis test is</question>
			<answer type="correct">the number of variables entered in a block.</answer>
			<answer>the number of scores.</answer>
			<answer>the number of scores minus two.</answer>
			<answer>the mean square change divided by the mean square error.</answer>
			<difficulty></difficulty>
			<discriminability></discriminability>
			<author>David Stockburger</author>
			<date>03/24/2001</date>
			<concept>Multiple Regression with Categorical Variables</concept>
		</TestItem>
<P><h3>Correlated and Uncorrelated Predictor Variables</h3></P>
<P>Adding variables to a linear regression model will always increase the unadjusted R<SUP>2</SUP> value. If the additional predictor variables are correlated with the predictor variables already in the model, then the combined results are difficult to predict. In some cases, the combined result will provide only a slightly better prediction, while in other cases, a much better prediction than expected will be the outcome of combining two correlated variables.</P>
		<TestItem type="MC">
			<question>When performing an hierarchical regression analysis, adding a single independent variable that is uncorrelated with all other independent variables will result in an R square change of</question>
			<answer type="correct">the correlation coefficient squared between the added variable and dependent variable.</answer>
			<answer>the square root of the standard error of estimate.</answer>
			<answer>the standardized regression coefficient for that variable.</answer>
			<answer>the squared unstandardized regression coefficient for that variable.</answer>
			<difficulty></difficulty>
			<discriminability></discriminability>
			<author>David Stockburger</author>
			<date>03/24/2001</date>
			<concept>Multiple Regression with Categorical Variables</concept>
		</TestItem>
<P>If the additional predictor variables are uncorrelated (r = 0.0) with the predictor variables already in the model, then the result of adding additional variables to the regression model is easy to predict. Namely the R<SUP>2</SUP> change will be equal to the correlation coefficient squared between the added variable and predicted variable. In this case it makes no difference what order the predictor variables are entered into the prediction model. For example, if X<SUB>1</SUB> and X<SUB>2</SUB> were uncorrelated (r<SUB>12</SUB> = 0) and r<SUB>1y</SUB><SUP>2</SUP> = .3 and r<SUB>2y</SUB><SUP>2 </SUP>= .4, then R<SUP>2</SUP> for X<SUB>1</SUB> and X<SUB>2</SUB> would equal .3 + .4 = .7. The value for R<SUP>2</SUP> change for X<SUB>2</SUB> given X<SUB>1</SUB> was in the model would be .4. The value for R<SUP>2</SUP> change for X<SUB>2</SUB> given no variable was in the model would be .4. It would make no difference at what stage X<SUB>2</SUB> was entered into the model, the value for R<SUP>2</SUP> change would always be .4. Similarly, the R<SUP>2</SUP> change value for X<SUB>1</SUB> would always be .3. Because of this relationship, uncorrelated predictor variables will be preferred, when possible.</P>
		<TestItem type="MC">
			<question>When possible, the statistician generally prefers</question>
			<answer type="correct">uncorrelated predictor variables.</answer>
			<answer>collinear predictor variables.</answer>
			<answer>heterosecdasticity.</answer>
			<answer>a large number of predictor variables.</answer>
			<difficulty></difficulty>
			<discriminability></discriminability>
			<author>David Stockburger</author>
			<date>03/24/2001</date>
			<concept>Multiple Regression with Categorical Variables</concept>
		</TestItem>
</section>
<section>
<P><h2>Example Data</h2></P>
<DataFile type="SPSS">
	<url>h23.sav</url>
	<description>Example data file for multiple regression with categorical variables - SPSS format.</description>
</DataFile>
<DataFile type="SPSS">
	<url>h23.dat</url>
	<description>Example data file for multiple regression with categorical variables - Text format.</description>
</DataFile>
<P>The following is simulated data.</P>
<P><table cellPadding="1" cellSpacing="3" summary = "Faculty Salary Simulated Data" title="Faculty Salary Simulated Data"><tcaption>Faculty Salary Simulated Data</tcaption>
<TR><TH><SPSSVar>Faculty</SPSSVar></TH><TH><SPSSVar>Salary</SPSSVar></TH><TH><SPSSVar>Gender</SPSSVar></TH><TH><SPSSVar>Rank</SPSSVar></TH><TH><SPSSVar>Dept</SPSSVar></TH><TH><SPSSVar>Years</SPSSVar></TH><TH><SPSSVar>Merit</SPSSVar></TH></TR>
<TR><TD>1</TD><TD>38</TD><TD>0</TD><TD>3</TD><TD>1</TD><TD>0</TD><TD>1.47</TD></TR>
<TR><TD>2</TD><TD>58</TD><TD>1</TD><TD>2</TD><TD>2</TD><TD>8</TD><TD>4.38</TD></TR>
<TR><TD>3</TD><TD>80</TD><TD>1</TD><TD>3</TD><TD></TD><TD>9</TD><TD>3.65</TD></TR>
<TR><TD>4</TD><TD>30</TD><TD>1</TD><TD>1</TD><TD>1</TD><TD>0</TD><TD>1.64</TD></TR>
<TR><TD>5</TD><TD>50</TD><TD>1</TD><TD>1</TD><TD>3</TD><TD>0</TD><TD>2.54</TD></TR>
<TR><TD>6</TD><TD>49</TD><TD>1</TD><TD>1</TD><TD>3</TD><TD>1</TD><TD>2.06</TD></TR>
<TR><TD>7</TD><TD>45</TD><TD>0</TD><TD>3</TD><TD>1</TD><TD>4</TD><TD>4.76</TD></TR>
<TR><TD>8</TD><TD>42</TD><TD>1</TD><TD>1</TD><TD>2</TD><TD>0</TD><TD>3.05</TD></TR>
<TR><TD>9</TD><TD>59</TD><TD>0</TD><TD>3</TD><TD>3</TD><TD>3</TD><TD>2.73</TD></TR>
<TR><TD>10</TD><TD>47</TD><TD>1</TD><TD>2</TD><TD>1</TD><TD>0</TD><TD>3.14</TD></TR>
<TR><TD>11</TD><TD>34</TD><TD>0</TD><TD>1</TD><TD>1</TD><TD>3</TD><TD>4.42</TD></TR>
<TR><TD>12</TD><TD>53</TD><TD>0</TD><TD>2</TD><TD>3</TD><TD>0</TD><TD>2.36</TD></TR>
<TR><TD>13</TD><TD>35</TD><TD>1</TD><TD>1</TD><TD>1</TD><TD>1</TD><TD>4.29</TD></TR>
<TR><TD>14</TD><TD>42</TD><TD>0</TD><TD>1</TD><TD>2</TD><TD>2</TD><TD>3.81</TD></TR>
<TR><TD>15</TD><TD>42</TD><TD>0</TD><TD>1</TD><TD>2</TD><TD>2</TD><TD>3.84</TD></TR>
<TR><TD>16</TD><TD>51</TD><TD>0</TD><TD>3</TD><TD>2</TD><TD>7</TD><TD>3.15</TD></TR>
<TR><TD>17</TD><TD>51</TD><TD>1</TD><TD>2</TD><TD>1</TD><TD>8</TD><TD>5.07</TD></TR>
<TR><TD>18</TD><TD>40</TD><TD>0</TD><TD>1</TD><TD>2</TD><TD>3</TD><TD>2.73</TD></TR>
<TR><TD>19</TD><TD>48</TD><TD>1</TD><TD>2</TD><TD>1</TD><TD>1</TD><TD>3.56</TD></TR>
<TR><TD>20</TD><TD>34</TD><TD>1</TD><TD>1</TD><TD>1</TD><TD>7</TD><TD>3.54</TD></TR>
<TR><TD>21</TD><TD>46</TD><TD>1</TD><TD>2</TD><TD>1</TD><TD>2</TD><TD>2.71</TD></TR>
<TR><TD>22</TD><TD>45</TD><TD>0</TD><TD>1</TD><TD>2</TD><TD>6</TD><TD>5.18</TD></TR>
<TR><TD>23</TD><TD>50</TD><TD>1</TD><TD>1</TD><TD>3</TD><TD>2</TD><TD>2.66</TD></TR>
<TR><TD>24</TD><TD>61</TD><TD>0</TD><TD>3</TD><TD>3</TD><TD>3</TD><TD>3.7</TD></TR>
<TR><TD>25</TD><TD>62</TD><TD>1</TD><TD>3</TD><TD>1</TD><TD>2</TD><TD>3.75</TD></TR>
<TR><TD>26</TD><TD>51</TD><TD>0</TD><TD>1</TD><TD>3</TD><TD>8</TD><TD>3.96</TD></TR>
<TR><TD>27</TD><TD>59</TD><TD>0</TD><TD>3</TD><TD>3</TD><TD>0</TD><TD>2.88</TD></TR>
<TR><TD>28</TD><TD>65</TD><TD>1</TD><TD>2</TD><TD>3</TD><TD>5</TD><TD>3.37</TD></TR>
<TR><TD>29</TD><TD>49</TD><TD>0</TD><TD>1</TD><TD>3</TD><TD>0</TD><TD>2.84</TD></TR>
<TR><TD>30</TD><TD>37</TD><TD>1</TD><TD>1</TD><TD>1</TD><TD>9</TD><TD>5.12</TD></TR>
</table></P>
<P><ul>
<li><SPSSVar>Salary</SPSSVar></li>
<li><SPSSVar>Gender</SPSSVar> (0=Male, 1=Female)</li>
<li><SPSSVar>Rank</SPSSVar> (1=Assistant, 2=Associate, 3=Full)</li>
<li><SPSSVar>Dept</SPSSVar> Department (1=Family Studies, 2=Biology, 3=Business)</li>
<li><SPSSVar>Years</SPSSVar> since making Rank</li>
<li>Average <SPSSVar>Merit</SPSSVar> Ranking </li>
</ul></P>
<P>It is fairly clear that Gender could be directly entered into a regression model predicting Salary, because it is dichotomous. The problem is how to deal with the two <index>categorical predictor variables</index> with more than two levels (Rank and Dept).</P>
</section>
<section>
<P><h2>Categorical Predictor Variables</h2></P>
<P><h3>Dummy Coding - making many variables out of one </h3></P>
	<TestItem type="MC">
		<question>To use a categorical variable with 8 levels in a multiple regression model, it would be necessary to create __ dummy coded variables </question>
		<answer type="incorrect">2</answer>
		<answer type="correct">7</answer>
		<answer type="incorrect">8</answer>
		<answer type="incorrect">9</answer>
		<difficulty></difficulty>
		<discriminability></discriminability>
		<author>David Stockburger</author>
		<date>03/05/2001</date>
		<concept></concept>
	</TestItem>
	<TestItem type="MC">
		<question> ANOVA is a special case of </question>
		<answer type="incorrect">discriminant function analysis.</answer>
		<answer type="incorrect">simple linear regression.</answer>
		<answer type="correct">multiple regression.</answer>
		<answer type="incorrect">regression to the mean.</answer>
		<answer type="incorrect">a t-test.</answer>
		<difficulty></difficulty>
		<discriminability></discriminability>
		<author>David Stockburger</author>
		<date>03/05/2001</date>
		<concept></concept>
	</TestItem>
<P>because categorical predictor variables cannot be entered directly into a regression model and be meaningfully interpreted, some other method of dealing with information of this type must be developed. In general, a categorical variable with k levels will be transformed into k-1 variables each with two levels. For example, if a categorical variable had six levels, then five dichotomous variables could be constructed that would contain the same information as the single categorical variable. Dichotomous variables have the advantage that they can be directly entered into the regression model. The process of creating dichotomous variables from categorical variables is called <B><index>dummy coding</index></B>.</P>
<P>Depending upon how the dichotomous variables are constructed, additional information can be gleaned from the analysis. In addition, careful construction will result in uncorrelated dichotomous variables. As discussed earlier, these variables have the advantage of simplicity of interpretation and are preferred to correlated predictor variables.</P>
		<TestItem type="MC">
			<question>If a researcher is interested only in the overall gain in predictive power of a categorical variable with three or more levels when using a multiple regression model,</question>
			<answer type="correct">any system of dummy codes would work.</answer>
			<answer>the dummy codes must be carefully constructed so that they are uncorrelated with each other.</answer>
			<answer>the best-fitting system of dummy codes must be discovered by trial and error.</answer>
			<answer>the dummy codes will correspond the standardized regression coefficients.</answer>
			<difficulty></difficulty>
			<discriminability></discriminability>
			<author>David Stockburger</author>
			<date>03/24/2001</date>
			<concept>Multiple Regression with Categorical Variables</concept>
		</TestItem>
<P><h3>Dummy Coding with three levels</h3></P>
<P>The simplest case of dummy coding is when the categorical variable has three levels and is converted to two dichotomous variables. For example, <SPSSVar>Dept</SPSSVar> in the example data has three levels, 1=Family Studies, 2=Biology, and 3=Business. This variable could be dummy coded into two variables, one called <SPSSVar>FamilyS</SPSSVar> and one called <SPSSVar>Biology</SPSSVar>. If <SPSSVar>Dept</SPSSVar> = 1, then <SPSSVar>FamilyS</SPSSVar> would be coded with a 1 and <SPSSVar>Biology</SPSSVar> with a 0. If <SPSSVar>Dept</SPSSVar>=2, then <SPSSVar>FamilyS</SPSSVar> would be coded with a 0 and <SPSSVar>Biology</SPSSVar> would be coded with a 1. If <SPSSVar>Dept</SPSSVar>=3, then both <SPSSVar>FamilyS</SPSSVar> and <SPSSVar>Biology</SPSSVar> would be coded with a 0. The dummy coding is represented below.</P>
<P><table cellPadding="7" cellSpacing="0" summary = "Dummy Coded Variables" title="Three Variables Dummy Coded Variables as 0 and 1"><tcaption>Three Variables Dummy Coded Variables as 0 and 1</tcaption>
<TR><TH></TH><TH><SPSSVar>Dept</SPSSVar></TH><TH><SPSSVar>FamilyS</SPSSVar></TH><TH><SPSSVar>Biology</SPSSVar></TH></TR>
<TR><TD>Family Studies</TD><TD>1</TD><TD>1</TD><TD>0</TD>
</TR><TR><TD>Biology</TD><TD>2</TD><TD>0</TD><TD>1</TD></TR>
<TR><TD>Business</TD><TD>3</TD><TD>0</TD><TD>0</TD></TR>
</table></P>
<P><h3>Using SPSS to Dummy Code Variables</h3></P>
<P>The dummy coding can be done using SPSS and the <SPSSCommand>Transform/RecodeInto different Variable...</SPSSCommand> options. The <SPSSVar>Dept</SPSSVar> variable is the "Numeric Variable" that is going to be transformed. In this case the <SPSSVar>FamilyS</SPSSVar> variable is going to be created. The window on the screen should appear as follows:</P>
<P>
	<figure>
		<description>The first screen of the SPSS user interface for recoding one variable into a different variable is shown. The Dept variable is being changed into a variable called FamilyS. The word change is circled on the screen to point to the place to change the variable name.</description>
		<url>images/mlt0885.gif</url>
		<width>473</width>
		<height>262</height>
		<align></align>
		<caption>Recoding a variable in SPSS into a different variable - first screen.</caption>
		<alt>Recoding a variable in SPSS into a different variable.</alt>
	</figure></P>
<P>Clicking on the <SPSSCommand>Change</SPSSCommand> button and then on the <SPSSCommand>Old and New Values...</SPSSCommand> button will result in the following window:</P>
<P>	<figure>
		<description>The second screen of the SPSS user interface for recoding one variable into a different variable is shown. The values are being transformed such that a 1=1, 2=0, and 3=0. The Add button is circled in red to illustrate how to change the values in the interface.</description>
		<url>images/mlt0886.gif</url>
		<width>472</width>
		<height>270</height>
		<align></align>
		<caption>Recoding a variable in SPSS into a different variable - second screen.</caption>
		<alt>Recoding a variable in SPSS into a different variable - second screen.</alt>
	</figure></P>
<P>The <SPSSCommand>Old Value</SPSSCommand> is the level of the categorical variable to be changed, the <SPSSCommand>New Value</SPSSCommand> is the value on the transformed variable. In the example window above, a value of 3 on the <SPSSVar>Dept</SPSSVar> variable will be coded as a 0 on the <SPSSVar>FamilyS</SPSSVar> variable. The <SPSSCommand>Add</SPSSCommand> button must be pressed to add the recoding to the list. When all the recodings have been added, click on the <SPSSCommand>Continue</SPSSCommand> button and then the <SPSSCommand>OK</SPSSCommand> button.</P>
<P>The recoding of the <SPSSVar>Biology</SPSSVar> is accomplished in the same manner. A listing of the data is presented below.</P>
<P>	<figure>
		<description>The example data file with three new variables coded as 0 and 1 is illustrated. Where the Dept variable was a 1, the FamilyS variable is a 1 and the Biology variable is a 0. Where the Dept variable was a 2, the FamilyS variable is a 0 and the Biology variable is a 1. Where the Dept variable was a 3, both the FamilyS and Biology variables are coded as a 0.</description>
		<url>images/mlt0887.gif</url>
		<width>391</width>
		<height>649</height>
		<align></align>
		<caption>The example data file with dummy coded variables.</caption>
		<alt>The example data file with dummy coded variables.</alt>
	</figure></P>
<P>The correlation matrix of the dummy variables and the <SPSSVar>Salary</SPSSVar> variable is presented below.</P>
<P>	<figure>
		<description>The SPSS output of the two dummy coded variables and Salary is pictured. The correlation between FamilyS and Biology is -.474. The correlation between FamilyS and Salary is -.476 and the correlation between Biology and Salary is -.102.</description>
		<url>images/mlt0888.gif</url>
		<width>387</width>
		<height>248</height>
		<align></align>
		<caption>The correlation matrix of the non-orthogonal dummy coded variables.</caption>
		<alt>The correlation matrix of the non-orthogonal dummy coded variables.</alt>
	</figure></P>
<P>Two things should be observed in the <index>correlation matrix</index>. The first is that the correlation between <SPSSVar>FamilyS</SPSSVar> and <SPSSVar>Biology</SPSSVar> is not zero, rather it is -.474. Second is that the correlation between the <SPSSVar>Salary</SPSSVar> variable and the two dummy variables is different from zero. The correlation between <SPSSVar>FamilyS</SPSSVar> and <SPSSVar>Salary</SPSSVar> is significantly different from zero.</P>
<definition word="correlation matrix">a table of correlation coefficients.</definition>
<P>The results of predicting <SPSSVar>Salary</SPSSVar> from <SPSSVar>FamilyS</SPSSVar> and <SPSSVar>Biology</SPSSVar> using a <index>multiple regression</index> procedure are presented below. The first table enters <SPSSVar>FamilyS</SPSSVar> in the first block and <SPSSVar>Biology</SPSSVar> in the second. The second table reverses the order that the variables are entered into the regression equation. The model summary tables are presented below.</P>
<P>	<figure>
		<description>The SPSS model summary for the multiple regression program predicting salary from FamilyS and then Biology in an hierarchical regression procedure. In the first stage, the multiple R is .476 with a significance level of .009. In the second stage, the multiple R was .604 with a significance level of .025.</description>
		<url>images/mlt0889.gif</url>
		<width>739</width>
		<height>205</height>
		<align></align>
		<caption>Predicting Salary using FamilyS in the first block and Biology in the second.</caption>
		<alt>Predicting Salary using FamilyS in the first block and Biology in the second.</alt>
	</figure></P>
<P>	<figure>
		<description>The SPSS model summary for the multiple regression program predicting salary from Biology and then FamilyS in an hierarchical regression procedure. In the first stage, the multiple R is .102 with a significance level of .598.  In the second stage, the multiple R was .604 with a significance level of .001.</description>
		<url>images/mlt0890.gif</url>
		<width>739</width>
		<height>205</height>
		<align></align>
		<caption>Predicting Salary using Biology in the first block and FamilyS in the second.</caption>
		<alt>Predicting Salary using Biology in the first block and FamilyS in the second.</alt>
	</figure></P>
<P>In the first table above both <SPSSVar>FamilyS</SPSSVar> and <SPSSVar>Biology</SPSSVar> are significant. In the second, only <SPSSVar>FamilyS</SPSSVar> is statistically significant. Note that both orderings end up with the same value for multiple R (.604). It makes a difference what order the variables are entered into the regression equation in the hierarchical analysis. </P>
		<TestItem type="MC">
			<question>When the dummy codes are correlated with each other,</question>
			<answer type="correct">it makes a difference what order the variables are entered into the regression equation in the hierarchical analysis.</answer>
			<answer>the correlation matrix of the dummy codes will have values of either one or zero.</answer>
			<answer>the standard error of estimate will equal the square root of one minus the multiple R squared.</answer>
			<answer>hypothesis testing of R squared change will generally produce anomalous results.</answer>
			<difficulty></difficulty>
			<discriminability></discriminability>
			<author>David Stockburger</author>
			<date>03/24/2001</date>
			<concept>Multiple Regression with Categorical Variables</concept>
		</TestItem>
<P>In the next tables, both <SPSSVar>FamilyS</SPSSVar> and <SPSSVar>Biology</SPSSVar> have been entered in the first block. The model summary table, ANOVA, and Coefficients tables are presented below.</P>
<P>	<figure>
		<description>The model summary table predicting salary using Biology and FamilyS is pictured. The multiple R is .604 with a significance level of .003.</description>
		<url>images/mlt0891.gif</url>
		<width>739</width>
		<height>165</height>
		<align></align>
		<caption>The model summary table predicting salary using Biology and FamilyS.</caption>
		<alt>The model summary table predicting salary using Biology and FamilyS.</alt>
	</figure></P>
<P>	<figure>
		<description>The ANOVA table predicting salary using Biology and FamilyS is shown. The significance level is .003.</description>
		<url>images/mlt0892.gif</url>
		<width>534</width>
		<height>174</height>
		<align></align>
		<caption>The ANOVA table predicting salary using Biology and FamilyS.</caption>
		<alt>The ANOVA table predicting salary using Biology and FamilyS.</alt>
	</figure></P>
<P>	<figure>
		<description>The coefficients table predicting salary using Biology and FamilyS is illustrated. The unstandardized regression model has a constant term of 54.6 and regression weights of -.886 and -12.350 for FamilyS and Biology, respectively. The standardized regression coefficients are -.423 and -.676 with significance levels of .025 and .001.</description>
		<url>images/mlt0893.gif</url>
		<width>527</width>
		<height>202</height>
		<align></align>
		<caption>The coefficients table predicting salary using Biology and FamilyS.</caption>
		<alt>The coefficients predicting salary using Biology and FamilyS.</alt>
	</figure></P>
<P>The <index>ANOVA</index> and model summary tables contain basically redundant information in this case. The Coefficients table can be interpreted as Biology making 8.886 thousand dollars less in salary per year relative to the Business department, while the Family Studies department make 12.350 thousand dollars less than the Business department. Note that the "Sig." levels in the "Coefficients" table are the same as the significance levels of the <index>model summary tables</index> presented earlier when each of the dummy coded variables is entered into the regression equation last.</P>
<P><h3>Similarity of Regression analysis and ANOVA</h3></P>
<P>The results of the preceding analysis can be compared to the results of using the ANOVA procedure in SPSS with <SPSSVar>Salary</SPSSVar> as the dependent measure and <SPSSVar>Dept</SPSSVar> as the independent. The following table presents the table of means and ANOVA table.</P>
<P>	<figure>
		<description>The SPSS output table of means and standard deviations of Salary broken down by Dept is shown. The mean salaries for the three levels of departments are 42.25, 45.71, and 54.60 for Family Studies, Biology, and Business.</description>
		<url>images/mlt0894.gif</url>
		<width>761</width>
		<height>183</height>
		<align></align>
		<caption>Means and standard deviations of Salary broken down by Dept.</caption>
		<alt>Means and standard deviations of Salary broken down by Dept.</alt>
	</figure></P>
<P>	<figure>
		<description>The SPSS output table of the ANOVA of Salary broken down by Dept is shown. The significance level is .003 and the ANOVA table is general is similar to the regression ANOVA table with the dummy coded variables.</description>
		<url>images/mlt0895.gif</url>
		<width>523</width>
		<height>163</height>
		<align></align>
		<caption>The ANOVA table of Salary broken down by Dept.</caption>
		<alt>The ANOVA table of Salary broken down by Dept.</alt>
	</figure></P>
<P>Note first that the ANOVA tables produced using the ANOVA command and the Linear Regression command are identical. ANOVA is a special case of linear regression when the variables have been dummy coded. The second notable comparison of the tables involves the regression weights and the actual differences between the means. Note that the regression weight for <SPSSVar>FamilyS</SPSSVar> in the regression procedure is -12.350 and the difference between the means of the Family Studies department (42.25) and the Business department (54.60) is -12.350.</P>
		<TestItem type="MC">
			<question>Which of the following values would most likely be different from the others</question>
			<answer type="correct">the significance level for a dummy coded variable on the coefficients table.</answer>
			<answer>the significance level for R squared change for a block of dummy coded variables.</answer>
			<answer>the significance level in the ANOVA table using the SPSS ANOVA command.</answer>
			<answer>the significance level in the ANOVA table of a regression model for a block of dummy coded variables.</answer>
			<difficulty></difficulty>
			<discriminability></discriminability>
			<author>David Stockburger</author>
			<date>03/24/2001</date>
			<concept>Multiple Regression with Categorical Variables</concept>
		</TestItem>
</section>
<section>
<P><h2>Dummy Coding into Independent Variables</h2></P>
<P>Selection of an appropriate set of <index>dummy codes</index> will result in new variables that are uncorrelated or independent of each other. In the case when the categorical variable has three levels this can be accomplished by creating a new variable where one level of the categorical variable is assigned the value of -2 and the other levels are assigned the value of 1. The signs are arbitrary and may be reversed, that is, values of 2 and -1 would work equally well. The second variable created as a dummy code will have the level of the categorical variable coded as -2 given the value of 0 and the other values recoded as 1 and -1. In all cases the sum of the dummy coded variable will be zero. Trust me, this is actually much easier than it sounds.</P>
<P>Interpretation is straightforward. Each of the new dummy coded variables, called a <I><index>contrast</index></I>, compares levels coded with a positive number to levels coded with a negative number. Levels coded with a zero are not included in the interpretation.</P>
		<TestItem type="MC">
			<question>Dummy codes are sometimes called</question>
			<answer type="correct">contrasts.</answer>
			<answer>uncorrelated regression weights.</answer>
			<answer>between subjects analytical analysis.</answer>
			<answer>principal components.</answer>
			<difficulty></difficulty>
			<discriminability></discriminability>
			<author>David Stockburger</author>
			<date>03/24/2001</date>
			<concept>Multiple Regression with Categorical Variables</concept>
		</TestItem>
<P>For example, <SPSSVar>Dept </SPSSVar>in the example data has three levels, 1=Family Studies, 2=Biology, and 3=Business. This variable could be dummy coded into two variables, one called <SPSSVar>Business</SPSSVar> (comparing the Business Department with the other two departments) and one called <SPSSVar>FSvsBio </SPSSVar>(for Family Studies versus Biology.) The <SPSSVar>Business</SPSSVar> contrast would create a variable where all members of the Business Department would be given a value of -2 and all members of the other two departments would be given a value of 1. The <SPSSVar>FSvsBio</SPSSVar> contrast would assign a value of 0 to members of the Business Department, 1 divided by the number of members of the Family Studies Department to member of the Family Studies Department, and -1 divided by the number of members of the Biology Department to members of the Biology Department. The <SPSSVar>FSvsBio</SPSSVar> variable could be coded as 1 and -1 for Family Studies and Biology respectively, but the recoded variable would no longer be uncorrelated with the first dummy coded variable (<SPSSVar>Business</SPSSVar>).  In most practical applications, it makes little difference whether the variables are correlated or not, so the simpler 1 and -1 coding is generally preferred.  The contrasts are summarized in the following table.</P>
		<TestItem type="MC">
			<question>When a set of dummy codes are orthogonal,</question>
			<answer type="correct">they will be uncorrelated with each other.</answer>
			<answer>they will describe interaction effects.</answer>
			<answer>the order the variables are entered into the regression equation is critical.</answer>
			<answer>interpretation of the results of the analysis becomes much more difficult.</answer>
			<difficulty></difficulty>
			<discriminability></discriminability>
			<author>David Stockburger</author>
			<date>03/24/2001</date>
			<concept>Multiple Regression with Categorical Variables</concept>
		</TestItem>
<P><table cellPadding="7" cellSpacing="0" summary = "Orthogonal dummy coded Variables." title="Orthogonal dummy coded Variables."><tcaption>Orthogonal dummy coded Variables.</tcaption>
<TR><TH></TH><TH><SPSSVar>Dept</SPSSVar></TH><TH><SPSSVar>Business</SPSSVar></TH><TH><SPSSVar>FSvsBio</SPSSVar></TH></TR>
<TR><TD>Family Studies</TD><TD>1</TD><TD>1</TD><TD>1/N<SUB>1</SUB> = 1/12= .0833</TD></TR>
<TR><TD>Biology</TD><TD>2</TD><TD>1</TD><TD>-1/N<SUB>2</SUB> = -1/7 = -.1429</TD></TR>
<TR><TD>Business</TD><TD>3</TD><TD>-2</TD><TD>0</TD></TR>
</table></P>
<P>The data matrix with the dummy coded variables would appear as follows.</P>
<P>	<figure>
		<description>The example data file with three new orthogonal variables is illustrated. Where the Dept variable was a 1, both the Business variable is coded as a 1 and the FSvsBio variable is coded as .0833. Where the Dept variable was a 2, the Business variable is a 1 and the FDvsBio variable is a -.429. Where the Dept variable was a 3, both the Business variable is coded as a -2 and the FSvsBio variable is coded as a 0.</description>
		<url>images/mlt0896.gif</url>
		<width>423</width>
		<height>602</height>
		<align></align>
		<caption>The example data matrix of orthogonal dummy coded variables.</caption>
		<alt>The example data matrix of orthogonal dummy coded variables.</alt>
	</figure></P>
<P>The correlation matrix containing the two contrasts and the <SPSSVar>Salary</SPSSVar> variable is presented below.</P>
<P>	<figure>
		<description>The SPSS output of the two orthogonal dummy coded variables and Salary is pictured. The correlation between FSvsBio and Business is 0.0. The correlation between FSvsBio and Salary is -.150 and the correlation between Business and Salary is -.585.</description>
		<url>images/mlt0897.gif</url>
		<width>401</width>
		<height>248</height>
		<align></align>
		<caption>The correlation matrix of the orthogonal dummy coded variables.</caption>
		<alt>The correlation matrix of the orthogonal dummy coded variables.</alt>
	</figure></P>
<P>Note that the <index>correlation coefficient</index> between the two contrasts is zero. The correlation between the <SPSSVar>Business</SPSSVar> contrast and <SPSSVar>Salary</SPSSVar> is -.585 with a squared correlation coefficient of .342. This correlation coefficient has a significance level of .001. The correlation coefficient between the <SPSSVar>FSvsBio</SPSSVar> contrast and <SPSSVar>Salary</SPSSVar> is -.150 with a squared value of .023. </P>
<P>In this case entering <SPSSVar>Business</SPSSVar> or <SPSSVar>FSvsBio</SPSSVar> first makes no difference in the results of the regression analysis.</P>
<P>	<figure>
		<description>The SPSS model summary for the multiple regression program predicting salary from Business and then FSvsBio in an hierarchical regression procedure. In the first stage, the multiple R is .585 with a significance level of .001.  In the second stage, the multiple R was .604 with a significance level of .345.</description>
		<url>images/mlt0898.gif</url>
		<width>739</width>
		<height>205</height>
		<align></align>
		<caption>Predicting Salary using Business in the first block and FSvsBio in the second.</caption>
		<alt>Predicting Salary using Business in the first block and FSvsBio in the second.</alt>
	</figure></P>
<P>	<figure>
		<description>The SPSS model summary for the multiple regression program predicting salary from FSvsBio and then Business in an hierarchical regression procedure. In the first stage, the multiple R is .150 with a significance level of .436.  In the second stage, the multiple R was .604 with a significance level of .001.</description>
		<url>images/mlt0899.gif</url>
		<width>739</width>
		<height>205</height>
		<align></align>
		<caption>Predicting Salary using FSvsBio in the first block and Business in the second.</caption>
		<alt>Predicting Salary using FSvsBio in the first block and Business in the second.</alt>
	</figure></P>
<P>Entering both contrasts simultaneously into the regression equation produces the following ANOVA table.</P>
<P>	<figure>
		<description>The ANOVA table predicting salary using Biology and FamilyS is shown. The significance level is .003</description>
		<url>images/mlt08100.gif</url>
		<width>534</width>
		<height>174</height>
		<align></align>
		<caption>The ANOVA table predicting Salary using FSvsBio and Business.</caption>
		<alt>The ANOVA table predicting Salary using FSvsBio and Business.</alt>
	</figure></P>
<P>Note that this table is identical to the two ANOVA tables presented in the previous section. It may be concluded that it does not make a difference what set of contrasts are selected when only the overall test of significance is desired. It does make a difference how contrasts are selected, however, if it is desired to make a meaningful interpretation of each contrast.</P>
		<TestItem type="MC">
			<question>The selection of a set of contrasts makes a difference</question>
			<answer type="correct">if it is desired to make a meaningful interpretation of each contrast.</answer>
			<answer>if only the overall test of significance is desired.</answer>
			<answer>in the significance levels of the R squared change.</answer>
			<answer>in categorical variables with seven or more levels.</answer>
			<difficulty></difficulty>
			<discriminability></discriminability>
			<author>David Stockburger</author>
			<date>03/24/2001</date>
			<concept>Multiple Regression with Categorical Variables</concept>
		</TestItem>
<P>The <index>coefficient table</index> for the simultaneous entry of both contrasts is presented below.</P>
<P>	<figure>
		<description>The coefficients table predicting salary using Business and FSvsBio is illustrated. The unstandardized regression model has a constant term of 47.218 and regression weights of -3.691 and -15.316 for Business and FSvsBio, respectively. The standardized regression coefficients are -.585 and -.150 with significance levels of .001 and .345.</description>
		<url>images/mlt08101.gif</url>
		<width>530</width>
		<height>202</height>
		<align></align>
		<caption>The coefficients table predicting salary using Business and FsvsBio.</caption>
		<alt>The coefficients table predicting Salary using Business and FsvsBio.</alt>
	</figure></P>
<P>Note that the "Sig." level is identical to the value when each contrast was entered last into the regression model. In this case the <SPSSVar>Business</SPSSVar> contrast was significant and the <SPSSVar>FSvsBio</SPSSVar> contrast was not. The interpretation of these results would be that the Business Department was paid significantly more than the Family Studies and Biology Departments, but that no significant differences in salary were found between the Family Studies and Biology Departments.</P>
<P>By carefully selecting the set of contrasts to be used in the regression with categorical variables, it is possible to construct tests of specific hypotheses. The hypotheses to be tested are generated by the theory used when designing the study.</P>
		<TestItem type="MC">
			<question>In general, the set of contrasts selected and tested</question>
			<answer type="correct">are generated by the theory used when designing the study.</answer>
			<answer>are generated using step-up or step-down regression procedures.</answer>
			<answer>are generally similar for any analysis.</answer>
			<answer>are selected to result in the largest R squared change.</answer>
			<difficulty></difficulty>
			<discriminability></discriminability>
			<author>David Stockburger</author>
			<date>03/24/2001</date>
			<concept>Multiple Regression with Categorical Variables</concept>
		</TestItem>
</section>
<section>
<P><h2>Categorical Predictor Variables with Six Levels</h2></P>
<P>If a categorical variable had six levels, five dummy coded contrasts would be necessary to use the categorical variable in a regression analysis. For example, suppose that a researcher at a headache care center did a study with six groups of four patients each (N is being deliberately kept small). The dependent measure is subjective experience of pain. The six groups consisted of six different treatment conditions.</P>
<P><table cellPadding="7" cellSpacing="0" summary = "The six treatment conditions of the second example." title="The six treatment conditions of the second example."><tcaption>The six treatment conditions of the second example.</tcaption>
<TR><TH>Group</TH><TH>Treatment</TH></TR>
<TR><TD>1</TD><TD>None</TD></TR>
<TR><TD>2</TD><TD>Placebo</TD></TR>
<TR><TD>3</TD><TD>Psychotherapy</TD></TR>
<TR><TD>4</TD><TD>Acupuncture</TD></TR>
<TR><TD>5</TD><TD>Drug 1</TD></TR>
<TR><TD>6</TD><TD>Drug 2</TD></TR>
</table></P>

<P>An independent contrast is a contrast that is not a linear combination of any other set of contrasts. Any set of independent contrasts would work equally well if the end result was the simultaneous test of the five contrasts, as in an ANOVA. One of the many possible examples is presented below.</P>
<P><table cellPadding="7" cellSpacing="0" summary = "Non-orthogonal dummy codes for six groups." title="Non-orthogonal dummy codes for six groups."><tcaption>Non-orthogonal dummy codes for six groups.</tcaption>
<TR><TH></TH><TH>Group</TH><TH>C1</TH><TH>C2</TH><TH>C3</TH><TH>C4</TH><TH>C5</TH></TR>
<TR><TD>None</TD><TD>1</TD><TD>0</TD><TD>0</TD><TD>0</TD><TD>0</TD><TD>0</TD></TR>
<TR><TD>Placebo</TD><TD>2</TD><TD>1</TD><TD>0</TD><TD>0</TD><TD>0</TD><TD>0</TD></TR>
<TR><TD>Psychotherapy</TD><TD>3</TD><TD>0</TD><TD>1</TD><TD>0</TD><TD>0</TD><TD>0</TD></TR>
<TR><TD>Acupuncture</TD><TD>4</TD><TD>0</TD><TD>0</TD><TD>1</TD><TD>0</TD><TD>0</TD></TR>
<TR><TD>Drug 1</TD><TD>5</TD><TD>0</TD><TD>0</TD><TD>0</TD><TD>1</TD><TD>0</TD></TR>
<TR><TD>Drug 2</TD><TD>6</TD><TD>0</TD><TD>0</TD><TD>0</TD><TD>0</TD><TD>1</TD></TR>
</table></P>
<P>Application of this dummy coding in a regression model entering all contrasts in a single block would result in an ANOVA table similar to the one obtained using Means, ANOVA, or General Linear Model programs in SPSS.</P>
		<TestItem type="MC">
			<question>It is conventional to construct contrasts with a sum equal to</question>
			<answer type="correct">zero.</answer>
			<answer>one.</answer>
			<answer>N.</answer>
			<answer>the number of levels of the categorical variable.</answer>
			<difficulty></difficulty>
			<discriminability></discriminability>
			<author>David Stockburger</author>
			<date>03/24/2001</date>
			<concept>Multiple Regression with Categorical Variables</concept>
		</TestItem>
<P>This solution would not be ideal, however, because there is considerable information available by setting the contrasts to test specific hypotheses. The levels of the categorical variable generally dictate the structure of the contrasts. In the example study, it makes sense to contrast the two control groups (1 and 2) with the other four experimental groups (3, 4, 5, and 6). Any two numbers would work, one assigned to groups 1 and 2 and the others assigned to the other four groups, but it is conventional to have the sum of the contrasts equal to zero. One contrast that meets this criterion would be (-2, -2, 1, 1, 1, 1).</P>
<P>Generally it is easiest to set up contrasts within subgroups of the first contrast. For example, a second contrast might test whether there are differences between the two control groups. This contrast would appear as (1, -1, 0, 0, 0, 0). A third contrast might compare non-drug vs. rug treatment groups, groups 3 and 4 vs. groups 5 and 6 (0, 0, 1, 1, -1, -1). As can be seen, this would be a contrast within the experimental treatment groups. Within the non-drug treatment, a contrast comparing Group 3 with Group 4 might be appropriate (0, 0, 1, -1, 0, 0). Within the drug treatment conditions, a contrast comparing the two drug treatments would be the last contrast (0, 0, 0, 0, 1, -1). Combined, the contrasts are given in the following table.</P>
<P><table cellPadding="7" cellSpacing="0" summary = "Orthogonal dummy codes for six groups." title="Orthogonal dummy codes for six groups."><tcaption>Orthogonal dummy codes for six groups.</tcaption>
<TR><TH></TH><TH>Group</TH><TH>C1</TH><TH>C2</TH><TH>C3</TH><TH>C4</TH><TH>C5</TH></TR>
<TR><TD>None</TD><TD>1</TD><TD>-2</TD><TD>1</TD><TD>0</TD><TD>0</TD><TD>0</TD></TR>
<TR><TD>Placebo</TD><TD>2</TD><TD>-2</TD><TD>-1</TD><TD>0</TD><TD>0</TD><TD>0</TD></TR>
<TR><TD>Psychotherapy</TD><TD>3</TD><TD>1</TD><TD>0</TD><TD>1</TD><TD>1</TD><TD>0</TD></TR>
<TR><TD>Acupuncture</TD><TD>4</TD><TD>1</TD><TD>0</TD><TD>1</TD><TD>-1</TD><TD>0</TD></TR>
<TR><TD>Drug 1</TD><TD>5</TD><TD>1</TD><TD>0</TD><TD>-1</TD><TD>0</TD><TD>1</TD></TR>
<TR><TD>Drug 2</TD><TD>6</TD><TD>1</TD><TD>0</TD><TD>-1</TD><TD>0</TD><TD>-1</TD></TR>
</table></P>

<P>The following table presents example data and dummy coded contrasts for this hypothetical study.</P>
<definition word="orthogonal">independent, uncorrelated</definition>
<P>	<figure>
		<description>The example data file with five new orthogonal variables is illustrated. Where Group is equal to Control, the five new variables, labeled C1, C2, C3, C4, and C5 have values of -2, 1, 0, 0, and 0. For Group = Placebo, the new values are -2, -1, 0, 0, and 0. For Group = Psychotherapy the new values are 1, 0, 1, 1, and 0. For Group = Acupuncture the new values are 1, 0, 1, -1, and 0. For Group = Drug1 the new values are 1, 0, -1, 0, and 1. For Group = Drug2 the new values are 1, 0, -1, 0, -1.</description>
		<url>images/mlt08102.gif</url>
		<width>477</width>
		<height>493</height>
		<align></align>
		<caption>The data matrix for orthogonal contrasts of six groups.</caption>
		<alt>The data matrix for orthogonal contrasts of six groups.</alt>
	</figure></P>
<P>The correlation matrix of the five contrasts and the pain variable is presented below.</P>
<P>	<figure>
		<description>The correlation matrix from SPSS output for orthogonal contrasts of six groups is shown. The first variable, Pain, is correlated with the other five variables, C1, C2, C3, C4, and C5 with values of -.651, .230, .351, .390, and .212, respectively. All correlations between C1 to C5 are zero, except on the diagonal, where they are 1.0.</description>
		<url>images/mlt08103.gif</url>
		<width>602</width>
		<height>324</height>
		<align></align>
		<caption>The correlation matrix for orthogonal contrasts of six groups.</caption>
		<alt>The correlation matrix for orthogonal contrasts of six groups.</alt>
	</figure></P>
<P>Note that the <index>correlation coefficients</index> between the five contrasts are all zero.  This occurs because all groups have an equal number of subjects.</P>
<P>Using pain as the dependent variable and the five <index>contrasts</index> as the independent variables, the regression results tables entering all variables in block 1 are presented below.</P>
<P>	<figure>
		<description>The model summary for orthogonal contrasts of six groups is illustrated. The multiple R is .892 and the standard error of estimate is 2.12.</description>
		<url>images/mlt08104.gif</url>
		<width>379</width>
		<height>133</height>
		<align></align>
		<caption>The model summary for orthogonal contrasts of six groups.</caption>
		<alt>The model summary for orthogonal contrasts of six groups.</alt>
	</figure></P>
<P>	<figure>
		<description>The ANOVA table for orthogonal contrasts of six groups is shown. Note that the F ratio is 14.062 with a significance level of .000. Note also that the value of the Mean Square Residual, 4.514, is the square of the standard error of estimate.</description>
		<url>images/mlt08105.gif</url>
		<width>524</width>
		<height>174</height>
		<align></align>
		<caption>The ANOVA table for orthogonal contrasts of six groups.</caption>
		<alt>The ANOVA table for orthogonal contrasts of six groups.</alt>
	</figure></P>
		<TestItem type="MC">
			<question>When using a set of orthogonal contrasts, the correlation coefficient between the contrast and the dependent variable will equal</question>
			<answer type="correct">the standardized regression coefficient.</answer>
			<answer>zero.</answer>
			<answer>one.</answer>
			<answer>the square root of one minus the multiple R squared.</answer>
			<difficulty></difficulty>
			<discriminability></discriminability>
			<author>David Stockburger</author>
			<date>03/24/2001</date>
			<concept>Multiple Regression with Categorical Variables</concept>
		</TestItem>
<P>	<figure>
		<description>The coefficients table for orthogonal contrasts of six groups is pictured. Note that the standardized coefficients are equal to the correlation coefficients presented in an earlier illustration. The significance levels are different, however, because the error term is different, being calculated from the overall Mean Square Residual.</description>
		<url>images/mlt08106.gif</url>
		<width>527</width>
		<height>257</height>
		<align></align>
		<caption>The coefficients table for orthogonal contrasts of six groups.</caption>
		<alt>The coefficients table for orthogonal contrasts of six groups.</alt>
	</figure></P>
<P>Of major interest is the "Sig." column on the "Coefficients" table. Note that all contrasts are statistically significant except C5. This can be interpreted as: (1) The treatment conditions were more effective than the control conditions, (2) the two control conditions significantly differed from one another, with placebo more effective than control (3) The drug groups were more effective in reducing pain than the non-drug conditions (4) Acupuncture was significantly more effective than Psychotherapy (5) the two drug treatments were not significantly different from one another.</P>
<P>The output from the "General Linear Model, Simple Factorial" program in SPSS is presented below.</P>
<P>	<figure>
		<description>The output from the General Linear Model, Simple Factorial program in SPSS for the data in the orthogonal contrasts of six groups is presented. Note that the table is almost identical to the ANOVA table calculated using the linear regression program with the dummy coded variables.</description>
		<url>images/mlt08107.gif</url>
		<width>626</width>
		<height>211</height>
		<align></align>
		<caption>The data for orthogonal contrasts of six groups analyzed using an ANOVA procedure.</caption>
		<alt>The data for orthogonal contrasts of six groups analyzed using an ANOVA procedure.</alt>
	</figure></P>
		<TestItem type="MC">
			<question>The General Linear Model program in SPSS</question>
			<answer type="correct">automatically selects a set of contrasts.</answer>
			<answer>requires the users to enter a set of contrasts.</answer>
			<answer>has no flexibility with respect to the contrasts selected.</answer>
			<answer>is much more difficult to use than the multiple regression program with dummy coded variables.</answer>
			<difficulty></difficulty>
			<discriminability></discriminability>
			<author>David Stockburger</author>
			<date>03/24/2001</date>
			<concept>Multiple Regression with Categorical Variables</concept>
		</TestItem>
<P>Note that it is for practical purposes identical to the <index>ANOVA table</index> produced using the multiple regression program with the dummy coded contrasts. In effect what the <index>General Linear Model</index> program does is to automatically select a set of contrasts and then perform a regression analysis with those contrasts. The General Linear Model program allows the user to specify a special set of contrasts so that an analysis like the one done with dummy coding of contrasts in multiple regression might be performed. It is left for the reader to explore SPSS for this ability.</P>
<definition word="general linear model">a general-purpose conceptual framework for many different statistical techniques utilizing a linear model, i.e. multiple regression, ANOVA, Discriminant function analysis, and canonical correlation</definition> 
</section>
<section>
<P><h2>Combinations of Categorical Predictor Variables</h2></P>
<P>In the original example data set for this chapter there were three obvious categorical variables, <SPSSVar>Gender</SPSSVar>, <SPSSVar>Rank</SPSSVar>, and <SPSSVar>Dept</SPSSVar>. Gender could be directly entered into the regression model. After dummy coding into two contrasts each, <SPSSVar>Rank</SPSSVar> and <SPSSVar>Dept</SPSSVar> could be directly entered into the regression model. Difficulties arise, however, when combinations of these categorical variables must be considered. For example, consider <SPSSVar>Gender</SPSSVar> and <SPSSVar>Dept</SPSSVar>. Rather than two groups and three groups, this combination of categorical variables must be considered as six groups, Male Family Studies, Female Family Studies, Male Biology, Female Biology, Male Business, and Female Business. Dummy coding these data would require five dummy coded contrasts. Three exist, one for <SPSSVar>Gender</SPSSVar> and two for <SPSSVar>Dept</SPSSVar>, but there is no accounting for the two additional contrasts. They will be the focus of the next topics<I>, interaction effects</I>.</P>
		<TestItem type="MC">
			<question>When two categorical variables are used in a multiple regression model, the total number of groups will be</question>
			<answer type="correct">the number of levels of the first categorical variable times the number of levels of the second.</answer>
			<answer>the number of levels of the first categorical variable plus the number of levels of the second.</answer>
			<answer>the number of subjects divided by the degrees of freedom.</answer>
			<answer>the number of levels of the first categorical variable times the number of levels of the second minus the number of levels of the first categorical variable plus the number of levels of the second.</answer>
			<difficulty></difficulty>
			<discriminability></discriminability>
			<author>David Stockburger</author>
			<date>03/24/2001</date>
			<concept>Multiple Regression with Categorical Variables</concept>
		</TestItem>
<definition word="main effect">changes in a dependent variable at different levels of another variable</definition>
<definition word="simple main effect">changes in a dependent variable at different levels of another variable, given the value of another variable remains constant</definition>
<definition word="interaction effect">a change in the simple main effect of one variable over levels of another variable or combination of variables.</definition>
<P><h3>Equal Sample Size</h3></P>
		<TestItem type="MC">
			<question>When dummy coding for two or more categorical variables, interaction contrasts can be found by</question>
			<answer type="correct">multiplying the dummy codes for the main effects.</answer>
			<answer>only by trial and error.</answer>
			<answer>by reversing the signs of the dummy codes for the main effects and then adding them together.</answer>
			<answer>by reordering the levels within the variables.</answer>
			<difficulty></difficulty>
			<discriminability></discriminability>
			<author>David Stockburger</author>
			<date>03/24/2001</date>
			<concept>Multiple Regression with Categorical Variables</concept>
		</TestItem>
<P>Because everything works out much cleaner when <index>equal sample sizes</index> are assumed, this case will be presented first. The example data set has been reduced to twelve subjects, two for each combination of <SPSSVar>Gender</SPSSVar> and <SPSSVar>Dept</SPSSVar>. The reduced data set is presented below.</P>
<P><table cellPadding="4" cellSpacing="0" summary = "Data matrix for combinations of categorical predictor variables." title="Data matrix for combinations of categorical predictor variables."><tcaption>Data matrix for combinations of categorical predictor variables.</tcaption>
<TR><TH>Faculty</TH><TH>Salary</TH><TH>Gender</TH><TH>Dept</TH></TR>
<TR><TD>7</TD><TD>45</TD><TD>0</TD><TD>1</TD></TR>
<TR><TD>11</TD><TD>34</TD><TD>0</TD><TD>1</TD></TR>
<TR><TD>14</TD><TD>42</TD><TD>0</TD><TD>2</TD></TR>
<TR><TD>15</TD><TD>42</TD><TD>0</TD><TD>2</TD></TR>
<TR><TD>9</TD><TD>59</TD><TD>0</TD><TD>3</TD></TR>
<TR><TD>12</TD><TD>53</TD><TD>0</TD><TD>3</TD></TR>
<TR><TD>4</TD><TD>30</TD><TD>1</TD><TD>1</TD></TR>
<TR><TD>10</TD><TD>47</TD><TD>1</TD><TD>1</TD></TR>
<TR><TD>8</TD><TD>42</TD><TD>1</TD><TD>2</TD></TR>
<TR><TD>2</TD><TD>58</TD><TD>1</TD><TD>2</TD></TR>
<TR><TD>5</TD><TD>50</TD><TD>1</TD><TD>3</TD></TR>
<TR><TD>6</TD><TD>49</TD><TD>1</TD><TD>3</TD></TR>
</table></P>

<P>The levels of Gender and Dept will now be combined to produce six groups.</P>
<P><table cellPadding="4" cellSpacing="0" summary = "Data matrix for combinations of categorical predictor variables as a single group." title="Data matrix for combinations of categorical predictor variables as a single group."><tcaption>Data matrix for combinations of categorical predictor variables as a single group.</tcaption>
<TR><TH>Salary</TH><TH>Gender</TH><TH>Dept</TH><TH>Group</TH></TR>
<TR><TD>45</TD><TD>0</TD><TD>1</TD><TD>1</TD></TR>
<TR><TD>34</TD><TD>0</TD><TD>1</TD><TD>1</TD></TR>
<TR><TD>42</TD><TD>0</TD><TD>2</TD><TD>2</TD></TR>
<TR><TD>42</TD><TD>0</TD><TD>2</TD><TD>2</TD></TR>
<TR><TD>59</TD><TD>0</TD><TD>3</TD><TD>3</TD></TR>
<TR><TD>53</TD><TD>0</TD><TD>3</TD><TD>3</TD></TR>
<TR><TD>30</TD><TD>1</TD><TD>1</TD><TD>4</TD></TR>
<TR><TD>47</TD><TD>1</TD><TD>1</TD><TD>4</TD></TR>
<TR><TD>42</TD><TD>1</TD><TD>2</TD><TD>5</TD></TR>
<TR><TD>58</TD><TD>1</TD><TD>2</TD><TD>5</TD></TR>
<TR><TD>50</TD><TD>1</TD><TD>3</TD><TD>6</TD></TR>
<TR><TD>49</TD><TD>1</TD><TD>3</TD><TD>6</TD></TR>
</table></P>

<P>The situation is now analogous to the earlier case when the categorical variable had six levels.</P>
<P><h4>Main Effects</h4></P>
<P>A <index>categorical variable</index> with six levels can be dummy coded into five contrasts. The first three contrasts have already been discussed. The first of these contrasts will compare males with females and will comprise the <SPSSVar>Gender Main Effect</SPSSVar>. The next two will compare the salaries of the three departments over levels of gender and will be called the <SPSSVar>Department Main Effect</SPSSVar>. The dummy codes for these main effects are presented below.</P>
<P><table cellPadding="4" cellSpacing="0" summary = "Orthogonal contrasts for data matrix for combinations of categorical predictor." title="Orthogonal contrasts for data matrix for combinations of categorical predictor."><tcaption>Orthogonal contrasts for main effects for data matrix for combinations of categorical predictor.</tcaption>
<TR><TH>Salary</TH><TH>Group</TH><TH>Gender Main Effect</TH><TH scope = "col" colspan = "2">Department Main Effect</TH></TR>
<TR><TD>45</TD><TD>1</TD><TD>1</TD><TD>1</TD><TD>1</TD></TR>
<TR><TD>34</TD><TD>1</TD><TD>1</TD><TD>1</TD><TD>1</TD></TR>
<TR><TD>42</TD><TD>2</TD><TD>1</TD><TD>1</TD><TD>-1</TD></TR>
<TR><TD>42</TD><TD>2</TD><TD>1</TD><TD>1</TD><TD>-1</TD></TR>
<TR><TD>59</TD><TD>3</TD><TD>1</TD><TD>-2</TD><TD>0</TD></TR>
<TR><TD>53</TD><TD>3</TD><TD>1</TD><TD>-2</TD><TD>0</TD></TR>
<TR><TD>30</TD><TD>4</TD><TD>-1</TD><TD>1</TD><TD>1</TD></TR>
<TR><TD>47</TD><TD>4</TD><TD>-1</TD><TD>1</TD><TD>1</TD></TR>
<TR><TD>42</TD><TD>5</TD><TD>-1</TD><TD>1</TD><TD>-1</TD></TR>
<TR><TD>58</TD><TD>5</TD><TD>-1</TD><TD>1</TD><TD>-1</TD></TR>
<TR><TD>50</TD><TD>6</TD><TD>-1</TD><TD>-2</TD><TD>0</TD></TR>
<TR><TD>49</TD><TD>6</TD><TD>-1</TD><TD>-2</TD><TD>0</TD></TR>
</table></P>
		<TestItem type="MC">
			<question>When there are equal sample sizes in each combination of two categorical variables the order of entry of blocks of main effects and interaction effects</question>
			<answer type="correct">will result in similar R squared changes.</answer>
			<answer>will result in similar significance levels for R squared changes.</answer>
			<answer>will result in different R squared changes.</answer>
			<answer>will result in a different total multiple R.</answer>
			<difficulty></difficulty>
			<discriminability></discriminability>
			<author>David Stockburger</author>
			<date>03/24/2001</date>
			<concept>Multiple Regression with Categorical Variables</concept>
		</TestItem>
		<TestItem type="MC">
			<question>When there are unequal sample sizes in each combination of two categorical variables the order of entry of blocks of main effects and interaction effects</question>
			<answer>will result in similar R squared changes.</answer>
			<answer>will result in similar significance levels for R squared changes.</answer>
			<answer type="correct">will result in different R squared changes.</answer>
			<answer>will result in a different total multiple R.</answer>
			<difficulty></difficulty>
			<discriminability></discriminability>
			<author>David Stockburger</author>
			<date>03/24/2001</date>
			<concept>Multiple Regression with Categorical Variables</concept>
		</TestItem>
<P>This is basically the same coding as discussed earlier, except it is simplified because of the equal number of subjects in each cell. It will later be demonstrated that the correlation coefficients between these dummy coded variables is zero.</P>
<P><h4>Interaction Effects</h4></P>
<P>Two additional dummy coded variables are needed to account for the categorical variable. These contrasts will comprise the <SPSSVar>Interaction Effect</SPSSVar>. In this case the easiest way to find the needed contrasts is to multiply the <index>dummy coded contrast</index> for gender times the dummy coded contrasts for Department. This has the result of changing the sign of the department contrasts for one gender but not the other. The results of this operation appear below.</P>
<P><table cellPadding="4" cellSpacing="0" summary = "Orthogonal contrasts for main and interaction effects for data matrix for combinations of categorical predictor." title="Orthogonal contrasts for main and interaction effects for data matrix for combinations of categorical predictor."><tcaption>Orthogonal contrasts for main and interaction effects for data matrix for combinations of categorical predictor.</tcaption>
<TR><TH>Salary</TH><TH>Group</TH><TH>Gender Main Effect</TH><TH scope = "col" colspan = "2">Department Main Effect</TH><TH scope = "col" colspan = "2">Interaction Effect</TH></TR>
<TR><TD>45</TD><TD>1</TD><TD>1</TD><TD>1</TD><TD>1</TD><TD>1</TD><TD>1</TD></TR>
<TR><TD>34</TD><TD>1</TD><TD>1</TD><TD>1</TD><TD>1</TD><TD>1</TD><TD>1</TD></TR>
<TR><TD>42</TD><TD>2</TD><TD>1</TD><TD>1</TD><TD>-1</TD><TD>1</TD><TD>-1</TD></TR>
<TR><TD>42</TD><TD>2</TD><TD>1</TD><TD>1</TD><TD>-1</TD><TD>1</TD><TD>-1</TD></TR>
<TR><TD>59</TD><TD>3</TD><TD>1</TD><TD>-2</TD><TD>0</TD><TD>-2</TD><TD>0</TD></TR>
<TR><TD>53</TD><TD>3</TD><TD>1</TD><TD>-2</TD><TD>0</TD><TD>-2</TD><TD>0</TD></TR>
<TR><TD>30</TD><TD>4</TD><TD>-1</TD><TD>1</TD><TD>1</TD><TD>-1</TD><TD>-1</TD></TR>
<TR><TD>47</TD><TD>4</TD><TD>-1</TD><TD>1</TD><TD>1</TD><TD>-1</TD><TD>-1</TD></TR>
<TR><TD>42</TD><TD>5</TD><TD>-1</TD><TD>1</TD><TD>-1</TD><TD>-1</TD><TD>1</TD></TR>
<TR><TD>58</TD><TD>5</TD><TD>-1</TD><TD>1</TD><TD>-1</TD><TD>-1</TD><TD>1</TD></TR>
<TR><TD>50</TD><TD>6</TD><TD>-1</TD><TD>-2</TD><TD>0</TD><TD>2</TD><TD>0</TD></TR>
<TR><TD>49</TD><TD>6</TD><TD>-1</TD><TD>-2</TD><TD>0</TD><TD>2</TD><TD>0</TD></TR>
</table></P>

<P></P>
<P>The <index>correlation matrix</index> for this data set is presented below.</P>
<P>	<figure>
		<description>The correlation matrix from SPSS output for orthogonal contrasts of six groups with main and interaction effects is shown. The first variable, Salary, is correlated with the other five variables, C1, C2, C3, C4, and C5 with values of -.101, -.579, -.342, -.282, and .220, respectively. All correlations between C1 to C5 are zero, except on the diagonal, where they are 1.0.</description>
		<url>images/mlt08108.gif</url>
		<width>602</width>
		<height>322</height>
		<align></align>
		<caption>The correlation matrix for orthogonal contrasts of six groups with interaction and main effects.</caption>
		<alt>The correlation matrix for orthogonal contrasts of six groups with interaction and main effects.</alt>
	</figure></P>
<P>Note that the contrasts all have a correlation coefficient of zero among themselves. The contrasts will be entered into the regression equation predicting salary in three blocks. The first block will contain C1, the second will contain C2 and C3, while the third will contain C4 and C5. The results of this analysis are presented below.</P>
<P>	<figure>
		<description>Model summary for orthogonal contrasts of main and interaction effects with variables entered in three blocks of C1, C2 and C3, and C4 and C5. The R squared changes for the three blocks are .000, .452, and .128.</description>
		<url>images/mlt08109.gif</url>
		<width>739</width>
		<height>244</height>
		<align></align>
		<caption>Model summary for orthogonal contrasts of main and interaction effects with variables entered in three blocks of C1, C2 and C3, and C4 and C5.</caption>
		<alt>Model summary for orthogonal contrasts of main and interaction effects with variables entered in three blocks of C1, C2 and C3, and C4 and C5.</alt>
	</figure></P>
<P>Entering the contrasts in the opposite order has no effect on R Square Change.</P>
<P>	<figure>
		<description>Model summary for orthogonal contrasts of main and interaction effects with variables entered in three blocks of C4 and C5, C2 and C3, and C1. The R squared changes for the three blocks are .128, .452, and .000</description>
		<url>images/mlt08110.gif</url>
		<width>739</width>
		<height>244</height>
		<align></align>
		<caption>Model summary for orthogonal contrasts of main and interaction effects with variables entered in three blocks of C4 and C5, C2 and C3, and C1.</caption>
		<alt>Model summary for orthogonal contrasts of main and interaction effects with variables entered in three blocks of C4 and C5, C2 and C3, and C1.</alt>
	</figure></P>
<P>The value for "F Change" and "Sig. F change" is different, however, because different error terms are employed in each case. In this subset of the data, none of the contrasts are significant. The interpretation of the <index>main effects</index> and <index>interaction effects</index> will be the topic of discussion of the next chapter.</P>
<P><h3>Unequal Sample Size</h3></P>
		<TestItem type="MC">
			<question>In most real-life regression analyses with two categorical variables one may expect to find</question>
			<answer type="correct">unequal sample sizes.</answer>
			<answer>equal sample sizes.</answer>
			<answer>proportional sample sizes.</answer>
			<answer>no missing data.</answer>
			<difficulty></difficulty>
			<discriminability></discriminability>
			<author>David Stockburger</author>
			<date>03/24/2001</date>
			<concept>Multiple Regression with Categorical Variables</concept>
		</TestItem>
<P>Equal sample size is seldom achieved in the real world, even in the best-designed experiments. <index>Unequal sample size</index> makes the effects no longer independent. This implies that it makes difference in hypothesis testing when the effects are added into the model, first, middle, or last.</P>
<P>The same dummy coding that was applied to equal sample sizes will now be applied to the original data with unequal sample sizes. The simplest way to do this is to recode GENDER into C!, DEPARTMENT into C2 and C3, and compute C4 and C5 by multiplying corresponding contrasts into the new contrast.  For example, C4 could be created by multiplying C1 * C2 and C5 could be created by multiplying C1 * C3.  The data and dummy coded contrasts appear below.</P>
<P>	<figure>
		<description>Data and contrasts for main and interaction effects for data matrix for combinations of categorical predictor. The data is similar to that presented in an earlier table except that more subjects are included and the number of subjects in each combination of factors is not equal.</description>
		<url>images/mlt08111.gif</url>
		<width>528</width>
		<height>648</height>
		<align></align>
		<caption>Data and contrasts for main and interaction effects for data matrix for combinations of categorical predictor.</caption>
		<alt>Data and contrasts for main and interaction effects for data matrix for combinations of categorical predictor.</alt>
	</figure></P>
<P>The correlation matrix of the contrasts is presented below.</P>
<P>	<figure>
		<description>Correlation matrix for main and interaction effects for data matrix for combinations of categorical predictor. In this case the correlations between the contrasts are not zero.</description>
		<url>images/mlt08112.gif</url>
		<width>602</width>
		<height>344</height>
		<align></align>
		<caption>Correlation matrix for main and interaction effects for data matrix for combinations of categorical predictor.</caption>
		<alt>Correlation matrix for main and interaction effects for data matrix for combinations of categorical predictor.</alt>
	</figure></P>
<P>Note that the <index>correlation coefficients</index> between the <index>contrasts</index> are not zero. This has the effect of changing the value of R<SUP>2 </SUP>Change for a term depending upon when that term was entered into the model. This is illustrated by entering the two contrasts associated with <SPSSVar>Dept</SPSSVar> (C2 and C3) first, second, and last.</P>
<P><h4>Main Effects of Dept Entered First</h4></P>
<P>	<figure>
		<description>Model summary for contrasts of main and interaction effects with variables entered in three blocks of C2 and C3, C1, and C4 and C5.</description>
		<url>images/mlt08113.gif</url>
		<width>739</width>
		<height>244</height>
		<align></align>
		<caption>Model summary for contrasts of main and interaction effects with variables entered in three blocks of C2 and C3, C1, and C4 and C5.</caption>
		<alt>Model summary for contrasts of main and interaction effects with variables entered in three blocks of C2 and C3, C1, and C4 and C5.</alt>
	</figure></P>
<P><h4>Main Effects of Dept Entered Second</h4></P>
<P>There are two different ways in which the main effect of <SPSSVar>Dept</SPSSVar> may be entered second in the regression model. The first is after <SPSSVar>Gender</SPSSVar> and is presented below.</P>
<P>	<figure>
		<description>Model summary for contrasts of main and interaction effects with variables entered in three blocks of C1, C2 and C3, and C4 and C5.</description>
		<url>images/mlt08114.gif</url>
		<width>739</width>
		<height>244</height>
		<align></align>
		<caption>Model summary for contrasts of main and interaction effects with variables entered in three blocks of C1, C2 and C3, and C4 and C5.</caption>
		<alt></alt>
	</figure></P>
<P>As can be seen, the value of R<SUP>2</SUP> change for adding C2 and C3 changes only slightly from .379 to .376. A slightly greater change in R<SUP>2</SUP> change value is observed if the interaction contrasts (C4 and C5) are entered before the main effect of department.</P>
<P>	<figure>
		<description>Model summary for contrasts of main and interaction effects with variables entered in three blocks of C4 and C5, C2 and C3, and C1.</description>
		<url>images/mlt08115.gif</url>
		<width>739</width>
		<height>244</height>
		<align></align>
		<caption>Model summary for contrasts of main and interaction effects with variables entered in three blocks of C4 and C5, C2 and C3, and C1.</caption>
		<alt>Model summary for contrasts of main and interaction effects with variables entered in three blocks of C4 and C5, C2 and C3, and C1.</alt>
	</figure></P>
<P>Note that the value of R<SUP>2</SUP> change is greater for Gender (C1) if it is entered last, rather than first.</P>
<P><h4>Main Effects of Dept Entered Third</h4></P>
<P>	<figure>
		<description>Model summary for contrasts of main and interaction effects with variables entered in three blocks of C4 and C5, C1, and C2 and C3.</description>
		<url>images/mlt08116.gif</url>
		<width>739</width>
		<height>244</height>
		<align></align>
		<caption>Model summary for contrasts of main and interaction effects with variables entered in three blocks of C4 and C5, C1, and C2 and C3.</caption>
		<alt>Model summary for contrasts of main and interaction effects with variables entered in three blocks of C4 and C5, C1, and C2 and C3.</alt>
	</figure></P>
<P>Note that the value of R<SUP>2</SUP> change is only changed slightly depending upon when it was entered into the model. The pattern of results of the <index>significance tests</index> would not change.</P>
<P><h4>Main Effect of Gender Given Rank, Dept, Gender X Rank, Gender X Dept, Years, Merit</h4></P>
<P>The dummy coded contrasts can be used like any other variables in a multiple regression analysis. In order to find the significance of the effect of <SPSSVar>Gender</SPSSVar> given <SPSSVar>Rank</SPSSVar>, <SPSSVar>Dept</SPSSVar>, <SPSSVar>Gender X Rank</SPSSVar>, <SPSSVar>Gender X Dept</SPSSVar>, <SPSSVar>Years</SPSSVar>, and <SPSSVar>Merit</SPSSVar>, the <SPSSVar>Rank</SPSSVar> and <SPSSVar>Gender X Rank</SPSSVar> effects must be created as <index>dummy coded contrasts</index>. In the following data file the <SPSSVar>Rank</SPSSVar> main effect consists of two contrasts: C2a contrasting Full professors with Assistant and Associate professors and C3a contrasting Assistant with Associate professors. The <SPSSVar>Gender X Rank</SPSSVar> interaction contrasts (C4a and C5a) are constructed by multiplying the <SPSSVar>Gender</SPSSVar> contrast (C1) times the two contrasts for the main effect for Rank.</P>
<P></P>
<P><table cellPadding="4" cellSpacing="0" summary = "Contrast coding for rank, gender, and rank X gender interaction." title="Contrast coding for rank, gender, and rank X gender interaction."><tcaption>Contrast coding for rank, gender, and rank X gender interaction.</tcaption>
<TR><TH><SPSSVar>Gender</SPSSVar></TH><TH><SPSSVar>Rank</SPSSVar></TH><TH><SPSSVar>C1</SPSSVar></TH><TH><SPSSVar>C2a</SPSSVar></TH><TH><SPSSVar>C3a</SPSSVar></TH><TH><SPSSVar>C4a</SPSSVar></TH><TH><SPSSVar>C5a</SPSSVar></TH></TR>
<TR><TD>0</TD><TD>1</TD><TD>-1</TD><TD>1</TD><TD>1</TD><TD>-1</TD><TD>-1</TD></TR>
<TR><TD>0</TD><TD>2</TD><TD>-1</TD><TD>1</TD><TD>-1</TD><TD>-1</TD><TD>1</TD></TR>
<TR><TD>0</TD><TD>3</TD><TD>-1</TD><TD>-2</TD><TD>0</TD><TD>2</TD><TD>0</TD></TR>
<TR><TD>1</TD><TD>1</TD><TD>1</TD><TD>1</TD><TD>1</TD><TD>1</TD><TD>1</TD></TR>
<TR><TD>1</TD><TD>2</TD><TD>1</TD><TD>1</TD><TD>-1</TD><TD>1</TD><TD>-1</TD></TR>
<TR><TD>1</TD><TD>3</TD><TD>1</TD><TD>-2</TD><TD>0</TD><TD>-2</TD><TD>0</TD></TR>
</table></P>

<P>The additional dummy coded variables are added to the data file in the following.</P>
<P>	<figure>
		<description>Data matrix with contrast coding for rank, gender, and rank X gender interaction.</description>
		<url>images/mlt08117.gif</url>
		<width>480</width>
		<height>648</height>
		<align></align>
		<caption>Data matrix with contrast coding for rank, gender, and rank X gender interaction.</caption>
		<alt>Data matrix with contrast coding for rank, gender, and rank X gender interaction.</alt>
	</figure></P>
<P>Salary is predicted in six blocks (only two are really needed) in the following multiple regression analysis.  In a simplified analysis, the first block would contain all variables except Gender (C1) and the second would contain only Gender (C1).</P>
<P>	<figure>
		<description>Model summary with contrast coding for rank, gender, and rank X gender interaction.</description>
		<url>images/mlt08118.gif</url>
		<width>739</width>
		<height>362</height>
		<align></align>
		<caption>Model summary with contrast coding for rank, gender, and rank X gender interaction.</caption>
		<alt>Model summary with contrast coding for rank, gender, and rank X gender interaction.</alt>
	</figure></P>
<P>As can be seen, the R<SUP>2</SUP> change for Gender has increased to a value of .120 which is significant. The value of multiple R is not really 1.000, but very high, close to 1.000. For that reason the error variance is extremely small, resulting in significant effects. This illustrates the problem of fitting too few data points with too many parameters.</P>
		<TestItem type="MC">
			<question>When fitting too many parameters with too few data points</question>
			<answer type="correct">the value of the unadjusted multiple R will be close to 1.0.</answer>
			<answer>interaction effects will seldom be significant.</answer>
			<answer>the standard error of estimate will become very large.</answer>
			<answer>collinearity will be a problem.</answer>
			<difficulty></difficulty>
			<discriminability></discriminability>
			<author>David Stockburger</author>
			<date>03/24/2001</date>
			<concept>Multiple Regression with Categorical Variables</concept>
		</TestItem>
<P>If all the effects mentioned above are entered into the model in a single block, the coefficients table appears as follows.</P>
<P>	<figure>
		<description>Coefficients table with contrast coding for rank, gender, and rank X gender interaction.</description>
		<url>images/mlt08119.gif</url>
		<width>527</width>
		<height>366</height>
		<align></align>
		<caption>Coefficients table with contrast coding for rank, gender, and rank X gender interaction.</caption>
		<alt>Coefficients table with contrast coding for rank, gender, and rank X gender interaction.</alt>
	</figure></P>
<P>A has been described earlier, the "Sig." column is the significance level of that variable if it is entered last in the regression model. Since t<SUP>2</SUP> = F, it is noted that 77.205<SUP>2</SUP> is equal to 5960.619, within rounding error. In this case, every variable except C4 and Years is statistically significant.</P>
<P>The alert reader has probably noted that other <index>interaction terms</index> could be created and entered into the regression model. For example, four dummy coded contrasts could be created such that a <SPSSVar>Rank X Dept</SPSSVar> interaction could be found. Multiplying this by the <SPSSVar>Gender</SPSSVar> contrast (C1) would result in a three-way <SPSSVar>Gender X Rank X Dept</SPSSVar> interaction.</P>
</section>
<section>
<P><h2>ANOVA using General Linear Model in SPSS</h2></P>
<P>Although the dummy coding of variables in multiple regression results in considerable flexibility in the analysis of categorical variables, it can also be tedious to program. For this reason most statistical packages have made a program available that automatically creates dummy coded variables and performs the appropriate statistical analysis. In most cases the user is unaware of the calculations being performed in the computer program. This is the case with the <index>General Linear Model program in SPSS</index>.</P>
<P>This program is selected in SPSS by <SPSSCommand>Analyze/General Linear Model/GLM - General Factorial...</SPSSCommand>. To perform the Gender by Department analysis discussed earlier in this section, enter <SPSSVar>Salary</SPSSVar> as the dependent measure and <SPSSVar>Gender</SPSSVar> and <SPSSVar>Dept</SPSSVar> as fixed factors. The screen should appear as follows.</P>
<P>	<figure>
		<description>The general linear model user interface for SPSS is illustrated. The variable Salary has been entered as the dependent variable and Gender and Dept have been entered as fixed factors.</description>
		<url>images/mlt08120.gif</url>
		<width>357</width>
		<height>360</height>
		<align></align>
		<caption>SPSS user interface for factorial ANOVA.</caption>
		<alt>SPSS user interface for factorial ANOVA.</alt>
	</figure>
</P>
<P>Click <SPSSCommand>OK</SPSSCommand> and the results are as follows.</P>
<P>	<figure>
		<description>SPSS output for the factorial ANOVA is shown. The table has entries for Gender, Dept, Gender X Dept, error, and total. The significance levels for the various effects are: .390, .005, and .569.</description>
		<url>images/mlt08121.gif</url>
		<width>562</width>
		<height>300</height>
		<align></align>
		<caption>SPSS output for the factorial ANOVA.</caption>
		<alt>SPSS output for the factorial ANOVA.</alt>
	</figure></P>
<P>Note that the "F" column and "Sig." column is identical to the results of the R<SUP>2</SUP> change analysis presented earlier in this chapter if each of the effects is entered last. This is the meaning of the default "<index>Type III Sum of Squares</index>."</P>
<P>The interpretation of "effects," the result of the dummy coding of categorical variables, is the subject of the next chapter.</P>
	<TestItem type="MC">
		<question> Which of the following statements is true </question>
		<answer type="incorrect">Multiple Regression is a special case of ANOVA.</answer>
		<answer type="correct">Any independent system of dummy codes will work equally well in the overall ANOVA.</answer>
		<answer type="incorrect">Dummy codes are unnecessary if the categorical variables are uncorrelated.</answer>
		<answer type="incorrect">Dummy codes are a special case of suppressor variables.</answer>
		<difficulty></difficulty>
		<discriminability></discriminability>
		<author>David Stockburger</author>
		<date>03/05/2001</date>
		<concept></concept>
	</TestItem>
	<TestItem type="MC">
		<question>Changing the order that main and interaction effects are entered into a multiple regression equation will have an effect on the R<sup>2</sup> change values if </question>
		<answer type="incorrect">the contrasts are uncorrelated</answer>
		<answer type="correct">the cell sizes are not equal</answer>
		<answer type="incorrect">suppressor variables are entered first</answer>
		<answer type="incorrect">within groups mean square variance is greater the 37.54</answer>
		<difficulty></difficulty>
		<discriminability></discriminability>
		<author>David Stockburger</author>
		<date>03/05/2001</date>
		<concept></concept>
	</TestItem>
	<TestItem type="MC">
		<question>In a study with two categorical variables, A and B, if A had 5 levels and B had 4 levels, how many contrasts would be necessary to explain the variance predicted by the combinations of these variables </question>
		<answer type="incorrect">15</answer>
		<answer type="correct">19</answer>
		<answer type="incorrect">20</answer>
		<answer type="incorrect">45</answer>
		<difficulty></difficulty>
		<discriminability></discriminability>
		<author>David Stockburger</author>
		<date>03/05/2001</date>
		<concept></concept>
	</TestItem>
	<TestItem type="MC">
		<question> Which contrast compared the two types of music </question>
		<figure>
			<description>A description of an experiment, a contrast table, a table of regression coefficients, and a table of means and standard deviations.</description>
			<url>Contrast1.gif</url>
			<width>523</width>
			<height>363</height>
			<align></align>
			<caption></caption>
			<alt>Contrast Tables</alt>
		</figure>
		<answer type="incorrect">c1</answer>
		<answer type="incorrect">c2</answer>
		<answer type="incorrect">c3</answer>
		<answer type="incorrect">c4</answer>
		<answer type="correct">c5</answer>
		<difficulty></difficulty>
		<discriminability></discriminability>
		<author>David Stockburger</author>
		<date>03/05/2001</date>
		<concept></concept>
	</TestItem>
	<TestItem type="MC">
		<question>Which contrast was least significant </question>
		<figure>
			<description>A description of an experiment, a contrast table, a table of regression coefficients, and a table of means and standard deviations.</description>
			<url>Contrast1.gif</url>
			<width>523</width>
			<height>363</height>
			<align></align>
			<caption></caption>
			<alt>Contrast Tables</alt>
		</figure>
		<answer type="incorrect">c1</answer>
		<answer type="incorrect">c2</answer>
		<answer type="correct">c3</answer>
		<answer type="incorrect">c4</answer>
		<answer type="incorrect">c5</answer>
		<difficulty></difficulty>
		<discriminability></discriminability>
		<author>David Stockburger</author>
		<date>03/05/2001</date>
		<concept></concept>
	</TestItem>
	<TestItem type="MC">
		<question>White noise was significantly different from street noise </question>
		<figure>
			<description>A description of an experiment, a contrast table, a table of regression coefficients, and a table of means and standard deviations.</description>
			<url>Contrast1.gif</url>
			<width>523</width>
			<height>363</height>
			<align></align>
			<caption></caption>
			<alt>Contrast Tables</alt>
		</figure>
		<answer type="incorrect">true</answer>
		<answer type="correct">false</answer>
		<answer type="incorrect">cannot tell from the given data</answer>
		<difficulty></difficulty>
		<discriminability></discriminability>
		<author>David Stockburger</author>
		<date>03/05/2001</date>
		<concept></concept>
	</TestItem>
	<TestItem type="MC">
		<question>White noise was significantly different from absolute quiet </question>
		<figure>
			<description>A description of an experiment, a contrast table, a table of regression coefficients, and a table of means and standard deviations.</description>
			<url>Contrast1.gif</url>
			<width>523</width>
			<height>363</height>
			<align></align>
			<caption></caption>
			<alt>Contrast Tables</alt>
		</figure>
		<answer type="incorrect">true</answer>
		<answer type="incorrect">false</answer>
		<answer type="correct">cannot tell from the given data</answer>
		<difficulty></difficulty>
		<discriminability></discriminability>
		<author>David Stockburger</author>
		<date>03/05/2001</date>
		<concept></concept>
	</TestItem>
	<TestItem type="MC">
		<question> The music and crying baby were more disruptive than the other conditions </question>
		<figure>
			<description>A description of an experiment, a contrast table, a table of regression coefficients, and a table of means and standard deviations.</description>
			<url>Contrast1.gif</url>
			<width>523</width>
			<height>363</height>
			<align></align>
			<caption></caption>
			<alt>Contrast Tables</alt>
		</figure>
		<answer type="correct">true</answer>
		<answer type="incorrect">false</answer>
		<answer type="incorrect">cannot tell from the given data</answer>
		<difficulty></difficulty>
		<discriminability></discriminability>
		<author>David Stockburger</author>
		<date>03/05/2001</date>
		<concept></concept>
	</TestItem>
</section>
<section>
<P><h2>Summary</h2></P>
<P>This chapter discussed how categorical variables with more than two levels could be used in a multiple regression prediction model. The procedure is called dummy coding and involves creating a number of dichotomous categorical variables from a single categorical variable with more than two levels. The text showed how any number of different coding systems would result in similar overall statistical decisions. It was also argued that some coding systems contain greater information about specific statistical decisions and are to be preferred over coding systems that provide less information. The usefulness of one type of coding system, that of main and interaction effects, was demonstrated when there were combinations of categorical variables. The similarity of this system of dummy coding and multifactor ANOVA was demonstrated.</P>
</section>
</chapter>

