<?xml version='1.0'?>
<?xml:stylesheet type="text/xsl" href="MultiBook.xsl" ?>
<chapter>
<number>2</number>
<author>David W. Stockburger</author>
<title> Discriminant Function Analysis</title>
<modified>8/31/2000</modified>
<URL>mlt03.xml</URL>
<section>
<definition word="discriminant function analysis">a procedure to predict group membership of preexisting groups based on linear combinations of interval variables.</definition>
<P>The main purpose of a <index>discriminant function analysis</index> is to predict group membership based on a linear combination of the interval variables. The procedure begins with a set of observations where both <index>group membership</index> and the values of the interval variables are known.  The end result of the procedure is a model that allows prediction of group membership when only the interval variables are known. A second purpose of discriminant function analysis is an understanding of the data set, as a careful examination of the prediction model that results from the procedure can give insight into the relationship between group membership and the variables used to predict group membership. </P>
		<TestItem type="MC">
			<question>Discriminant function analysis</question>
			<answer type="correct">can give insight into the relationship between group membership and the variables used to predict group membership.</answer>
			<answer>allows prediction of group membership with categorical information.</answer>
			<answer>can categorize individuals into groups when the structure of the groups is unknown, but the relationships between individuals is known.</answer>
			<answer>can provide values for interval variables based on group membership.</answer>
			<difficulty></difficulty>
			<discriminability></discriminability>
			<author>David Stockburger</author>
			<date>03/05/2001</date>
			<concept>Discriminant Function Analysis</concept>
		</TestItem>
	<TestItem type="MC">
		<question>The method of choice when desiring to classify individuals into known groups is </question>
		<answer type="correct">discriminant function analysis</answer>
		<answer type="incorrect">Regression</answer>
		<answer type="incorrect">Cluster analysis</answer>
		<answer type="incorrect">t-tests</answer>
		<difficulty></difficulty>
		<discriminability></discriminability>
		<author></author>
		<date></date>
		<concept></concept>
	</TestItem>
</section>
<section>
<P><h3>Examples</h3></P>
<P>For example, a graduate admissions committee might divide a set of past graduate students into two groups: students who finished the program in five years or less and those who did not.  Discriminant function analysis could be used to predict successful completion of the <index>graduate program</index> based on GRE score and undergraduate grade point average. Examination of the prediction model might provide insights into how each predictor individually and in combination predicted completion or non-completion of a graduate program. </P>
<P>Another example might predict whether patients recovered from a <index>coma</index> or not based on combinations of demographic and treatment variables. The predictor variables might include age, sex, general health, time between incident and arrival at hospital, various interventions, etc. In this case the creation of the prediction model would allow a <index>medical practitioner</index> to assess the chance of recovery based on observed variables. The medical practitioner might be using the results of a discriminant function analysis when telling a patient that she has an 80% chance of recovery, even though the medical practitioner may have obtained this value by entering numbers into a computer program and have little concept of how it was computed. The prediction model might also give insight into how the variables interact in predicting recovery. </P>
		<TestItem type="MC">
			<question>When predicting whether a student would pass, withdraw, or fail a class, the statistical method of choice would be</question>
			<answer type="correct">discriminant function analysis.</answer>
			<answer>ANOVA.</answer>
			<answer>multiple regression with a dichotomous dependent variable.</answer>
			<answer>cluster analysis.</answer>
			<difficulty></difficulty>
			<discriminability></discriminability>
			<author>David Stockburger</author>
			<date>03/05/2001</date>
			<concept>Discriminant Function Analysis</concept>
		</TestItem>
		<TestItem type="MC">
			<question>Results of a discriminant function analysis</question>
			<answer type="correct">can be used by individuals without much statistical training.</answer>
			<answer>can only be understood by the statistically sophisticated.</answer>
			<answer>have theoretical, but not practical applications.</answer>
			<answer>are often expressed in terms of standard deviation units.</answer>
			<difficulty></difficulty>
			<discriminability></discriminability>
			<author>David Stockburger</author>
			<date>03/05/2001</date>
			<concept>Discriminant Function Analysis</concept>
		</TestItem></section>
<section>
<P><h3>The Simplest Case</h3></P>
<definition word="dichotomous group membership">the unit belongs to one of two groups.</definition>
<P>The simplest case of discriminant function analysis is the prediction of <index>dichotomous group membership</index> based on a single variable. An example of the simplest case is the prediction of successful completion of a graduate program based on the GRE verbal score. In this case, since the prediction model includes only a single variable, it gives little insight into how variables interact with each other in prediction. Thus prediction of group membership will be the major focus of the next section of this chapter.</P>
		<TestItem type="MC">
			<question>The simplest case of discriminant function analysis has</question>
			<answer type="correct">dichotomous groups and a single interval variable.</answer>
			<answer>a single dichotomous independent variable and three or more groups.</answer>
			<answer>a single dichotomous independent variable and two or more interval variables.</answer>
			<answer>a single interval independent and dependent variable.</answer>
			<difficulty></difficulty>
			<discriminability></discriminability>
			<author>David Stockburger</author>
			<date>03/05/2001</date>
			<concept>Discriminant Function Analysis</concept>
		</TestItem>
<P>With respect to the data file and purpose of analysis, this simplest case is identical to the case of <index>linear regression</index> with <index>dichotomous dependent variables</index>. As discussed previously, data of this type may be represented in any number of different forms: scatter plots, tables of means and standard deviations, and overlapping frequency polygons. Because <index>overlapping frequency polygons</index> have such an intuitive appeal, they will be used to describe how discriminant function analysis works.</P>
		<TestItem type="MC">
			<question>The simplest discriminant function analysis is identical to </question>
			<answer type="correct">simple linear regression with a dichotomous dependent variable.</answer>
			<answer>cluster analysis with a single variable.</answer>
			<answer>multiple t-tests.</answer>
			<answer>chi-square with dichotomous variables.</answer>
			<difficulty></difficulty>
			<discriminability></discriminability>
			<author>David Stockburger</author>
			<date>03/05/2001</date>
			<concept>Discriminant Function Analysis</concept>
		</TestItem>
</section>
<section>
<P><h3>Prediction Accuracy</h3></P>
<P>A single interval variable might discriminate between groups in an almost perfect fashion, not at all, or somewhere in between. For example, if one wished to differentiate adult males and females, one could collect information on how many <index>bras</index> the person owned, score on the last statistics test, and height. In the case of the number of bras, the discrimination would be very good, but not perfect (some women don't own any bras, some men do). In the case of the score on the last statistics test, little discrimination would be possible because <index>males</index> and <index>females</index> generally score about the same. In the case of height, some discrimination between adult males and females would be possible, but it would be far from perfect.</P>
<P>In general, the larger the difference between the means of the two groups relative to the within groups variability, the better the discrimination between the groups. The following program allows the student to explore data sets with different degrees of discrimination ability.</P>
<activity URL=" exercises/mlt03a.htm ">
	<description>Frequency Polygons and Means in Discriminant Analysis</description>
</activity>
		<TestItem type="MC">
			<question>The variable that might best discriminate between male and female students is</question>
			<answer type="correct">number of dresses hanging in their closet.</answer>
			<answer>amount of tuition paid.</answer>
			<answer>general intellectual ability.</answer>
			<answer>hours spent studying for last test.</answer>
			<difficulty></difficulty>
			<discriminability></discriminability>
			<author>David Stockburger</author>
			<date>03/05/2001</date>
			<concept>Discriminant Function Analysis</concept>
		</TestItem>
<P>The figure below shows the results of the program when the discrimination is set to low.</P>
<P>
	<figure>
		<description>Two overlapping frequency distributions appear in the top graph.  One, drawn in green, is on the left hand side of the screen.  The other, drawn in blue, is on the right-hand side.  There is considerable overlap between the two graphs. The option under the graph for discrimination is set to low. The mean and standard deviation for the green graph is 12 and 4.5, respectively.  The mean and standard deviation for the blue graph is 14.8 and 2.3.</description>
		<url>Images/mlt0362.gif</url>
		<width>514</width>
		<height>372</height>
		<align></align>
		<caption>Differences between frequency polygons when discrimination is set to low</caption>
		<alt> Frequency Polygons and Means in Discriminant Analysis Example</alt>
	</figure> 
</P>
<P>The next figure shows the results when the discrimination is set to high.</P>
<P> 	<figure>
		<description> Two overlapping frequency distributions appear in the top graph.  One, drawn in green, is on the left hand side of the screen.  The other, drawn in blue, is on the right-hand side.  There is very little overlap between the two graphs. The option under the graph for discrimination is set to high. The mean and standard deviation for the green graph is 10.5 and 5.28, respectively.  The mean and standard deviation for the blue graph is 18.17 and 1.27.</description>
		<url> Images/mlt0357.gif </url>
		<width>508</width>
		<height>366</height>
		<align></align>
		<caption> Differences between frequency polygons when discrimination is set to high</caption>
		<alt> Frequency Polygons and Means in Discriminant Analysis Example </alt>
	</figure></P>
<definition word="discriminability">ability to separate into groups.</definition>
<P>Note that the two frequency polygons overlap a great deal when there is little or no <index>discriminability</index> between groups and hardly at all when there is high discriminability.  In the same vein, the means are fairly similar relative to their standard deviations in the low discriminability condition and different in the high discriminability condition.</P>
		<TestItem type="MC">
			<question>When discrimination between groups is low in overlapping relative frequency polygons</question>
			<answer type="correct"> means are fairly similar relative to their standard deviations.</answer>
			<answer> standard deviations are fairly similar relative to their means.</answer>
			<answer>there is very little overlap between the distributions.</answer>
			<answer>both distribution are assumed to have the same standard deviation (homogeneity of variance).</answer>
			<difficulty></difficulty>
			<discriminability></discriminability>
			<author>David Stockburger</author>
			<date>03/05/2001</date>
			<concept>Discriminant Function Analysis</concept>
		</TestItem>
</section>
<section>
<P><h3>Modeling the Data</h3></P>
<P>Discriminant function analysis is based on modeling the interval variable for each group with a <index>normal curve</index>. The mean of each group is used an estimate of mu for that group. Sigma for each group can be estimated by using weighted mean of the within group variances or using the standard deviation of that group. In the case of the weighted mean the variances are weighted by sample size and can be calculated either as the denominator for a nested t-test or as the square root of the <index>Mean Squares Within Groups</index> in an <index>ANOVA</index>, providing identical estimates. When using the weighted mean of the variances, one must assume that the <index>generating function</index> for each group produces numbers that in the long run have the same variability. </P>
		<TestItem type="MC">
			<question>In discriminant function analysis, the underlying assumption is that the distribution of the interval variable(s) is modeled by</question>
			<answer type="correct">a normal distribution.</answer>
			<answer>a uniform distribution.</answer>
			<answer>a variation of the F distribution.</answer>
			<answer>a variation of the chi-square distribution.</answer>
			<difficulty></difficulty>
			<discriminability></discriminability>
			<author>David Stockburger</author>
			<date>03/05/2001</date>
			<concept>Discriminant Function Analysis</concept>
		</TestItem>
	<TestItem type="MC">
		<question>Assumptions made when computing probabilities of group membership in discriminant function analysis include </question>
		<answer type="incorrect">equality of means for each group</answer>
		<answer type="correct">equality of variances for each group</answer>
		<answer type="incorrect">multicollinearity</answer>
		<answer type="incorrect">heteroscedastisity</answer>
		<difficulty></difficulty>
		<discriminability></discriminability>
		<author></author>
		<date></date>
		<concept></concept>
	</TestItem>
<P>In the simple case of dichotomous groups and a single predictor variable, it really does not make a great deal of difference in the complexity of the model if the variability of each groups is assumed to be equal or not. This is not true, however, when more groups and more predictor variables are added to the model. For that reason, the <index>assumption of equality of within group variance</index> is almost universal in discriminant function analysis.</P>
<P>The following program allows the student to explore the relationship between different generating functions (poor, medium, or good discrimination; equal or unequal variances), sample size, and resulting model based on the sample. The student should verify that larger sample sizes provide resulting models that are more similar to the generating model. The student should also explore the effect of <index>violations of the equality of variance assumptions</index> on the resulting model.</P>
<definition word="generating model">a theoretical probability model that is used to generate random data.</definition>
<activity URL=" exercises/mlt03b.htm ">
	<description> Modeling of Frequency Polygons in Discriminant Analysis </description>
</activity>
		<TestItem type="MC">
			<question>In the discriminant function analysis program that allows the student to explore the relationship between different generating functions (poor, medium, or good discrimination; equal or unequal variances), sample size, and the resulting model based on the sample, the larger the sample size of each group</question>
			<answer type="correct">the more similar the resulting model is to the generating model.</answer>
			<answer>the more unequal the variances.</answer>
			<answer>the better the discrimination between the groups.</answer>
			<answer>the more equal the variances.</answer>
			<difficulty></difficulty>
			<discriminability></discriminability>
			<author>David Stockburger</author>
			<date>03/05/2001</date>
			<concept>Discriminant Function Analysis</concept>
		</TestItem>
<P>
	<figure>
		<description>Three graphs are shown, labeled theory, actual, and model.  The theory graph illustrates the hypothetical distribution from which the data was selected.  The actual graph presents a graph of the randomly selected sample from the hypothetical distribution.  The model is a graph of the hypothetical distribution estimated from the actual data.  The figure also presents the various options available when recomputing the results.  Options include discrimination index, sample size, true variances, and assumptions about variances.  The student should verify that the larger the sample size the closer the model is to the theoretical distribution.  In addition, if the assumption of equal variance is met, the model should better closer the model to the theory.</description>
		<url> Images/mlt0358.gif </url>
		<width>456</width>
		<height>399</height>
		<align></align>
		<caption>Example of computer exercise on modeling of frequency polygons in discriminant analysis.</caption>
		<alt> Example of computer exercise on modeling of frequency polygons in discriminant analysis.</alt>
	</figure>
</P> 
</section>
<section>
<P><h3>Probabilities of Group Membership</h3></P>
<P>The <index>normal curve</index> models of the predictor variables for each group and can be used to provide probability estimates of a particular score given membership in a particular group. In <index>discriminant function analysis</index>, the area in the tails under a normal curve model for a given group between points equally distant from <index>mu</index> is the probability of either point given that group. This probability is symbolized as <index>P(D/G) </index> on SPSS output.</P>
<definition word="P(D/G)">the probability of the data given the group.</definition>
<definition word="P(G/D)">the probability of the group given the data.</definition>
<P>For example, suppose that the normal curve model for a given group has a value of 13 for mu and 2 for sigma. The <index>probability</index> of a score of 16 would be the area in the tails of this normal curve between 10 and 16. The value of 10 was selected as the low score because 16 is three units above the mean (16-13 = 3) and 10 is three sigma units below the mean (13-3 = 10). This is all much easier visualized than stated.</P>
		<TestItem type="MC">
			<question>In discriminant function analysis, suppose that the model for a given group had a value of 201 for mu and 10 for sigma, what would be the probability of the data given the group for a score of 213?</question>
			<answer type="correct">.23</answer>
			<answer>.77</answer>
			<answer>.213</answer>
			<answer>.201</answer>
			<difficulty></difficulty>
			<discriminability></discriminability>
			<author>David Stockburger</author>
			<date>03/05/2001</date>
			<concept>Discriminant Function Analysis</concept>
		</TestItem>
		<TestItem type="MC">
			<question>In discriminant function analysis, suppose that the model for a given group had a value of 201 for mu and 18 for sigma, what would be the probability of the data given the group for a score of 213?</question>
			<answer type="correct">.5</answer>
			<answer>.77</answer>
			<answer>.218</answer>
			<answer>.447</answer>
			<difficulty></difficulty>
			<discriminability></discriminability>
			<author>David Stockburger</author>
			<date>03/05/2001</date>
			<concept>Discriminant Function Analysis</concept>
		</TestItem>
		<TestItem type="MC">
			<question>In discriminant function analysis, suppose that the model for a given group had a value of 201 for mu and 10 for sigma, what score would have an identical probability of the data given the group as a score of 213?</question>
			<answer type="correct">189</answer>
			<answer>200</answer>
			<answer>211</answer>
			<answer>191</answer>
			<difficulty></difficulty>
			<discriminability></discriminability>
			<author>David Stockburger</author>
			<date>03/05/2001</date>
			<concept>Discriminant Function Analysis</concept>
		</TestItem>
<P>
	<figure>
		<description>A normal curve where mu equals 13 and sigma equals two is illustrated.  The area left of a score of 10 and right of a score of 16 is colored yellow.  The yellow area is .134, while the white area in the middle of the curve is labeled with a value of .866.</description>
		<url> Images/Mlt0354.gif </url>
		<width>416</width>
		<height>171</height>
		<align></align>
		<caption>Two-tailed probability of a score of 10 on a normal curve with mu=13 and sigma=2.</caption>
		<alt> Two-tailed probability of a score of 10 on a normal curve with mu=13 and sigma=2.</alt>
	</figure>
</P> 
<P>This area could be found using the <index>probability calculator</index> by finding the <index>two-tailed significance level</index> for a score of 10 under a normal curve with mu equal to 13 and sigma equal to 2.</P>
<P>
	<figure>
		<description>The probability calculator is shown. The two-tailed significance level under the normal curve button is selected.  A value of 13 is entered for mu, 2 for sigma, and 10 for the score text box. After entering mu, sigma, and the score, the right arrow is clicked and a value of .134 (rounded) can be viewed in the right hand text box labeled probability.</description>
		<url> Images/Mlt0351.gif </url>
		<width>509</width>
		<height>289</height>
		<align></align>
		<caption>Finding P(D/G) using the Probability Calculator.</caption>
		<alt> Finding P(D/G) using the Probability Calculator </alt>
	</figure>
</P> 
<ProbabilityCalculator/> 
<P>A score of 10 would have the same probability as a score of 16 because it is an equal distance from mu.</P>
<P>In a similar manner the probability of membership in this group given a score of 11 could be found as the area in the tails of a <index>normal curve</index> with mu=13 and sigma=2 between a score of 11 and 15.</P>
<P>
	<figure>
		<description> A normal curve where mu equals 13 and sigma equals two is illustrated.  The area left of a score of 11 and right of a score of 15 is colored yellow.  The yellow area is .317, while the white area in the middle of the curve is labeled with a value of .683.</description>
		<url> Images/Mlt0355.gif </url>
		<width>423</width>
		<height>178</height>
		<align></align>
		<caption> Two-tailed probability of a score of 11 on a normal curve with mu=13 and sigma=2.</caption>
		<alt> Two-tailed probability of a score of 11 on a normal curve with mu=13 and sigma=2.</alt>
	</figure>
</P> 
<P>
	<figure>
		<description> The probability calculator is shown. The two-tailed significance level under the normal curve button is selected.  A value of 13 is entered for mu, 2 for sigma, and 11 for the score text box. After entering mu, sigma, and the score, the right arrow is clicked and a value of .317 (rounded) can be viewed in the right hand text box labeled probability.</description>
		<url> Images/Mlt0352.gif </url>
		<width>508</width>
		<height>291</height>
		<align></align>
		<caption> Finding P(D/G) using the Probability Calculator.</caption>
		<alt> Finding P(D/G) using the Probability Calculator.</alt>
	</figure>
</P> 
<P>If the second group had a mean of 17 and the same value for sigma (2), then the probability of a score of 16 would be equal to .617 and could be found and visualized as follows:</P>
<P>
	<figure>
		<description> A normal curve where mu equals 17 and sigma equals two is illustrated.  The area left of a score of 16 and right of a score of 18 is colored yellow.  The yellow area is .617, while the white area in the middle of the curve is labeled with a value of .383.</description>
		<url> Images/Mlt0341.gif </url>
		<width>360</width>
		<height>164</height>
		<align></align>
		<caption> Two-tailed probability of a score of 16 on a normal curve with mu=17 and sigma=2.</caption>
		<alt> Two-tailed probability of a score of 16 on a normal curve with mu=17 and sigma=2.</alt>
	</figure>
</P> 
<P>
	<figure>
		<description> The probability calculator is shown. The two-tailed significance level under the normal curve button is selected.  A value of 17 is entered for mu, 2 for sigma, and 16 for the score text box. After entering mu, sigma, and the score, the right arrow is clicked and a value of .383 (rounded) can be viewed in the right hand text box labeled probability.</description>
		<url> Images/Mlt0353.gif </url>
		<width>505</width>
		<height>293</height>
		<align></align>
		<caption> Finding P(D/G) using the Probability Calculator.</caption>
		<alt> Finding P(D/G) using the Probability Calculator.</alt>
	</figure>
</P> 
<P>Computation of low and high scores for large numbers of points could be tedious and is best left to computers. A program to compute P(D/G) for two groups requires that the user enter the means for the two groups, the value for <index>Mean Square Within Groups</index> (from the <index>ANOVA</index> source table), and the score value. Using the preceding examples with group means of 13 and 17, a Mean Square Within of 4 (2<SUP>2</SUP>), and a score of 16 would generate the following results:</P>
<definition word="mean squares within">the denominator of the F ratio. A measure of error within groups.</definition>
<activity URL=" exercises/discrim.htm ">
	<description> Discriminant Analysis Probabilities Program </description>
</activity>
<P>
	<figure>
		<description>A form labeled Discriminant Function is shown.  To use this form, the user enters the means of the two groups, the prior probabilities, the mean square within from the ANOVA table, and a score. After clicking on the Compute button, the P(D/G) and P(G/D) is shown for both groups.  In the example, the means for the two groups are 13 and 17, the prior probabilities are .9 and .1, the score is equal to 16, and the mean square within is 4. The values of P(D/G) are .134 and .617 for the two groups and similar values for P(G/D) are .024 and .976.</description>
		<url> Images/Mlt0345.gif </url>
		<width>316</width>
		<height>186</height>
		<align></align>
		<caption>Using the Discriminant Function Calculator to find P(D/G) and P(G/D).</caption>
		<alt> Using the Discriminant Function Calculator to find P(D/G) and P(G/D).</alt>
	</figure>
</P> 
<P><index>Interpretation of P(G/D)</index> is the likelihood of membership in a group given a particular score. In some cases involving extreme scores, the likelihood of belonging to either group will be small. In other cases involving scores that fall almost equidistant from either mean, the likelihood of belonging to either group will be similar. Rather than simply observing predicted group membership, the student is advised to check probabilities of membership in all groups.</P>
		<TestItem type="MC">
			<question>P(G/D) may be interpreted as</question>
			<answer type="correct">the likelihood of membership in a group given a particular score.</answer>
			<answer> the likelihood of a particular score given membership in a particular group.</answer>
			<answer>the probability of D divided by G.</answer>
			<answer>the likelihood of the obtained mean given the true value for mu.</answer>
			<difficulty></difficulty>
			<discriminability></discriminability>
			<author>David Stockburger</author>
			<date>03/05/2001</date>
			<concept>Discriminant Function Analysis</concept>
		</TestItem>
		<TestItem type="MC">
			<question>P(D/G) may be interpreted as</question>
			<answer>the likelihood of membership in a group given a particular score.</answer>
			<answer type="correct"> the likelihood of a particular score given membership in a particular group.</answer>
			<answer>the probability of D divided by G.</answer>
			<answer>the likelihood of the obtained mean given the true value for mu.</answer>
			<difficulty></difficulty>
			<discriminability></discriminability>
			<author>David Stockburger</author>
			<date>03/05/2001</date>
			<concept>Discriminant Function Analysis</concept>
		</TestItem>
</section>
<section>
<P><h3>Prior Probabilities</h3></P>
	<TestItem type="MC">
		<question>In general, in discriminant function analysis, the larger the difference between the means of the groups, </question>
		<answer type="incorrect">the higher the value of the estimated error term</answer>
		<answer type="incorrect">the poorer the prediction accuracy</answer>
		<answer type="incorrect">the larger the difference in the prior probabilities</answer>
		<answer type="correct">the better the discrimination between groups</answer>
		<difficulty></difficulty>
		<discriminability></discriminability>
		<author></author>
		<date></date>
		<concept></concept>
	</TestItem>
	<TestItem type="MC">
		<question>In  a recent election, the local bond issue for the school system was soundly defeated.  If a researcher wished to predict voting behavior (for or against) using discriminant function analysis, she would set the prior probabilities </question>
		<answer type="incorrect">as equal.</answer>
		<answer type="incorrect">higher for the "for" group than for the "against" group</answer>
		<answer type="incorrect">as close as possible to one</answer>
		<answer type="correct">higher for the "against" group than for the "for" group</answer>
		<difficulty></difficulty>
		<discriminability></discriminability>
		<author></author>
		<date></date>
		<concept></concept>
	</TestItem>
	<TestItem type="MC">
		<question>A high posterior probability </question>
		<answer type="incorrect">will always results from a high prior probability.</answer>
		<answer type="incorrect">will result when the MS error is high.</answer>
		<answer type="correct">can be obtained when it is unlikely that the score can belong to either group.</answer>
		<answer type="incorrect">will result when discriminability is low.</answer>
		<difficulty></difficulty>
		<discriminability></discriminability>
		<author></author>
		<date></date>
		<concept></concept>
	</TestItem>
<definition word="prior probabilities">the likelihood of belonging to a particular group given no other information is available. They are symbolized by P(G).</definition>
<P><index>Prior probabilities</index> are the likelihood of belonging to a particular group given no information about the person is available. In the classical testing literature prior probabilities are called<I> <index>base rates</index></I>. Prior probabilities influence our decisions about group membership. Prior probabilities will be symbolized as <index>P(G) </index>. For example, P(G<SUB>1</SUB>) is the prior probability of a score belonging to group 1.</P>
		<TestItem type="MC">
			<question>Prior probabilities</question>
			<answer type="correct">are also called base rates.</answer>
			<answer>is the likelihood of the group given the data.</answer>
			<answer>is the likelihood of the data given the group.</answer>
			<answer>have no influence on decisions about group membership.</answer>
			<difficulty></difficulty>
			<discriminability></discriminability>
			<author>David Stockburger</author>
			<date>03/05/2001</date>
			<concept>Discriminant Function Analysis</concept>
		</TestItem>
<P>Consider the case of prediction of completion of graduate school using GRE verbal scores. Suppose that 99% of the students who started the graduate program successfully completed the program (it was a really easy program). Even if a student scored considerably lower than the mean of the successful group, program completion would most likely be predicted because almost everyone finishes. Likewise, in a graduate program where less than 10% of beginning students complete the program, the likelihood of completion will be fairly low no matter how high the GRE verbal score.</P>
		<TestItem type="MC">
			<question>If the prior probability of belonging to a particular group is very low</question>
			<answer type="correct">the posterior probability for that group will most likely be low.</answer>
			<answer>the probability of the data given the group will most likely be low.</answer>
			<answer>the probability of the data given the group will most likely be high.</answer>
			<answer>the posterior probability for that group will be unaffected.</answer>
			<difficulty></difficulty>
			<discriminability></discriminability>
			<author>David Stockburger</author>
			<date>03/05/2001</date>
			<concept>Discriminant Function Analysis</concept>
		</TestItem>
<definition word="Bayes theorem">a formula to combine conditional and prior probabilities to compute posterior probabilities. </definition>
<definition word="posterior probabilities">the probability of belonging to a particular group given additional information about the unit.</definition>
<P>As discussed in the chapter on probabilities in the introductory text, <index>Bayes Theorem</index> provides a means to transform prior probabilities into posterior probabilities. In the case of <index>discriminant function analysis</index>, prior probabilities P(G) are transformed into the <index>posterior probabilities</index> of group membership given a particular score <index>P(G/D)</index> using information about the discriminating variables. The formula for computing P(G/D) using Bayes Theorem is as follows:</P>
		<TestItem type="MC">
			<question>In discriminant function analysis, transformation from prior to posterior probabilities is</question>
			<answer type="correct">accomplished using Bayes Theorem.</answer>
			<answer>a result of a standard score transformation.</answer>
			<answer>expressed as a ratio of variances.</answer>
			<answer>seldom done in the application of the technique because of the rigorous underlying assumptions.</answer>
			<difficulty></difficulty>
			<discriminability></discriminability>
			<author>David Stockburger</author>
			<date>03/05/2001</date>
			<concept>Discriminant Function Analysis</concept>
		</TestItem>
<P>
	<figure>
		<description>Bayes theorem is illustrated. The probability of P(G sub 1/D) is equal to the P(D/G sub 1) times P(G sub 1) divided by the quantity P(D/G sub 1) times P(G sub 1) plus P(D/G sub 2) times P(G sub 2) </description>
		<url> Images/Mlt0346.gif </url>
		<width>294</width>
		<height>70</height>
		<align></align>
		<caption> Bayes theorem. </caption>
		<alt> Bayes theorem. </alt>
	</figure>
</P> 
<P>For example, if the prior probability of membership in group 1 was .10, P(G<SUB>1</SUB>)=.10, and the probability of a score of X=16 given membership in group 1 was .134, P(D=16/G<SUB>1</SUB>)=.134, then the posterior probability, P(G<SUB>1</SUB>/D=16, would be .024. Computation of this result can be seen below.</P>
		<TestItem type="MC">
			<question>For two groups, if the prior probability was .7 and .3, respectively, and the P(D/G) was .5 and .3, what would be the P(G/D) for group 1?</question>
			<answer type="correct">.795</answer>
			<answer>.595</answer>
			<answer>.625</answer>
			<answer>.7</answer>
			<difficulty></difficulty>
			<discriminability></discriminability>
			<author>David Stockburger</author>
			<date>03/05/2001</date>
			<concept>Discriminant Function Analysis</concept>
		</TestItem>
		<TestItem type="MC">
			<question>For two groups, if the prior probability was .7 and .3, respectively, and the P(D/G) was .5 and .3, what would be the P(G/D) for group 2?</question>
			<answer>.405</answer>
			<answer type="correct">.205</answer>
			<answer>.625</answer>
			<answer>.7</answer>
			<difficulty></difficulty>
			<discriminability></discriminability>
			<author>David Stockburger</author>
			<date>03/05/2001</date>
			<concept>Discriminant Function Analysis</concept>
		</TestItem>
<P>
	<figure>
		<description> Bayes theorem is illustrated. The probability of P(G sub 1/D) is equal to the P(D/G sub 1) times P(G sub 1) divided by the quantity P(D/G sub 1) times P(G sub 1) plus P(D/G sub 2) times P(G sub 2). In this case P(D/G sub 1) equals .134, P(D/G sub 2) equals .617, P(G sub 1) equals .1 and P(G sub 2) equals .9). The expression evaluates to .024.</description>
		<url> Images/Mlt0348.gif </url>
		<width>317</width>
		<height>106</height>
		<align></align>
		<caption>Application of Bayes Theorem.</caption>
		<alt> Application of Bayes Theorem </alt>
	</figure>
</P> 
<P>In a similar manner, if the prior probability of membership in group 2 was .90, P(G<SUB>2</SUB>)=.90, and the probability of a score of X=16, given membership in group 2 was .617, P(D=16/G<SUB>2</SUB>)=.617, then the posterior probability, P(G<SUB>2</SUB>/D=16 would be .976. Computation of this result can be seen below.</P>
<P>
	<figure>
		<description> Bayes theorem is illustrated. The probability of P(G sub 1/D) is equal to the P(D/G sub 1) times P(G sub 1) divided by the quantity P(D/G sub 1) times P(G sub 1) plus P(D/G sub 2) times P(G sub 2). In this case P(D/G sub 1) equals .617, P(D/G sub 2) equals .134, P(G sub 1) equals .9 and P(G sub 2) equals .1). The expression evaluates to .976.</description>
		<url> Images/Mlt0349.gif </url>
		<width>316</width>
		<height>106</height>
		<align></align>
		<caption> Application of Bayes Theorem.</caption>
		<alt> Application of Bayes Theorem.</alt>
	</figure>
</P> 
<P>These posterior probabilities can be compared with those where the prior probabilities of group membership is equal to understand the effect of setting <index>different prior probabilities</index> on the values of the posterior probabilities</P>
		<TestItem type="MC">
			<question>The posterior probabilities of two groups will be equal to their prior probabilities if</question>
			<answer type="correct">P(D/G) is the same for both groups.</answer>
			<answer>P(G) is the same for both groups.</answer>
			<answer>P(G/D) is the same for both groups.</answer>
			<answer>P(G)/P(G/D) is the same for both groups</answer>
			<difficulty></difficulty>
			<discriminability></discriminability>
			<author>David Stockburger</author>
			<date>03/05/2001</date>
			<concept>Discriminant Function Analysis</concept>
		</TestItem>
		<TestItem type="MC">
			<question>Everything else being equal, increasing the prior probability for group 1 in a two-groups discriminant function analysis</question>
			<answer type="correct">decreases the posterior probability for group 2.</answer>
			<answer>increases the P(D/G) for group 1.</answer>
			<answer>decreases the P(G/D) for group 1.</answer>
			<answer>will have no effect on the posterior probabilities.</answer>
			<difficulty></difficulty>
			<discriminability></discriminability>
			<author>David Stockburger</author>
			<date>03/05/2001</date>
			<concept>Discriminant Function Analysis</concept>
		</TestItem><P>
	<figure>
		<description> A form labeled Discriminant Function is shown.  To use this form, the user enters the means of the two groups, the prior probabilities, the mean square within from the ANOVA table, and a score. After clicking on the Compute button, the P(D/G) and P(G/D) is shown for both groups.  In the example, the means for the two groups are 13 and 17, the prior probabilities are .5 and .5, the score is equal to 16, and the mean square within is 4. The values of P(D/G) are .134 and .617 for the two groups and similar values for P(G/D) are .178 and .822.</description>
		<url> Images/Mlt0347.gif </url>
		<width>318</width>
		<height>189</height>
		<align></align>
		<caption> Using the Discriminant Function Calculator to understand how changes in prior probabilities affect P(G/D).</caption>
		<alt>Using the Discriminant Function Calculator to find P(D/G) and P(G/D).</alt>
	</figure>
</P> 
<P>The <index>sum of the posterior probabilities</index> for all groups will necessarily be one. In the previous example P(G<SUB>1</SUB>/D=16) + P(G<SUB>2</SUB>/D=16) = .178 + .822 = 1.0. Predicted group membership involves comparing posterior probabilities and selecting the group that has the largest value. It is possible for a score to be unlikely to belong to either group, yet have a high posterior probability of belonging to one group or the other, as can be seen below.</P>
		<TestItem type="MC">
			<question>The sum of the posterior probabilities for all groups will</question>
			<answer type="correct">equal one.</answer>
			<answer>equal zero.</answer>
			<answer>be different depending upon the prior probabilities.</answer>
			<answer>be different depending upon P(D/G).</answer>
			<difficulty></difficulty>
			<discriminability></discriminability>
			<author>David Stockburger</author>
			<date>03/05/2001</date>
			<concept>Discriminant Function Analysis</concept>
		</TestItem>
		<TestItem type="MC">
			<question>The sum of the prior probabilities for all groups will</question>
			<answer type="correct">equal one.</answer>
			<answer>equal zero.</answer>
			<answer>be different depending upon P(D/G).</answer>
			<answer>be proportional the sum of the variances.</answer>
			<difficulty></difficulty>
			<discriminability></discriminability>
			<author>David Stockburger</author>
			<date>03/05/2001</date>
			<concept>Discriminant Function Analysis</concept>
		</TestItem>
<P>
	<figure>
		<description> A form labeled Discriminant Function is shown.  To use this form, the user enters the means of the two groups, the prior probabilities, the mean square within from the ANOVA table, and a score. After clicking on the Compute button, the P(D/G) and P(G/D) is shown for both groups.  In the example, the means for the two groups are 15 and 17, the prior probabilities are .5 and .5, the score is equal to 21, and the mean square within is 4. The values of P(D/G) are .004 and .046 for the two groups and similar values for P(G/D) are .061 and .936.</description>
		<url> Images/Mlt0350.gif </url>
		<width>323</width>
		<height>194</height>
		<align></align>
		<caption> Using the Discriminant Function Calculator to view how very small values of P(D/G) may result in large values of P(G/D).</caption>
		<alt> Using the Discriminant Function Calculator to view how very small values of P(D/G) may result in large values of P(G/D).</alt>
	</figure>
</P> 
<P>Posterior probabilities are included in the <index>program to compute probabilities in discrimination function analysis</index>. Note that the default value for prior probabilities is .5 for each group. The SPSS discriminant function analysis program also defaults to equally likely priors and allows the user to optionally supply different prior probabilities for group membership.</P>
		<TestItem type="MC">
			<question>The default value for prior probabilities in SPSS Discriminant Function Analysis is</question>
			<answer type="correct">equally likely.</answer>
			<answer>proportional to group membership in the sample.</answer>
			<answer>proportional to group means.</answer>
			<answer>proportional to group variances.</answer>
			<difficulty></difficulty>
			<discriminability></discriminability>
			<author>David Stockburger</author>
			<date>03/05/2001</date>
			<concept>Discriminant Function Analysis</concept>
		</TestItem>
<P>The effect of setting different prior probabilities can be seen in the following examples.</P>
<P>
	<figure>
		<description> A form labeled Discriminant Function is shown.  To use this form, the user enters the means of the two groups, the prior probabilities, the mean square within from the ANOVA table, and a score. After clicking on the Compute button, the P(D/G) and P(G/D) is shown for both groups.  In the example, the means for the two groups are 15 and 17, the prior probabilities are .9 and .1, the score is equal to 21, and the mean square within is 4. The values of P(D/G) are .004 and .046 for the two groups and similar values for P(G/D) are .662 and .338.</description>
		<url> Images/mlt0360.gif </url>
		<width>343</width>
		<height>191</height>
		<align></align>
		<caption> Using the Discriminant Function Calculator to understand how changes in prior probabilities affect P(G/D).</caption>
		<alt> Using the Discriminant Function Calculator to understand how changes in prior probabilities affect P(G/D).</alt>
	</figure>
</P> 
<P>
	<figure>
		<description> A form labeled Discriminant Function is shown.  To use this form, the user enters the means of the two groups, the prior probabilities, the mean square within from the ANOVA table, and a score. After clicking on the Compute button, the P(D/G) and P(G/D) is shown for both groups.  In the example, the means for the two groups are 15 and 17, the prior probabilities are .1 and .9, the score is equal to 21, and the mean square within is 4. The values of P(D/G) are .004 and .046 for the two groups and similar values for P(G/D) are .024 and .976.</description>
		<url> Images/mlt0361.gif </url>
		<width>343</width>
		<height>199</height>
		<align></align>
		<caption> Using the Discriminant Function Calculator to understand how changes in prior probabilities affect P(G/D).</caption>
		<alt> Using the Discriminant Function Calculator to understand how changes in prior probabilities affect P(G/D).</alt>
	</figure>
</P> 
</section>
<section>
<P><h3>Running a Discriminant Function Analysis Using SPSS</h3></P>
<P>The following <index>animated illustration</index> has been prepared to show the steps necessary to run a discriminant function analysis using <index>SPSS</index>. The data file includes a dichotomous dependent variable labeled "Y" and an interval independent variable labeled "X". Some options have been selected to provide additional output.</P>
<activity URL=" images/spss-discrim.gif ">
	<description>Animated illustration of Discriminant Function Analysis using SPSS</description>
</activity>
<h3>Output From SPSS/WIN Discriminant Function Analysis Program</h3>
<P>Given the following raw data.</P>
<P>
	<figure>
		<description>A table of data for two groups is presented.</description>
		<url> Images/Mlt0344.gif </url>
		<width>527</width>
		<height>103</height>
		<align></align>
		<caption>Raw data for discriminant function analysis.</caption>
		<alt> Raw data for discriminant function analysis.</alt>
	</figure>
</P> 
	<TestItem type="MC">
		<question>The score value which most likely generated this table is </question>
		<figure>
			<description>A table of five columns with headers Group, Mean, probability of D given G, probability of G, and probability of G given D.  The values for row one are 1, 17, 0.617, .9, and 0.976.  The values for row two are 2, 13, 0.134, .1, and 0.024.</description>
			<url>Discrim1a.gif</url>
			<width>303</width>
			<height>48</height>
			<align></align>
			<caption></caption>
			<alt></alt>
		</figure>
		<answer type="incorrect">10</answer>
		<answer type="incorrect">12</answer>
		<answer type="incorrect">14</answer>
		<answer type="correct">16</answer>
		<difficulty></difficulty>
		<discriminability></discriminability>
		<author></author>
		<date></date>
		<concept></concept>
	</TestItem>
	<TestItem type="MC">
		<question>The prior probabilities of group membership in this table </question>
		<figure>
			<description>A table of five columns with headers Group, Mean, probability of D given G, probability of G, and probability of G given D.  The values for row one are 1, 17, 0.617, .9, and 0.976.  The values for row two are 2, 13, 0.134, .1, and 0.024.</description>
			<url>Discrim1a.gif</url>
			<width>303</width>
			<height>48</height>
			<align></align>
			<caption></caption>
			<alt></alt>
		</figure>
		<answer type="incorrect">were assumed to be equal</answer>
		<answer type="correct">favored Group 1</answer>
		<answer type="incorrect">are a function of P(D/G)</answer>
		<answer type="incorrect">cannot be determined</answer>
		<difficulty></difficulty>
		<discriminability></discriminability>
		<author></author>
		<date></date>
		<concept></concept>
	</TestItem>
	<TestItem type="MC">
		<question>An individual making the score which generated this table would  </question>
		<figure>
			<description>A table of five columns with headers Group, Mean, probability of D given G, probability of G, and probability of G given D.  The values for row one are 1, 17, 0.617, .9, and 0.976.  The values for row two are 2, 13, 0.134, .1, and 0.024.</description>
			<url>Discrim1a.gif</url>
			<width>303</width>
			<height>48</height>
			<align></align>
			<caption></caption>
			<alt></alt>
		</figure>
		<answer type="correct">would be classified as belonging to Group 1</answer>
		<answer type="incorrect">would be classified as belonging to Group 2</answer>
		<answer type="incorrect">would be equally likely to belong to Group 1 or Group 2</answer>
		<answer type="incorrect">cannot be determined</answer>
		<difficulty></difficulty>
		<discriminability></discriminability>
		<author></author>
		<date></date>
		<concept></concept>
	</TestItem>
<P>A discriminant function analysis was done using SPSS.  The <index>output from the discriminant function analysis program of SPSS</index> is not easy to read, nor is it particularly informative for the case of a single dichotomous dependent variable. One can only hope that future versions of this program will include improved output for this program. The output has been cut up and rearranged. It is highly recommended that the student have a copy of the complete output from this analysis in addition to this text in order to locate the portion of the output that is being discussed.</P>
		<TestItem type="MC">
			<question>Classification probabilities in SPSS discriminant function analysis output does not include</question>
			<answer type="correct">P(D/G) for the second highest group.</answer>
			<answer>P(D/G) for the highest group.</answer>
			<answer>P(G/D) for the second highest group.</answer>
			<answer>P(G/D) for the highest group.</answer>
			<difficulty></difficulty>
			<discriminability></discriminability>
			<author>David Stockburger</author>
			<date>03/05/2001</date>
			<concept>Discriminant Function Analysis</concept>
		</TestItem>
<P><index>Classification probabilities</index> for each score must be optionally requested and are presented in the following table. SPSS does not provide P(D/G) for the 2<SUP>nd</SUP> highest group.</P>
<P>
	<figure>
		<description>The output from the use of the SPSS discriminant function analysis is presented. It contains sixteen rows (one for each score) and eight columns. The eight columns are labeled: Case number, actual group, highest group, P(D/G) and P(G/D) for the highest group, P(G/D) for the second highest group, and Discrim Scores.</description>
		<url> Images/Mlt0333.gif </url>
		<width>440</width>
		<height>285</height>
		<align></align>
		<caption>Classification probabilities in SPSS output for Discriminant Function Analysis.</caption>
		<alt> Classification probabilities in SPSS output for Discriminant Function Analysis.</alt>
	</figure>
</P> 
<P>A rather crude frequency polygon is also provided given that it is optionally requested.</P>
<P>
	<figure>
		<description>A frequency polygon that is optionally provided as part of the SPSS output from the discriminant function analysis is illustrated.  It is a text graph with group membership illustrated by either a one or two as part of the graph.</description>
		<url> Images/Mlt0334.gif </url>
		<width>515</width>
		<height>310</height>
		<align></align>
		<caption>Frequency polygon of Canonical Discriminant Function from SPSS for a Dichotomous Variable.</caption>
		<alt> Frequency polygon of Canonical Discriminant Function from SPSS for a Dichotomous Variable.</alt>
	</figure>
</P>
<definition word="unstandardized canonical discriminant function coefficients">weights in a linear model used to combine variables or scores to predict group membership. These weights are optimized to maximally discriminate among groups.</definition> 
<P>The <index>unstandardized canonical discriminant function coefficients</index> are the regression weights for prediction of a <index>dichotomous dependent variable</index>. The following compares this portion of the output of the discriminant functional analysis program</P>
		<TestItem type="MC">
			<question>When using dichotomous groups and a single dependent variable, the unstandardized canonical discriminant function coefficients presented in the SPSS discriminant function analysis are </question>
			<answer type="correct"> the regression weights for prediction of a dichotomous dependent variable.</answer>
			<answer>standard scores in a standard normal distribution.</answer>
			<answer>estimates of the standard error of estimate using simple linear regression.</answer>
			<answer>correlation coefficients predicting group membership.</answer>
			<difficulty></difficulty>
			<discriminability></discriminability>
			<author>David Stockburger</author>
			<date>03/05/2001</date>
			<concept>Discriminant Function Analysis</concept>
		</TestItem>
<P>
	<figure>
		<description>A table labeled unstandardized canonical discriminant function coefficients is shown.  It contains the value .379 for X and -4.55 for the constant term.</description>
		<url> Images/Mlt0332.gif </url>
		<width>418</width>
		<height>115</height>
		<align></align>
		<caption>Unstandardized canonical discriminant function coefficients.</caption>
		<alt> Unstandardized canonical discriminant function coefficients </alt>
	</figure>
</P> 
<P>with a similar portion of the output from the regression package where the Y variable has been recoded to values of 7 and -9.</P>
<P>
	<figure>
		<description>A table of regression coefficients from a regression analysis of the example discriminant function analysis data is illustrated.  The unstandardized coefficients are -16.5 for the constant term and 1.375 for the X variable.  The standardized coefficient is .49 for the X variable.  The Sig. term for the X variable is .054.</description>
		<url> Images/Mlt0342.gif </url>
		<width>492</width>
		<height>184</height>
		<align></align>
		<caption>Regression coefficients from regression analysis of discriminant function example data.</caption>
		<alt> Regression coefficients from regression analysis of discriminant function example data </alt>
	</figure>
</P> 
<P>The regression weights are a multiple of each other. For example the constant terms in the two equations are related by the constant value of -16.50/-4.55=3.63. Likewise, the slopes of the two equations are related by the same constant value of 1.375/.379 =3.63. A proper rescaling of the Y variable would result in equivalent equations <ref>(Flury and Riedwyl, 1988)</ref>.</P>
		<ReferenceSource type="book">
			<title></title>
			<ISBN></ISBN>
			<edition></edition>
			<price type="hard/soft/web"></price>
			<refs> Flury and Riedwyl, 1988</refs>
			<URL></URL>
			<pubdate type="year/other"></pubdate>
			<author vCard=""><first></first><last></last></author>
			<author vCard=""><first></first><last></last></author>
			<location></location>
			<publisher vCard=""></publisher>
			<booknote date=""></booknote>
			<quote page=""></quote>
		</ReferenceSource> 
<definition word="Wilks Lamba">a measure of relationship between group membership and interval variables.</definition> 
<P>Since there is only one independent variable in the prediction equation, the <index>canonical correlation coefficient</index> is equal to the correlation coefficient (.49). Note also that the significance level of Wilks' Lamba (.054) is the same as the significance level for the t-test for the b<SUB>1</SUB> value in the table above. The following is from the discriminant function analysis program.</P>
		<TestItem type="MC">
			<question> When using dichotomous groups and a single dependent variable, the  canonical correlation coefficient presented in the SPSS discriminant function analysis </question>
			<answer type="correct">is equal to the correlation between the dependent and independent variables.</answer>
			<answer>will be equal to one if group membership is perfectly predicted.</answer>
			<answer>can never be less than .273.</answer>
			<answer>will increase as sample size increases.</answer>
			<difficulty></difficulty>
			<discriminability></discriminability>
			<author>David Stockburger</author>
			<date>03/05/2001</date>
			<concept>Discriminant Function Analysis</concept>
		</TestItem>
<P>
	<figure>
		<description>The canonical discriminant functions from SPSS output is shown.  It consists of two separate parts. The first, on the left-hand side, contains a single function with an eigenvalue of .3159, a percent of variance of 100, a cumulative percent of variance of 100, and a canonical correlation of .49.  The second part contains a significance test of the first canonical function.  It has a Wilks lambda value of .7599 and a significance level of .054.</description>
		<url> Images/Mlt0331.gif </url>
		<width>548</width>
		<height>152</height>
		<align></align>
		<caption>Canonical discriminant functions from SPSS output.</caption>
		<alt> Canonical discriminant functions from SPSS output </alt>
	</figure>
</P> 
<P>The following is from the regression program.</P>
<P>
	<figure>
		<description>A multiple R of .49 is obtained when predicting group membership using a linear regression analysis for the example data.</description>
		<url> Images/Mlt0343.gif </url>
		<width>488</width>
		<height>214</height>
		<align></align>
		<caption>Model summary for regression analysis of example data.</caption>
		<alt> Model summary for regression analysis of example data </alt>
	</figure>
</P> 
<P>A table of actual and predicted group membership may also be optionally requested.</P>
<P>
	<figure>
		<description>Actual and predicted group membership from discriminant analysis program in SPSS.  Group 6 of the 9 members of group 1 were correctly predicted as were 4 of the seven members of group 2. In both groups, 62.5 percent of the members were correctly predicted using the discriminant function analysis. </description>
		<url> Images/Mlt0336.gif </url>
		<width>412</width>
		<height>203</height>
		<align></align>
		<caption> Actual and predicted group membership from discriminant analysis program in SPSS.</caption>
		<alt> Actual and predicted group membership from discriminant analysis program in SPSS </alt>
	</figure>
</P> 
		<TestItem type="MC">
			<question>Using the SPSS discriminant function analysis program, accuracy in prediction can be best assessed using</question>
			<answer type="correct">percentage of correctly predicted group membership.</answer>
			<answer>canonical discriminant function coefficient.</answer>
			<answer>eigenvalue of the highest discriminant function.</answer>
			<answer>canonical correlation coefficient.</answer>
			<difficulty></difficulty>
			<discriminability></discriminability>
			<author>David Stockburger</author>
			<date>03/05/2001</date>
			<concept>Discriminant Function Analysis</concept>
		</TestItem>
<P>It can be seen that the procedure is slightly more accurate (66.7%) in predicting group membership in group 0 than in group 2 (57.1%).</P>
</section>
<section>
<P><h3>An Alternative Method of Computing Probability of Group Membership</h3></P>
<P>For reasons that I have not been able to fully comprehend, <index>SPSS</index> computes <index>probability of group membership</index> as the relative height of the normal curve at a given point (<ref>Tatsuoka, 1971</ref>).  The following figure gives an example of the calculation of probabilities based on the height of the normal curve.</P>
		<ReferenceSource type="book">
			<title></title>
			<ISBN></ISBN>
			<edition></edition>
			<price type="hard/soft/web"></price>
			<refs> Tatsuoka, 1971</refs>
			<URL></URL>
			<pubdate type="year/other"></pubdate>
			<author vCard=""><first></first><last></last></author>
			<location></location>
			<publisher vCard=""></publisher>
			<booknote date=""></booknote>
			<quote page=""></quote>
		</ReferenceSource> 
<P>
	<figure>
		<description>Two overlapping normal curves illustrate the computation of probabilities as relative heights of normal curves rather than areas.  In each curve, at a given point on the x-axis, one curve is illustrated with a red line and the other by a green line. The probabilities of each can be calculated by dividing the length of each line by the total length of the two lines summed.</description>
		<url> Images/mlt0359.gif </url>
		<width>467</width>
		<height>163</height>
		<align></align>
		<caption>P(G/D) computed as a relative height of a normal curve.</caption>
		<alt> P(G/D) computed as a relative height of a normal curve </alt>
	</figure>
</P> 
<P>The following program will generate probabilities for discriminant function analysis based on the relative height of the normal curve at any point.  This program should be used to match the probabilities generated using SPSS.</P>
<activity URL=" exercises/discrima.htm ">
	<description> Discriminant Analysis Probabilities Program using SPSS computational method </description>
</activity> 
</section>
<section>
<activity URL=" exercises/discrima.htm ">
	<description> Discriminant Analysis Probabilities Program using SPSS computational method </description>
</activity> 
<DataFile type="SPSS">
	<url>data/Mlt03.sav</url>
	<description>Data file for the example discriminant function analysis homework assignment.</description>
</DataFile>
<P><h3>Discriminant Function Homework Assignment</h3></P>
<P>This section will guide the reader through the discriminant function analysis homework assignment. It will demonstrate the use of both the Discriminant Function Probabilities program and the SPSS discriminant function analysis option. The exercise consists of two parts: the raw data and classification section appearing below</P>
<P>
	<figure>
		<description>Example 1 of the Discriminant Analysis homework assignment is illustrated. It contains a table with seven columns: X, Y, Predicted Group, P of D given group one, P of group one given D, P of D given group 2 and P of group two given D.  The first two columns contain numbers, the X variable ranging from 11 to 18 in ascending order and the Y variable is either a zero or a one.  The first and the last rows of the table contain a form for the student to enter numbers when submitting for grading.</description>
		<url> Images/mlt0363.gif </url>
		<width>523</width>
		<height>565</height>
		<align></align>
		<caption>Example Discriminant Function Assignment - Part I: Classification</caption>
		<alt> Example Discriminant Function Assignment </alt>
	</figure>
</P>
<P>and a statistics section, shown as follows.</P>
<P>
	<figure>
		<description>The second part of the discriminant function homework assignment is illustrated.  It contains two tables.  The first is a table of sample size, means and standard deviations of the X variable for the two levels of the Y variable. The second is an ANOVA table with columns for degrees of freedom, mean squares, F ratio, and significance level.</description>
		<url> Images/mlt0364.gif </url>
		<width>390</width>
		<height>252</height>
		<align></align>
		<caption> Example Discriminant Function Assignment - Part II: Statistics </caption>
		<alt> Example Discriminant Function Assignment - Part II: Statistics </alt>
	</figure>
</P>
<P>The first step is the creation of a data file in SPSS. The data editor for the example data appears below.</P>
<P>
	<figure>
		<description>This illustration of an SPSS data file of example discriminant function analysis homework assignment shows a data file of twenty rows and two columns, labeled X and Y. The numbers in the data file correspond to the data in the example assignment.</description>
		<url> Images/mlt0367.gif </url>
		<width>205</width>
		<height>412</height>
		<align></align>
		<caption> SPSS data file of example discriminant function analysis homework assignment.</caption>
		<alt>SPSS data file of example discriminant function analysis homework assignment.</alt>
	</figure>
</P>
<P>After creating the data file, the first step in the analysis procedure is finding the means and variances of the two groups and performing an ANOVA on the results. This can be done with a statistical calculator or using the SPSS <SPSSCommand>Analyze/Compare Means/Means</SPSSCommand> command. The screen snapshot below shows the interface for the Means command and the example data. Note that an "ANOVA Table and Eta" has been optionally requested.</P>
<P>
	<figure>
		<description> Finding a means for the discriminant analysis homework example using SPSS. In this example the Analyze/Compare Means/Means options have been selected from the SPSS toolbar.  The ANOVA table and Eta option has been selected using the Options button on the Means command and checking the appropriate box.</description>
		<url> Images/mlt0368.gif </url>
		<width>691</width>
		<height>386</height>
		<align></align>
		<caption>Finding a means for the discriminant analysis homework example using SPSS.</caption>
		<alt> Finding a means for the discriminant analysis homework example using SPSS.</alt>
	</figure>
</P>
<P>The output resulting from running this command is shown below.</P>
<P>
	<figure>
		<description>The SPSS output for the Analyze/Means/Compare Means command for the example discriminant analysis homework assignment is illustrated.  Two tables are shown, a table of means and an ANOVA summary table.  The table of means has two means, 12.88 and 15, two Ns, 8 and 12, and two standard deviations, 1.46 and 1.65, for the two values of Y, 0 and 1. The ANOVA table has a mean square within value of 2.493.</description>
		<url> Images/mlt0369.gif </url>
		<width>643</width>
		<height>226</height>
		<align></align>
		<caption> SPSS output for the Analyze/Means/Compare Means command for the example discriminant analysis homework assignment. </caption>
		<alt> SPSS output for the Analyze/Means/Compare Means command for the example discriminant analysis homework assignment </alt>
	</figure>
</P>
<P>This output can be copied directly into the statistical section of the homework assignment and is necessary before the classification section can be completed. The completed statistical section is shown below.</P>
<P>
	<figure>
		<description> The second part of the discriminant function homework assignment is illustrated.  It contains two tables.  The first is a table of sample size, means and standard deviations of the X variable for the two levels of the Y variable. In this example, the sample sizes are 8 and 12, the means are 12.88 and 16, and the standard deviations are 1.46 and 1.65 for the two groups.  The second table is an ANOVA table with columns for degrees of freedom, mean squares, F ratio, and significance level.  The within groups row has a mean square value of 2.49.</description>
		<url> Images/mlt0366.gif </url>
		<width>390</width>
		<height>262</height>
		<align></align>
		<caption> Answer Key for Example Discriminant Function Assignment - Part II: Statistics</caption>
		<alt> Answer Key for Example Discriminant Function Assignment - Part II: Statistics </alt>
	</figure>
</P>
<P>At this point the normal curve option of the Probability Calculator could be used to find P(D/G) for each score for both groups and then Bayes theorem could be used to find P(G/D) for each group. Far easier is the use of the Discriminant Function Probabilities program supplied with this text. The probabilities for this assignment are based on a direct application of Bayes theorem and not the SPSS probabilities.  Do not choose the SPSS probabilities option. After calling the program, enter the two means and the mean square within found in the previous step into the appropriate text boxes in the program interface. This needs to be done only once. Following this, enter each different score into the Score text box and click on the Compute button. The results for the first score of 11 should appear as follows.</P>
<P>
	<figure>
		<description> Finding probabilities for the first row of the example discriminant function analysis homework assignment using the discriminant function probabilities calculator is illustrated. In this example, a score of 11 is entered in the score text box, means of 12.88 and 16 in the boxes for the means of groups 1 and 2, respectively, and 2.493 for the mean square within.  Following that, clicking on the Compute button results in P(D/G) of .234 and .002 and P(G/D) of .987 and .013 for the two groups.</description>
		<url> Images/mlt0370.gif </url>
		<width>445</width>
		<height>184</height>
		<align></align>
		<caption>Finding probabilities for the first row of the example discriminant function analysis homework assignment using the discriminant function probabilities calculator.</caption>
		<alt> Finding probabilities for the first row of the example discriminant function analysis homework assignment using the discriminant function probabilities calculator.</alt>
	</figure>
</P>
<P>The results of the final score of 18 appear as follows.</P>
<P>
	<figure>
		<description>> Finding probabilities for the last row of the example discriminant function analysis homework assignment using the discriminant function probabilities calculator. In this example, a score of 18 is entered in the score text box, means of 12.88 and 16 in the boxes for the means of groups 1 and 2, respectively, and 2.493 for the mean square within.  Following that, clicking on the Compute button results in P(D/G) of .001 and .205 and P(G/D) of .011 and .989 for the two groups.</description>
		<url> Images/mlt0371.gif </url>
		<width>442</width>
		<height>176</height>
		<align></align>
		<caption> Finding probabilities for the last row of the example discriminant function analysis homework assignment using the discriminant function probabilities calculator.</caption>
		<alt>> Finding probabilities for the last row of the example discriminant function analysis homework assignment using the discriminant function probabilities calculator.</alt>
	</figure>
</P>
<P>The probabilities found using the Discriminant Function Probabilities can be entered directly on the assignment. The predicted group will be found by comparing the P(G/D) for both groups. The group with the higher P(G/D) is the predicted group. At some point in the data, the predicted group will switch from group 0 to group 1. In the example data below, a score of 14 is the cutoff for predicted membership in group 0.</P>
<P>
	<figure>
		<description>Example 1 of the answer key for the Discriminant Analysis homework assignment is illustrated. It contains a table with seven columns: X, Y, Predicted Group, P of D given group one, P of group one given D, P of D given group 2 and P of group two given D.  The first two columns contain numbers, the X variable ranging from 11 to 18 in ascending order and the Y variable is either a zero or a one.  The first and the last rows of the table contain a form for the student to enter numbers when submitting for grading. In this illustration, the table is filled with numbers.</description>
		<url> Images/mlt0365.gif </url>
		<width>465</width>
		<height>555</height>
		<align></align>
		<caption>Answer Key for Example Discriminant Function Assignment - Part I: Classification </caption>
		<alt> Answer Key for Example Discriminant Function Assignment - Part I: Classification </alt>
	</figure>
</P>
<P>The results could be submitted for grading at this point, but a few minutes spent submitting the data to the Discriminant Function Analysis program in SPSS will allow you to verify your results and better understand the SPSS output. Use the <SPSSCommand>Analyze/Classify/Discriminant Analysis</SPSSCommand> command with the example data. The X variable is selected as the independent variable and Y is the grouping variable. You must tell this SPSS command how you coded your groups by clicking on the Define Range button. In this case the groups were coded as 0 and 1, so these values are entered in the corresponding text boxes. The interface and example data analysis is shown below.</P>
<P>
	<figure>
		<description>The user interface for the SPSS discriminant function analysis command is illustrated.  In this example, the variable X has been selected as the independent variable and Y for the grouping variable.  The addition box showing the interface when the Define Range button is selected also appears.  In this example, the minimum value is 0 and the maximum value is 1. Following this entry, the Continue, Classify, and OK buttons are clicked to continue the analysis.</description>
		<url> Images/mlt0372.gif </url>
		<width>505</width>
		<height>283</height>
		<align></align>
		<caption>Performing a discriminant function analysis using SPSS.</caption>
		<alt> Performing a discriminant function analysis using SPSS.</alt>
	</figure>
</P>
<P>You need to optionally request some additional output. Click on the Classify button and select casewise results, summary table, and territorial map as options in the resulting screen.  The selection of the options for this command is illustrated below.</P>
<P>
	<figure>
		<description>The user interface in SPSS showing the classification options in the discriminant function analysis program is illustrated. The casewise results and summary table options are selected under the Display grouping, while the territorial map is selected as a plot option.</description>
		<url> Images/mlt0373.gif </url>
		<width>460</width>
		<height>250</height>
		<align></align>
		<caption>Selecting the Classify options in the discriminant function analysis command in SPSS.</caption>
		<alt> Selecting the Classify options in the discriminant function analysis command in SPSS.</alt>
	</figure>
</P>
<P>The optional casewise results output from the SPSS discriminant function analysis program is presented below. Note that the P(D/G) matches the P(D/G) probabilities on the assignment except that after the tenth score (the red line on the figure) the results refer to different groups. The P(G/D) is presented for both groups in the output, but it will differ from that on the assignment because it is computed using a different method, as discussed earlier in this text. </P>
<P>
	<figure>
		<description>The SPSS output of casewise results of the discriminant function analysis program for the example data is illustrated. A fairly complicated table with eleven columns is shown. The twenty rows of the table correspond to the twenty rows in the data matrix. Two major headings, titled highest group and second highest group appear. Under both major heading, a column labeled P of G given D appears. In the example output, the first ten rows, corresponding to a predicted group of 0, under the highest group are marked in purple while the second ten rows of the same column are marked in green. The opposite color scheme, green then purple, appears under the same column of the second highest group.</description>
		<url> Images/mlt0374.gif </url>
		<width>712</width>
		<height>564</height>
		<align></align>
		<caption>SPSS output of casewise results of the discriminant function analysis program for the example data.</caption>
		<alt> SPSS output of casewise results of the discriminant function analysis program for the example data.</alt>
	</figure>
</P>
<P>The results of the optional summary classification table is presented below. Note that the 80% of the total sample was correctly classified.</P>
<P>
	<figure>
		<description>The SPSS output of summary classification results of the discriminant function analysis program for the example data is illustrated.  In this example, 80%, or sixteen of the twenty scores were correctly classified. Of the eight individuals belonging to group one, seven were correctly classified. Of the twelve individuals belonging to group 2, nine were correctly classified.</description>
		<url> Images/mlt0375.gif </url>
		<width>398</width>
		<height>192</height>
		<align></align>
		<caption>SPSS output of summary classification results of the discriminant function analysis program for the example data </caption>
		<alt> SPSS output of summary classification results of the discriminant function analysis program for the example data.</alt>
	</figure>
</P>
</section>
<section>
<DataFile type="SPSS">
	<url>data/Mlt03a.sav</url>
	<description>Data file for the example discriminant function analysis homework assignment.</description>
</DataFile>
<P><h3>Discriminant Analysis with More than Two Groups</h3></P>
<P>The extension of discriminant function analysis to situations where there are three or more groups is straightforward. Probabilities of the observed score given group membership are computed in a manner similar to the case of dichotomous groups. The only difference is that there are more probabilities to compute. The following illustrates overlapping probability models for five groups.</P>
<P>
	<figure>
		<description>Five overlapping normal curves are drawn on a single axis.</description>
		<url> Images/Mlt0356.gif </url>
		<width>576</width>
		<height>171</height>
		<align></align>
		<caption>Discriminant function analysis with five groups and a single variable.</caption>
		<alt> Discriminant function analysis with five groups and a single variable.</alt>
	</figure>
</P> 
<P>In this case some groups could be discriminated more easily than others. For example, group 1 differs from the others, while groups 2 and 3 could be discriminated from groups 4 and 5. It would be difficult to discriminate membership in group 2 from group 3 using this single variable.</P>
</section>
<section>
<P><h3>An Example Discriminant Function Analysis with Three Groups and Five Variables</h3></P>
<P>A real-life application of discriminant function analysis will now be presented to illustrate the potential usefulness of this technique.</P>
<P>In the criminal justice system in the United States defense attorneys often attempt to have their clients declared incompetent to stand trial. Being incompetent to stand trial basically means the person being accused has little knowledge of the criminal justice procedure and is incapable of assisting in his or her own defense. Being declared incompetent to stand trial does not absolve someone of criminal activity, but it can often delay the trial. Because the longer a case takes to go to trial, the more likely witnesses are to disappear, the less the publicity around the crime, or the less likely the government is to prosecute, criminals sometimes wish to be found incompetent to stand trial when if fact they are not. To achieve this end they malinger, or fake, their answers on psychological tests. One of the functions of Psychologists in the prison system in the United States is to testify as to the competence to stand trial of various accused persons. They are asked to give an opinion as to whether the accused is truly incompetent or malingering.</P>
<definition word="malingering">faking, usually pretending to be worse off than one really is.</definition> 
<P>In the example study <ref>????(2001)</ref>, one hundred and twenty prisoners completed a psychological inventory called the MMPI, or Minnesota Multiphasic Personality Inventory. This inventory consists of 571 true/false questions such as "I often bite my nails." Forty of prisoners were told to malinger and pretend they were incompetent to stand trial, forty were truly incompetent, and forty were competent, but psychiatric, patients. The purpose of the study was to create a new scale that could differentiate between the three groups to assist the Psychologist in testifying as to the true competence to stand trial of the person being accused.</P>
<P>For demonstration purposes, the data set has been reduced to five items on the MMPI. These items have been coded as 1=false and 2=true in the example data set. A portion of the data editor file is shown below.</P>
<P>
	<figure>
		<description>An example SPSS data file for discriminant analysis with three groups and five variables is shown. Subjects 33 through 47 are shown, each with seven variables.  The variables, arranged in columns, include subject, group, item147, item149, item170, item175, and item325. There is no missing data and the scores for the items are either ones or twos.</description>
		<url> Images/mlt0376.gif </url>
		<width>435</width>
		<height>409</height>
		<align></align>
		<caption>Example SPSS data file for discriminant analysis with three groups and five variables.</caption>
		<alt> Example SPSS data file for discriminant analysis with three groups and five variables.</alt>
	</figure>
</P>
<P>The Group variable is coded with the following value labels.</P>
<P>
	<figure>
		<description>The value labels for groups in example SPSS data file for discriminant analysis with three groups and five variables is shown. For the three groups, group one is called malingering neuro, group two is called non-malingering neuro, and group three is labeled psychiatric.</description>
		<url> Images/mlt0377.gif </url>
		<width>369</width>
		<height>193</height>
		<align></align>
		<caption>Value labels for groups in example SPSS data file for discriminant analysis with three groups and five variables.</caption>
		<alt> Value labels for groups in example SPSS data file for discriminant analysis with three groups and five variables </alt>
	</figure>
</P>
<P>Using the example data, a discriminant function analysis was performed using the SPSS <SPSSCommand>Analyze/Classify/Discriminant Function</SPSSCommand> command sequence. The five items were used as independent variables and the group variable was selected as the grouping variable. A minimum value of 1 and a maximum value of 3 were selected as the range of the grouping variable. Optional classification results and plots were requested.
</P>
<P>The procedure will return more than one discriminant function when there are more than two groups and at least two independent variables. The number of possible discriminant functions can be computed by finding the smaller of one minus the number of groups or the number of independent variables. In this case one minus the number of groups is two and the number of independent variables is five, so there can be two discriminant functions.
</P>
<definition word="eigenvalues">combined variance.</definition> 
<P>Not all the discriminant functions will be equally useful in discriminating between groups and the next step in the analysis is often to decide how many to use. This decision is aided by the following table of eigenvalues. The term eigenvalue will be discussed in detail in a later chapter. For the purposes of this discussion, examination of the relative percent of variance accounted for by each discriminate function will provide useful information. Here it can be seen that the first discriminant function accounts for about 76 percent of the variance, or about three times the percent of variance of the second discriminant function, or 24 percent. Although both functions will be examined in the following discussion, only the first is likely to discriminate between the groups with any accuracy.
</P>
<P>
	<figure>
		<description>The graph shows a table of eigenvalues from the example discriminant function analysis in SPSS. There are two discriminant functions, the first with an eigenvalue of .662 accounting for 76 percent of the variance. The second eigenvalue was .208, accounting for 24 percent of the variance.</description>
		<url> Images/mlt0388.gif </url>
		<width>440</width>
		<height>150</height>
		<align></align>
		<caption>Eigenvalues for the example discriminant function analysis.</caption>
		<alt> Eigenvalues for the example discriminant function analysis </alt>
	</figure>
</P>
<P>The next table of interest in the discriminant function output of SPSS is the table of standardized discriminate function coefficients presented below.
</P>
<P>
	<figure>
		<description>A table labeled standardized canonical discriminate function coefficients is shown. There are three columns in the table. The first column lists the five independent variable, items 147, 149, 170, 175 and 325. The second lists the coefficients for the first discriminant function; .716, .433, -.147, .475, and -.054.  The third column lists ht coefficients for the second discriminant function; -.347, -.414, .676, .269, and .715.  </description>
		<url> Images/mlt0387.gif </url>
		<width>369</width>
		<height>171</height>
		<align></align>
		<caption>Standardized canonical discriminate function coefficients for SPSS data file for discriminant analysis with three groups and five variables.</caption>
		<alt> Standardized canonical discriminate function coefficients for SPSS data file for discriminant analysis with three groups and five variables.</alt>
	</figure>
</P>
<P>These coefficients allow the computation of new variables that can be used to discriminate between groups in the same manner as the single variable in the earlier presentation. Because the new variable is standardized, the first step in computation is to convert all the dependent variables to standard scores. This can be easily accomplished by the <SPSSCommand>Analyze/Descriptives/Descriptive Statistics</SPSSCommand> command and selecting the save standard scores option.
</P>
<P>The second step is the multiplication of the standardized canonical discriminate function coefficients times each of the standard scores and then summing the results. For example, finding the first discriminate function for subject 33 could be done as follows:
</P>
<P>(.716*1.198) + (.433*.856) + (-.147*.799) + (.475*.932) + (-.054*.165) = 1.46</P>
<P>The computation is done using the Compute command in SPSS with two new variables added to the data file named df1z and df2z. The resulting data file is presented in the following figure.</P>
<P>
	<figure>
		<description>A portion of SPSS data file showing the standardized item variables and the computed discriminant functions is shown. Shown are subjects 33 through 39. The first subject shown, number 33, illustrates the computation of the first discriminant function just described in the text as (.716*1.198) + (.433*.856) + (-.147*.799) + (.475*.932) + (-.054*.165) = 1.46. The value for the first discriminate function for this subject is 1.46.</description>
		<url> Images/mlt0379.gif </url>
		<width>562</width>
		<height>157</height>
		<align></align>
		<caption>Portion of SPSS data file showing standardized variables and computed discriminant functions.</caption>
		<alt> Portion of SPSS data file showing standardized variables and computed discriminant functions </alt>
	</figure>
</P>
<P>The resulting discriminate function can be plotted in a scatter plot as an option in the SPSS discriminate function analysis program to give a visual representation of the results. In the following the large red dots indicate the centroids, or means, of each of the three groups on the computed discriminate function variables. Group membership is indicated by the use of different markers and colors for each subject. The scatter plot is presented below.
</P>
<definition word="centroid">mean</definition> 
<P>
	<figure>
		<description>A graph of the canonical discriminate functions for SPSS data file for discriminant analysis with three groups and five variables is illustrated. It is a scatter plot of scores. The centroids, or means, of the discriminant functions are shown with large red dots and labeled with the group name. The centroids form a triangle with the malingering and psychiatric groups on the bottom vortex and the non-malingering group at the top vortex. The computed discriminant functions for each subject is plotted on the scatter plot using a different color and marker style to indicate original group membership.</description>
		<url> Images/mlt0378.gif </url>
		<width>458</width>
		<height>378</height>
		<align></align>
		<caption>Graph of canonical discriminate functions for SPSS data file for discriminant analysis with three groups and five variables.</caption>
		<alt> Graph of canonical discriminate functions for SPSS data file for discriminant analysis with three groups and five variables </alt>
	</figure>
</P>
<P>By projecting the points onto the x-axis you can get a pretty good idea of how the first discriminate function works to discriminate between groups. The malingering groups scores the lowest, the non-malingering group is in the middle, and the psychiatric group scores the highest on this variable.
</P>
<P>In a similar manner, the discrimination of the second discriminate function can be seen by projecting the points onto the y-axis. In this function the non-malingering group can be distinguished from the malingering and psychiatric groups, but the latter two groups may not be differentiated. The centroids of the latter two groups project to pretty much the same value on the y-axis.
</P>
<P>Each of the computed discriminate function can be used individually or in combination to compute the P(D/G) for each subject and each group. The discriminate function probability calculator cannot be used because it was designed for use with only two groups, but in concept the methods are identical. After the values of P(D/G) are found, the values of P(G/D) may be found using Bayes theorem. Because SPSS finds these values automatically as part of the output, there is little reason to do so by hand.
</P>
<P>The summary table of classification by the original discriminant functions is part of the optional SPSS output of the discriminant function program and is presented below for the example data.
</P>
<P>
	<figure>
		<description> The classification results for original discriminant functions are shown. Of the malingering group, 27 were correctly identified, 8 were classified as non-malingering, and 5 were classified as psychiatric.  Of the non-malingering group, 20 were correctly classified, 11 were classified as malingering and 9 were classified as psychiatric.  Of the psychiatric group, 31 were correctly classified, 4 were classified as malingering and 5 were classified as non-malingering. The overall correct classification percentage was 65.</description>
		<url> Images/mlt0389.gif </url>
		<width>603</width>
		<height>226</height>
		<align></align>
		<caption>Classification results for original discriminant functions.</caption>
		<alt> Classification results for original discriminant functions </alt>
	</figure>
</P>
<P>
The overall correct classification rate was 65 percent. Psychiatric patients were most likely to be correctly classified (77.5%), while non-malingering patients were least likely to be correctly classified (50%). Depending upon the cost of each type of misclassification error, the statistician might want to adjust the classification cutoff points.
</P>
<P>Implementing standardized scores and standardized canonical discriminant function weights can be cumbersome. Most often the statistician would use the results of the discriminant function analysis to construct a new scale, generally ignoring item weights. For example, the first discriminant function, has three items with weights with a fairly high absolute value, items 147 (.716), 149 (.433), and 175 (.475). The items were first recoded to a value of 0=false and 1=true and then added together using the compute statement in the following illustration.
</P>
		<TestItem type="MC">
			<question>In the following SPSS output, a discriminant function analysis predicting program of student from movie preferences, discriminant function one best discriminated between</question>
				<figure>
					<description>A scatter plot of two discriminant functions predicting program of study from movie preferences. Below the scatter plot a table of standardized canonical discriminant function coefficients is presented. The values for discriminant function one are .231 for Sixth Sense, -.670 for Pinocchio, .790 for Nutty Professor, 1.043 for Thelma and Louise, and .062 for Little Big Man. Similar values for discriminant function two were .290, -.015, .153, .186, and 1.016.</description>
					<url>discrim.gif</url>
					<width>474</width>
					<height>516</height>
					<align></align>
					<caption>Discriminant functions for test items.</caption>
					<alt> Discriminant functions for test items.</alt>
				</figure>
			<answer type="correct">clinical and other.</answer>
			<answer>IO and other.</answer>
			<answer>IO and clinical.</answer>
			<answer>all three groups.</answer>
			<difficulty></difficulty>
			<discriminability></discriminability>
			<author>David Stockburger</author>
			<date>03/05/2001</date>
			<concept>Discriminant Function Analysis</concept>
		</TestItem>
		<TestItem type="MC">
			<question>In the following SPSS output, a discriminant function analysis predicting program of student from movie preferences, discriminant function two best discriminated between</question>
				<figure>
					<description>A scatter plot of two discriminant functions predicting program of study from movie preferences. Below the scatter plot a table of standardized canonical discriminant function coefficients is presented. The values for discriminant function one are .231 for Sixth Sense, -.670 for Pinocchio, .790 for Nutty Professor, 1.043 for Thelma and Louise, and .062 for Little Big Man. Similar values for discriminant function two were .290, -.015, .153, .186, and 1.016.</description>
					<url>discrim.gif</url>
					<width>474</width>
					<height>516</height>
					<align></align>
					<caption>Discriminant functions for test items.</caption>
					<alt> Discriminant functions for test items.</alt>
				</figure>
			<answer>clinical and other.</answer>
			<answer>IO and other.</answer>
			<answer type="correct">IO and clinical.</answer>
			<answer>all three groups.</answer>
			<difficulty></difficulty>
			<discriminability></discriminability>
			<author>David Stockburger</author>
			<date>03/05/2001</date>
			<concept>Discriminant Function Analysis</concept>
		</TestItem>
		<TestItem type="MC">
			<question>In the following SPSS output, a discriminant function analysis predicting program of student from movie preferences, discriminant function two worst discriminated between</question>
				<figure>
					<description>A scatter plot of two discriminant functions predicting program of study from movie preferences. Below the scatter plot a table of standardized canonical discriminant function coefficients is presented. The values for discriminant function one are .231 for Sixth Sense, -.670 for Pinocchio, .790 for Nutty Professor, 1.043 for Thelma and Louise, and .062 for Little Big Man. Similar values for discriminant function two were .290, -.015, .153, .186, and 1.016.</description>
					<url>discrim.gif</url>
					<width>474</width>
					<height>516</height>
					<align></align>
					<caption>Discriminant functions for test items.</caption>
					<alt> Discriminant functions for test items.</alt>
				</figure>
			<answer type="correct">clinical and other.</answer>
			<answer>IO and other.</answer>
			<answer>IO and clinical.</answer>
			<answer>all three groups.</answer>
			<difficulty></difficulty>
			<discriminability></discriminability>
			<author>David Stockburger</author>
			<date>03/05/2001</date>
			<concept>Discriminant Function Analysis</concept>
		</TestItem>
		<TestItem type="MC">
			<question> In the following SPSS output, a discriminant function analysis predicting program of student from movie preferences, suppose a student had standard scores of  0.0  for Sixth Sense, -1.5 for Pinocchio, 1.3 for Nutty Professor, 1.1 for Thelma and Louise, and 0.0 for Little Big Man. This student would most likely be classified as belonging to what program.</question>
				<figure>
					<description>A scatter plot of two discriminant functions predicting program of study from movie preferences. Below the scatter plot a table of standardized canonical discriminant function coefficients is presented. The values for discriminant function one are .231 for Sixth Sense, -.670 for Pinocchio, .790 for Nutty Professor, 1.043 for Thelma and Louise, and .062 for Little Big Man. Similar values for discriminant function two were .290, -.015, .153, .186, and 1.016.</description>
					<url>discrim.gif</url>
					<width>474</width>
					<height>516</height>
					<align></align>
					<caption>Discriminant functions for test items.</caption>
					<alt> Discriminant functions for test items.</alt>
				</figure>
			<answer type="correct">clinical</answer>
			<answer>IO</answer>
			<answer>other</answer>
			<answer>no prediction is possible</answer>
			<difficulty></difficulty>
			<discriminability></discriminability>
			<author>David Stockburger</author>
			<date>03/05/2001</date>
			<concept>Discriminant Function Analysis</concept>
		</TestItem>
		<TestItem type="MC">
			<question> In the following SPSS output, a discriminant function analysis predicting program of student from movie preferences, suppose a student had standard scores of  0.0  for Sixth Sense, 1.0 for Pinocchio, 0.3 for Nutty Professor, -1.1 for Thelma and Louise, and 0.0 for Little Big Man. This student would most likely be classified as belonging to what program.</question>
				<figure>
					<description>A scatter plot of two discriminant functions predicting program of study from movie preferences. Below the scatter plot a table of standardized canonical discriminant function coefficients is presented. The values for discriminant function one are .231 for Sixth Sense, -.670 for Pinocchio, .790 for Nutty Professor, 1.043 for Thelma and Louise, and .062 for Little Big Man. Similar values for discriminant function two were .290, -.015, .153, .186, and 1.016.</description>
					<url>discrim.gif</url>
					<width>474</width>
					<height>516</height>
					<align></align>
					<caption>Discriminant functions for test items.</caption>
					<alt> Discriminant functions for test items.</alt>
				</figure>
			<answer>clinical</answer>
			<answer>IO</answer>
			<answer type="correct">other</answer>
			<answer>no prediction is possible</answer>
			<difficulty></difficulty>
			<discriminability></discriminability>
			<author>David Stockburger</author>
			<date>03/05/2001</date>
			<concept>Discriminant Function Analysis</concept>
		</TestItem>
		<TestItem type="MC">
			<question> In the following SPSS output, a discriminant function analysis predicting program of student from movie preferences, the group with the largest number of predicted members was</question>
				<figure>
					<description>The SPSS classification table output from the discriminant function analysis program is presented. Rows are original group membership, columns are predicted group membership. Of the 6 clinical students, five were correctly classified and one was classified as IO. Of the 10 IO students, 2 were classified as clinical, 1 as other, and 7 as IO. Of the 6 other student, 5 were correctly classified and one was classified as IO. A total correct classification rate of 77.3% was obtained.</description>
					<url>discrim1.gif</url>
					<width>499</width>
					<height>211</height>
					<align></align>
					<caption>Classification results for test item.</caption>
					<alt>Classification results for test item.</alt>
				</figure>
			<answer type="correct">more than one group.</answer>
			<answer>IO.</answer>
			<answer>clinical.</answer>
			<answer>other.</answer>
			<difficulty></difficulty>
			<discriminability></discriminability>
			<author>David Stockburger</author>
			<date>03/05/2001</date>
			<concept>Discriminant Function Analysis</concept>
		</TestItem>
		<TestItem type="MC">
			<question> In the following SPSS output, a discriminant function analysis predicting program of student from movie preferences, the group with the highest correct classification rate was </question>
				<figure>
					<description>The SPSS classification table output from the discriminant function analysis program is presented. Rows are original group membership, columns are predicted group membership. Of the 6 clinical students, five were correctly classified and one was classified as IO. Of the 10 IO students, 2 were classified as clinical, 1 as other, and 7 as IO. Of the 6 other student, 5 were correctly classified and one was classified as IO. A total correct classification rate of 77.3% was obtained.</description>
					<url>discrim1.gif</url>
					<width>499</width>
					<height>211</height>
					<align></align>
					<caption>Classification results for test item.</caption>
					<alt>Classification results for test item.</alt>
				</figure>
			<answer type="correct">more than one group.</answer>
			<answer>IO</answer>
			<answer>clinical</answer>
			<answer>other</answer>
			<difficulty></difficulty>
			<discriminability></discriminability>
			<author>David Stockburger</author>
			<date>03/05/2001</date>
			<concept>Discriminant Function Analysis</concept>
		</TestItem>
		<TestItem type="MC">
			<question> In the following SPSS output, a discriminant function analysis predicting program of student from movie preferences, the overall correct classification rate was </question>
				<figure>
					<description>The SPSS classification table output from the discriminant function analysis program is presented. Rows are original group membership, columns are predicted group membership. Of the 6 clinical students, five were correctly classified and one was classified as IO. Of the 10 IO students, 2 were classified as clinical, 1 as other, and 7 as IO. Of the 6 other student, 5 were correctly classified and one was classified as IO. A total correct classification rate of 77.3% was obtained.</description>
					<url>discrim1.gif</url>
					<width>499</width>
					<height>211</height>
					<align></align>
					<caption>Classification results for test item.</caption>
					<alt>Classification results for test item.</alt>
				</figure>
			<answer type="correct">77.3%</answer>
			<answer>83.3%</answer>
			<answer>70%</answer>
			<answer>100%</answer>
			<difficulty></difficulty>
			<discriminability></discriminability>
			<author>David Stockburger</author>
			<date>03/05/2001</date>
			<concept>Discriminant Function Analysis</concept>
		</TestItem>
		<TestItem type="MC">
			<question> In the following SPSS output, a discriminant function analysis predicting program of student from movie preferences, suppose a student had standard scores of  0.0  for Sixth Sense, 0.0 for Pinocchio, 0.0 for Nutty Professor, 0.0 for Thelma and Louise, and -1.0 for Little Big Man. This student would most likely be classified as belonging to what program.</question>
				<figure>
					<description>A scatter plot of two discriminant functions predicting program of study from movie preferences. Below the scatter plot a table of standardized canonical discriminant function coefficients is presented. The values for discriminant function one are .231 for Sixth Sense, -.670 for Pinocchio, .790 for Nutty Professor, 1.043 for Thelma and Louise, and .062 for Little Big Man. Similar values for discriminant function two were .290, -.015, .153, .186, and 1.016.</description>
					<url>discrim.gif</url>
					<width>474</width>
					<height>516</height>
					<align></align>
					<caption>Discriminant functions for test items.</caption>
					<alt> Discriminant functions for test items.</alt>
				</figure>
			<answer>clinical</answer>
			<answer type="correct">IO</answer>
			<answer>other</answer>
			<answer>no prediction is possible</answer>
			<difficulty></difficulty>
			<discriminability></discriminability>
			<author>David Stockburger</author>
			<date>03/05/2001</date>
			<concept>Discriminant Function Analysis</concept>
		</TestItem>
<P>
	<figure>
		<description>A compute command interface illustrating the computation of a new variable in SPSS corresponding to the first discriminant function is shown. The computed variable will be called df1, for discriminant function one, and this is entered in the target variable box. The numeric expression adds the variables item147, item149, and item175 together.</description>
		<url> Images/mlt0380.gif </url>
		<width>508</width>
		<height>299</height>
		<align></align>
		<caption>Computing a new variable in SPSS corresponding to the first discriminant function.</caption>
		<alt> Computing a new variable in SPSS corresponding to the first discriminant function.</alt>
	</figure>
</P>
<P>
The resulting value, called df1 for discriminant function one, is a number between zero and three corresponding to the number of times true was selected in these three items. Creating a summary classification table using the SPSS <SPSSCommand>Analyze/Descriptives/Crosstabs</SPSSCommand> resulted in the following table.
</P>
<P>
	<figure>
		<description>A contingency table of group membership and computed discriminant function one is shown. The three groups, malingering, non-malingering, and psychiatric appear as row headers and scores 0 through three as column headers. A red box is drawn around the scores in the columns with headers 1 and 2.</description>
		<url> Images/mlt0382.gif </url>
		<width>617</width>
		<height>171</height>
		<align>Crosstabs of group membership and computed discriminant function one for example data with three groups and five variables.</align>
		<caption> Crosstabs of group membership and computed discriminant function one for example data with three groups and five variables </caption>
		<alt></alt>
	</figure>
</P>
<P>
It can be seen in the contingency table that the scores of 1 and 2 on the df1 variable could be combined into a prediction of non-malingering with a slight loss of information (the red box defines the new classification). Recoding the df1 variable into predicted group membership using discriminant function one resulted in the following summary classification table.
</P>
<P>
	<figure>
		<description>A contingency table of group membership and predicted group membership based on discriminant function one is shown. Of the 40 malingering subjects, 19 were classified as malingering and 21 as non-malingering.  Of the 40 non-malingering subjects, 26 were correctly classified as non-malingering, 7 as malingering, and 7 as psychiatric.  Of the 40 psychiatric patients, one was classified as malingering, 19 as non-malingering, and 20 as psychiatric. A correct classification ratio of 54.4% was found.</description>
		<url> Images/mlt0383.gif </url>
		<width>478</width>
		<height>126</height>
		<align></align>
		<caption> Crosstabs of group membership and predicted group membership based on discriminant function one for example data with three groups and five variables.</caption>
		<alt> Crosstabs of group membership and predicted group membership based on discriminant function one for example data with three groups and five variables </alt>
	</figure>
</P>
<P>An overall correct classification rate of 54.4% is found. You can see that this classification system was biased in favor of a classification of non-malingering, with 66 out of 120 classified in this category. This bias resulted in 26 out of 40, or 65% of the non-malingering group being correctly classified.
</P>
<P>A similar recoding and computing of the second discriminate function could be done, but the likelihood of success is small based on the relatively small eigenvalue for this function. Based on the sign and absolute values of the standardized canonical discriminant function coefficients, this variable (df2 for discriminant function two) will be constructed by subtracting items 145 and 147 and adding items 170, 175, and 325. Because these variables have been recoded to value of 0=false and 1=true, df2 can take on values between -1 and 3. In general, the higher the value of this variable, the more likely the patient belongs to the non-malingering group. The compute command for this variable is illustrated below.
</P>
<P>
	<figure>
		<description> A compute command interface illustrating the computation of a new variable in SPSS corresponding to the second discriminant function is shown. The computed variable will be called df2, for discriminant function two, and this is entered in the target variable box. The numeric expression subtract the variables item147 and item149, and adds item170, item175, and item325.</description>
		<url> Images/mlt0381.gif </url>
		<width>506</width>
		<height>298</height>
		<align></align>
		<caption> Computing a new variable in SPSS corresponding to the second discriminant function.</caption>
		<alt>> Computing a new variable in SPSS corresponding to the second discriminant function.</alt>
	</figure>
</P>
<P>A contingency table of group membership and computed discriminant function two is shown below.
</P>
<P>
	<figure>
		<description> A contingency table of group membership and computed discriminant function two is shown. The three groups, malingering, non-malingering, and psychiatric appear as row headers and scores -2 through 2 as column headers.</description>
		<url> Images/mlt0384.gif </url>
		<width>689</width>
		<height>171</height>
		<align></align>
		<caption> Crosstabs of group membership and computed discriminant function two for example data with three groups and five variables.</caption>
		<alt> Crosstabs of group membership and computed discriminant function two for example data with three groups and five variables.</alt>
	</figure>
</P>
<P>Examination of this contingency table indicates that this variable might be useful in discriminating between the non-malingering and other groups. A cutoff score of 1 was selected to be included in the non-malingering group. In other words, if the patient made a score of -2, -1, or 0, he or she would be included in the malingering or psychiatric group, otherwise, a score of 1, 2, or 3 would place the patient in the non-malingering group. The recode command that transforms the variable into predicted group membership is shown below.
</P>
<P>
	<figure>
		<description>The SPSS recode command interface is illustrated. A variable labeled df2, for discriminant function two, is being recoded into a variable labeled df2pred, for discriminant function two predicted. Scores of lowest through 0 will be recoded as 1 on df2pred and scores of 1 through highest will be recoded as 2 on df2pred.</description>
		<url> Images/mlt0385.gif </url>
		<width>522</width>
		<height>358</height>
		<align></align>
		<caption>Recoding discriminant function two scores into predicted group membership.</caption>
		<alt> Recoding discriminant function two scores into predicted group membership.</alt>
	</figure>
</P>
<P>
The resulting classification summary table is presented below.
</P>
<P>
	<figure>
		<description>A contingency table of group membership and predicted group membership based on discriminant function one is shown. Of the 40 malingering subjects, 26 were classified as malingering and 14 as non-malingering.  Of the 40 non-malingering subjects, 28 were correctly classified as non-malingering and 12 as malingering.  Of the 40 psychiatric patients, 25 were classified as malingering and 15 as non-malingering. A correct classification ratio of 67.5% was found for classifying the malingering and non-malingering groups.</description>
		<url> Images/mlt0386.gif </url>
		<width>405</width>
		<height>114</height>
		<align></align>
		<caption>Crosstabs of group membership and predicted group membership based on discriminant function two for example data with three groups and five variables.</caption>
		<alt> Crosstabs of group membership and predicted group membership based on discriminant function two for example data with three groups and five variables </alt>
	</figure>
</P>
<P>
Of the 40 malingering subjects, 26 were classified as malingering and 14 as non-malingering.  Of the 40 non-malingering subjects, 28 were correctly classified as non-malingering and 12 as malingering.  Of the 40 psychiatric patients, 25 were classified as malingering and 15 as non-malingering. A correct classification ratio of 67.5% was found for classifying the malingering and non-malingering groups. This discriminate function does not discriminate between the malingering and psychiatric groups, however, and is of limited use.
</P>
</section>
<section>
<P><h3>Summary</h3></P>
<P>
Discriminate function analysis is a useful statistical technique for classifying units, usually individuals in Psychology, into known groups based on linear combinations of interval score variables. The procedure works by solving for weights, which when multiplied times the variables and summed, provides maximum discrimination between groups. The weighted combination of scores is called a linear discriminate function.
</P>
<P>
The linear discriminate functions can then be used to compute the probability of the data given each group, P(D/G), for each unit. A normal curve with assumptions about equality of variances is usually used to model the data to compute these probabilities. In more complex analyses involving multiple groups and multiple independent variables (scores), more stringent assumptions involving multivariate normal distributions are usually made.
</P>
<P>
After the probability of the data given each group is estimated for each unit and the prior probabilities are set, Bayes theorem may be used to estimate the probability of each group given the data, P(G/D), for each unit. The unit is predicted to belong in the group with the highest P(G/D). Goodness of fit of the model is usually assessed by examination of the overall percent of correct classifications of units into groups.
</P>
</section>
</chapter>


