Multivariate Statistics: Concepts, Models, and Applications
David W. Stockburger
The main purpose of a discriminant function analysis is to predict group membership based on a linear combination of the interval variables. The procedure begins with a set of observations where both group membership and the values of the interval variables are known. The end result of the procedure is a model that allows prediction of group membership when only the interval variables are known. A second purpose of discriminant function analysis is an understanding of the data set, as a careful examination of the prediction model that results from the procedure can give insight into the relationship between group membership and the variables used to predict group membership.
For example, a graduate admissions committee might divide a set of past graduate students into two groups: students who finished the program in five years or less and those who did not. Discriminant function analysis could be used to predict successful completion of the graduate program based on GRE score and undergraduate grade point average. Examination of the prediction model might provide insights into how each predictor individually and in combination predicted completion or non-completion of a graduate program.
Another example might predict whether patients recovered from a coma or not based on combinations of demographic and treatment variables. The predictor variables might include age, sex, general health, time between incident and arrival at hospital, various interventions, etc. In this case the creation of the prediction model would allow a medical practitioner to assess the chance of recovery based on observed variables. The prediction model might also give insight into how the variables interact in predicting recovery.
The simplest case of discriminant function analysis is the prediction of dichotomous group membership based on a single variable. An example of the simplest case is the prediction of successful completion of a graduate program based on the GRE verbal score. In this case, since the prediction model includes only a single variable, it gives little insight into how variables interact with each other in prediction. Thus prediction of group membership will be the major focus of the next section of this chapter.
With respect to the data file and purpose of analysis, this simplest case is identical to the case of linear regression with dichotomous dependent variables. As discussed previously, data of this type may be represented in any number of different forms: scatterplots, tables of means and standard deviations, and overlapping frequency polygons. Because overlapping frequency polygons have such an intuitive appeal, they will be used to describe how discriminant function analysis works.
A single interval variable might discriminate between groups in an almost perfect fashion, not at all, or somewhere in between. For example, if one wished to differentiate adult males and females, one could collect information on how many bras the person owned, score on the last statistics test, and height. In the case of the number of bras, the discrimination would be very good, but not perfect (some women don't own any bras, some men do). In the case of the score on the last statistics test, little discrimination would be possible because males and females generally score about the same. In the case of height, some discrimination between adult males and females would be possible, but it would be far from perfect.
In general, the larger the difference between the means of the two groups relative to the within groups variability, the better the discrimination between the groups. The following program allows the student to explore data sets with different degrees of discrimination ability.
Frequency Polygons and Means in Discriminant Analysis
The figure below shows the results of the program when the discrimination is set to low.
The next figure shows the results when the discrimination is set to high.
Note that the two frequency polygons overlap a great deal when there is little or no discriminability between groups and hardly at all when there is high discriminability. In the same vein, the means are fairly similar relative to their standard deviations in the low discriminability condition and different in the high discriminability condition.
Discriminant function analysis is based on modeling the interval variable for each group with a normal curve. The mean of each group is used an estimate of mu for that group. Sigma for each group can be estimated by using weighted mean of the within group variances or using the standard deviation of that group. In the case of the weighted mean the variances are weighted by sample size and can be calculated either as the denominator for a nested t-test or as the square root of the Mean Squares Within Groups in an ANOVA, providing identical estimates. When using the weighted mean of the variances, one must assume that the generating function for each group produces numbers that in the long run have the same variability.
In the simple case of dichotomous groups and a single predictor variable, it really does not make a great deal of difference in the complexity of the model if the variability of each groups is assumed to be equal or not. This is not true, however, when more groups and more predictor variables are added to the model. For that reason, the assumption of equality of within group variance is almost universal in discriminant function analysis.
The following program allows the student to explore the relationship between different generating functions (poor, medium, or good discrimination; equal or unequal variances), sample size, and resulting model based on the sample. The student should verify that larger sample sizes provide resulting models that are more similar to the generating model. The student should also explore the effect of violations of the equality of variance assumptions on the resulting model.
Modeling of Frequency Polygons in Discriminat Analysis
The normal curve models of the predictor variables for each group and can be used to provide probability estimates of a particular score given membership in a particular group. In discriminant function analysis, the area in the tails under a normal curve model for a given group between points equally distant from mu is the probability of either point given that group. This probability is symbolized as P(D/G) on SPSS output.
For example, suppose that the normal curve model for a given group has a value of 13 for mu and 2 for sigma. The probability of a score of 16 would be the area in the tails of this normal curve between 10 and 16. The value of 10 was selected as the low score because 16 is three units above the mean (16-13 = 3) and 10 is three sigma units below the mean (13-3 = 10). This is all much easier visualized than stated.
This area could be found using the normal curve program by subtracting the area between scores of 10 and 16 from one ( 1 - .866 = .134) as follows:
Click Here to go to Normal Curve Area Program
A score of 10 would have the same probability as a score of 16 because it is an equal distance from mu.
In a similar manner the probability of membership in this group given a score of 11 could be found as the area in the tails of a normal curve with mu=13 and sigma=2 between a score of 11 and 15. This area would be ( 1 - .683 ) or .317.
If the second group had a mean of 17 and the same value for sigma (2), then the probability of a score of 16 would be one minus the area between 16 and 18 (1 - .383 = .617) and could be found and visualized as follows:
Computation of low and high scores for large numbers of points could be tedious and is best left to computers. A program to compute P(D/G) for two groups requires that the user enter the means for the two groups, the value for Mean Square Within Groups (from the ANOVA source table), and the score value. Using the preceding examples with group means of 13 and 17, a Mean Square Within of 4 (22), and a score of 16 would generate the following results:
Discriminant Analysis Probabilities Program
Interpretation of P(D/G) is the likelihood of membership in a group given a particular score. In some cases involving extreme scores, the likelihood of belonging to either group will be small. In other cases involving scores that fall almost equidistant from either mean, the likelihood of belonging to either group will be similar. Rather than simply observing predicted group membership, the student is advised to check probabilities of membership in all groups.
Prior probabilities are the likelihood of belonging to a particular group given no information about the person is available. In the classical testing literature prior probabilities are called base rates. Prior probabilities influence our decisions about group membership. Prior probabilities will be symbolized as P(G). For example, P(G1) is the prior probability of a score belonging to group 1.
Consider the case of prediction of completion of graduate school using GRE verbal scores. Suppose that 99% of the students who started the graduate program successfully completed the program (it was a really easy program). Even if a student scored considerably lower than the mean of the successful group, program completion would most likely be predicted because almost everyone finishes. Likewise, in a graduate program where less than 10% of beginning students complete the program, the likelihood of completion will be fairly low no matter how high the GRE verbal score.
As discussed in the chapter on probabilities in the introductory text, Bayes Theorem provides a means to transform prior probabilities into posterior probabilities. In the case of discriminant function analysis, prior probabilities P(G) are transformed into the posterior probabilities of group membership given a particular score P(G/D) using information about the discriminating variables. The formula for computing P(G/D) using Bayes Theorem is as follows:
For example, if the prior probability of membership in group 1 was .10, P(G1)=.10, and the probability of a score of X=16 given membership in group 1 was .134, P(D=16/G1)=.134, then the posterior probability, P(G1/D=16, would be .024. Computation of this result can be seen below.
In a similar manner, if the prior probability of membership in group 2 was .90, P(G2)=.90, and the probability of a score of X=16, given membership in group 2 was .617, P(D=16/G2)=.617, then the posterior probability, P(G2/D=16 would be .976. Computation of this result can be seen below.
These posterior probabilities can be compared with those where the prior probabilities of group membership is equal to understand the effect of setting different prior probabilities on the values of the posterior probabilities
The sum of the posterior probabilities for all groups will necessarily be one. In the previous example P(G1/D=16) + P(G2/D=16) = .178 + .822 = 1.0. Predicted group membership involves comparing posterior probabilities and selecting the group that has the largest value. It is possible for a score to be unlikely to belong to either group, yet have a high posterior probability of belonging to one group or the other, as can be seen below.
Posterior probabilities are included in the program to compute probabilities in discrimination function analysis. Note that the default value for prior probabilities is .5 for each group. The SPSS/WIN discriminant function analysis program also defaults to equally likely priors and allows the user to optionally supply different prior probabilities for group membership.
The effect of setting different prior probabilities can be seen in the following examples.
The following animated illustration has been prepared to show the steps necessary to run a discriminant function analysis using SPSS/WIN. The data file includes a dichotomous dependent variable labeled "Y" and an interval independent variable labeled "X". Some options have been selected to provide additional output.
CLICK HERE FOR ANIMATED ILLUSTRATION
Given the following raw data.
A discriminant function analysis was done using SPSS/WIN. The output from the discriminant function analysis program of SPSS/WIN is not easy to read, nor is it particularly informative for the case of a single dichotomous dependent variable. One can only hope that future versions of this program will include improved output for this program. The output has been cut up and rearranged. It is highly recommended that the student have a copy of the complete output from this analysis in addition to this text in order to locate the portion of the output that is being discussed.
Classification probabilities for each score must be optionally requested and are presented in the following table. SPSS/WIN does not provide P(D/G) for the 2nd highest group.
A rather crude frequency polygon is also provided given that it is optionally requested.
The unstandardized canonical discriminant function coefficients are the regression weights for prediction of a dichotomous dependent variable. The following compares this portion of the output of the discriminant functional analysis program
with a similar portion of the output from the regression package where the Y variable has been recoded to values of 7 and -9.
The regression weights are a multiple of each other. For example the constant terms in the two equations are related by the constant value of -16.50/-4.55=3.63. Likewise, the slopes of the two equations are related by the same constant value of 1.375/.379 =3.63. A proper rescaling of the Y variable would result in equivalent equations (Flury and Riedwyl, 1988).
Since there is only one independent variable in the prediction equation, the canonical correlation coefficient is equal to the correlation coefficient (.49). Note also that the significance level of Wilks' Lamba (.054) is the same as the significance level for the t-test for the b1 value in the table above. The following is from the discriminant function analysis program.
The following is from the regression program.
A table of actual and predicted group membership may also be optionally requested.
It can be seen that the procedure is slightly more accurate (66.7%) in predicting group membership in group 0 than in group 2 (57.1%).
For reasons that I have not been able to fully comprehend, SPSS/WIN computes probability of group membership as the relative height of the normal curve at a given point (Tatsuoka, 1971). The following figure gives an example of the calculation of probabilities based on the height of the normal curve.
Discriminant Analysis Probabilities Program using SPSS/WIN computational method.
The following program will generate probabilities for discriminant function analysis based on the relative height of the normal curve at any point. This program should be used to match the probabilities generated using SPSS/WIN.
The extension of discriminant function analysis to situations where there are three or more groups is straightforward. Probabilities of the observed score given group membership are computed in a manner similar to the case of dichotomous groups. The only difference is that there are more probabilities to compute. The following illustrates overlapping probability models for five groups.
In this case some groups could be discriminated more easily than others. For example, group 1 differs from the others, while groups 2 and 3 could be discriminated from groups 4 and 5. It would be difficult to discriminate membership in group 2 from group 3 using this single variable.