Before an experiment is performed, the question of experimental design must be addressed. Experimental design refers to the manner in which the experiment will be set up, specifically the way the treatments will be administered to subjects. Treatments will be defined as quantitatively or qualitatively different levels of experience. For example, in an experiment on the effects of caffeine, the treatment levels might be exposure to different amounts of caffeine, from 0 to .0375 milligram In a very simple experiment there are two levels of treatment; none, called the control condition, and some, called the experimental condition.
The type of analysis or hypothesis test used is dependent upon the type of experimental design employed. The two basic types of experimental designs are crossed and nested.
In a crossed design each subject sees each level of the treatment conditions. In a very simple experiment, such as one that studies the effects of caffeine on alertness, each subject would be exposed to both a caffeine condition and a no caffeine condition. For example, using the members of a statistics class as subjects, the experiment might be conducted as follows. On the first day of the experiment, the class is divided in half with one half of the class getting coffee with caffeine and the other half getting coffee without caffeine. A measure of alertness is taken for each individual, such as the number of yawns during the class period. On the second day the conditions are reversed; that is, the individuals who received coffee with caffeine on the first day are now given coffee without and vice-versa. The size of the effect will be the difference of alertness on the days with and without caffeine.
The distinguishing feature of crossed designs is that each individual will have more than one score. The effect occurs within each subject, thus these designs are sometimes referred to as WITHIN SUBJECTS designs. In SPSS the analysis is called a Paired-Samples T-Test.
Crossed designs have two advantages: they generally require fewer subjects, because each subject is used a number of times in the experiment and, they are more likely to result in a significant effect, given the effects are real.
Crossed designs also have some disadvantages. For one, the experimenter must be concerned about carry-over effects. For example, individuals not used to caffeine may still feel the effects of caffeine on the second day, even though they did not receive the drug. Secondly, the first measurements taken may influence the second. For example, if the measurement of interest was a score on a statistics test, taking the test once may influence performance the second time the test is taken. Finally, the assumptions necessary when more than two treatment levels are employed in a crossed design may be restrictive.
In a nested design, each subject receives one, and only one, treatment condition. The critical difference in the simple experiment described earlier is that the experiment would be performed on a single day, with half the individuals receiving coffee with caffeine and half receiving coffee without caffeine. The size of effect in this case is determined by comparing average alertness between the two groups.
The major distinguishing feature of nested designs is that each subject has a single score. The effect, if any, occurs between groups of subjects and thus the name BETWEEN SUBJECTS is sometimes given to these designs. In SPSS the analysis of nested designs with two groups is called Independent-Samples T-Test.
The relative advantages and disadvantages of nested designs are opposite those of crossed designs: with the advantage that carry-over effects are not a problem, as individuals are measured only once and, the disadvantage that the number of subjects needed to discover effects is greater than with crossed designs.
Some treatments are nested by their nature. The effect of sex, for example, is necessarily nested. One is either a male or a female, but not both. Current religious preference is another example. Effects that rely on a pre-existing condition are sometimes called demographic or blocking factors and subjects will always be nested within these factors.
As discussed earlier, a crossed design occurs when each subject sees each treatment level, that is, when there is more than one score per subject. The purpose of the analysis is to determine if the effects of the treatment are real, or greater than expected by chance alone. Let's go through a sample experiment.
An experimenter is interested in the difference of finger-tapping speed by the right and left hands. She believes that if a difference is found, it will confirm a theory about hemispheric differences (left versus right) in the brain.
A sample of thirteen subjects (N=13) is taken from a population of adults. Six subjects tap for fifteen seconds with their right hand ring finger. Seven subjects tapped with their left hand ring finger. After the number of taps have been recorded, the subjects tap again, but with the opposite hand. Thus each subject appears in each level of the treatment condition: tapping with both the right hand and the left hand.
After the data is collected, it is usually arranged in a table like the following:
Note that the two scores for each subject are written in the same column.
In analysis of crossed designs, first calculate the difference scores for each subject. These scores become the basic unit of analysis. Add a row to the example data table for the difference scores:
The difference scores will be symbolized by Di to differentiate them from raw scores, symbolized by Xi.
The next step is to enter the difference scores into the calculator and calculate the mean and standard deviation. For example, the mean and standard deviation of the difference scores in the preceding table (-2, 5, 7, and so on) are:
Mean = = 4.69
Standard Deviation = sD = 7.146
The mean and standard deviation of the difference scores will be represented by and SD, respectively.
The Null Hypothesis states that there are no effects. That is, if the experiment was conducted with an infinite number of subjects, the average difference between the right and left hand would be zero ( =0).
If the experiment using thirteen subjects was repeated an infinite number of times assuming the Null Hypothesis was true, then a model could be created of the means of these experiments. This model would be the sampling distribution of the mean difference scores. The central limit theorem states that the sampling distribution of the mean has a mean equal to the mean of the population model. In this case the mean would equal 0.0, and a standard error represented by . The standard error of the mean difference scores could be computed by the following formula.
The only difficulty in this case is that the standard deviation of the population model, ( ) is not known. It can, however, be estimated using the sample standard deviation, sD. This estimation adds error to the procedure and requires the use of the t-distribution rather than the normal curve. The t-distribution will be discussed in greater detail later in this chapter.
The standard error of the mean difference score, , is estimated by , which is calculated by the following formula:
Using this formula on the example data yields:
Before the exact significance level can be found using the Probability Calculator, an additional parameter, called the degrees of freedom, must be calculated. Degrees of freedom, symbolized as df, are basically the number of values that are free to vary. In a sample of N scores, each score can take on a different value, so there are N degrees of freedom. If the mean of the scores is computed and used in a computation, as it is when finding the standard deviation, then one degree of freedom is lost because given that both the mean and N-1 of the scores are known, the Nth score can always be found. Since this is the situation in the case of a crossed t-test, the degrees of freedom are found using the following formula.:
In this case, it results in the following:
The exact significance level, or probability of obtaining a mean difference equal to or greater than the one we obtained given that chance alone was operation, of this statistic may be found using the Probability Calculator, as shown in the following figure. First, select the Two-tailed Sig Level option under t-Distribution. and click on the t Distribution button. Next, enter a value of 12 in the df (degrees of freedom) box. The value of mu for the distribution when the null hypothesis is true is zero, so enter a 0 for mu. Sigma is the estimated standard error of the mean difference scores, 1.982; enter that value in the sigma box. Enter the mean difference score, 4.69, in the Score box. Click the right-arrow button to generate the result.
Since the exact significance level equals .036 and is less than the value of alpha (.05), the model under the Null Hypothesis is rejected and the hypothesis of real effects accepted. The mean difference is said to be statistically significant. In this case the conclusion would be made that the right hand taps faster than the left hand, because the mean difference is greater than zero and the number of taps using the left hand was subtracted from the number of taps using the right. If the mean difference was negative, then the left hand would have tapped faster than the right.
The data is entered into an SPSS data file with each subject as a row and the two variables for each subject as columns. In the example data, there would be thirteen rows and two columns, one each for the right and left hand data. The data file looks like the following figure:
What I call a crossed t-test, the SPSS package calls a paired-samples T Test. This statistical procedure is accessed by selecting Analyze/Compare Means/Paired-Samples T Test, as the following figure shows:
You must then tell SPSS which variables are to be compared by double-clicking on them. After selecting the two variables to be included in the analysis, click on the OK button. The example data analysis proceeds as follows:
The first table of SPSS output from this procedure, shown here, includes the means
The second SPSS output table includes the t-test results:
Note that the results from using SPSS are within rounding error of the results computed by using the formulas, a hand-held calculator, and the Probability Calculator.
A nested t-test is the appropriate hypothesis test when there are two groups and different subjects are used in the treatment groups. Two examples of nested t-tests will be presented in this chapter, one detailing the logic underlying the analysis and one illustrating the manner in which the analysis will be performed in practice.
In a comparison of the finger-tapping speed of males and females the following data was collected:
The design is necessarily nested because each subject has only one score and appears in a single treatment condition. Through the marvels of modern medicine it might be possible to treat sex as a crossed design, but finding subjects might be somewhat difficult. The next step is to find the mean and variance of the two groups:
The critical statistic is the difference between the two means . In the example, it is noted that females, on the average, tapped faster than males. A difference of 43.33 - 57.33 or -14.00 was observed between the two means. The question addressed by the hypothesis test is whether this difference is large enough to consider the effect to be due to a real difference between males and females, rather than a chance happening (the sample of females just happened to be faster than the sample of males).
The analysis proceeds in a manner similar to all hypothesis tests. An assumption is made that there is no difference in the tapping speed of males and females. The experiment is then carried out an infinite number of times, each time finding the difference between means, creating a model of the world when the Null Hypothesis is true. The difference between the means that was obtained from the experiment is compared to the difference that would be expected on the basis of the model of no effects. If the difference that was found is unlikely given the model, the model of no effects is rejected and the difference is said to be significant.
In the case of a nested design, the sampling distribution is the difference between means. The sampling distribution consists of an infinite number of differences between means.
The sampling distribution of this statistic is characterized by the parameters and . In the case of , if the null hypothesis is true, then the mean of the sampling distribution would be equal to zero (0). The value of is not known, but may be estimated. In each of the following formulas, the assumption is made that the variances of the population values of for each group are similar.
The computational formula for the estimate of the standard error of the difference between means is:
The procedure for finding the value of this statistic using a statistical calculator with parentheses is:
If each group has the same number of scores, the preceding formula may be simplified:
if N1 = N2 = N then
Because the example problem has different numbers in each group, the longer formula must be used to find the standard error of the differences between means. Computation proceeds as follows:
The sampling distribution of the differences between means when in fact the null hypothesis is true may now be estimated as illustrated in the following:
The degrees of freedom for a nested t-test is N1+N2-2. In this case there would be 6 + 9 - 2 = 13 degrees of freedom.
The probability (the exact significance level) of obtaining the observed difference between the means ( = 43.33 - 57.33 = -14.00) given that the preceding model is true is obtained by means of the Probability Calculator. Here are the steps:
Since the value of the exact significance level is less than alpha, the model would be rejected along with the null hypothesis. The alternative hypothesis, that there were real effects, would be accepted. Because the mean for females is larger than the mean for males in this case the effect would be that females tapped faster than males.
To compute a nested t-test using SPSS, the data for each subject must first be entered with one variable identifying to which group that subject belongs, and a second with the actual score for that subject. In the example data, one variable identifies the gender of the subject and the other identifies the number of finger taps that subject did. A 1 indicates the male subject group and a 2 indicates the female subject group in the gender column of the example data file, as shown in the following figure:
The SPSS package does what I call a nested t-test with the Independent-Samples T Test procedure, which is accessed by selecting Analyze/Compare Means/Independent-Samples T Test, as the following figure shows.
Then you must describe both the dependent (Test) variable and the independent (Grouping) variable to the procedure. The levels of the grouping variable are further described by clicking on the Define Groups button. You can see example input to the procedure in the following figure:
Clicking Continue and then OK produces both a table of means, like this one:
and the results of the nested t-test, like this:
The SPSS output produces more detailed results than those presented earlier in the chapter. The Levene's Test for Equality of Variances columns present a test of the assumption that the theoretical variances of the two groups are equal. If this statistic is significant, that is, the value of Sig. is less than .05 (or whatever the value for alpha), then some statisticians argue that a different procedure must be used to compute and test for differences between means. The SPSS output gives results of statistical procedures both assuming equal and not equal variances. In the case of the example analysis, the test for equal variances was not significant, indicating that the first t-test would be appropriate.
Note that the procedures described earlier in this chapter produce results within rounding error of those assuming equal variances. When in doubt, you should probably opt for the more conservative (less likely to find results) t-test assuming unequal variances.
In an early morning class, one-half the students are given coffee with caffeine and one-half coffee without caffeine. The number of times each student yawns during the lecture is recorded with the following results:
Because each student participated in one, and only one, treatment condition, the experiment has subjects nested within treatments.
The degrees of freedom is N1 + N2 - 2; in this case, 8 + 8 -2 = 14.
The null hypothesis is rejected because the value for the exact significance level is less than the value of alpha. In this case it is possible to say that caffeine had a real effect.
The t distribution is a theoretical probability distribution. It is symmetrical, bell-shaped, and similar to the normal curve. It differs from the normal curve, however, in that it has an additional parameter, called degrees of freedom, that changes its shape.
Degrees of freedom, usually symbolized by df, is a parameter of the t distribution which can be any real number greater than zero (0.0). Setting the value of df in addition to the values of mu and sigma defines a particular member of the family of t distributions. A member of the family of t distributions with a smaller df has more area in the tails of the distribution than one with a larger df.
The effect of df on the t distribution is illustrated in the three t distributions in the following figure.
Note that the smaller the df, the flatter the shape of the distribution, resulting in greater area in the tails of the distribution.
The astute reader will no doubt observe that the t distribution looks very similar to the normal curve. As the df increase, the t distribution approaches the normal distribution with similar values for mu and sigma. The normal curve is a special case of the t distribution when df=¥. For practical purposes, the t distribution approaches the normal distribution relatively quickly, such that when df=30 the two are almost identical.
The t-distribution rather than the normal curve is used to test hypotheses when the value of sigma must be estimated rather than being known. Estimating the value of sigma adds additional uncertainty or error to the hypothesis testing procedure. Statisticians attempt to correct for this error by making the hypothesis test more conservative, that is, less likely to find results, and this is accomplished by using the t-distribution rather than the normal curve.
For example, the exact significance level when mu=0, sigma=2, and the value=3.5 is .080 for the normal curve, .106 for the t-distribution with df=12, .141 for the t-distribution with df=5, and .330 for the t-distribution with df=1. Because the decision to reject the Null Hypothesis is made by comparing the exact significance level to alpha, the smaller the degrees of freedom, the less likely we are to reject the Null Hypothesis. Thus the hypothesis test becomes more conservative the smaller the degrees of freedom. Basically, the smaller the sample size, the smaller the degrees of freedom, the greater the likelihood of error in estimating the value of sigma, the greater the correction that is made by using the t-distribution rather than the normal curve.
A choice of a one- or two-tailed t-test determines how the exact significance level is computed. The one-tailed t-test is performed if the results are interesting only if they turn out in a particular direction. The two-tailed t-test is performed if the results would be interesting in either direction. The choice of a one- or two-tailed t-test must be made in the design stage of the study and affects the hypothesis testing procedure in a number of different ways.
A two-tailed t-test computes the exact significance level by summing the areas in two tails under the t-distribution. If the score is greater than mu, the areas summed would be the area under the curve above the score and the area below a mirror image of the score under the same distribution. In the following example, df=5, mu=0, sigma=3.47, and the score=4.11. The exact significance level is computed by summing the areas above 4.11 and below -4.11.
If the score was less than mu, the areas summed would be the area below the score plus the area above a mirror image of the score. In the following example, df=5, mu=0, sigma=3.47, and the score=-2.39. The exact significance level is computed by summing the areas below -2.39 and above 2.39.
In either case, because of the symmetry of the t-distribution, the exact significance level will be twice the area either above or below the score, whichever is less.
When doing a two-tailed t-test, the null hypothesis is a particular value, and there are two alternative hypotheses, one positive and one negative. When the null hypothesis is rejected, one of the alternative hypotheses is accepted and the other rejected, depending upon the direction of the results.
For example, consider the earlier example study comparing the finger-tapping speeds of males and females. If a two-tailed nested t-test was done comparing the mean finger-tapping speeds of males and females, the null hypothesis would be that males and females tapped equally fast. One alternative hypothesis would be that males tapped faster than females and the other would be that females tapped faster than males. If the null hypothesis that males and females tapped equally fast was rejected based on the hypothesis test, then one of the alternative hypotheses would be accepted and the other rejected. If the mean for males was larger than the mean for females, the alternative hypothesis that males tapped faster than females would be accepted. If the opposite was the case, then the alternative hypothesis that females tapped faster than males would be accepted.
In a one-tailed t-test, the exact significance level is calculated by finding the area under the t-distribution either above or below the score. Two different one-tailed t-tests exist: one for area above and one for area below. The decision about whether to find the area above or below the score must be made before the study is conducted and corresponds with the direction of the score value if the results come out as expected.
In a one-tailed t-test in the positive direction the exact significance level is computed by finding the area above the score value. In the following example, df=5, mu=0, sigma=3.47, and the score=4.11. The exact significance level is computed by finding the area above 4.11.
In a second example, this time with a negative score, df=5, mu=0, sigma=3.47, and the score=-2.39. The exact significance level is computed by finding the area above -2.39. In this case, the results of the study were opposite the prediction and the exact significance level is very large at .7392.
In a one-tailed t-test in the negative direction the exact significance level is computed by finding the area below the score value. In the following example, df=5, mu=0, sigma=3.47, and the score=4.11. The exact significance level is computed by finding the area below 4.11. In this case, because the score value is opposite the prediction, the exact significance level is large at .8553.
In a second example, this time with a negative score, df=5, mu=0, sigma=3.47, and the score=-2.39. The exact significance level is computed by finding the area below -2.39. In this case, the results of the study were congruent with the prediction and the exact significance level is .2608.
The selection of a one or two-tailed t-test must be made before the experiment is performed. It is not "cricket" to find that the score of -6.5 would only be significant with a one-tailed test in the negative direction, and then say "I really meant to do a one-tailed t-test." Because reviewers of articles submitted for publication are sometimes suspicious when a one-tailed t-test is done, the recommendation is that if there is any doubt, a two-tailed test should be done.
When only two levels or groups are used in an experiment or study, hypothesis testing about means can be done using a t-test. A different procedure will be used for the t test depending upon the type of experimental design employed.
When each subject sees each level of treatment, a crossed design results. In a crossed design the effects are found as mean difference scores.The difference scores between the two treatment levels are first calculated for each subject, followed by the mean and standard deviation of the difference scores. Following this, the standard error of the mean difference scores is estimated and the exact significance level found by comparing the mean difference score found in the study with a theoretical distribution of mean difference scores given there were no effects. The exact significance level is then compared with the value of alpha that was set before the study was conducted. If the exact significance level is less than alpha, the Null Hypothesis is rejected and the hypothesis of real effects is accepted. In the opposite case, where the exact significance level is greater than alpha, the null hypothesis is retained with no conclusion about the reality of effects.
When the experimental design has subjects appearing in one, and only one, of two treatment conditions, a nested t-test is the appropriate analysis. In a nested design the effects are found using the differences between the means. The results of the study are compared with a model of the distribution of differences between sample means under the null hypothesis, resulting in an exact significance level. As in all hypothesis tests, if the model is unlikely given the data, the model and the null hypothesis are rejected and the alternative accepted.
The t distribution is used to create the model under the null hypothesis rather than the normal distribution because error is added to the hypothesis testing procedure when the standard errors are estimated rather than known. The t distribution is very similar to the normal distribution, except an additional parameter, called the degrees of freedom is needed to describe a particular member of this family of distributions. The larger the degrees of freedom, the more similar the t distribution is to the normal distribution and the less correction for error.
The statistician has the choice of using either a single tail or both tails when doing a t test. In a non-directional or two-tailed t test, alpha is divided in half and each half is placed in one tail of the distribution. The effects may be significant in either direction. In a directional t test, alpha is placed in a single tail of the distribution and are only significant if the results fall in that tail. Results are more likely to be significant in a directional test, but because it is difficult to verify that the statistician decided before the study to use a one-tailed t test, the two-tailed t test is most often used.