INTRODUCTORY
STATISTICS:
CONCEPTS,
MODELS, AND APPLICATIONS
WWW Version 1.0
First Published 7/15/96
Revised 8/5/97
Revised 2/19/98
David W. Stockburger
Southwest
Missouri State University
@Copyright 1996 by David W.
Stockburger
Introductory
Statistics: Concepts, Models, and Applications
David W. Stockburger
Table of Contents.................................................................................................................................................................. 2
INTRODUCTION................................................................................................................................................................... 7
INSTRUCTOR................................................................................................................................................................... 7
WHO SHOULD TAKE THIS
COURSE........................................................................................................................... 7
FEELINGS ABOUT THE
COURSE................................................................................................................................. 7
A BRIEF HISTORY OF THE
TEACHING OF STATISTICS........................................................................................ 8
TEACHING STATISTICS................................................................................................................................................ 9
REQUIREMENTS OF THE
COURSE............................................................................................................................. 9
GRADE DETERMINATION......................................................................................................................................... 10
OBJECTIVES OF THE
COURSE................................................................................................................................... 11
COURSE OUTLINE........................................................................................................................................................ 11
ICONS............................................................................................................................................................................... 12
PREFACE......................................................................................................................................................................... 13
A MAYORAL FANTASY..................................................................................................................................................... 14
A FULLBLOWN FIASCO............................................................................................................................................. 14
ORGANIZING AND
DESCRIBING THE DATA  AN ALTERNATIVE ENDING................................................. 15
SAMPLING  AN ALTERNATIVE
APPROACH AND A HAPPY ENDING........................................................... 15
DOES CAFFEINE MAKE PEOPLE MORE ALERT?.................................................................................................... 17
MODELS.............................................................................................................................................................................. 19
DEFINITION OF A MODEL......................................................................................................................................... 19
CHARACTERISTICS OF MODELS.............................................................................................................................. 19
THE LANGUAGE OF MODELS................................................................................................................................... 21
MODELBUILDING IN
SCIENCE................................................................................................................................ 22
ADEQUACY AND GOODNESS
OF MODELS............................................................................................................ 23
MATHEMATICAL MODELS....................................................................................................................................... 23
Building a Better Boat
 Example of ModelBuilding....................................................................................................... 24
THE LANGUAGE OF ALGEBRA...................................................................................................................................... 27
THE SYMBOL SET OF
ALGEBRA............................................................................................................................... 27
NUMBERS.................................................................................................................................................................. 27
VARIABLES............................................................................................................................................................... 27
OPERATORS.............................................................................................................................................................. 28
DELIMITERS............................................................................................................................................................. 28
SYNTAX OF THE LANGUAGE
OF ALGEBRA......................................................................................................... 28
Creating sentences........................................................................................................................................................ 28
Eliminating Parentheses............................................................................................................................................... 29
TRANSFORMATIONS.................................................................................................................................................. 30
Numbers....................................................................................................................................................................... 30
Fractions...................................................................................................................................................................... 31
Exponential Notation................................................................................................................................................... 32
Binomial Expansion..................................................................................................................................................... 32
Multiplication of
Phrases............................................................................................................................................ 33
Factoring...................................................................................................................................................................... 34
SEQUENTIAL APPLICATION
OF REWRITING RULES TO SIMPLIFY AN EXPRESSION........................... 34
EVALUATING ALGEBRAIC SENTENCES................................................................................................................. 34
A NOTE TO THE STUDENT........................................................................................................................................ 35
MEASUREMENT................................................................................................................................................................ 36
DEFINITION................................................................................................................................................................... 36
PROPERTIES OF
MEASUREMENT SYSTEMS.......................................................................................................... 36
Magnitude.................................................................................................................................................................... 37
Intervals....................................................................................................................................................................... 38
Rational Zero............................................................................................................................................................... 38
SCALE TYPES................................................................................................................................................................. 39
Nominal Scales............................................................................................................................................................. 39
Ordinal Scales............................................................................................................................................................... 40
Interval Scales.............................................................................................................................................................. 40
Ratio Scales.................................................................................................................................................................. 40
EXERCISES IN
CLASSIFYING SCALE TYPES...................................................................................................... 41
THE MYTH OF SCALE TYPES..................................................................................................................................... 41
The Ruler and Scale
Types.......................................................................................................................................... 42
TOWARD A
RECONCEPTUALIZATION OF SCALE TYPES............................................................................. 42
FREQUENCY DISTRIBUTIONS....................................................................................................................................... 44
FREQUENCY TABLES................................................................................................................................................... 44
FREQUENCY DISTRIBUTIONS................................................................................................................................... 47
Histograms................................................................................................................................................................... 48
Absolute Frequency
Polygons..................................................................................................................................... 50
Relative Frequency
Polygon........................................................................................................................................ 52
Absolute Cumulative
Frequency Polygons................................................................................................................. 54
Relative Cumulative
Polygon....................................................................................................................................... 55
COMPARING FREQUENCY DISTRIBUTIONS............................................................................................................. 58
OVERLAPPING FREQUENCY
TABLES...................................................................................................................... 58
OVERLAPPING FREQUENCY
POLYGONS............................................................................................................... 59
CONTINGENCY TABLES............................................................................................................................................. 61
CONCLUSION................................................................................................................................................................. 62
GROUPED FREQUENCY DISTRIBUTIONS.................................................................................................................. 64
SELECTING THE INTERVAL
SIZE.............................................................................................................................. 65
COMPUTING THE FREQUENCY
TABLE.................................................................................................................. 66
SELECTING ANOTHER
INTERVAL SIZE.................................................................................................................. 69
SELECTING THE
APPROPRIATE INTERVAL SIZE................................................................................................. 69
MODELS OF DISTRIBUTIONS........................................................................................................................................ 72
1.) A belief in the
eminent simplicity of the world...................................................................................................... 72
2.) Not all the data
can be collected............................................................................................................................. 72
VARIATIONS OF
PROBABILITY MODELS.............................................................................................................. 73
The Uniform or
Rectangular Distribution.................................................................................................................... 73
The Negative
Exponential Distribution....................................................................................................................... 74
The Triangular
Distribution......................................................................................................................................... 74
The Normal Distribution
or Normal Curve................................................................................................................. 74
PROPERTIES OF
PROBABILITY DISTRIBUTIONS................................................................................................. 75
1) Parameters............................................................................................................................................................... 75
2) The Area Underneath
the Curve Is Equal to One (1.0)........................................................................................... 75
3) The Area Under the
Curve Between any Two Scores is a PROBABILITY.......................................................... 75
THE NORMAL CURVE...................................................................................................................................................... 76
A FAMILY OF
DISTRIBUTIONS................................................................................................................................. 76
Similarity of Members
of the Family of Normal Curves............................................................................................. 77
AREA UNDER A CURVE.............................................................................................................................................. 77
DRAWING A MEMBER OF THE
FAMILY OF NORMAL CURVES...................................................................... 79
DIFFERENCES IN MEMBERS
OF THE FAMILY OF NORMAL CURVES............................................................ 79
FINDING AREA UNDER
NORMAL CURVES........................................................................................................... 82
FINDING SCORES FROM
AREA................................................................................................................................. 83
THE STANDARD NORMAL
CURVE.......................................................................................................................... 85
CONCLUSION................................................................................................................................................................. 87
THE SUMMATION SIGN................................................................................................................................................... 89
SUBSCRIPTED VARIABLES......................................................................................................................................... 89
THE SUMMATION SIGN............................................................................................................................................. 89
SUMMATION OF ALGEBRAIC
EXPRESSIONS....................................................................................................... 90
The General Rule......................................................................................................................................................... 91
Exceptions to the
General Rule.................................................................................................................................... 92
Solving Algebraic
Expressions with Summation Notation........................................................................................... 94
STATISTICS......................................................................................................................................................................... 96
MEASURES OF CENTRAL
TENDENCY..................................................................................................................... 96
The Mode.................................................................................................................................................................... 96
The Median.................................................................................................................................................................. 98
The Mean..................................................................................................................................................................... 99
Kiwi Bird Problem..................................................................................................................................................... 100
Skewed Distributions
and Measures of Central Tendency....................................................................................... 100
MEASURES OF VARIABILITY.................................................................................................................................. 103
The Range.................................................................................................................................................................. 103
The Variance and The
Standard Deviation................................................................................................................. 104
Calculating Statistics
with a Statistical Calculator..................................................................................................... 106
Calculating Statistics
using SPSS............................................................................................................................... 106
INTERPRETING A MEAN AND
STANDARD DEVIATION................................................................................. 108
SUMMARY................................................................................................................................................................... 112
SCORE TRANSFORMATIONS....................................................................................................................................... 113
Why Do We Need to
Transform Scores?........................................................................................................................ 113
PERCENTILE RANKS BASED
ON THE SAMPLE................................................................................................... 113
Computing Percentile
Ranks Based on the Sample with SPSS....................................................................................... 115
PERCENTILE RANKS BASED
ON THE NORMAL CURVE................................................................................... 118
Computing Percentile
Ranks Based on the Normal Curve with SPSS............................................................................ 118
Comparing the Two
Methods of Computing Percentile Ranks...................................................................................... 119
An Unfortunate Property............................................................................................................................................... 119
LINEAR TRANSFORMATIONS...................................................................................................................................... 121
THE ADDITIVE COMPONENT................................................................................................................................. 122
THE MULTIPLICATIVE
COMPONENT................................................................................................................... 122
LINEAR TRANSFORMATIONS
 EFFECT ON MEAN AND STANDARD DEVIATION.................................. 124
LINEAR TRANSFORMATIONS
 FINDING a AND b GIVEN AND s_{X'}...................................................... 125
STANDARD SCORES OR
ZSCORES........................................................................................................................ 127
FINDING LINEAR
TRANSFORMATIONS USING SPSS........................................................................................ 129
CONCLUSION AND SUMMARY.............................................................................................................................. 130
REGRESSION MODELS................................................................................................................................................. 131
EXAMPLE USES OF
REGRESSION MODELS.......................................................................................................... 131
Selecting Colleges....................................................................................................................................................... 131
Pregnancy................................................................................................................................................................... 132
Selection and Placement
During the World Wars....................................................................................................... 132
Manufacturing Widgets.............................................................................................................................................. 132
PROCEDURE FOR
CONSTRUCTION OF A REGRESSION MODEL.................................................................... 133
THE LEASTSQUARES
CRITERIA FOR GOODNESSOFFIT............................................................................... 134
THE REGRESSION MODEL........................................................................................................................................ 136
SOLVING FOR PARAMETER
VALUES WHICH SATISFY THE LEASTSQUARES CRITERION.................... 140
USING STATISTICAL
CALCULATORS TO SOLVE FOR REGRESSION PARAMETERS................................. 142
DEMONSTRATION OF
"OPTIMAL" PARAMETER ESTIMATES...................................................................... 143
SCATTER PLOTS AND THE
REGRESSION LINE................................................................................................... 144
THE STANDARD ERROR OF
ESTIMATE............................................................................................................... 146
CONDITIONAL
DISTRIBUTIONS............................................................................................................................. 148
INTERVAL ESTIMATES............................................................................................................................................. 149
REGRESSION ANALYSIS
USING SPSS..................................................................................................................... 151
CONCLUSION............................................................................................................................................................... 153
CORRELATION................................................................................................................................................................ 154
DEFINITION................................................................................................................................................................. 154
UNDERSTANDING AND
INTERPRETING THE CORRELATION COEFFICIENT............................................ 155
Scatterplots................................................................................................................................................................ 155
Slope of the Regression
Line of zscores................................................................................................................... 158
Variance Interpretation.............................................................................................................................................. 160
CALCULATION OF THE
CORRELATION COEFFICIENT.................................................................................... 162
The Correlation Matrix................................................................................................................................................... 162
CAUTIONS ABOUT
INTERPRETING CORRELATION COEFFICIENTS............................................................ 166
Appropriate Data Type............................................................................................................................................ 166
Effect of Outliers....................................................................................................................................................... 167
CORRELATION AND
CAUSATION.................................................................................................................... 168
SUMMARY AND CONCLUSION.............................................................................................................................. 169
HYPOTHESIS TESTING.................................................................................................................................................. 170
DEFINITION................................................................................................................................................................. 170
Rational Decisions..................................................................................................................................................... 170
Effects........................................................................................................................................................................ 170
GENERAL PRINCIPLES............................................................................................................................................... 171
The Model................................................................................................................................................................. 172
Probability................................................................................................................................................................. 172
THE SAMPLING DISTRIBUTION.................................................................................................................................. 174
DEFINITION................................................................................................................................................................. 174
THE SAMPLE DISTRIBUTION.................................................................................................................................. 174
PROBABILITY MODELS 
POPULATION DISTRIBUTIONS............................................................................... 175
THE SAMPLING
DISTRIBUTION............................................................................................................................. 176
THE SAMPLING
DISTRIBUTION OF THE MEAN................................................................................................ 177
MICROCOMPUTER
SIMULATION OF SAMPLING DISTRIBUTIONS.............................................................. 178
SUMMARY................................................................................................................................................................... 179
TESTING HYPOTHESES ABOUT SINGLE MEANS................................................................................................... 180
THE HEADSTART
EXPERIMENT........................................................................................................................... 180
POPULATION DISTRIBUTION
ASSUMING NO EFFECTS............................................................................. 181
SAMPLING DISTRIBUTION
ASSUMING NO EFFECTS AND N = 64........................................................... 181
RESULTS OF THE
EXPERIMENT........................................................................................................................ 181
HEADSTART EXPERIMENT
REDONE................................................................................................................... 181
POPULATION DISTRIBUTION
ASSUMING NO EFFECTS............................................................................. 182
SAMPLING DISTRIBUTION
ASSUMING NO EFFECTS AND N = 400......................................................... 182
RESULTS OF THE
EXPERIMENT........................................................................................................................ 182
IN SUMMARY.............................................................................................................................................................. 183
EXPERIMENTAL DESIGNS........................................................................................................................................... 184
CROSSED DESIGNS..................................................................................................................................................... 184
NESTED DESIGNS........................................................................................................................................................ 185
ANALYSIS OF CROSSED DESIGNS............................................................................................................................. 186
CROSSED DESIGNS..................................................................................................................................................... 186
EXAMPLE DESIGN...................................................................................................................................................... 186
RAW SCORES................................................................................................................................................................ 186
STEP ONE  FIND THE
DIFFERENCE SCORES....................................................................................................... 186
STEP TWO  FIND THE
MEAN AND STANDARD DEVIATION OF D_{i}.............................................................. 187
THE MODEL UNDER THE
NULL HYPOTHESIS.................................................................................................... 187
STEP THREE  ESTIMATE
THE STANDARD ERROR........................................................................................... 188
STEP FOUR  CALCULATE t_{OBS}................................................................................................................................. 188
STEP FIVE  FIND t_{CRIT}................................................................................................................................................ 189
USING SPSS TO COMPUTE A
CROSSED TTEST.................................................................................................. 189
CONCLUSION............................................................................................................................................................... 191
NESTED tTESTS............................................................................................................................................................. 193
ANALYSIS AND LOGIC
UNDERLYING NESTED DESIGNS................................................................................. 193
AN EXAMPLE WITHOUT
EXPLANATIONS.......................................................................................................... 198
CONCLUSION............................................................................................................................................................... 199
THE t DISTRIBUTION..................................................................................................................................................... 200
DEFINITION................................................................................................................................................................. 200
DEGREES OF FREEDOM............................................................................................................................................ 200
RELATIONSHIP TO THE
NORMAL CURVE........................................................................................................... 200
ONE AND TWOTAILED tTESTS.................................................................................................................................. 201
TWOTAILED tTESTS................................................................................................................................................ 201
ONETAILED tTESTS................................................................................................................................................. 201
Comparison of One and
Twotailed ttests.................................................................................................................... 202
ERRORS IN HYPOTHESIS TESTING............................................................................................................................ 203
A SECOND CHANCE................................................................................................................................................... 207
THE ANALYSIS
GENERALIZED TO ALL EXPERIMENTS................................................................................... 208
CONCLUSION............................................................................................................................................................... 209
ANOVA............................................................................................................................................................................... 210
Why Multiple
Comparisons Using ttests is NOT the Analysis of Choice.................................................................. 210
The Bottom Line 
Results and Interpretation of ANOVA............................................................................................ 211
HYPOTHESIS TESTING
THEORY UNDERLYING ANOVA.................................................................................. 213
The Sampling
Distribution Reviewed........................................................................................................................ 213
Two Ways of Estimating
the Population Parameter ............................................................................................... 214
THE WITHIN METHOD........................................................................................................................................ 215
THE BETWEEN METHOD.................................................................................................................................... 215
The Fratio and
Fdistribution................................................................................................................................... 216
Nonsignificant and
Significant Fratios..................................................................................................................... 219
Similarity of ANOVA and
ttest..................................................................................................................................... 221
Computing the ttest.................................................................................................................................................. 221
Computing the ANOVA............................................................................................................................................ 222
Comparing the results................................................................................................................................................ 222
EXAMPLE OF A NONSIGNIFICANT
ONEWAY ANOVA.................................................................................. 222
EXAMPLE OF A
SIGNIFICANT ONEWAY ANOVA............................................................................................. 223
CHISQUARE AND TESTS OF CONTINGENCY TABLES........................................................................................ 225
REVIEW OF CONTINGENCY
TABLES..................................................................................................................... 225
HYPOTHESIS TESTING WITH
CONTINGENCY TABLES.................................................................................... 227
COMPUTATION OF THE
CHISQUARED STATISTIC......................................................................................... 227
THE THEORETICAL
DISTRIBUTION OF CHISQUARED WHEN NO EFFECTS ARE PRESENT.................. 231
COMPARING THE OBSERVED
AND EXPECTED VALUE OF CHISQUARE................................................... 232
SUMMARY................................................................................................................................................................... 233
TESTING A SINGLE CORRELATION COEFFICIENT.............................................................................................. 234
THE HYPOTHESIS AND
NATURE OF THE EFFECT............................................................................................. 234
THE MODEL OF NO EFFECTS.................................................................................................................................. 235
THE SAMPLING
DISTRIBUTION OF A CORRELATION COEFFICIENT..................................................... 235
COMPARING THE OBTAINED
STATISTIC WITH THE MODEL....................................................................... 236
Bibliography..................................................................................................................................................................... 237
This course is designed for individuals who desire knowledge about some very important tools used by the Behavioral Sciences to understand the world. Some degree of mathematical sophistication is necessary to understand the material in this course. The prerequisite for Psy 200 is a first course in Algebra, preferably at the college level. Students have succeeded with less, but it requires a greater than average amount of time and effort on the part of the student. If there is any doubt about whether or not you should be in the course, it is recommended that the chapter on the Review of Algebra be attempted. If successful, then the material in the rest of the course will most likely also be successfully mastered.
It is not unusual to hear a student describe their past experience with a mathematics course with something like the following: "I had an algebra class 10 years ago, I hated it, I got a 'D' in it, I put this course off until the last possible semester in my college career...". With many students, statistics is initially greeted with mixed feelings of fear and anger. Fear because of the reputation of mathematics courses, anger because the course is required for a major or minor and the student has had no choice in its selection. It is my feeling that these emotions inhibit the learning of the material. It is my experience that before any real learning may take place the student must relax, have some success with the material, and accept the course. It is the responsibility of the instructor to deal with these negative emotions. If this is not done, the course might just as well be taught by a combination of books and computers.
Another difficulty is sometimes encountered by the instructor of a statistics course is the student who has done very well in almost all other courses, and has a desire to do very well in a statistics course. In some cases it is the first time that the material does not come easily to the student, with the student not understanding everything the instructor says. Panic sets in, tears flow, or perhaps the student is simply never seen in a statistics classroom again.
The student must be willing to accept the fact that a complete understanding of statistics may not be obtained in a single introductory course. Additional study, perhaps graduate work, is necessary to more fully grasp the area of statistics. This does not mean, however, the student has achieved nothing of value in the course. Statistics may be understood at many different levels; it is the purpose of this course to introduce the material in a manner such that the student achieves the first level.
A BRIEF HISTORY OF THE TEACHING OF STATISTICS
Emphasis has been placed in several different directions during the past two decades with respect to the teaching of statistics in the behavioral sciences. The first, during the 1950's and perhaps early 1960's saw a focus on computation. During this time, large electromechanical calculators were available in many statistics laboratories. These calculators usually had ten rows and ten columns of number keys, and an addition, subtraction, multiplication, and division key. If one was willing to pay enough money, one could get two accumulators on the top; one for sums and one for sum of squares. They weighed between 50 and 100 pounds, filled a desktop, made a lot of noise, and cost over $1000. Needless to say, not very many students carried one around in their backpack.
Because the focus was on computation, much effort was made by the writers of introductory textbooks on statistics to reduce the effort needed to perform statistical computations using these behemoths. This was the time period during which computational formulas were developed. These are formulas that simplify computation, but give little insight into the meaning of the statistic. This is in contrast to definitional formulas, that better describe the meaning of the statistic, but are often a nightmare when doing largescale computation. To make a long story short, students during this phase of the teaching of introductory statistics ended up knowing how to do the computations, but had little insight into what they were doing, why they did it, or when to use it.
The next phase was a reaction to the first. Rather than computation, the emphasis was on meaning. This was also the time period of the "new math" in grade and high schools, when a strong mathematical treatment of the material was attempted. Unfortunately, many of the students in the behavioral sciences were unprepared for such an approach and ended up knowing neither the theory nor the computational procedure.
Calculators available during this time were electronic, with some statistical functions available. They were still very expensive, over $1000, and would generally not fit in a briefcase. In most cases, the statistics texts still retained the old computational formulas.
The current trend is to attempt to make statistics as simple for the student as possible. An attitude of "I can make it easier, or more humorous, or flashier than anyone else has in the past" seems to exist among many introductory statistics textbooks. In some cases this has resulted in the student sitting down for dinner and being served a hot fudge sundae. The goal is admirable, and in some cases achieved, but the fact remains that the material, and the logic underlying it, is difficult for most students.
My philosophy is that the statistical calculator and statistical computer packages have eliminated the need for computational formulas; thus they have been eliminated from this text. Definitional formulas have been retained and the student is asked to compute the statistic once or twice "by hand." Following that, all computation is done using the statistical features on the calculator.
This is analogous to the square root function on a calculator. How many times do people ever use the complex algorithm they learned in high school to find a square root? Seldom or never. It is the argument of the present author that the square root procedure should be eliminated from the mathematics classroom. It gives little or no insight into what a square root is or how it should be used. Since it takes only a few minutes to teach a student how to find a square root on a calculator, it is much better to spend the remaining classroom time discussing the meaning and the usefulness of the square root.
In addition, an attempt has been made to tie together the various aspects of statistics into a theoretical whole by closely examining the scientific method and its relationship to statistics. In particular, this is achieved by introducing the concept of models early in the course, and by demonstrating throughout the course how the various topics are all aspects of the same approach to knowing about the world.
·
CALCULATOR 
Obtaining or purchasing a statistical calculator is required to complete the
assignments. A number of calculators that will perform the necessary functions
are available and many cost less than twenty dollars. The student is
responsible for learning how the calculator functions, so a manual is
absolutely necessary. The calculator must be able to do bivariate statistical
analysis. If the index of the manual has an entry for linear regression,
regression, or correlation, then the calculator will do the functions necessary
to complete the assignments. A few old calculators are available for students
to borrow during the lectures and laboratories; they must remain in the
classroom, however.
·
HOMEWORK 
Most semesters will include 18 homework assignments. The individualized
assignments will have two due dates: an unofficial one, where the student is
expected to have attempted the problems and brought to class any questions; and
an official one, where the assignments will be turned in. Assignments may be
turned in up to one week later, with penalty. The student may obtain assistance
in order to do the homework. Laboratory assistants are available during the
statistics laboratory hours, and any student who experiences difficulty during
the course is encouraged to take advantage of them. Small groups working
together have also proved valuable in doing homework. The homework assignments
may be generated by selecting the appropriate assignment on the Web Home Page,
finding and selecting your name in the name box, clicking on the
"Generate" button, and printing the resulting text. Note that
Internet Explorer 3.0 or later is necessary to generate the homework
assignments and the mouse must be clicked somewhere outside the
"Generate" button before the assignments will print. The lowest three
homework assignment grades will be automatically dropped in computing the
student's grade, resulting in a total of 15 included in the final total. Since
each is worth a maximum of eight points, the points available from the homework
assignments will total 120. The student is warned that missing more than three
assignments may be very detrimental to the final course grade.
·
TESTS 
There will be three tests, two during the semester and a final given during the
regularly scheduled finals period. Each test will take approximately one and
onehalf to two hours to complete and will consist of both a problem and a
multiplechoice section. The problem part of the test will be individualized
like the homework. Cheating is discouraged and, if discovered will result in
loss of all points for a given assignment or test. Examples of the problem part
of the tests, with the answer keys, may be generated in the same manner as the
homework assignments.
· PROJECT  Three homework assignments (out of the eighteen) will be assigned to an individual project. An individual project is required to earn an "A" in the course. Otherwise, they are optional and will not affect the student's grade in either a positive or negative direction, other than counting as three missing homework assignments. The project will be due the last day of regularly scheduled classes at 5:00 PM. A fifteenminute conference will be scheduled with the instructor to discuss a proposal for the project approximately three to six weeks before the project is due. Example projects can be requested from the instructor.
Grade assignment will be made on a point system based on total points at the end of the semester. Assignment will be made by taking 90%, 80%, 70%, and 60% of total points, based on the highest score in the class for A, B, C, D, and F, respectively. Some adjustment may be made at the discretion of the instructor. The different components of the course will be weighted with the following values:
· 25% Homework Assignments
· 25% First Test
· 25% Second Test
· 25% Final
In the past semesters a total of from 310 to 315 points have been necessary to earn an "A" in the course, 270 to 275 to earn a "B", etc. The total number of points varies slightly from semester to semester because different numbers of multiplechoice questions have been included on the tests.
SOME ADVISE  Classroom attendance is strongly encouraged. Roll will be taken during the first part of the class until the instructor learns the names of the students. Attendance does not directly affect your grade, although on the basis of past experience, it is the truly exceptional student who can afford to miss more than two or three classes. Getting behind is almost always fatal in terms of completion of the course and grading.
The correct way to study the material is to read the text before coming to class, listen carefully in class, following along with the problems, take notes in the margins and space provided in this text, reread carefully the text at home, follow the examples, and finally, do the assigned homework.
· To understand the relationship between statistics and the scientific method and how it applies to psychology and the behavioral sciences.
· To be able to read and understand the statistics presented in the professional literature.
· To be able to calculate and communicate statistical information to others.
· Introduction
· Review of Algebra
· Measurement
· Frequency Distributions
· The Normal Curve
· Statistics
· First Test
· Interpretation of Scores
· Regression
· Correlation
· Second Test
· Logic of Inferential Statistics
· The Sampling Distribution
· Some Hypothesis Tests
· The ttests
· Additional Topics
· Final
Throughout the book various icons or pictures will be placed in the left margin. They should be loosely interpreted as follows:
Important Definitional Formula  MEMORIZE
A Computational Procedure.
An Example of a Procedure
I wrote this book for a number of reasons, the most important one being my students. As I taught over a period of years, my approach to teaching introductory statistics began to deviate more and more from traditional textbooks. All too often students would come up to me and say that they seemed to understand the material in class, thought they took good notes, but when they got home the notes didn't seem to make much sense. Because the textbook I was using didn't seem to help much, I wrote this book. I took my lectures, added some documentation, and stirred everything with a word processor with this book as the result.
This book is dedicated to all the students I have had over the years. Some made me think about the material in ways that I had not previously done, questioning the very basis of what this course was all about. Others were a different challenge in terms of how I could explain what I knew and understood in a manner in which they could comprehend. All have had an impact in one way or another.
Three students had a more direct input into the book and deserve special mention. Eve Shellenberger, an exEnglish teacher, earned many quarters discovering various errors in earlier editions of the text. Craig Shealy took his editorial pencil to a very early draft and improved the quality greatly. Wendy Hoyt has corrected many errors in the Web Edition. To all I am especially grateful.
I wish to thank my former dean, Dr. Jim Layton, and my
department head, Dr. Fred Maxwell, both of whom found the necessary funds for
hardware and software acquisition to enable this project. Recently I have
received funds from the Southwest Missouri State University academic vice
president, Dr. Bruno Schmidt, to assist me in the transfer of this text from
paper to Web format.
Introductory
Statistics: Concepts, Models, and Applications
David W. Stockburger
Imagine, if you will, that you have just been elected mayor of a mediumsized city. You like your job; people recognize you on the street and show you the proper amount of respect. You are always being asked to out lunch and dinner, etc. You want to keep your job as long as possible.
In addition to electing you mayor, the electorate voted for a new income tax at the last election. In an unprecedented show of support for your administration, the amount of the tax was left unspecified, to be decided by you (this is a fantasy!). You know the people of the city fairly well, however, and they would throw you out of office in a minute if you taxed them too much. If you set the tax rate too low, the effects of this action might not be as immediate, as it takes some time for the city and fire departments to deteriorate, but just as certain.
You have a pretty good idea of the amount of money needed to run the city. You do not, however, have more than a foggy notion of the distribution of income in your city. The IRS, being the IRS, refuses to cooperate. You decide to conduct a survey to find the necessary information.
Since there are approximately 150,000 people in your city, you hire 150 students to conduct 1000 surveys each. It takes considerable time to hire and train the students to conduct the surveys. You decide to pay them $5.00 a survey, a considerable sum when the person being surveyed is a child with no income, but not much for the richest man in town who employs an army of CPAs. The bottom line is that it will cost approximately $750,000, or close to threequarters of a million dollars to conduct this survey.
After a considerable period of time has elapsed, (because it takes time to conduct that many surveys,) your secretary announces that the task is complete. Boxes and boxes of surveys are placed on your desk.
You begin your task of examining the figures. The first one is $33,967, the next is $13,048, the third is $309,339 etc. Now the capacity for human shortterm memory is approximately five to nine chunks (7 plus or minus 2, [Miller, 1963]). What this means is that by the time you are examining the tenth income, you have forgotten one of the previous incomes, unless you put the incomes in long term memory. Placing 150,000 numbers in long term memory is slightly overwhelming so you do not attempt that task. By the time you have reached the 100,000th number you have an attack of nervous exhaustion, are carried away by the men in white, and are never seen again.
ORGANIZING AND DESCRIBING THE DATA  AN ALTERNATIVE ENDING
In an alternative ending to the fantasy, you had at one time in your college career made it through the first half of an introductory statistics course. This part of the course covered the DESCRIPTIVE function of statistics. That is, procedures for organizing and describing sets of data.
Basically, there are two methods of describing data: pictures and numbers. Pictures of data are called frequency distributions and make the task of understanding sets of numbers cognitively palatable. Summary numbers may also be used to describe other numbers, and are called statistics. An understanding of what two or three of these summary numbers mean allows you to have a pretty good understanding of what the distribution of numbers looks like. In any case, it is easier to deal with two or three numbers than with 150,000.
After organizing and describing the data, you make a decision about the amount of tax to implement. Everything seems to be going well until an investigative reporter from the local newspaper prints a story about the threequarters of a million dollar cost of the survey. The irate citizens immediately start a recall petition. You resign the office in disgrace before you are thrown out.
SAMPLING  AN ALTERNATIVE APPROACH AND A HAPPY ENDING
If you had only completed the last part of the statistics course in which you were enrolled, you would have understood the basic principles of the INFERENTIAL function of statistics. Using inferential statistics, you can take a sample from the population, describe the numbers of the sample using descriptive statistics, and infer the population distribution. Granted, there is a risk of error involved, but if the risk can be minimized the savings in time, effort, and money is well worth the risk.
In the preceding fantasy, suppose that rather than surveying the entire population, you randomly selected 1000 people to survey. This procedure is called SAMPLING from the population and the individuals selected are called a SAMPLE. If each individual in the population is equally likely to be included in the sample, the sample is called a RANDOM SAMPLE.
Now, instead of 150 student surveyors, you only need to hire 10 surveyors, who each survey 100 citizens. The time taken to collect the data is a fraction of that taken to survey the entire population. Equally important, now the survey costs approximately $5000, an amount that the taxpayers are more likely to accept.
At the completion of the data collection, the descriptive function of statistics is used to describe the 1000 numbers, but an additional analysis must be carried out to generalize (infer) from the sample to the population.
Some reflection on your part suggests that it is possible that the sample contained 1000 of the richest individuals in your city. If this were the case, then the estimate of the amount of income to tax would be too high. Equally possible is the situation where 1000 of the poorest individuals were included in the survey (the bums on skid row), in which case the estimate would be too low. These possibilities exist through no fault of yours or the procedure utilized. They are said to be due to chance; a distorted sample just happened to be selected.
The beauty of inferential statistics is that the amount of probable error, or likelihood of either of the above possibilities, may be specified. In this case, the possibility of either of the above extreme situations actually occurring is so remote that they may be dismissed. , However, the chance that there will be some error in our estimation procedure is pretty good. Inferential statistics will allow you to specify the amount of error with statements like, "I am 95 percent sure that the estimate will be within $200 of the true value." You are willing to trade the risk of error and inexact information because the savings in time, effort, and money are so great.
At the conclusion of the fantasy a grateful citizenry makes you king (or queen). You receive a large salary increase and are elected to the position for life. You may continue this story any way that you like at this point ...
Introductory
Statistics: Concepts, Models, and Applications
David W. Stockburger
DOES CAFFEINE MAKE PEOPLE MORE ALERT?
Does the coffee I drink almost every morning really make me more alert. If all the students drank a cup of coffee before class, would the time spent sleeping in class decrease? These questions may be answered using experimental methodology and hypothesis testing procedures.
The last part of the text is concerned with HYPOTHESIS TESTING, or procedures to make rational decisions about the reality of effects. The purpose of hypothesis testing is perhaps best illustrated by an example.
To test the effect of caffeine on alertness in people, one experimental design would divide the classroom students into two groups; one group receiving coffee with caffeine, the other coffee without caffeine. The second group gets coffee without caffeine rather than nothing to drink because the effect of caffeine is the effect of interest, rather than the effect of ingesting liquids. The number of minutes that students sleep during that class would be recorded.
Suppose the group, which got coffee with caffeine, sleeps less on the average than the group which drank coffee without caffeine. On the basis of this evidence, the researcher argues that caffeine had the predicted effect.
A statistician, learning of the study, argues that such a conclusion is not warranted without performing a hypothesis test. The reasoning for this argument goes as follows: Suppose that caffeine really had no effect. Isn't it possible that the difference between the average alertness of the two groups was due to chance? That is, the individuals who belonged to the caffeine group had gotten a better night's sleep, were more interested in the class, etc., than the no caffeine group? If the class was divided in a different manner the differences would disappear.
The purpose of the hypothesis test is to make a rational decision between the hypotheses of real effects and chance explanations. The scientist is never able to totally eliminate the chance explanation, but may decide that the difference between the two groups is so large that it makes the chance explanation unlikely. If this is the case, the decision would be made that the effects are real. A hypothesis test specifies how large the differences must be in order to make a decision that the effects are real.
At the conclusion of the experiment, then, one of two decisions will be made depending upon the size of the differences between the caffeine and no caffeine groups. The decision will either be that caffeine has an effect, making people more alert, or that chance factors (the composition of the group) could explain the result. The purpose of the hypothesis test is to eliminate false scientific conclusions as much as possible.
Introductory
Statistics: Concepts, Models, and Applications
David W. Stockburger
The knowledge and understanding that the scientist has about the world is often represented in the form of models. The scientific method is basically one of creating, verifying, and modifying models of the world. The goal of the scientific method is to simplify and explain the complexity and confusion of the world. The applied scientist and technologist then use the models of science to predict and control the world.
This book is about a particular set of models, called statistics, which social and behavioral scientists have found extremely useful. In fact, most of what social scientists know about the world rests on the foundations of statistical models. It is important, therefore, that social science students understand both the reasoning behind the models, and their application in the world.
A model is a
representation containing the essential structure of some object or event in
the real world.
The representation may take two major forms:
1) Physical, as in a model airplane or architect's model of a building
or
2) Symbolic, as in a natural language, a computer program, or a set of mathematical equations.
In either form, certain characteristics are present by the nature of the definition of a model.
1. Models are necessarily incomplete.
Because it is a representation, no model includes every aspect of the real world. If it did, it would no longer be a model. In order to create a model, a scientist must first make some assumptions about the essential structure and relationships of objects and/or events in the real world. These assumptions are about what is necessary or important to explain the phenomena.
For example, a behavioral scientist might wish to model the time it takes a rat to run a maze. In creating the model the scientist might include such factors as how hungry the rat was, how often the rat had previously run the maze, and the activity level of the rat during the previous day. The modelbuilder would also have to decide how these factors interacted when constructing the model. The scientist does not assume that only factors included in the model affect the behavior. Other factors might be the timeofday, the experimenter who ran the rat, and the intelligence of the rat. The scientist might assume that these are not part of the "essential structure" of the time it takes a rat to run a maze. All the factors that are not included in the model will contribute to error in the predictions of the model.
2. The model may be changed or manipulated with relative ease.
To be useful it must be easier to manipulate the model than the real world. The scientist or technician changes the model and observes the result, rather than doing a similar operation in the real world. He or she does this because it is simpler, more convenient, and/or the results might be catastrophic.
A race car designer, for example, might build a small model of a new design and test the model in a wind tunnel. Depending upon the results, the designer can then modify the model and retest the design. This process is much easier than building a complete car for every new design. The usefulness of this technique, however, depends on whether the essential structure of the wind resistance of the design was captured by the wind tunnel model.
Changing symbolic models is generally much easier than changing physical models. All that is required is rewriting the model using different symbols. Determining the effects of such models is not always so easily accomplished. In fact, much of the discipline of mathematics is concerned with the effects of symbolic manipulation.
If the race car designer was able to capture the essential structure of the wind resistance of the design with a mathematical model or computer program, he or she would not have to build a physical model every time a new design was to be tested. All that would be required would be the substitution of different numbers or symbols into the mathematical model or computer program. As before, to be useful the model must capture the essential structure of the wind resistance.
The values, which may be changed in a model to create different models, are called parameters. In physical models, parameters are physical things. In the race car example, the designer might vary the length, degree of curvature, or weight distribution of the model. In symbolic models parameters are represented by symbols. For example, in mathematical models parameters are most often represented by variables. Changes in the numbers assigned to the variables change the model.
Of the two types of models, physical and symbolic, the latter is used much more often in science. Symbolic models are constructed using either a natural or formal language (Kain, 1972). Examples of natural languages include English, German, and Spanish. Examples of formal languages include mathematics, logic, and computer languages. Statistics as a model is constructed in a branch of the formal language of mathematics, algebra.
Natural and formal languages share a number of commonalties. First, they are both composed of a set of symbols, called the vocabulary of the language. English symbols take the form of words, such as those that appear on this page. Algebraic symbols include the following as a partial list: 1, 3.42, X, +, =, >.
The language consists of strings of symbols from the symbol set. Not all possible strings of symbols belong to the language. For instance, the following string of words is not recognized as a sentence, "Of is probability a model uncertainty," while the string of words "Probability is a model of uncertainty." is recognized almost immediately as being a sentence in the language. The set of rules to determine which strings of symbols form sentences and which do not is called the syntax of the language.
The syntax of natural languages is generally defined by common usage. That is, people who speak the language ordinarily agree on what is, and what is not, a sentence in the language. The rules of syntax are most often stated informally and imperfectly, for example, "noun phrase, verb, noun phrase."
The syntax of a formal language, on the other hand, may be stated with formal rules. Thus it is possible to determine whether or not a string of symbols forms a sentence in the language without resorting to users of the language. For example, the string "x + / y =" does not form a sentence in the language of algebra. It violates two rules in algebra: sentences cannot end in "=" and the symbols "+" and "/" cannot follow one another. The rules of syntax of algebra may be stated much more succinctly as will be seen in the next chapter.
Both natural and formal languages are characterized by the ability to transform a sentence in the language into a different sentence without changing the meaning of the string. For example, the active voice sentence "The dog chased the cat," may be transformed to the sentence "The cat was chased by the dog," by using the passive voice transformation. This transformation does not change the meaning of the sentence. In an analogous manner, the sentence "ax + ay" in algebra may be transformed to the sentence "a(x+y)" without changing the meaning of the sentence. Much of what has been taught as algebra consists of learning appropriate transformations, and the order in which to apply them.
The transformation process exists entirely within the realm of the language. The word proof will be reserved for this process. That is, it will be possible to prove that one algebraic sentence equals another. It will not be possible, however, to prove that a model is correct, because a model is never complete.
The scientific method is a procedure for the construction and verification of models. After a problem is formulated, the process consists of four stages.
1. Simplification/Idealization.
As mentioned previously, a model contains the essential structure of objects or events. The first stage identifies the relevant features of the real world.
2. Representation/Measurement.
The symbols in a formal language are given meaning as objects, events, or relationships in the real world. This is the process used in translating "word problems" to algebraic expressions in high school algebra. This process is called representation of the world. In statistics, the symbols of algebra (numbers) are given meaning in a process called measurement.
3. Manipulation/Transformation.
Sentences in the language are transformed into other statements in the language. In this manner implications of the model are derived.
4. Verification.
Selected implications derived in the previous stage are compared with experiments or observations in the real world. Because of the idealization and simplification of the modelbuilding process, no model can ever be in perfect agreement with the real world. In all cases, the important question is not whether the model is true, but whether the model was adequate for the purpose at hand. Modelbuilding in science is a continuing process. New and more powerful models replace less powerful models, with "truth" being a closer approximation to the real world.
These four stages and their relationship to one another are illustrated below.
ADEQUACY AND GOODNESS OF MODELS
In general, the greater the number of simplifying assumptions made about the essential structure of the real world, the simpler the model. The goal of the scientist is to create simple models that have a great deal of explanatory power. Such models are called parsimonious models. In most cases, however, simple yet powerful models are not available to the social scientist. A tradeoff occurs between the power of the model and the number of simplifying assumptions made about the world. A social or behavioral scientist must decide at what point the gain in the explanatory power of the model no longer warrants the additional complexity of the model.
The power of the mathematical model is derived from a number of sources. First, the language has been used extensively in the past and many models exist as examples. Some very general models exist which may describe a large number of real world situations. In statistics, for example, the normal curve and the general linear model often serve the social scientist in many different situations. Second, many transformations are available in the language of mathematics.
Third, mathematics permit thoughts which are not easily expressed in other languages. For example, "What if I could travel approaching the speed of light?" or "What if I could flip this coin an infinite number of times?" In statistics these "what if" questions often take the form of questions like "What would happen if I took an infinite number of infinitely precise measurements?" or "What would happen if I repeated this experiment an infinite number of times?"
Finally, it is often possible to maximize or minimize the form of the model. Given that the essence of the real world has been captured by the model, what values of the parameters optimize (minimize or maximize) the model. For example, if the design of a race car can be accurately modeled using mathematics, what changes in design will result in the least possible wind resistance? Mathematical procedures are available which make these kinds of transformations possible.
Building a Better Boat  Example of ModelBuilding
Suppose for a minute that you had lots of money, time, and sailing experience. Your goal in life is to build and race a 12meter yacht that would win the America's Cup competition. How would you go about doing it?
Twelvemeter racing yachts do not have to be identical to compete in the America's Cup race. There are certain restrictions on the length, weight, sail area, and other areas of boat design. Within these restrictions, there are variations that will change the handling and speed of the yacht through the water. The following two figures (Lethcer, Marshall, Oliver, and Salvesen, 1987) illustrate different approaches to keel design. The designer has the option of whether to install a wing on the keel. If a wing is chosen, the decision of where it will be placed must be made.
You could hire a designer, have him or her draw up the plans, build the yacht, train a crew to sail it, and then compete in yachting races. All this would be fine, except it is a very timeconsuming and expensive process. What happens if you don't have a very good boat? Do you start the whole process over again?
The scientific method suggests a different approach. If a physical model was constructed, and a string connected to weights was connected to the model through a pulley system, the time to drag the model from point A to point B could be measured. The hull shape could be changed using a knife and various weights. In this manner, many more different shapes could be attempted than if a whole new yacht had to be built to test every shape.
One of the problems with this physical model approach is that the designer never knows when to stop. That is, the designer never knows that if a slightly different shape was used, it might be faster than any of the shapes attempted up to that point. In any case the designer has to stop testing models and build the boat at some point in time.
Suppose the fastest hull shape was selected and the fullscale yacht was built. Suppose also that it didn't win. Does that make the modelbuilding method wrong? Not necessarily. Perhaps the model did not represent enough of the essential structure of the real world to be useful. In examining the real world, it is noticed that racing yachts do not sail standing straight up in the water, but at some angle, depending upon the strength of the wind. In addition, the ocean has waves which necessarily change the dynamics of the movement of a hull through water.
If the physical model was pulled through a pool of wavy water at an angle, then the simulation would more closely mirror the real world. Assume that this is done, and the fullscale yacht built. It still doesn't win. What next?
One possible solution is the use of symbolic or mathematical models in the design of the hull and keel. Lethcer, et. al. (1987) describe how various mathematical models were employed in the design of Stars and Stripes. The mathematical model uses parameters which allow one to change the shape of the simulated hull and keel by setting the values of the parameters to particular numbers. That is, a mathematical model of a hull and keel shape does not describe a particular shape, but a large number of possible shapes. When the parameters of the mathematical model of the hull shape are set to particular numbers, one of the possible hull shapes is specified. By sailing the simulated hull shape through simulated water, and measuring the simulated time it takes, the potential speed of a hull shape may be evaluated.
The advantage of creating a symbolic model over a physical model is that many more shapes may be assessed. By turning a computer on and letting it run all night, hundreds of shapes may be tested. It is sometimes possible to use mathematical techniques to find an optimal model, one that guarantees that within the modeling framework, no other hull shape will be faster. However, if the model does not include the possibility of a winged keel, it will never be discovered.
Suppose that these techniques are employed, and the yacht is built, but it still does not win. It may be that not enough of the real world was represented in the symbolic model. Perhaps the simulated hull must travel at an angle to the water and sail through waves. Capturing these conditions makes the model more complex, but are necessary if the model is going to be useful.
In building Stars and Stripes, all the above modeling techniques were employed (Lethcer et. al., 1987). After initial computer simulation, a onethird scale model was constructed to work out the details of the design. The result of the modelbuilding design process is history.
In conclusion, the scientific method of modelbuilding is a very powerful tool in knowing and dealing with the world. The main advantage of the process is that model may be manipulated where it is often difficult or impossible to manipulate the real world. Because manipulation is the key to the process, symbolic models have advantages over physical models.
Introductory
Statistics: Concepts, Models, and Applications
David W. Stockburger
This section is intended as a review of the algebra necessary to understand the rest of this book, allowing the student to gauge his or her mathematical sophistication relative to what is needed for the course. The individual without adequate mathematical training will need to spend more time with this chapter. The review of algebra is presented in a slightly different manner than has probably been experienced by most students, and may prove valuable even to the mathematically sophisticated reader.
Algebra is a formal symbolic language, composed of strings of symbols. Some strings of symbols form sentences within the language (X + Y = Z), while others do not (X += Y Z). The set of rules that determines which strings belong to the language and which do not, is called the syntax of the language. Transformational rules change a given sentence in the language into another sentence without changing the meaning of the sentence. This chapter will first examine the symbol set of algebra, which is followed by a discussion of syntax and transformational rules.
The symbol set of algebra includes numbers, variables, operators, and delimiters. In combination they define all possible sentences which may be created in the language.
 Numbers are analogous to proper nouns in English, such as names of dogs  Spot, Buttons, Puppy, Boots, etc. Some examples of numbers are:
1, 2, 3, 4.89, 0.8965, 10090897.294, 0, P , e.
Numbers may be either positive (+) or negative (). If no sign is included the number is positive. The two numbers at the end of the example list are called universal constants. The values for these constants are P = 3.1416... and e = 2.718....
 Variables are symbols that stand for any number. They are the common nouns within the language of algebra  dog, cat, student, etc. Letters in the English alphabet most often represent variables, although Greek letters are sometimes used. Some example variables are:
X, Y, Z, W, a, b, c, k, m ,, r.
 Other symbols, called operators, signify relationships between numbers and/or variables. Operators serve the same function as verbs in the English language. Some example operators in the language of algebra are:
+, , /, *, =, >, <, ³ .
Note that the "*" symbol is used for multiplication instead of the "x" or "· " symbol. This is common to many computer languages. The symbol "³ " is read as "greater than or equal" and "£" is read as "less than or equal."
 Delimiters are the punctuation marks in algebra. They let the reader know where one phrase or sentence ends and another begins. Example delimiters used in algebra are:
( ), [ ], { }.
In this course, only the "( )" symbols are used as delimiters, with the algebraic expressions being read from the innermost parenthesis out.
Many statements in algebra can be constructed using only the symbols mentioned thus far, although other symbols exist. Some of these symbols will be discussed either later in this chapter and later in the book.
SYNTAX OF THE LANGUAGE OF ALGEBRA
Sentences in algebra can be constructed using a few simple rules. The rules can be stated as replacement statements and are as follows:
Delimiters (parentheses) surround each phrase in order to keep the structure of the sentence straight. Sentences are constructed by creating a lowerorder phrase and sequentially adding greater complexity, moving to higherorder levels. For example, the construction of a complex sentence is illustrated below:
X + 3
7 * (X + 3)
(7 * (X + 3)) / (X * Y)
(P + Q)  ((7 * (X + 3)) / (X * Y))
((P + Q)  ((7 * (X + 3)) / (X * Y)))  5.45
Statements such as this are rarely seen in algebra texts because rules exist to eliminate some of the parentheses and symbols in order to make reading the sentences easier. In some cases these are rules of precedence where some operations (* and /) take precedence over others (+ and ).
The following rules permit sentences written in the full form of algebra to be rewritten to make reading easier. Note that they are not always permitted when writing statements in computer languages such as PASCAL or BASIC.
1. The "*" symbol can be eliminated, along with the parentheses surrounding the phrase if the phrase does not include two numbers as subphrases. For example, (X * (Y  Z)) may be rewritten as X (Y  Z). However, 7*9 may not be rewritten as 79.
2. Any phrase connected with "+" or "" may be rewritten without parentheses if the inside phrase is also connected with "+" or "". For example, ((X + Y)  3) + Z may be rewritten as (X + Y)  3 + Z. Continued application of this rule would result in the sentence X+Y3+Z.
Sequential application of these rules may result in what appears to be a simpler sentence. For example, the sentence created in the earlier example may be rewritten as:
((P + Q)  ((7 * (X + 3)) / (X * Y)))  5.45
Rule 1
((P + Q)  7(X + 3) / XY)  5.45
Rule 2
P + Q  7(X +3) / XY  5.45
Often these transformation are taken for granted and already applied to algebraic sentences before they appear in algebra texts.
Transformations are rules for rewriting sentences in the language of algebra without changing their meaning, or truth value. Much of what is taught in an algebra course consists of transformations.
When a phrase contains only numbers and an operator (i.e. 8*2), that phrase may be replaced by a single number (i.e. 16). These are the same rules that have been drilled into grade school students, at least up until the time of new math. The rule is to perform the operation and delete the parentheses. For example:
(8 + 2) = 10
(16 / 5) = 3.2
(3.875  2.624) = 1.251
The rules for dealing with negative numbers are sometimes imperfectly learned by students, and will now be reviewed.
1. An even number of negative signs results in a positive number; an odd number of negative signs results in a negative number. For example:
(8) = 1 * 8 = 8 or +8
((2)) = 1 * 1 * 2 = 2
8 * 9 * 2 = 144
96 / 32 = 3
2. Adding a negative number is the same as subtracting a positive number.
8 + (2) = 8  2 = 6
10  (7) = 10 + 7 = 3
A second area that sometimes proves troublesome to students is that of fractions. Fractions are an algebraic phrase involving two numbers connected by the operator "/"; for example, 7/8. The top number or phrase is called the numerator, and the bottom number or phrase the denominator. One method of dealing with fractions that has gained considerable popularity since inexpensive calculators have become available is to do the division operation and then deal with decimal numbers. In what follows, two methods of dealing with fractions will be illustrated. The student should select the method that is easiest for him or her.
Multiplication of fractions is relatively straightforward: multiply the numerators for the new numerator and the denominators for the new denominator. For example:
7/8 * 3/11 = (7*3)/(8*11) = 21/88 = 0.2386
Using decimals the result would be:
7/8 * 3/11 = .875 * .2727 = 0.2386
Division is similar to multiplication except the rule is to invert and multiply. An example is:
(5/6) / (4/9) = (5/6) * (9/4) = (5*9)/(6*4) = 45/24 = 1.8750
or in decimal form:
.83333 / .44444 = 1.8751
Addition and subtraction with fractions first requires finding the least common denominator, adding (or subtracting) the numerators, and then placing the result over the least common denominator. For example:
(3/4) + (5/9) = (1*(3/4)) + (1*(5/9)) = ((9/9)*(3/4)) + ((4/4)*(5/9)) = 27/36 + 20/36 = 47/36 = 1.3056
Decimal form is simpler, in my opinion, with the preceding being:
3/4 + 5/9 = .75 + .5556 = 1.3056
Fractions have a special rewriting rule that sometimes allows an expression to be transformed to a simpler expression. If a similar phrase appears in both the numerator and the denominator of the fraction and these similar phrases are connected at the highest level by multiplication, then the similar phrases may be canceled. The rule is actually easier to demonstrate than to state:
CORRECT
8X / 9X = 8/9
((X+3)*(XAY+Z))/((X+3)*(ZX)) = (XAY+Z) / (ZX)
The following is an incorrect application of the above rule:
(X + Y) / X = Y
A number of rewriting rules exist within algebra to simplify a with a shorthand notation. Exponential notation is an example of a shorthand notational scheme. If a series of similar algebraic phrases are multiplied times one another, the expression may be rewritten with the phrase raised to a power. The power is the number of times the phrase is multiplied by itself and is written as a superscript of the phrase. For example:
8 * 8 * 8 * 8 * 8 * 8 = 8^{6}
(X  4Y) * (X  4Y) * (X  4Y) * (X  4Y) = (X  4Y)^{4}
Some special rules apply to exponents. A negative exponent may be transformed to a positive exponent if the base is changed to one divided by the base. A numerical example follows:
5^{3} = (1/5)^{3} = 0.008
A root of a number may be expressed as a base (the number) raised to the inverse of the root. For example:
Ö <sqrt>16 </sqrt>= SQRT( 16 ) = 16^{1/2}
^{<5576 = 5761/5}
When two phrases that have the same base are multiplied, the product is equal to the base raised to the sum of their exponents. The following examples illustrate this principle.
18^{2} * 18^{5} = 18^{5+2} = 18^{7}
(X+3)^{8}*(X+3)^{7} = (X+3)^{87} = (X+3)^{1} = X+3
It is possible to raise a decimal number to a decimal power, that is "funny" numbers may be raised to "funny" powers. For example:
3.44565^{1.234678} = 4.60635
245.967^{.7843} = 75.017867
A special form of exponential notation, called binomial expansion occurs when a phrase connected with addition or subtraction operators is raised to the second power. Binomial expansion is illustrated below:
(X + Y)^{2}
= (X + Y) * (X + Y)
= X ^{2}+ XY + XY + Y ^{2}
= X ^{2}+ 2XY + Y^{2}
(X  Y)^{2}
= (X  Y) * (X  Y)
= X^{2}  XY  XY + Y^{2}
= X ^{2} 2XY + Y^{2}
A more complex example of the preceding occurs when the phrase being squared has more than two terms.
(Y  a  bX)^{2} = Y^{2} + a^{2} + (bX)^{2}  2aY  2bXY + 2abX
When two expressions are connected with the multiplication operator, it is often possible to "multiply through" and change the expression. In its simplest form, if a number or variable is multiplied by a phrase connected at the highest level with the addition or subtraction operator, the phrase may be rewritten as the variable or number times each term in the phrase. Again, it is easier to illustrate than to describe the rule:
a * (x + y + z) = ax + ay + az
If the number or variable is negative, the sign must be carried through all the resulting terms as seen in the following example:
y * (p + q  z)
=  yp  yq  yz
=  yp  yq + yz
Another example of the application of this rule occurs when the number 1 is not written in the expression, but rather inferred:
 (a + b  c)
= ( 1) * (a + b  c)
=  a  b  c
=  a  b + c.
When two additive phrases are connected by multiplication, a more complex form of this rewriting rule may be applied. In this case one phrase is multiplied by each term of the other phrase:
(c  d) * (x + y)
= c * (x + y)  d * (x + y)
= cx + cy  dx  dy
Note that the binomial expansion discussed earlier was an application of this rewriting rule.
A corollary to the previously discussed rewriting rule for multiplication of phrases is factoring, or combining like terms. The rule may be stated as follows: If each term in a phrase connected at the highest level with the addition or subtraction operator contains a similar term, the similar term(s) may be factored out and multiplied times the remaining terms. It is the opposite of "multiplying through." Two examples follow:
ax + az  axy = a * (x + z  xy)
(a+z) * (pq)  (a+z) * (x+y2z) = (a+z) * (pqxy+2z)
SEQUENTIAL APPLICATION OF REWRITING RULES TO SIMPLIFY AN EXPRESSION
Much of what is learned as algebra in high school and college consists of learning when to apply what rewriting rule to a sentence to simplify that sentence. Application of rewriting rules change the form of the sentence, but not its meaning or truth value. Sometimes a sentence in algebra must be expanded before it may be simplified. Knowing when to apply a rewriting rule is often a matter of experience. As an exercise, simplify the following expression:
( (X + Y)^{2} + (X  Y) ^{2}) / ( X^{2} + Y^{2} )
EVALUATING ALGEBRAIC SENTENCES
A sentence in algebra is evaluated when the variables in the sentence are given specific values, or numbers. Two sentences are said to have similar truth values if they will always evaluate to equal values when evaluated for all possible numbers. For example, the sentence in the immediately preceding example may be evaluated where X = 4 and Y = 6 to yield the following result:
( (X + Y)^{2} + (X  Y)^{2} ) / ( X^{2} + Y^{2} )
( (4 + 6)^{2} + (4  6)^{2} ) / ( 4^{2} + 6^{2} )
( 10^{2} + 2^{2} ) / ( 16 + 36 )
( 100 + 4 ) / ( 52 )
104 / 52
2
The result should not surprise the student who had correctly solved the preceding simplification. The result must ALWAYS be equal to 2 as long as both X and Y are not zero. Note that the sentences are evaluated from the "innermost parenthesis out", meaning that the evaluation of the sentence takes place in stages: phrases that are nested within the innermost or inside parentheses are evaluated before phrases that contain other phrases.
The preceding review of algebra is not meant to take the place of an algebra course; rather, it is presented to "knock some of the cobwebs loose," and to let the reader know the level of mathematical sophistication necessary to tackle the rest of the book. It has been the author's experience that familiarity with simplifying algebraic expression is a useful but not absolutely necessary skill to continue. At some points all students will be asked to "believe", those with lesser mathematical background will be asked to "believe" more often than those with greater. It is the author's belief, however, that students can learn about statistics even though every argument is not completely understood.
Introductory
Statistics: Concepts, Models, and Applications
David W. Stockburger
Measurement consists of rules for assigning numbers to attributes of objects based upon rules.
The language of algebra has no meaning in and of itself. The theoretical mathematician deals entirely within the realm of the formal language and is concerned with the structure and relationships within the language. The applied mathematician or statistician, on the other hand, is concerned not only with the language, but the relationship of the symbols in the language to real world objects and events. The concern about the meaning of mathematical symbols (numbers) is a concern about measurement.
By definition any set of rules for assigning numbers to attributes of objects is measurement. Not all measurement techniques are equally useful in dealing with the world, however, and it is the function of the scientist to select those that are more useful. The physical and biological scientists generally have wellestablished, standardized, systems of measurement. A scientist knows, for example, what is meant when a "ghundefelder fish" is described as 10.23 centimeters long and weighing 34.23 grams. The social scientist does not, as a general rule, have such established and recognized systems. A description of an individual as having 23 "units" of need for achievement does not evoke a great deal of recognition from most scientists. For this reason the social scientist, more than the physical or biological scientist, has been concerned about the nature and meaning of measurement systems.
PROPERTIES OF MEASUREMENT SYSTEMS
S. S. Stevens (1951) described properties of measurement systems that allowed decisions about the quality or goodness of a measurement technique. A property of a measurement system deals with the extent that the relationships which exists between the attributes of objects in the "real world" are preserved in the numbers which are assigned these objects. For an example of relationships existing in the "real world", if the attribute in question is height, then objects (people) in the "real world" have more or less of the attribute (height) than other objects (people). In a similar manner, numbers have relationships to other numbers. For example 59 is less than 62, 48 equals 48, and 73 is greater than 68. One property of a measurement system that measures height, then, is whether the relationships between heights in the "real world" are preserved in the numbers which are assigned to heights; that is, whether taller individuals are given bigger numbers.
Before describing in detail the properties of measurement systems, a means of symbolizing the preceding situation will be presented. The student need not comprehend the following formalism to understand the issues involved in measurement, but mathematical formalism has a certain "beauty" which some students appreciate.
Objects in the real world may be represented by O_{i} where "O" is a shorthand notation for "object" and "I" is a subscript referring to which object is being described and may take on any integer number. For example O_{1} is the first object, O_{2} the second, O_{3} the third and so on. The symbol M(O_{i}) will be used to symbolize the number, or measure (M), of any particular object which is assigned to that object by the system of rules; M(O_{1}) being the number assigned to the first object, M(O_{2}) the second, and so on. The expression O_{1} > O_{2} means that the first object has more of something in the "real world" than does the second. The expression M(O_{1}) > M(O_{2}) means that the number assigned to the first object is greater than that assigned to the second.
In mathematical terms measurement is a functional mapping from the set of objects {O_{i}} to the set of real numbers {M(O_{i})}.
The goal of measurement systems is to structure the rule for assigning numbers to objects in such a way that the relationship between the objects is preserved in the numbers assigned to the objects. The different kinds of relationships preserved are called properties of the measurement system.
The property of magnitude exists when an object that has more of the attribute than another object, is given a bigger number by the rule system. This relationship must hold for all objects in the "real world". Mathematically speaking, this property may be described as follows:
The property of MAGNITUDE exists
when for all i, j if O_{i} > O_{j}, then M(O_{i}) > M(O_{j}).
The property of intervals is concerned with the relationship of differences between objects. If a measurement system possesses the property of intervals it means that the unit of measurement means the same thing throughout the scale of numbers. That is, an inch is an inch is an inch, no matter were it falls  immediately ahead or a mile down the road.
More precisely, an equal difference between two numbers reflects an equal difference in the "real world" between the objects that were assigned the numbers. In order to define the property of intervals in the mathematical notation, four objects are required: O_{i}, O_{j}, O_{k}, and O_{l} . The difference between objects is represented by the "" sign; O_{i}  O_{j} refers to the actual "real world" difference between object i and object j, while M(O_{i})  M(O_{j}) refers to differences between numbers.
The property of INTERVALS exists, for all i, j, k, l
if O_{i}O_{j} ³ O_{k} O_{l} then M(O_{i})M(O_{j}) ³ M(O_{k})M( O_{l} ).
A corollary to the preceding definition is that if the number assigned to two pairs of objects are equally different, then the pairs of objects must be equally different in the real world. Mathematically it may be stated
If the property of INTERVALS exists if for all I, j, k, l
if M(O_{i})M(O_{j}) = M(O_{k})M( O_{l} ) then O_{i}O_{j} = O_{k} O_{l} .
This provides the means to test whether a measurement system possesses the interval property, for if two pairs of objects are assigned numbers equally distant on the number scale, then it must be assumed that the objects are equally different in the real world. For example, in order for the first test in a statistics class to possess the interval property, it must be true that two students making scores of 23 and 28 respectively must reflect the same change in knowledge of statistics as two students making scores of 30 and 35.
The property of intervals is critical in terms of the ability to meaningfully use the mathematical operations "+" and "". To the extent to which the property of intervals is not satisfied, any statistic that is produced by adding or subtracting numbers will be in error.
A measurement system possesses a rational zero if an object that has none of the attribute in question is assigned the number zero by the system of rules. The object does not need to really exist in the "real world", as it is somewhat difficult to visualize a "man with no height". The requirement for a rational zero is this: if objects with none of the attribute did exist would they be given the value zero. Defining O_{0} as the object with none of the attribute in question, the definition of a rational zero becomes:
The property of RATIONAL ZERO exists if M(O_{0}) = 0.
The property of rational zero is necessary for ratios between numbers to be meaningful. Only in a measurement system with a rational zero would it make sense to argue that a person with a score of 30 has twice as much of the attribute as a person with a score of 15. In many application of statistics this property is not necessary to make meaningful inferences.
In the same article in which he proposed the properties of measurement systems, S. S. Stevens (1951) proposed four scale types. These scale types were Nominal, Ordinal, Interval, and Ratio, and each possessed different properties of measurement systems.
Nominal scales are measurement systems that possess none of the three properties discussed earlier. Nominal scales may be further subdivided into two groups: Renaming and Categorical.
NominalRenaming occurs when each object in the set is assigned a different number, that is, renamed with a number. Examples of nominalrenaming are social security numbers or numbers on the back of a baseball player. The former is necessary because different individuals have the same name, i.e. Mary Smith, and because computers have an easier time dealing with numbers rather than alphanumeric characters.
Nominalcategorical occurs when objects are grouped into subgroups and each object within a subgroup is given the same number. The subgroups must be mutually exclusive, that is, an object may not belong to more than one category or subgroup. An example of nominalcategorical measurement is grouping people into categories based upon stated political party preference (Republican, Democrat, or Other,) or upon sex (Male or Female.) In the political party preference system Republicans might be assigned the number "1", Democrats "2", and Others "3", while in the latter females might be assigned the number "1" and males "2".
In general it is meaningless to find means, standard deviations, correlation coefficients, etc., when the data is nominalcategorical. If the mean for a sample based on the above system of political party preferences was 1.89, one would not know whether most respondents were Democrats or whether Republicans and Others were about equally split. This does not mean, however, that such systems of measurement are useless, for in combination with other measures they can provide a great deal of information.
An exception to the rule of not finding statistics based on nominalcategorical scales types is when the data is dichotomous, or has two levels, such as Females = 1 and Males = 2. In this case it is appropriate to both compute and interpret statistics that assume the interval property is met, because the single interval involved satisfies the requirement of the interval property.
Ordinal Scales are measurement systems that possess the property of magnitude, but not the property of intervals. The property of rational zero is not important if the property of intervals is not satisfied. Any time ordering, ranking, or rank ordering is involved, the possibility of an ordinal scale should be examined. As with a nominal scale, computation of most of the statistics described in the rest of the book is not appropriate when the scale type is ordinal. Rank ordering people in a classroom according to height and assigning the shortest person the number "1", the next shortest person the number "2", etc. is an example of an ordinal scale.
Interval scales are measurement systems that possess the properties of magnitude and intervals, but not the property of rational zero. It is appropriate to compute the statistics described in the rest of the book when the scale type is interval.
Ratio scales are measurement systems that possess all three properties: magnitude, intervals, and rational zero. The added power of a rational zero allows ratios of numbers to be meaningfully interpreted; i.e. the ratio of John's height to Mary's height is 1.32, whereas this is not possible with interval scales.
It is at this point that most introductory statistics textbooks end the discussion of measurement systems, and in most cases never discuss the topic again. Taking an opposite tack, some books organize the entire text around what is believed to be the appropriate statistical analysis for a particular scale type. The organization of measurement systems into a rigorous scale type classification leads to some considerable difficulties. The remaining portion of this chapter will be used to point out those difficulties and a possible reconceptualization of measurement systems.
EXERCISES IN CLASSIFYING SCALE TYPES
The following present a list of different attributes and rules for assigning numbers to objects. Try to classify the different measurement systems into one of the four types of scales before reading any further.
1. Your checking account number as a name for your account.
2. Your checking account balance as a measure of the amount of money you have in that account.
3. Your checking account balance as a measure of your wealth.
4. The number you get from a machine (32, 33, ...) as a measure of the time you arrived in line.
5. The order in which you were eliminated in a spelling bee as a measure of your spelling ability.
6. Your score on the first statistics test as a measure of your knowledge of statistics.
7. Your score on an individual intelligence test as a measure of your intelligence.
8. The distance around your forehead measured with a tape measure as a measure of your intelligence.
9. A response to the statement "Abortion is a woman's right" where "Strongly Disagree" = 1, "Disagree" = 2, "No Opinion" = 3, "Agree" = 4, and "Strongly Agree" = 5, as a measure of attitude toward abortion.
If you encountered difficulty in answering some of the above descriptions it does not mean that you lack understanding of the scale types. The problem resides in the method used to describe measurement systems; it simply does not work in many applied systems. Michell (1986) presents a recent discussion of the controversy still present in Psychology involving scale types and statistics.
Usually the difficulty begins in situation #3. Is it not possible that John has less money in the bank than Mary, but John has more wealth? Perhaps John has a pot of gold buried in his back yard, or perhaps he just bought a new car. Therefore the measurement system must be nominalrenaming. But if Mary has $1,000,000 in her account and John has $10, isn't it more likely that Mary has more wealth? Doesn't knowing a person's bank account balance tell you something about their wealth? It just doesn't fit within the system.
Similar types of arguments may be presented with respect to the testing situations. Is it not possible that someone might score higher on a test yet know less or be less intelligent? Of course, maybe they didn't get enough sleep the night before or simply studied the wrong thing. On the other hand, maybe they were lucky; whenever they guessed they got the item correct. Should test scores not be used because they do not meet the requirements of an interval scale?
What about measuring intelligence with a tape measure? Many psychologists would argue that it is interval or ratio measurement, because it involves distance measured with a standard instrument. Not so. If your child were going to be placed in a special education classroom, would you prefer that the decision be made based on the results of a standardized intelligence test, or the distance around his or her forehead? The latter measure is nonsense, or almost entirely error.
Suppose a ruler was constructed in a nonindustrialized country in the following manner: peasants were given a stick of wood and sat painting interval markings and numbers on the wood. Would anything measured with this ruler be an interval scale? No, because no matter how good the peasant was at this task, the intervals would simply not be equal.
Suppose one purchases a ruler at a local department store. This ruler has been massproduced at a factory in the United States. Would anything measured with this ruler be measured on an interval scale? No again, although it may be argued that it would certainly be closer than the painted ruler.
Finally, suppose one purchases a very precise Swiss caliper, measuring to the nearest 1/10,000 of an inch. Would it be possible to measure anything on precisely an interval scale using this instrument. Again the answer is "no", although it is certainly the closest system presented thus far.
Suppose a molecular biochemist wished to measure the distance between molecules. Would the Swiss caliper work? Is it not possible to think of situations where the painted ruler might work? Certainly the ruler made in the United States would be accurate enough to measure a room to decide how much carpet to order. The point is that in reality, unless a system is based on simple counting, an interval scale does not exist. The requirement that all measures be an interval or ratio scale before performing statistical operations makes no sense at all.
TOWARD A RECONCEPTUALIZATION OF SCALE TYPES
Measurement, as a process in which the symbols of the language of algebra are given meaning, is one aspect of the modeling process described in the chapter on models. Remembering the definition of a model as a representation of the "essential structure" of the real world, and not the complete structure, one would not expect that any system of measurement would be perfect. The argument that an interval scale does not exist in reality is not surprising viewed from this perspective.
The critical question is not whether a scale is nominal, ordinal, interval, or ratio, but rather whether it is useful for the purposes at hand. A measurement system may be "better" than another system because it more closely approximates the properties necessary for algebraic manipulation or costs less in terms of money or time, but no measurement system is perfect. In the view of the author, S. S. Stevens has greatly added to the understanding of measurement with the discussion of properties of measurement, but an unquestioning acceptance of scale types has blurred important concepts related to measurement for decades.
A discussion of error in measurement systems is perhaps a more fruitful manner of viewing measurement than scale types.
Different measurement systems exhibit greater or lesser amounts of different types of error. A complete discussion of measurement error remains for future study by the student.
The bottom line with respect to the theory of measurement that is of concern to the introductory statistics student is that certain assumptions are made, but never completely satisfied, with respect to the meaning of numbers. An understanding of these assumptions will allow the student to better evaluate whether a particular system will be useful for the purposes intended.
Introductory
Statistics: Concepts, Models, and Applications
David W. Stockburger
As discussed earlier, there are two major means of summarizing a set of numbers: pictures and summary numbers. Each method has advantages and disadvantages and use of one method need not exclude the use of the other. This chapter describes drawing pictures of data, which are called frequency distributions.
The first step in drawing a frequency distribution is to construct a frequency table. A frequency table is a way of organizing the data by listing every possible score (including those not actually obtained in the sample) as a column of numbers and the frequency of occurrence of each score as another. Computing the frequency of a score is simply a matter of counting the number of times that score appears in the set of data. It is necessary to include scores with zero frequency in order to draw the frequency polygons correctly.
For example, consider the following set of 15 scores which were obtained by asking a class of students their shoe size, shoe width, and sex (male or female).
Example Data
Shoe Size 
Shoe Width 
Gender 
10.5 
B 
M 
6.0 
B 
F 
9.5 
D 
M 
8.5 
A 
F 
7.0 
B 
F 
10.5 
C 
M 
7.0 
C 
F 
8.5 
D 
M 
6.5 
B 
F 
9.5 
C 
M 
7.0 
B 
F 
7.5 
B 
F 
9.0 
D 
M 
6.5 
A 
F 
7.5 
B 
F 
The same data entered into a data file in SPSS appears as follows:
To construct a frequency table, start with the smallest shoe size and list all shoe sizes as a column of numbers. The frequency of occurrence of that shoe size is written to the right.
Frequency Table of Example Data
Shoe Size 
Absolute Frequency 
6.0 
1 
6.5 
2 
7.0 
3 
7.5 
2 
8.0 
0 
8.5 
2 
9.0 
1 
9.5 
2 
10.0 
0 
10.5 
2 

15 
Note that the sum of the column of frequencies is equal to the number of scores or size of the sample (N = 15). This is a necessary, but not sufficient, property in order to insure that the frequency table has been correctly calculated. It is not sufficient because two errors could have been made, canceling each other out.
While people think of their shoe size as a discrete unit, a shoe size is actually an interval of sizes. A given shoe size may be considered the midpoint of the interval. The real limits of the interval, the two points which function as cutoff points for a given shoe size, are the midpoints between the given shoe sizes. For example, a shoe size of 8.0 is really an interval of shoe sizes ranging from 7.75 to 8.25. The smaller value is called the lower real limit, while the larger is called the upper real limit. In each case, the limit is found by taking the midpoint between the nearest score values. For example, the lower limit of 7.75 was found as the average (midpoint) of 7.5 and 8.0 by adding the values together and dividing by two (7.5 + 8.0) / 2 = 15.5/2 = 7.75. A similar operation was performed to find the upper real limit of 8.25, that is, the midpoint of 8.0 and 8.5.
To generate a frequency table using the SPSS package, select STATISTICS and FREQUENCIES as illustrated below:
In the frequencies box, select the variable name used for shoe size and the following choices:
The listing of the results of the analysis should contain the following:
The information contained in the frequency table may be transformed to a graphical or pictorial form. No information is gained or lost in this transformation, but the human information processing system often finds the graphical or pictorial presentation easier to comprehend. There are two major means of drawing a graph, histograms and frequency polygons. The choice of method is often a matter of convention, although there are times when one or the other is clearly the appropriate choice.
A histogram is drawn by plotting the scores (midpoints) on the Xaxis and the frequencies on the Yaxis. A bar is drawn for each score value, the width of the bar corresponding to the real limits of the interval and the height corresponding to the frequency of the occurrence of the score value. An example histogram is presented below for the book example. Note that although there were no individuals in the example with shoe sizes of 8.0 or 10.0, those values are still included on the Xaxis, with the bar for these values having no height.
The figure above was drawn using the SPSS computer package. Included in the output from the frequencies command described above was a histogram of shoe size. Unfortunately, the program automatically groups the data into intervals as described in Chapter 9. In order to generate a figure like the one above, the figure on the listing must be edited. To edit a figure in the listing file, place the cursor (arrow) on the figure and hit the right mouse button. When a menu appears, select the last entry on the list as follows:
Edit the graph selecting the following options:
If the data are nominal categorical in form, the histogram is similar, except that the bars do not touch. The example below presents the data for shoe width, assuming that it is not interval in nature. The example was drawn using the example SPSS data file and the Bar Graph command.
When the data are nominalcategorical in form, the histogram is the only appropriate form for the picture of the data. When the data may be assumed to be interval, then the histogram can sometimes have a large number of lines, called data ink, which make the comprehension of the graph difficult. A frequency polygon is often preferred in these cases because much less ink is needed to present the same amount of information.
In some instances artists attempt to "enhance" a histogram by adding extraneous data ink. Two examples of this sort of excess were taken from the local newspaper. In the first, the arm and building add no information to the illustration. San Francisco is practically hidden, and no building is presented for Honolulu. In the second, the later date is presented spatially before the earlier date and the size of the "bar" or window in this case has no relationship to the number being portrayed. These types of renderings should be avoided at all costs by anyone who in the slightest stretch of imagination might call themselves "statistically sophisticated." An excellent source of information about the visual display of quantitative information is presented in Tufte (1983)
An absolute frequency polygon is drawn exactly like a histogram except that points are drawn rather than bars. The Xaxis begins with the midpoint of the interval immediately lower than the lowest interval, and ends with the interval immediately higher than the highest interval. In the example, this would mean that the score values of 5.5 and 11.0 would appear on the Xaxis. The frequency polygon is drawn by plotting a point on the graph at the intersection of the midpoint of the interval and the height of the frequency. When the points are plotted, the dots are connected with lines, resulting in a frequency polygon. An absolute frequency polygon of the data in the book example is presented below.
Note that when the frequency for a score is zero, as is the case for the shoe sizes of 8.0 and 10.0, the line goes down to the Xaxis. Failing to go down to the Xaxis when the frequency is zero is the most common error students make in drawing noncumulative frequency polygons.
As of yet, I have been unable to find a means to directly draw a frequency polygon using the SPSS graphics commands. It was not possible to instruct the computer package to include the points on the Xaxis where the frequency goes down to zero. (I might be willing to reward the student who discovers a direct method extra credit.)
The absolute frequency polygon drawn above used an indirect method in SPSS. A new data set was constructed from the frequency table as follows:
The graph was drawn by selecting graphics and then line as follows (note that the case button is selected:
The next screen selects the columns to use in the display. All the following graphs will be created in a similar manner by selecting different variables as rows and columns.
In order to draw a relative frequency polygon, the relative frequency of each score interval must first be calculated and placed in the appropriate column in the frequency table.
The relative frequency of a score is another name for the proportion of scores that have a particular value. The relative frequency is computed by dividing the frequency of a score by the number of scores (N). The additional column of relative frequencies is presented below for the data in the book example.
Frequency Table of Example Data
Shoe Size 
Absolute Frequency 
Relative Frequency 
6.0 
1 
.07 
6.5 
2 
.13 
7.0 
3 
.20 
7.5 
2 
.13 
8.0 
0 
.00 
8.5 
2 
.13 
9.0 
1 
.07 
9.5 
2 
.13 
10.0 
0 
.00 
10.5 
2 
.13 

15 
.99 
The relative frequency polygon is drawn exactly like the absolute frequency polygon except the Yaxis is labeled and incremented with relative frequency rather than absolute frequency. The frequency distribution pictured below is a relative frequency polygon. Note that it appears almost identical to the absolute frequency polygon.
A relative frequency may be transformed into an absolute frequency by using an opposite transformation; that is, multiplying by the number of scores (N). For this reason the size of the sample on which the relative frequency is based is usually presented somewhere on the graph. Generally speaking, relative frequency is more useful than absolute frequency, because the size of the sample has been taken into account.
Absolute Cumulative Frequency Polygons
An absolute cumulative frequency is the number of scores which fall at or below a given score value. It is computed by adding up the number of scores which are equal to or less than a given score value. The cumulative frequency may be found from the absolute frequency by either adding up the absolute frequencies of all scores smaller than or equal to the score of interest, or by adding the absolute frequency of a score value to the cumulative frequency of the score value immediately below it. The following is presented in tabular form.
Frequency Table of Example Data
Shoe Size 
Absolute Frequency 
Absolute Cumulative Freq 
6.0 
1 
1 
6.5 
2 
3 
7.0 
3 
6 
7.5 
2 
8 
8.0 
0 
8 
8.5 
2 
10 
9.0 
1 
11 
9.5 
2 
13 
10.0 
0 
13 
10.5 
2 
15 

15 

Note that the cumulative frequency of the largest score (10.5) is equal to the number of scores (N = 15). This will always be the case if the cumulative frequency is computed correctly. The computation of the cumulative frequency for the score value of 7.5 could be done by either adding up the absolute frequencies for the scores of 7.5, 7.0, 6.5, and 6.0, respectively 2 + 3 + 2 + 1 = 8, or adding the absolute frequency of 7.5, which is 2, to the absolute cumulative frequency of 7.0, which is 6, to get a value of 8.
Plotting scores on the Xaxis and the absolute cumulative frequency on the Yaxis draws the cumulative frequency polygon. The points are plotted at the intersection of the upper real limit of the interval and the absolute cumulative frequency. The upper real limit is used in all cumulative frequency polygons because of the assumption that not all of the scores in an interval are accounted for until the upper real limit is reached. The book example of an absolute cumulative frequency polygon is presented below.
A cumulative frequency polygon will always be monotonically increasing, a mathematicians way of saying that the line will never go down, but that it will either stay at the same level or increase. The line will be horizontal when the absolute frequency of the score is zero, as is the case for the score value of 8.0 in the book example. When the highest score is reached, i.e. at 10.5, the line continues horizontally forever from that point. The cumulative frequency polygon, while displaying exactly the same amount of information as the absolute frequency distribution, expresses the information as a rate of change. The steeper the slope of the cumulative frequency polygon, the greater the rate of change. The slope of the example cumulative polygon is steepest between the values of 6.75 and 7.25, indicating the greatest number of scores between those values.
Rate of change information may be easier to comprehend if the score values involve a measure of time. The graphs of rate of rat bar pressing drawn by the behavioral psychologist are absolute cumulative polygons, as are some of the graphs in developmental psychology, such as the cumulative vocabulary of children.
The first step in drawing the relative cumulative polygon is computing the relative cumulative frequency; that is, dividing the absolute cumulative frequency by the number of scores (N). The result is the proportion of scores that fall at or below a given score. The relative cumulative frequency becomes:
Frequency Table of Example Data
Shoe Size 
Absolute Frequency 
Absolute Cum Freq 
Relative Cum Freq 
6.0 
1 
1 
.06 
6.5 
2 
3 
.20 
7.0 
3 
6 
.40 
7.5 
2 
8 
.53 
8.0 
0 
8 
.53 
8.5 
2 
10 
.67 
9.0 
1 
11 
.73 
9.5 
2 
13 
.87 
10.0 
0 
13 
.87 
10.5 
2 
15 
1.00 

15 


Drawing the Xaxis as before and the relative cumulative frequency on the Yaxis draws the relative cumulative frequency polygon directly from the preceding table. Points are plotted at the intersection of the upper real limit and the relative cumulative frequency. The graph that results from the book example is presented below.
Note that the absolute and relative cumulative frequency polygons are identical except for the Yaxis. Note also that the value of 1.000 is the largest relative cumulative frequency, and the highest point on the polygon.
Introductory
Statistics: Concepts, Models, and Applications
David W. Stockburger
COMPARING FREQUENCY DISTRIBUTIONS
When one variable is assumed to be measured on an interval scale, and another is dichotomous, that is, has only two levels, it is possible to illustrate the relationship between the variables by drawing overlapping frequency distributions. In the data presented in the preceding chapter shoe size could be treated as an interval measure and sex was a dichotomous variable with two levels, male and female. The relationship between sex and shoe size is thus an appropriate candidate for overlapping frequency distributions. Overlapping frequency distributions would be useful for two reasons: males wear different styles of shoes than females, and male and female shoe sizes are measured using different scales.
The first step in drawing overlapping frequency distributions is to partition the measured variable into two subsamples, one for each level of the dichotomous variable. In the example, shoe sizes are grouped into male and female groups as follows:
Males 

10.5 
9.5 
10.5 
8.5 
9.5 
9.0 



Females 

6.0 
8.5 
7.0 
7.0 
6.5 
7.0 
7.5 
6.5 
7.5 
A separate frequency table is then computed for each subsample. The example results in the following frequency tables.
Shoe Size Broken Down by Gender 



Males 
Females 







Shoe Size 
Abs Freq 
Rel Freq 
Abs Freq 
Rel Freq 






6.0 
0 
.00 
1 
.11 






6.5 
0 
.00 
2 
.22 






7.0 
0 
.00 
3 
.33 






7.5 
0 
.00 
2 
.22 






8.0 
0 
.00 
0 
.00 






8.5 
1 
.17 
1 
.11 






9.0 
1 
.17 
0 
.00 






9.5 
2 
.33 
0 
.00 






10.0 
0 
.00 
0 
.00 






10.5 
2 
.33 
0 
.00 







6 
1.00 
9 
.99 






Note that the relative frequency is computed by dividing the absolute frequency by the number of scores in that group. For example, the relative frequency of shoe size 9.5 for males is 2, the number of males wearing a size 9.5 shoe, divided by 6, the total number of males. The sum of the relative frequency for each gender must equal 1.00, within rounding error. To draw overlapping relative frequency polygons using SPSS/WIN enter the relative frequency table as data. The example appears below: OVERLAPPING FREQUENCY POLYGONS The overlapping relative frequency polygons are simply the two polygons for each group drawn on the same set of axes, distinguished with different types of lines. If conflicts appear, they may be resolved by drawing the lines next to one another. An example of overlapping relative frequency polygons is presented below. When polygons are drawn in this form, they may be easily compared with respect to their centers, shapes, continuity, etc. In addition, overlapping relative cumulative frequency polygons may also give additional information about how two distributions are similar or different. In many ways the overlapping cumulative frequency polygons are easier to interpret because the lines do not jump up and down as much as in the noncumulative polygons. The procedure to construct the preceding graphs in SPSS/WIN is to first the enter a frequency table as described above and then select the GRAPHICS and line... options from the toolbar. A multiline graph will generate the desired results. The commands necessary to generate the overlapping relative cumulative frequency polygons are illustrated below: Frequency tables of two variables presented simultaneously are called contingency tables. Although this rule is sometimes broken, contingency tables are generally appropriate for variables that have five or fewer levels, or different values. More than five levels, while not technologically incorrect, may result in tables that are very difficult to read and should be used with caution. Contingency tables are constructed by listing all the levels of one variable as rows in a table and the levels of the other variables as columns. For example, the labeling of the contingency table of sex by shoe width is presented below. The second step in computing the contingency table is to find the joint or cell frequency for each cell. For example, the cell in the upper left corner contains the number of males who had shoe width of "A", which in this case is zero. In turn, each cell has its frequency counted and placed in the appropriate cell. The cell frequencies are then summed across both the rows and the columns. The sums are placed in the margins, the values of which are called marginal frequencies. The lower right hand corner value contains the sum of either the row or column marginal frequencies, which both of which must be equal to N. An example is presented below. The above is an absolute frequency table and may be converted to a relative frequency table by dividing the absolute cell frequency by the number of scores, which may be row marginals, column marginals, or overall frequency (N). In the case of the above example, computing relative frequencies with respect to the row marginals results in the following table. This table gives the proportion of males or females who have a given shoe width, and would probably be most useful in ordering shoes. Computing the cell proportions using the column marginals, expressing the proportion of each shoe width which was male or female, is probably not as useful, but is shown below as a second possibility. Contingency tables are a convenient means of showing the relationship between two variables. When relative frequencies are computed, useful information about the distribution of a single variable over levels of another variable may be presented. 

Introductory
Statistics: Concepts, Models, and Applications
David W. Stockburger
GROUPED FREQUENCY DISTRIBUTIONS
An investigator interested in fingertapping behavior conducts the following study: Students are asked to tap as fast as they can with their ring finger. The hand is cupped and all fingers except the one being tapped are placed on the surface. Either the right or the left hand is used, at the preference of the student. At the end of 15 seconds, the number of taps for each student is recorded. Example data using 18 subjects are presented below:
53 
35 
67 
48 
63 
42 
48 
55 
33 
50 
46 
45 
59 
40 
47 
51 
66 
53 
A data file in SPSS corresponding to the example data is presented below:
The frequency table resulting from this data would have 34 different score values, computed by subtracting the low score (33) from the high score (67). A portion of this table is presented below:
# Taps 
Absolute Frequency 
33 
1 
34 
0 
35 
1 
... 
... 
65 
0 
66 
1 
67 
1 

18 
A histogram drawn using this data would appear as follows:
The above table and graph present all the information possible given the data. The problem is that so much information is presented that it is difficult to discern what the data is really like, or to "cognitively digest" the data. The graph is given the term "sawtoothed" because the many ups and downs give it the appearance of teeth on a saw. The great amount of data ink relative to the amount of information on the polygon makes an alternative approach desirable. It is possible to lose information (precision) about the data to gain understanding about distributions. This is the function of grouping data into intervals and drawing grouped frequency polygons.
The process of drawing grouped frequency distributions can be broken down into a number of interrelated steps: selecting the interval size, computing the frequency table, and drawing the grouped frequency histogram or polygon. Each will now be discussed in turn.
Selecting the interval size is more art than science. In order to find a starting interval size the first step is to find the range of the data by subtracting the smallest score from the largest. In the case of the example data, the range was 6733 = 34. The range is then divided by the number of desired intervals, with a suggested starting number of intervals being ten (10). In the example, the result would be 34/10 = 3.4. The nearest odd integer value is used as the starting point for the selection of the interval size. In the example the nearest odd integer would be 3.
After the interval size has been selected, the scale is then grouped into equalsized intervals based on the interval size. The first interval will begin with a multiple of the interval size equal to, or smaller than, the smallest score. In the example the first interval would begin with the value of 33, a multiple of the interval size (3 * 11). In this case the beginning of the first interval equals the smallest score value.
The ending value of the first interval is computed by adding the interval size to the beginning of the first interval and subtracting the unit of measurement. In the example, the beginning of the first interval (33) plus the interval size (3) minus the unit of measurement (1) results in a value of 33 + 3 1 or 35. Thus the first interval would be 33 to 35. Sequentially adding the interval size to these values results in all other intervals, for example 36 to 38, 39 to 41, etc.
The values for the intervals just constructed are called the apparent limits of the intervals. In the first interval, for example, the value of 33 would be called the apparent lower limit, and the value of 35 would be the apparent upper limit.
The midpoints of the intervals are computed by adding the two apparent limits together and dividing by two. The midpoint for the interval 33 to35 would thus be (33 + 35)/2 or 34. The midpoint for the second interval (3638) would be 37.
The midpoints between midpoints are called real limits. Each interval has a real lower limit and a real upper limit. The interval 3638 would therefore have a real lower limit of 35.5 and a real upper limit of 38.5. Please note that the difference between the real limits of an interval is equal to the interval size, that is 38.5  35.5 = 3. All this is easier than it first appears, as can be seen in the following grouping:
Apparent 
Apparent 
Real 
Real 


Interval 
Lower Limit 
Upper Limit 
Lower Limit 
Upper Limit 
Midpoint 
3335 
33 
35 
32.5 
35.5 
34 
3638 
36 
38 
35.5 
38.5 
37 
3941 
39 
41 
38.5 
41.5 
40 
4244 
42 
44 
41.5 
44.5 
43 
4547 
45 
47 
44.5 
47.5 
46 
4850 
48 
50 
47.5 
50.5 
49 
5153 
51 
53 
50.5 
53.5 
52 
5456 
54 
56 
53.5 
56.5 
55 
5759 
57 
59 
56.5 
59.5 
58 
6062 
60 
62 
59.5 
62.5 
61 
6365 
63 
65 
62.5 
65.5 
64 
6668 
66 
68 
65.5 
68.5 
67 
The hard work is finished when the intervals have been selected. All that remains is the counting of the frequency of scores for each interval, and, if needed, computing the relative, cumulative, and relative cumulative frequencies for the intervals. The frequency table for intervals of size three for the example data is presented below:
Absolute 

Interval 
Frequency 
3335 
2 
3638 
0 
3941 
1 
4244 
1 
4547 
3 
4850 
3 
5153 
3 
5456 
1 
5759 
1 
6062 
0 
6365 
1 
6668 
2 
The frequency histogram or polygon is drawn using the midpoints of the intervals plotted on the xaxis and the frequency on the yaxis. An absolute frequency polygon of the example data is presented below:
The above histogram was generated using SPSS graphic commands. The graph was first generated by selecting the Graphics and histogram... commands. In order to select the appropriate interval, the resulting image was edited and the category axis was changed as follows:
All of the following histograms were generated in a similar manner. Selecting the appropriate interval size and real lower limit will produce the desired result.
SELECTING ANOTHER INTERVAL SIZE
The first interval selected might not be the interval which best expresses or illustrates the data. A larger interval will condense and simplify the data, a smaller interval will expand the data and make the picture more detailed. An alternative frequency table for the example data with an interval of 6 is presented below:
Apparent 
Apparent 
Real 
Real 

Absolute 

Interval 
Lower Limit 
Upper Limit 
Lower Limit 
Upper Limit 
Midpoint 
Frequency 
3035 
30 
35 
29.5 
35.5 
32.5 
2 
3641 
36 
41 
35.5 
41.5 
38.5 
1 
4247 
42 
47 
41.5 
47.5 
44.5 
4 
4853 
48 
53 
47.5 
53.5 
50.5 
6 
5459 
54 
59 
53.5 
59.5 
56.6 
2 
6065 
60 
65 
59.5 
65.5 
62.5 
1 
6671 
66 
71 
65.5 
71.5 
68.5 
2 






18 
Note that for the first interval, the apparent lower limit is 30, the apparent upper limit is 35, the real lower limit is 29.5, the real upper limit is 35.5, and the midpoint is 32.5. The midpoint is not a unit of measurement, like 33, but a half unit, 32.5. The problem with having a midpoint that is not a unit of measurement is due to the even interval size, six in this case. For this reason, odd interval sizes are preferred.
SELECTING THE APPROPRIATE INTERVAL SIZE
Selection of the appropriate interval size requires that the intended audience of the graph be constantly kept in mind. If the persons reading the graph are likely to give the picture a cursory glance, then the information must be condensed by selecting a larger interval size. If detailed information is necessary, then a smaller interval size must be selected. The selection of the interval size, therefore, is a tradeoff between the amount of information present in the graph, and the difficulty of reading the information.
Factors other than the interval size, such as the number of scores and the nature of the data, also effect the difficulty of the graph. Because of this, the my recommendation is to select more than one interval size, draw the associated polygon, and use the resulting graph which best expresses the data for the purposes of the given audience. In this case there are no absolutes in drawing frequency polygons.
An interactive exercise is available to explore how changes in interval size effect the frequency table, relative frequency polygon, and relative cumulative frequency polygon.
The frequency table and resulting histogram for the example data and an interval of size 5 is presented below:
Apparent 
Apparent 
Real 
Real 

Absolute 

Interval 
Lower Limit 
Upper Limit 
Lower Limit 
Upper Limit 
Midpoint 
Frequency 
3034 
30 
34 
29.5 
34.5 
32 
1 
3539 
35 
39 
34.5 
39.5 
37 
1 
4044 
40 
44 
39.5 
44.5 
42 
2 
4549 
45 
49 
44.5 
49.5 
47 
5 
5054 
50 
54 
49.5 
54.5 
52 
4 
5559 
55 
59 
54.5 
59.5 
57 
2 
6064 
60 
64 
59.5 
64.5 
62 
1 
6569 
65 
69 
64.5 
69.5 
67 
2 






18 
In a like manner, the histograms for intervals of 7, 9, and 11 are now presented.
As can be seen, the shape of the distribution changes as different interval sizes are selected. In some cases, the distribution appears almost symmetric, while in others, the distribution appears skewed.
Introductory
Statistics: Concepts, Models, and Applications
David W. Stockburger
A model of a frequency distribution is an algebraic expression describing the relative frequency (height of the curve) for every possible score. The questions that sometimes come to the mind of the student is "What is the advantage of this level of abstraction? Why is all this necessary?" The answers to these questions may be found in the following.
1.) A belief in the eminent simplicity of the world.
The belief that a few algebraic expressions can adequately model a large number of very different real world phenomena underlies much statistical thought. The fact that these models often work justifies this philosophical approach.
2.) Not all the data can be collected.
In almost all cases in the social sciences, it is not feasible to collect data on the entire population in which the researcher is interested. For instance, the individual opening a shoe store might want to know the shoe sizes of all persons living within a certain distance from the store, but not be able to collect all the data. If data from a subset or sample of the population of interest is used rather than the entire population, then repeating the data collection procedure would most likely result in a different set of numbers. A model of the distribution is used to give some consistency to the results.
For example, suppose that the distribution of shoe sizes collected from a sample of fifteen individuals resulted in the following relative frequency polygon.
Because there are no individuals in the sample who wear size eight shoes, does that mean that the store owner should not stock the shelves with any size eight shoes? If a different sample was taken, would an individual who wore a size eight likely be included? Because the answer to both of the above questions is yes, some method of ordering shoes other than directly from the sample distribution must be used.
The alternative is a model of the frequency distribution, sometimes called a probability model or probability distribution. For example, suppose that the frequency polygon of shoe size for women actually looked like the following:
If this were the case the proportion (.12) or percentage (12%) of size eight shoes could be computed by finding the relative area between the real limits for a size eight shoe (7.75 to 8.25). The relative area is called probability.
The probability model attempts to capture the essential structure of the real world by asking what the world might look like if an infinite number of scores were obtained and each score was measured infinitely precisely. Nothing in the real world is exactly distributed as a probability model. However, a probability model often describes the world well enough to be useful in making decisions.
VARIATIONS OF PROBABILITY MODELS
The statistician has at his or her disposal a number of probability models to describe the world. Different models are selected for practical or theoretical reasons. Some example of probability models follow.
The Uniform or Rectangular Distribution
The uniform distribution is shaped like a rectangle, where each score is equally likely. An example is presented below.
If the uniform distribution was used to model shoe size, it would mean that between the two extremes, each shoe size would be equally likely. If the store owner was ordering shoes, it would mean that an equal number of each shoe size would be ordered. In most cases this would be a very poor model of the real world, because at the end of the year a large number of large or small shoe sizes would remain on the shelves and the middle sizes would be sold out.
The Negative Exponential Distribution
The negative exponential distribution is often used to model real world events which are relatively rare, such as the occurrence of earthquakes. The negative exponential distribution is presented below:
Not really a standard distribution, a triangular distribution could be created as follows:
It may be useful for describing some real world phenomena, but exactly what that would be is not known for sure.
The Normal Distribution or Normal Curve
The normal curve is one of a large number of possible distributions. It is very important in the social sciences and will be described in detail in the next chapter. An example of a normal curve was presented earlier as a model of shoe size.
PROPERTIES OF PROBABILITY DISTRIBUTIONS
Almost all of the useful models contain parameters. Recalling from the chapter on models, parameters are variables within the model that must be set before the model is completely specified. Changing the parameters of a probability model changes the shape of the curve. The use of parameters allows a single general purpose model to describe a wide variety of real world phenomena.
2) The Area Underneath the Curve Is Equal to One (1.0).
In order for an algebraic expression to qualify as a legitimate model of a distribution, the total area under the curve must be equal to one. This property is necessary for the same reason that the sum of the relative frequencies of a sample frequency table was equal to one. This property must hold true no matter what values are selected for the parameters of the model.
3) The Area Under the Curve Between any Two Scores is a PROBABILITY.
Although probability is a common term in the natural language, meaning likelihood or chance of occurrence, statisticians define it much more precisely. The probability of an event is the theoretical relative frequency of the event in a model of the population.
The models that have been discussed up to this point assume continuous measurement. That is, every score on the continuum of scores is possible, or there are an infinite number of scores. In this case, no single score can have a relative frequency because if it did, the total area would necessarily be greater than one. For that reason probability is defined over a range of scores rather than a single score. Thus a shoe size of 8.00 would not have a specific probability associated with it, although the interval of shoe sizes between 7.75 and 8.25 would.
Introductory
Statistics: Concepts, Models, and Applications
David W. Stockburger
As discussed in the previous chapter, the normal curve is one of a number of possible models of probability distributions. Because it is widely used and an important theoretical tool, it is given special status as a separate chapter.
The normal curve is not a single curve, rather it is an infinite number of possible curves, all described by the same algebraic expression:
Upon viewing this expression for the first time the initial reaction of the student is usually to panic. Don't. In general it is not necessary to "know" this formula to appreciate and use the normal curve. It is, however, useful to examine this expression for an understanding of how the normal curve operates.
First, some symbols in the expression are simply numbers. These symbols include "2", "P ", and "e". The latter two are rational numbers that are very long, P equaling 3.1416... and e equaling 2.81.... As discussed in the chapter on the review of algebra, it is possible to raise a "funny number", in this case "e", to a "funny power".
The second set of symbols which are of some interest includes the symbol "X", which is a variable corresponding to the score value. The height of the curve at any point is a function of X.
Thirdly, the final two symbols in the equation, "m " and "d " are called PARAMETERS, or values which, when set to particular numbers, define which of the infinite number of possible normal curves with which one is dealing. The concept of parameters is very important and considerable attention will be given them in the rest of this chapter.
The normal curve is called a family of distributions. Each member of the family is determined by setting the parameters (m and d ) of the model to a particular value (number). Because the m parameter can take on any value, positive or negative, and the s parameter can take on any positive value, the family of normal curves is quite large, consisting of an infinite number of members. This makes the normal curve a generalpurpose model, able to describe a large number of naturally occurring phenomena, from test scores to the size of the stars.
Similarity of Members of the Family of Normal Curves
All the members of the family of normal curves, although different, have a number of properties in common. These properties include: shape, symmetry, tails approaching but never touching the Xaxis, and area under the curve.
All members of the family of normal curves share the same bell shape, given the Xaxis is scaled properly. Most of the area under the curve falls in the middle. The tails of the distribution (ends) approach the Xaxis but never touch, with very little of the area under them.
All members of the family of normal curves are bilaterally symmetrical. That is, if any normal curve was drawn on a twodimensional surface (a piece of paper), cut out, and folded through the third dimension, the two sides would be exactly alike. Human beings are approximately bilaterally symmetrical, with a right and left side.
All members of the family of normal curves have tails that approach, but never touch, the Xaxis. The implication of this property is that no matter how far one travels along the number line, in either the positive or negative direction, there will still be some area under any normal curve. Thus, in order to draw the entire normal curve one must have an infinitely long line. Because most of the area under any normal curve falls within a limited range of the number line, only that part of the line segment is drawn for a particular normal curve.
All members of the family of normal curves have a total area of one (1.00) under the curve, as do all probability models or models of frequency distributions. This property, in addition to the property of symmetry, implies that the area in each half of the distribution is .50 or one half.
Because area under a curve may seem like a strange concept to many introductory statistics students, a short intermission is proposed at this point to introduce the concept.
Area is a familiar concept. For example, the area of a square is s^{2}, or side squared; the area of a rectangle is length times height; the area of a right triangle is onehalf base times height; and the area of a circle is P * r^{2}. It is valuable to know these formulas if one is purchasing such things as carpeting, shingles, etc.
Areas may be added or subtracted from one another to find some resultant area. For example, suppose one had an Lshaped room and wished to purchase new carpet. One could find the area by taking the total area of the larger rectangle and subtracting the area of the rectangle that was not needed, or one could divide the area into two rectangles, find the area of each, and add the areas together. Both procedures are illustrated below:
Finding the area under a curve poses a slightly different problem. In some cases there are formulas which directly give the area between any two points; finding these formulas are what integral calculus is all about. In other cases the areas must be approximated. All of the above procedures share a common theoretical underpinning, however.
Suppose a curve was divided into equally spaced intervals on the Xaxis and a rectangle drawn corresponding to the height of the curve at any of the intervals. The rectangles may be drawn either smaller that the curve, or larger, as in the two illustrations below:
In either case, if the areas of all the rectangles under the curve were added together, the sum of the areas would be an approximation of the total area under the curve. In the case of the smaller rectangles, the area would be too small; in the case of the latter, they would be too big. Taking the average would give a better approximation, but mathematical methods provide a better way.
A better approximation may be achieved by making the intervals on the Xaxis smaller. Such an approximations is illustrated below, more closely approximating the actual area under the curve.
The actual area of the curve may be calculated by making the intervals infinitely small (no distance between the intervals) and then computing the area. If this last statement seems a bit bewildering, you share the bewilderment with millions of introductory calculus students. At this point the introductory statistics student must say "I believe" and trust the mathematician or enroll in an introductory calculus course.
DRAWING A MEMBER OF THE FAMILY OF NORMAL CURVES
The standard procedure for drawing a normal curve is to draw a bellshaped curve and an Xaxis. A tick is placed on the Xaxis in corresponding to the highest point (middle) of the curve. Three ticks are then placed to both the right and left of the middle point. These ticks are equally spaced and include all but a very small portion under the curve. The middle tick is labeled with the value of m ; sequential ticks to the right are labeled by adding the value of d . Ticks to the left are labeled by subtracting the value of d from m for the three values. For example, if m =52 and d =12, then the middle value would be labeled with 52, points to the right would have the values of 64 (52 + 12), 76, and 88, and points to the left would have the values 40, 28, and 16. An example is presented below:
DIFFERENCES IN MEMBERS OF THE FAMILY OF NORMAL CURVES
Differences in members of the family of normal curves are a direct result of differences in values for parameters. The two parameters, m and d , each change the shape of the distribution in a different manner.
The first, m , determines where the midpoint of the distribution falls. Changes in m , without changes in d , result in moving the distribution to the right or left, depending upon whether the new value of m was larger or smaller than the previous value, but does not change the shape of the distribution. An example of how changes in m affect the normal curve are presented below:
Changes in the value of d , on the other hand, change the shape of the distribution without affecting the midpoint, because d affects the spread or the dispersion of scores. The larger the value of d , the more dispersed the scores; the smaller the value, the less dispersed. Perhaps the easiest way to understand how d affects the distribution is graphically. The distribution below demonstrates the effect of increasing the value of d :
Since this distribution was drawn according to the procedure described earlier, it appears similar to the previous normal curve, except for the values on the Xaxis. This procedure effectively changes the scale and hides the real effect of changes in d . Suppose the second distribution was drawn on a rubber sheet instead of a sheet of paper and stretched to twice its original length in order to make the two scales similar. Drawing the two distributions on the same scale results in the following graphic:
Note that the shape of the second distribution has changed dramatically, being much flatter than the original distribution. It must not be as high as the original distribution because the total area under the curve must be constant, that is, 1.00. The second curve is still a normal curve; it is simply drawn on a different scale on the Xaxis.
A different effect on the distribution may be observed if the size of d is decreased. Below the new distribution is drawn according to the standard procedure for drawing normal curves:
Now both distributions are drawn on the same scale, as outlined immediately above, except in this case the sheet is stretched before the distribution is drawn and then released in order that the two distributions are drawn on similar scales:
Note that the distribution is much higher in order to maintain the constant area of 1.00, and the scores are much more closely clustered around the value of m , or the midpoint, than before.
An interactive exercise is provided to demonstrate how the normal curve changes as a function of changes in m and d . The exercise starts by presenting a curve with m = 70 and d = 10. The student may change the value of m from 50 to 90 by moving the scroll bar on the bottom of the graph. In a similar manner, the value of d can be adjusted from 5 to 15 by changing the scroll bar on the right side of the graph.
FINDING AREA UNDER NORMAL CURVES
Suppose that when ordering shoes to restock the shelves in the store one knew that female shoe sizes were normally distributed with m = 7.0 and d = 1.1. Don't worry about where these values came from at this point, there will be plenty about that later. If the area under this distribution between 7.75 and 8.25 could be found, then one would know the proportion of size eight shoes to order. The values of 7.75 and 8.25 are the real limits of the interval of size eight shoes.
Finding the areas on the curve above is easy; simply enter the value of mu, sigma, and the score or scores into the correct boxes and click on a button on the display and the area appears. The following is an example of the use of the Normal Curve Area program and the reader should verify how the program works by entering the values in a separate screen.
To find the area below 7.75 on a normal curve with mu =7.0 and sigma=1.1 enter the following information and click on the button pointed to with the red arrow.
To find the area between scores, enter the low and high scores in the lower boxes and click on the box pointing to the "Area Between."
The area above a given score could be found on the above program by subtracting the area below the score from 1.00, the total area under the curve, or by entering the value as a "Low Score" on the bottom boxes and a corresponding very large value for a "High Score." The following illustrates the latter method. The value of "12" is more than three sigma units from the mu of 7.0, so the area will include all but the smallest fraction of the desired area.
In some applications of the normal curve, it will be necessary to find the scores that cut off some proportion or percentage of area of the normal distribution. For example, suppose one wished to know what two scores cut off the middle 75% of a normal distribution with m = 123 and d = 23. In order to answer questions of this nature, the Normal Curve Area program can be used as follows:
The results can be visualized as follows:
In a similar manner, the score value which cuts of the bottom proportion of a given normal curve can be found using the program. For example a score of 138.52 cuts off .75 of a normal curve with mu=123 and sigma=23. This area was found using Normal Curve Area program in the following manner.
The results can be visualized as follows:
The standard normal curve is a member of the family of normal curves with m = 0.0 and d = 1.0. The value of 0.0 was selected because the normal curve is symmetrical around m and the number system is symmetrical around 0.0. The value of 1.0 for d is simply a unit value. The Xaxis on a standard normal curve is often relabeled and called Z scores.
There are three areas on a standard normal curve that all introductory statistics students should know. The first is that the total area below 0.0 is .50, as the standard normal curve is symmetrical like all normal curves. This result generalizes to all normal curves in that the total area below the value of mu is .50 on any member of the family of normal curves.
The second area that should be memorized is between Zscores of 1.00 and +1.00. It is .68 or 68%.
The total area between plus and minus one sigma unit on any member of the family of normal curves is also .68.
The third area is between Zscores of 2.00 and +2.00 and is .95 or 95%.
This area (.95) also generalizes to plus and minus two sigma units on any normal curve.
Knowing these areas allow computation of additional areas. For example, the area between a Zscore of 0.0 and 1.0 may be found by taking 1/2 the area between Zscores of 1.0 and 1.0, because the distribution is symmetrical between those two points. The answer in this case is .34 or 34%. A similar logic and answer is found for the area between 0.0 and 1.0 because the standard normal distribution is symmetrical around the value of 0.0.
The area below a Zscore of 1.0 may be computed by adding .34 and .50 to get .84. The area above a Zscore of 1.0 may now be computed by subtracting the area just obtained from the total area under the distribution (1.00), giving a result of 1.00  .84 or .16 or 16%.
The area between 2.0 and 1.0 requires additional computation. First, the area between 0.0 and 2.0 is 1/2 of .95 or .475. Because the .475 includes too much area, the area between 0.0 and 1.0 (.34) must be subtracted in order to obtain the desired result. The correct answer is .475  .34 or .135.
Using a similar kind of logic to find the area between Zscores of .5 and 1.0 will result in an incorrect answer because the curve is not symmetrical around .5. The correct answer must be something less than .17, because the desired area is on the smaller side of the total divided area. Because of this difficulty, the areas can be found using the program included in this text. Entering the following information will produce the correct answer
The result can be seen graphically in the following:
The following formula is used to transform a given normal distribution into the standard normal distribution. It was much more useful when area between and below a score was only contained in tables of the standard normal distribution. It is included here for both historical reasons and because it will appear in a different form later in this text.
The normal curve is an infinite number of possible probability models called a family of distributions. Each member of the family is described by setting the parameters (m and d ) of the distribution to particular values. The members of the family are similar in that they share the same shape, are symmetrical, and have a total area underneath of 1.00. They differ in where the midpoint of the distribution falls, determined by m , and in the variability of scores around the midpoint, determined by d . The area between any two scores and the scores which cut off a given area on any given normal distribution can be easily found using the program provided with this text
Introductory
Statistics: Concepts, Models, and Applications
David W. Stockburger
It is necessary to enhance the language of algebra with an additional notational system in order to efficiently write some of the expressions which will be encountered in the next chapter on statistics. The notational scheme provides a means of representing both a large number of variables and the summation of an algebraic expression.
Suppose the following were scores made on the first homework assignment for five students in the class: 5, 7, 7, 6, and 8. These scores could be represented in the language of algebra by the symbols: V, W, X, Y, and Z. This method of representing a set of scores becomes unhandy when the number of scores is greater than 26, so some other method of representation is necessary. The method of choice is called subscripted variables, written as X_{i}, where the X is the variable name and the i is the subscript. The subscript (i) is a "dummy" or counter variable in that it may take on values from 1 to N, where N is the number of scores, to represent which score is being described. In the case of the example scores, then, X_{1}=5, X_{2}=7, X_{3}=7, X_{4}=6, and X_{5}=8.
If one wished to represent the scores made on the second homework by these same students, the symbol Y_{i} could be used. The variable Y_{1} would be the score made by the first student, Y_{2} the second student, etc.
Very often in statistics an algebraic expression of the form X_{1}+X_{2}+X_{3}+...+X_{N} is used in a formula to compute a statistic. The three dots in the preceding expression mean that something is left out of the sequence and should be filled in when interpretation is done. It is tedious to write an expression like this very often, so mathematicians have developed a shorthand notation to represent a sum of scores, called the summation notation.
The expression in front of the equals sign in what follows is summation notation; the expression that follows gives the meaning of the expression in "longhand" notation.
The expression is read, "the sum of X sub i from i equals 1 to N." It means "add up all the numbers." In the example set of five numbers, where N=5, the summation could be written:
The "i=1" in the bottom of the summation notation tells where to begin the sequence of summation. If the expression were written with "i=3", the summation would start with the third number in the set. For example:
In the example set of numbers, this would give the following result:
The "N" in the upper part of the summation notation tells where to end the sequence of summation. If there were only three scores then the summation and example would be:
Sometimes if the summation notation is used in an expression and the expression must be written a number of times, as in a proof, then a shorthand notation for the shorthand notation is employed. When the summation sign "" is used without additional notation, then "i=1" and "N" are assumed. For example:
SUMMATION OF ALGEBRAIC EXPRESSIONS
The summation notation may be used not only with single variables, but with algebraic expressions containing more than one variable. When these expressions are encountered, considerable attention must be paid to where the parentheses are located. If the parentheses are located after the summation sign, then the general rule is: DO THE ALGEBRAIC OPERATION AND THEN SUM. For example, suppose that X is the score on first homework and Y is the score for the second and that the gradebook is as follows:
X 
Y 
5 
6 
7 
7 
7 
8 
6 
7 
8 
8 
The sum of the product of the two variables could be written:
The preceding sum may be most easily computed by creating a third column on the data table above:
X 
Y 
X * Y 
5 
6 
30 
7 
7 
49 
7 
8 
56 
6 
7 
42 
8 
8 
64 
33 
36 
241 
Note that a change in the position of the parentheses dramatically changes the results:
A similar kind of differentiation is made between and . In the former the sum would be 223, while the latter would be 33^{2} or 1089.
Exceptions to the General Rule
Three exceptions to the general rule provide the foundation for some simplification and statistical properties to be discussed later. The three exceptions are:
1. When the expression being summed contains a "+" or "" at the highest level, then the summation sign may be taken inside the parentheses. The rule may be more concisely written:
Computing both sides from a table with example data yields:
X 
Y 
X + Y 
X  Y 
5 
6 
11 
1 
7 
7 
14 
0 
7 
8 
15 
1 
6 
7 
13 
1 
8 
8 
16 
0 
33 
36 
69 
3 
Note that the sum of the X+Y column is equal to the sum of X plus the sum of Y. Similar results hold for the XY column.
2. The sum of a
constant times a variable is equal to the constant times the sum of the
variable.
A constant is a value that does not change with the different values for the counter variable, "i", such as numbers. If every score is multiplied by the same number and then summed, it would be equal to the sum of the original scores times the constant. Constants are usually identified in the statement of a problem, often represented by the letters "c" or "k". If c is a constant, then, as before, this exception to the rule may be written in algebraic form:
For example, suppose that the constant was equal to 5. Using the example data produces the result:
X 
c = 5 
5 
25 
7 
35 
7 
35 
6 
30 
8 
40 
33 
165 
Note that c * 33 = 165, the same as the sum of the second column.
3. The sum of a
constant is equal to N times the constant.
If no subscripted variables (nonconstant) are included on the right of a summation sign, then the number of scores is multiplied times the constant appearing after the summation. Writing this exception to the rule in algebraic notation:
For example, if C = 8 and N = 5 then:
Solving Algebraic Expressions with Summation Notation
When algebraic expressions include summation notation, simplification can be performed if a few rules are remembered.
1. The expression to the right of the summation sign may be simplified using any of the algebraic rewriting rules.
2. The entire expression including the summation sign may be treated as a phrase in the language.
3. The summation sign is NOT a variable, and may not be treated as one (cancelled for example.)
4. The three exceptions to the general rule may be used whenever applicable.
Two examples follow with X and Y as variables and c, k, and N as constants:
Introductory
Statistics: Concepts, Models, and Applications
David W. Stockburger
A statistic is an algebraic expression combining scores into a single number. Statistics serve two functions: they estimate parameters in population models and they describe the data. The statistics discussed in this chapter will be used both as estimates of m and d and as measures of central tendency and variability. There are a large number of possible statistics, but some are more useful than others.
Central tendency is a typical or representative score. If the mayor is asked to provide a single value which best describes the income level of the city, he or she would answer with a measure of central tendency. The three measures of central tendency that will be discussed this semester are the mode, median, and mean.
The mode, symbolized by M_{o}, is the most frequently occurring score value. If the scores for a given sample distribution are:
32 
32 
35 
36 
37 
38 
38 
39 
39 
39 
40 
40 
42 
45 

then the mode would be 39 because a score of 39 occurs 3 times, more than any other score. The mode may be seen on a frequency distribution as the score value which corresponds to the highest point. For example, the following is a frequency polygon of the data presented above:
A distribution may have more than one mode if the two most frequently occurring scores occur the same number of times. For example, if the earlier score distribution were modified as follows:
32 
32 
32 
36 
37 
38 
38 
39 
39 
39 
40 
40 
42 
45 

then there would be two modes, 32 and 39. Such distributions are called bimodal. The frequency polygon of a bimodal distribution is presented below.
In an extreme case there may be no unique mode, as in the case of a rectangular distribution.
The mode is not sensitive to extreme scores. Suppose the original distribution was modified by changing the last number, 45, to 55 as follows:
32 
32 
35 
36 
37 
38 
38 
39 
39 
39 
40 
40 
42 
55 

The mode would still be 39.
In any case, the mode is a quick and dirty measure of central tendency. Quick, because it is easily and quickly computed. Dirty because it is not very useful; that is, it does not give much information about the distribution.
The median, symbolized by M_{d}, is the score value which cuts the distribution in half, such that half the scores fall above the median and half fall below it. Computation of the median is relatively straightforward. The first step is to rank order the scores from lowest to highest. The procedure branches at the next step: one way if there are an odd number of scores in the sample distribution, another if there are an even number of scores.
If there is an odd number of scores as in the distribution below:
32 
32 
35 
36 
36 
37 
38 















38 















39 
39 
39 
40 
40 
45 
46 
then the median is simply the middle number. In the case above the median would be the number 38, because there are 15 scores all together with 7 scores smaller and 7 larger.
If there is an even number of scores, as in the distribution below:
32 
35 
36 
36 
37 
38 














38 
39 














39 
39 
40 
40 
42 
45 
then the median is the midpoint between the two middle scores: in this case the value 38.5. It was found by adding the two middle scores together and dividing by two (38 + 39)/2 = 38.5. If the two middle scores are the same value then the median is that value.
In the above system, no account is paid to whether there is a duplication of scores around the median. In some systems a slight correction is performed to correct for grouped data, but since the correction is slight and the data is generally not grouped for computation in calculators or computers, it is not presented here.
The median, like the mode, is not effected by extreme scores, as the following distribution of scores indicates:
32 
35 
36 
36 
37 
38 














38 
39 














39 
39 
40 
40 
42 
55 
The median is still the value of 38.5. The median is not as quick and dirty as the mode, but generally it is not the preferred measure of central tendency.
The mean, symbolized by , is the sum of the scores divided by the number of scores. The following formula both defines and describes the procedure for finding the mean:
where X is the sum of the scores and N is the number of scores. Application of this formula to the following data
32 
35 
36 
37 
38 
38 
39 
39 
39 
40 
40 
42 
45 
yields the following results:
Use of means as a way of describing a set of scores is fairly common; batting average, bowling average, grade point average, and average points scored per game are all means. Note the use of the word "average" in all of the above terms. In most cases when the term "average" is used, it refers to the mean, although not necessarily. When a politician uses the term "average income", for example, he or she may be referring to the mean, median, or mode.
The mean is sensitive to extreme scores. For example, the mean of the following data is 39.0, somewhat larger than the preceding example.
32 
35 
36 
37 
38 
38 
39 
39 
39 
40 
40 
42 
55 
In most cases the mean is the preferred measure of central tendency, both as a description of the data and as an estimate of the parameter. In order for the mean to be meaningful, however, the acceptance of the interval property of measurement is necessary. When this property is obviously violated, it is inappropriate and misleading to compute a mean. Such is the case, for example, when the data are clearly nominal categorical. An example would be political party preference where 1 = Republican, 2 = Democrat, and 3 = Independent. The special case of dichotomous nominal categorical variables allows meaningful interpretation of means. For example, if only two levels of political party preference was allowed, 1 = Republican and 2 = Democrat, then the mean of this variable could be interpreted. In such cases it is preferred to code one level of the variable with a 0 and the other level with a 1 such that the mean is the proportion of the second level in the sample. For example, if gender was coded with 0 = Males and 1 = Females, then the mean of this variable would be the proportion of females in the sample.
As is commonly known, KIWIbirds are native to New Zealand. They are born exactly one foot tall and grow in one foot intervals. That is, one moment they are one foot tall and the next they are two feet tall. They are also very rare. An investigator goes to New Zealand and finds four birds. The mean of the four birds is 4, the median is 3, and the mode is 2. What are the heights of the four birds?
Hint  examine the constraints of the mode first, the median second, and the mean last.
Skewed Distributions and Measures of Central Tendency
Skewness refers to the asymmetry of the distribution, such that a symmetrical distribution exhibits no skewness. In a symmetrical distribution the mean, median, and mode all fall at the same point, as in the following distribution.
An exception to this is the case of a bimodal symmetrical distribution. In this case the mean and the median fall at the same point, while the two modes correspond to the two highest points of the distribution. An example follows:
A positively skewed distribution is asymmetrical and points in the positive direction. If a test was very difficult and almost everyone in the class did very poorly on it, the resulting distribution would most likely be positively skewed.
In the case of a positively skewed distribution, the mode is smaller than the median, which is smaller than the mean. This relationship exists because the mode is the point on the xaxis corresponding to the highest point, that is the score with greatest value, or frequency. The median is the point on the xaxis that cuts the distribution in half, such that 50% of the area falls on each side.
The mean is the balance point of the distribution. Because points further away from the balance point change the center of balance, the mean is pulled in the direction the distribution is skewed. For example, if the distribution is positively skewed, the mean would be pulled in the direction of the skewness, or be pulled toward larger numbers.
One way to remember the order of the mean, median, and mode in a skewed distribution is to remember that the mean is pulled in the direction of the extreme scores. In a positively skewed distribution, the extreme scores are larger, thus the mean is larger than the median.
A negatively skewed distribution is asymmetrical and points in the negative direction, such as would result with a very easy test. On an easy test, almost all students would perform well and only a few would do poorly.
The order of the measures of central tendency would be the opposite of the positively skewed distribution, with the mean being smaller than the median, which is smaller than the mode.
Variability refers to the spread or dispersion of scores. A distribution of scores is said to be highly variable if the scores differ widely from one another. Three statistics will be discussed which measure variability: the range, the variance, and the standard deviation. The latter two are very closely related and will be discussed in the same section.
The range is the largest score minus the smallest score. It is a quick and dirty measure of variability, although when a test is given back to students they very often wish to know the range of scores. Because the range is greatly affected by extreme scores, it may give a distorted picture of the scores. The following two distributions have the same range, 13, yet appear to differ greatly in the amount of variability.
Distribution 1 
32 
35 
36 
36 
37 
38 
40 
42 
42 
43 
43 
45 
Distribution 2 
32 
32 
33 
33 
33 
34 
34 
34 
34 
34 
35 
45 
For this reason, among others, the range is not the most important measure of variability.
The Variance and The Standard Deviation
The variance, symbolized by "s^{2}", is a measure of variability. The standard deviation, symbolized by "s", is the positive square root of the variance. It is easier to define the variance with an algebraic expression than words, thus the following formula:
Note that the variance could almost be the average squared deviation around the mean if the expression were divided by N rather than N1. It is divided by N1, called the degrees of freedom (df), for theoretical reasons. If the mean is known, as it must be to compute the numerator of the expression, then only N1 scores that are free to vary. That is if the mean and N1 scores are known, then it is possible to figure out the Nth score. One needs only recall the KIWIbird problem to convince oneself that this is in fact true.
The formula for the variance presented above is a definitional formula, it defines what the variance means. The variance may be computed from this formula, but in practice this is rarely done. It is done here to better describe what the formula means. The computation is performed in a number of steps, which are presented below:
Step One  
Find the mean of the scores. 
Step Two  
Subtract the mean from every score. 
Step three  
Square the results of step two. 
Step Four  
Sum the results of step three. 
Step Five  
Divide the results of step four by N1. 
Step Six  
Take the square root of step five. 
The result at step five is the sample variance, at step six, the sample standard deviation.
X 
X  
(X  )^{2} 
8 
2 
4 
8 
2 
4 
9 
1 
1 
12 
2 
4 
13 
3 
9 



50 
0 
22 
Step One  
Find the mean of the scores. 
= 50 / 5 = 10 
Step Two  
Subtract the mean from every score. 
The second column above 
Step three 
Square the results of step two. 
The third column above 
Step Four 
Sum the results of step three. 
22 
Step Five 
Divide the results of step four by N1. 
s^{2 }= 22 / 4 = 5.5 
Step Six 
Take the square root of step five. 
s = 2.345 
Note that the sum of column *2* is zero. This must be the case if the calculations are performed correctly up to that point.
The standard deviation measures variability in units of measurement, while the variance does so in units of measurement squared. For example, if one measured height in inches, then the standard deviation would be in inches, while the variance would be in inches squared. For this reason, the standard deviation is usually the preferred measure when describing the variability of distributions. The variance, however, has some unique properties which makes it very useful later on in the course.
Calculating Statistics with a Statistical Calculator
Calculations may be checked by using the statistical functions of a statistical calculator. This is the way the variance and standard deviation are usually computed in practice. The calculator has the definitional formula for the variance automatically programmed internally, and all that is necessary for its use are the following steps:
Step One  
Select the statistics mode. 
Step Two  
Clear the statistical registers. 
Step Three  
Enter the data. 
Step Four  
Make sure the correct number of scores have been entered. 
Step Five  
Hit the key that displays the mean. 
Step Six  
Hit the key that displays the standard deviation. 
Note that when using the calculator the standard deviation is found before the variance, while the opposite is the case using the definitional formula. The results using the calculator and the definitional formula should agree, within rounding error.
Calculating Statistics using SPSS
More often than not, statistics are computed using a computer package such as SPSS. It may seem initially like a lot more time and trouble to use the computer to do such simple calculations, but the student will most likely appreciate the savings in time and effort at a later time..
The first step is to enter the data into a form the computer can recognize. A data file with the example numbers is illustrated below:
Any number of statistical commands in SPSS would result in the computation of simple measures of central tendency and variability. The example below illustrates the use of the FREQUENCIES command.
The results of the above procedure are presented below:
INTERPRETING A MEAN AND STANDARD DEVIATION
An analysis, called a breakdown, gives the means and standard deviations of a variable for each level of another variable. The means and standard deviations may then be compared to see if they are different. Going back to the example of shoe sizes, the raw data appeared as follows:
Shoe Size 
Shoe Width 
Sex 
10.5 
B 
M 
6.0 
B 
F 
9.5 
D 
M 
8.8 
A 
F 
7.0 
B 
F 
10.5 
C 
M 
7.0 
C 
F 
8.5 
D 
M 
6.5 
B 
F 
9.5 
C 
M 
7.0 
B 
F 
7.5 
B 
F 
9.0 
D 
M 
6.5 
A 
F 
7.5 
B 
F 
The corresponding data file in SPSS would appear as follows:
It is possible to compare the shoe sizes of males and females by first finding the mean and standard deviation of males only and then for females only. In order to do this the original shoe sizes would be partitioned into two sets, one of males and one of females, as follows:
Males 
Females 
10.5 
6.0 
9.5 
8.5 
10.5 
7.0 
8.5 
7.0 
9.5 
6.5 
9.0 
7.0 

7.5 

6.5 

7.5 
The means and standard deviations of the males and females are organized into a table as follows:
Sex 
N 
Mean 
Standard Deviation 
Males 
6 
9.58 
0.80 
Females 
9 
7.06 
0.73 
Total 
15 
8.06 
1.47 
It can be seen that the males had larger shoe sizes as evidenced by the larger mean. It can also be seen that females had somewhat greater variability as evidenced by their larger standard deviation. In addition, the variability WITHIN GROUPS (males and females separately) was considerably less than the TOTAL variability (both sexes combined).
The analysis described above may be done using SPSS using the MEANS command. The following illustrates the selection and output of the MEANS command in SPSS.
A similar kind of breakdown could be performed for shoe size broken down by shoe width, which would produce the following table:
Shoe Width 
N 
Mean 
Standard Deviation 
A 
2 
7.5 
1.41 
B 
7 
7.43 
1.46 
C 
3 
9.0 
1.80 
D 
3 
9.0 
0.50 
Total 
15 
8.06 
1.47 
A breakdown is a very powerful tool in examining the relationships between variables. It can express a great deal of information in a relatively small space. Tables of means such as the one presented above are central to understanding Analysis of Variance (ANOVA).
Statistics serve to estimate model parameters and describe the data. Two categories of statistics were described in this chapter: measures of central tendency and measures of variability. In the former category were the mean, median, and mode. In the latter were the range, standard deviation, and variance. Measures of central tendency describe a typical or representative score, while measures of variability describe the spread or dispersion of scores. Both definitional examples of computational procedures and procedures for obtaining the statistics from a calculator were presented.
Introductory
Statistics: Concepts, Models, and Applications
David W. Stockburger
If a student, upon viewing a recently returned test, found that he or she had made a score of 33, would that be a good score or a poor score? Based only on the information given, it would be impossible to tell. The 33 could be out of 35 possible questions and be the highest score in the class, or it could be out of 100 possible points and be the lowest score, or anywhere in between. The score that is given is called a raw score. The purpose of this chapter is to describe procedures to transform raw scores into transformed scores.
Why Do We Need to Transform Scores?
Transforming scores from raw scores into transformed scores has two purposes: 1) It gives meaning to the scores and allows some kind of interpretation of the scores, 2) It allows direct comparison of two scores. For example, a score of 33 on the first test might not mean the same thing as a score of 33 on the second test.
The transformations discussed in this section belong to two general types; percentile ranks and linear transformations. Percentile ranks are advantageous in that the average person has an easier time understanding and interpreting their meaning. However, percentile ranks also have a rather unfortunate statistical property which makes their use generally unacceptable among the statistically sophisticated. Each will now be discussed in turn.
PERCENTILE RANKS BASED ON THE SAMPLE
A percentile rank is the percentage of scores that fall below a given score. For example, a raw score of 33 on a test might be transformed into a percentile rank of 98 and interpreted as "You did better than 98% of the students who took this test." In that case the student would feel pretty good about the test. If, on the other hand, a percentile rank of 3 was obtained, the student might wonder what he or she was doing wrong.
The procedure for finding the percentile rank is as follows. First, rank order the scores from lowest to highest. Next, for each different score, add the percentage of scores that fall below the score to onehalf the percentage of scores that fall at the score. The result is the percentile rank for that score.
It's actually easier to demonstrate and perform the procedure than it sounds. For example, suppose the obtained scores from 11 students were:
33 
28 
29 
37 
31 
33 
25 
33 
29 
32 
35 
The first step would be to rank order the scores from lowest to highest.
25 
28 
29 
29 
31 
32 
33 
33 
33 
35 
37 
Computing the percentage falling below a score of 31, for example, gives the value 4/11 = .364 or 36.4%. The four in the numerator reflects that four scores (25, 28, 29, and 29) were less than 31. The 11 in the denominator is N, or the number of scores. The percentage falling at a score of 31 would be 1/11 = .0909 or 9.09%. The numerator being the number of scores with a value of 31 and the denominator again being the number of scores. Onehalf of 9.09 would be 4.55. Adding the percentage below to onehalf the percentage within would yield a percentile rank of 36.4 + 4.55 or 40.95%.
Similarly, for a score of 33, the percentile rank would be computed by adding the percentage below (6/11=.5454 or 54.54%) to onehalf the percentage within ( 1/2 * 3/11 = .1364 or 13.64%), producing a percentile rank of 69.18%. The 6 in the numerator of percentage below indicates that 6 scores were smaller than a score of 33, while the 3 in the percentage within indicates that 3 scores had the value 33. All three scores of 33 would have the same percentile rank of 68.18%.
The preceding procedure can be described in an algebraic expression as follows:
Application of this algebraic procedure to the score values of 31 and 33 would give the following results:
Note that these results are within rounding error of the percentile rank computed earlier using the procedure described in words.
When computing the percentile rank for the smallest score, the frequency below is zero (0), because no scores are smaller than it. Using the formula to compute the percentile rank of the score of 25:
Computing the percentile rank for the largest score, 37, gives:
In the last two cases it has been demonstrated that a score may never have a percentile rank equal to or less than zero or equal to or greater than 100. Percentile ranks may be closer to zero or one hundred than those obtained if the number of scores was increased.
The percentile ranks for all the scores in the example data may be computed as follows:
25 
28 
29 
29 
31 
32 
33 
33 
33 
35 
37 
4.6 
13.6 
27.3 
27.3 
40.9 
50 
68.2 
68.2 
68.2 
86.4 
95.4 
Computing Percentile Ranks Based on the Sample with SPSS
Although the computation of percentile ranks based on a sample using SPSS is not exactly direct, the steps are relatively straightforward. First, enter the scores as a data file. The data used in the above example is illustrated below.
Next, sort the data from lowest to highest.
This ranks the scores from lowest to highest as follows.
Use the TRANSFORM and RANK options as follows.
The results of the preceding operation appear as a new variable with the same name as the original variable except it begins with an "r". For example, the original variable was named "scores" and the new variable appears as "rscores".
In order to compute percentile ranks as described in the earlier section, a new variable must be created using the COMPUTE command. The new variable is constructed by subtracting .5 from the new rank variable and dividing the result by N. In the example below the new variable is titled "prsamp" and it is computed by subtracting .5 from "rscores" and dividing the result by 11, the number of scores.
The student should verify that the values of the new variable are within rounding error of those in the table presented earlier in this chapter.
PERCENTILE RANKS BASED ON THE NORMAL CURVE
The percent of area below a score on a normal curve with a given mu and sigma provides an estimate of the percentile rank of a score. The mean and standard deviation of the sample estimate the values of mu and sigma. Percentile ranks can be found using the Normal Curve Area program by entering the mean, standard deviation, and score in the mu, sigma, and score boxes of the normal curve area program.
25 
28 
29 
29 
31 
32 
33 
33 
33 
35 
37 
In the example raw scores given above, the sample mean is 31.364 and the sample standard deviation is 3.414. Entering the appropriate values in the normal curve area program for a score of 29 in the Normal Curve Area program would yield a percentile rank based on the normal curve of 24% as demonstrated below.
Percentile ranks based on normal curve area for all the example scores are presented in the table below.
25 
28 
29 
29 
31 
32 
33 
33 
33 
35 
37 
3 
16 
24 
24 
46 
57 
68 
68 
68 
86 
95 
Computing Percentile Ranks Based on the Normal Curve with SPSS
Percentile ranks based on normal area can be computed using SPSS by using the "Compute Variable" option under the "Transform" command. The "CDFNORM" function returns the area below the standard normal curve. In the following example, "x" is the name of the variable describing the raw scores and "PRnorma" is a new variable created by the "Compute Variable" command. The name "PRnorma" was chosen as a shortened form of "Percentile Rank Normal Scores," the longer name being appropriate for a variable label. In the parentheses following the CDFNORM name is an algebraic expression of the form ( Variable Name  Mean)/Standard Deviation. In the case of the example data, the variable was named "x", the mean was 31.364, and the standard deviation was 3.414. The resulting expression becomes "(x31.364)/3.414".
This command would create a new variable called "PRnorma" to be included in the data table.
Comparing the Two Methods of Computing Percentile Ranks
The astute student will observe that the percentile ranks obtained in this manner are somewhat different from those obtained using the procedure described in an earlier section. That is because the two procedures give percentile ranks that are interpreted somewhat differently.
Raw Score 
25 
28 
29 
29 
31 
32 
33 
33 
33 
35 
37 
Sample %ile 
4.6 
13.6 
27.3 
27.3 
40.9 
50 
68.2 
68.2 
68.2 
86.4 
95.4 
Normal Area %ile 
3 
16 
24 
24 
46 
57 
68 
68 
68 
86 
95 
The percentile rank based on the sample describes where a score falls relative to the scores in the sample distribution. That is, if a score has a percentile rank of 34 using this procedure, then it can be said that 34% of the scores in the sample distribution fall below it.
The percentile rank based on the normal curve, on the other hand, describes where the score falls relative to a hypothetical model of a distribution. That is a score with a percentile rank of 34 using the normal curve says that 34% of an infinite number of scores obtained using a similar method will fall below that score. The additional power of this last statement is not bought without cost, however, in that the assumption must be made that the normal curve is an accurate model of the sample distribution, and that the sample mean and standard deviation are accurate estimates of the model parameters mu and sigma. If one is willing to buy these assumptions, then the percentile rank based on normal area describes the relative standing of a score within an infinite population of scores.
Percentile ranks, as the name implies, is a system of ranking. Using the system destroys the interval property of the measurement system. That is, if the scores could be assumed to have the interval property before they were transformed, they would not have the property after transformation. The interval property is critical to interpret most of the statistics described in this text, i.e. mean, mode, median, standard deviation, variance, and range, thus transformation to percentile ranks does not permit meaningful analysis of the transformed scores.
If an additional assumption of an underlying normal distribution is made, not only do percentile ranks destroy the interval property, but they also destroy the information in a particular manner. If the scores are distributed normally then percentile ranks underestimate large differences in the tails of the distribution and overestimate small differences in the middle of the distribution. This is most easily understood in an illustration:
In the above illustration two standardized achievement tests with m =500 and d =100 were given. In the first, an English test, Suzy made a score of 500 and Johnny made a score of 600, thus there was a one hundred point difference between their raw scores. On the second, a Math test, Suzy made a score of 800 and Johnny made a score of 700, again a one hundred point difference in raw scores. It can be said then, that the differences on the scores on the two tests were equal, one hundred points each.
When converted to percentile ranks, however, the differences are no longer equal. On the English test Suzy receives a percentile rank of 50 while Johnny gets an 84, a difference of 34 percentile rank points. On the Math test, Johnny's score is transformed to a percentile rank of 97.5 while Suzy's percentile rank is 99.5, a difference of only two percentile rank points.
It can be seen, then, that a percentile rank has a different meaning depending upon whether it occurs in the middle of the distribution or the tails of a normal distribution. Differences in the middle of the distribution are magnified, differences in the tails are minimized.
The unfortunate property destroying the interval property precludes the use of percentile ranks by sophisticated statisticians. Percentile ranks will remain in widespread use in order to interpret scores to the layman, but the statistician must help in emphasizing and interpreting scores. Because of this unfortunate property, a different type of transformation is needed, one which does not destroy the interval property. This leads directly into the next topic; that of linear transformations.
Introductory
Statistics: Concepts, Models, and Applications
David W. Stockburger
A linear transformation is a transformation of the form X' = a + bX. If a measurement system approximated an interval scale before the linear transformation, it will approximate it to the same degree after the linear transformation. Other properties of the distribution are similarly unaffected. For example, if a distribution was positively skewed before the transformation, it will be positively skewed after.
The symbols in the transformation equation, X'_{i} = a + bX_{i}, have the following meaning. The raw score is denoted by X_{i}, the score after the transformation is denoted by X'_{i}, read X prime or X transformed. The "b" is the multiplicative component of the linear transformation, sometimes called the slope, and the "a" is the additive component, sometimes referred to as the intercept. The "a" and "b" of the transformation are set to real values to specify a transformation.
The transformation is performed by first multiplying every score value by the multiplicative component "b" and then adding the additive component "a" to it. For example, the following set of data is linearly transformed with the transformation X'_{i} = 20 + 3*X_{i}, where a = 20 and b = 3.
Linear Transformation  a=20, b=3
X 

X' = a + bX 


12 

56 

15 

65 

15 

65 

20 

80 

22 

86 
The score value of 12, for example, is transformed first by multiplication by 3 to get 36 and then this product is added to 20 to get the result of 56.
The effect of the linear transformation on the mean and standard deviation of the scores is of considerable interest. For that reason, both, the additive and multiplicative components, of the transformation will be examined separately for their relative effects.
If the multiplicative component is set equal to one, the linear transformation becomes X' = a + X, so that the effect of the additive component may be examined. With this transformation, a constant is added to every score. An example additive transformation is shown below:
Linear Transformation  a=20, b=1
X 

X' = a + bX 


12 

32 

15 

35 

15 

35 

20 

40 

22 

42 
= 
16.8 
= 
36.8 
s_{X}= 
4.09 
s_{X'}= 
4.09 
The transformed mean, , is equal to the original mean, , plus the transformation constant, in this case a=20. The standard deviation does not change. It is as if the distribution was lifted up and placed back down to the right or left, depending upon whether the additive component was positive or negative. The effect of the additive component is graphically presented below.
The effect of the multiplicative component "b" may be examined separately if the additive component is set equal to zero. The transformation equation becomes X' = bX, which is the type of transformation done when the scale is changed, for example from feet to inches. In that case, the value of b would be equal to 12 because there are 12 inches to the foot. Similarly, transformations to and from the metric system, i.e. pounds to kilograms, and back again are multiplicative transformations.
An example multiplicative transformation is presented below, where b=3:
Linear Transformation  a=0, b=3
X 

X' = a + bX 


12 

36 

15 

45 

15 

45 

20 

60 

22 

66 
= 
16.8 
= 
50.4 
s_{X}= 
4.09 
s_{X'}= 
12.26 
Note that both the mean and the standard deviation of the transformed scores are three times their original value, which is precisely the amount of the multiplicative component. The multiplicative component, then, effects both the mean and standard deviation by its size, as illustrated below:
LINEAR TRANSFORMATIONS  EFFECT ON MEAN AND STANDARD DEVIATION
Putting the separate effects of the additive and multiplicative components together in a linear transformation, it would be expected that the standard deviation would be effected only by the multiplicative component and the mean by both. The following two equations express the relationships:
For example, in the original distribution and the linear transformation of X' = 20 + 3*X, the transformed mean and standard deviation would be expected to be:
If the transformation is done and the new mean and standard deviation computed, this is exactly what is found, within rounding error.
Linear Transformation  a=20, b=3
X 

X' = a + bX 


12 

56 

15 

65 

15 

65 

20 

80 

22 

86 
= 
16.8 
= 
70.4 
s_{X}= 
4.09 
s_{X'}= 
12.26 
The question students most often ask at this point is where do the values of "a" and "b" come from. "Are you just making them up as you go along?" is the most common question. That is exactly what has been done up to this point in time. Now the procedure for finding a and b such that the new mean and standard deviation will be a given value will be presented.
LINEAR TRANSFORMATIONS  FINDING a AND b GIVEN AND s_{X'}_{}
Suppose that the original scores were raw scores from an intelligence test. Historically, IQ's or intelligence test scores have a mean of 100 and a standard deviation of either 15 or 16, depending upon the test selected. In order to convert the raw scores to IQ scores on an IQ scale, a linear transformation is performed such that the transformed mean and standard deviation are 100 and 16, respectively. The problem is summarized in the following table:
Linear Transformation  a=?, b=?
X 

X' = a + bX 


12 

? 

15 

? 

15 

? 

20 

? 

22 

? 
= 
16.8 
= 
100.0 
s_{X}= 
4.09 
s_{X'}= 
16.0 
This problem may be solved by first recalling how the mean and standard deviation are effected by the linear transformation:
In these two equations are two unknowns, a and b. Because these equations are independent, they may be solved for a and b in the following manner. First, solving for b by dividing both sides of the equation by s_{X} produces:
Thus the value of "b" is found by dividing the new or transformed standard deviation by the original standard deviation.
After finding the value of b, the value of a may be found in the following manner:
In this case the product of the value of b times the original mean is subtracted from the new or transformed mean.
These two equations can be summarized as follows:
Application of these equations to the original problem where = 100 and s_{X'} = 16 produces the following results:
Plugging these values into the original problem produces the desired results, within rounding error.
Linear Transformation  a=34.31, b=3.91
X 

X' = a + bX 


12 

81.23 

15 

92.96 

15 

92.96 

20 

112.51 

22 

120.36 
= 
16.8 
= 
100.04 
s_{X}= 
4.09 
s_{X'}= 
15.99 
The transformed scores are now on an IQ scale. Before they may be considered as IQ's, however, the test must be validated as an IQ test. If the original test had little or nothing to do with intelligence, then the IQ's which result from a linear transformation such as the one above would be meaningless.
Using the above procedure, a given distribution with a given mean and standard deviation may be transformed into another distribution with any given mean and standard deviation. In order to turn this flexibility into some kind of order, some kind of standard scale has to be selected. The IQ scale is one such standard, but its use is pretty well limited to intelligence tests. Another standard is the Tdistribution, where scores are transformed into a scale with a mean of 50 and a standard deviation of 10. This transformation has the advantage of always being positive and between the values of 1 and 100. Another transformation is a stanine transformation where scores are transformed to a distribution with a mean of 5 and a standard deviation of 2. In this transformation the decimals are dropped, so a score of an integer value between 1 and 9 is produced. The Army used this transformation because the results could fit on a single column on a computer card.
Another possible transformation is so important and widely used that it deserves an entire section to itself. It is the standard score or zscore transformation. The standard score transformation is a linear transformation such that the transformed mean and standard deviation are 0 and 1 respectively. The selection of these values was somewhat arbitrary, but not without some reason.
Transformation to zscores could be accomplished using the procedure described in the earlier section to convert any distribution to a distribution with a given mean and standard deviation, in this case 0 and 1. This is demonstrated below with the example data.
Linear Transformation  a=4.11, b=0.244
X 

X' = a + bX 


12 

1.18 

15 

0.45 

15 

0.45 

20 

0.77 

22 

1.26 
= 
16.8 
= 
0.01 
s_{X}= 
4.09 
s_{X'}= 
0.997 
Note that the transformed mean and standard deviation are within rounding error of the desired figures.
Using a little algebra, the computational formulas to convert raw scores to zscores may be simplified. When converting to standard scores (=0 and s_{X'}=1.0), the value of a can be found by the following:
The value for b can then be found by substituting these values into the linear transformation equation:
The last result is a computationally simpler version of the standard score transformation. All this algebra was done to demonstrate that the standard score or zscore transformation was indeed a type of linear transformation. If a student is unable to follow the mathematics underlying the algebraic transformation, he or she will just have to "Believe!" In any case, the formula for converting to zscores is:
(Note that the "z" has replaced the "X'")
Application of this computational formula to the example data yields:
Zscore Transformation
X 

z 


12 

1.17 

15 

0.44 

15 

0.44 

20 

0.78 

22 

1.27 
= 
16.8 
= 
0.0 
s_{X}= 
4.09 
s_{X'}= 
.997 
Note that the two procedures produce almost identical results, except that the computational formula is slightly more accurate. Because of the increased accuracy and ease of computation, it is the method of choice in this case.
FINDING LINEAR TRANSFORMATIONS USING SPSS
All of the transformations described in this chapter could be done using the COMPUTE command on SPSS. The following presents a series of COMPUTE commands pasted into a SYNTAX file using SPSS to perform all the transformations previously described in this chapter.
COMPUTE AddTrans = 20 + x .
COMPUTE MulTrans = 3 * x .
COMPUTE LinTrans = 20 + ( 3 * x ) .
COMPUTE IQTrans = 34.31 + ( 3.91 * x ) .
COMPUTE Z1Trans = 4.11 + ( 0.244 * x ) .
COMPUTE Z2Trans = ( x  16.8 ) / 4.09 .
COMPUTE Perrank = CDFNORM(Z2Trans) .
EXECUTE .
The variables that resulted were printed using the CASE SUMMARIES command.
The means and standard deviations of this file were found using the DESCRIPTIVES command.
Transformations are performed to interpret and compare raw scores. Of the two types of transformations described in this text, percentile ranks are preferred to interpret scores to the lay public, because they are more easily understood. Because of the unfortunate property of destroying the interval property of the scale, the statistician uses percentile rank transformations with reluctance. Linear transformations are preferred because the interval property of the measurement system is not disturbed.
Using a linear transformation, a distribution with a given mean and standard deviation may be transformed into another distribution with a different mean and standard deviation. Several standards for the mean and standard deviation were discussed, but standard scores or zscores are generally the preferred transformation. The zscore transformation is a linear transformation with a transformed mean of 0 and standard deviation of 1.0. Computational procedures were provided for this transformation.
Standard scores could be converted to percentile ranks by use of the standard normal curve tables. Computation of percentile ranks using this method required additional assumptions about the nature of the world and had the same unfortunate property as percentile ranks based on the sample.
Introductory
Statistics: Concepts, Models, and Applications
David W. Stockburger
Regression models are used to predict one variable from one or more other variables. Regression models provide the scientist with a powerful tool, allowing predictions about past, present, or future events to be made with information about past or present events. The scientist employs these models either because it is less expensive in terms of time and/or money to collect the information to make the predictions than to collect the information about the event itself, or, more likely, because the event to be predicted will occur in some future time. Before describing the details of the modeling process, however, some examples of the use of regression models will be presented.
EXAMPLE USES OF REGRESSION MODELS
A high school student discusses plans to attend college with a guidance counselor. The student has a 2.04 grade point average out of 4.00 maximum and mediocre to poor scores on the ACT. He asks about attending Harvard. The counselor tells him he would probably not do well at that institution, predicting he would have a grade point average of 0.64 at the end of four years at Harvard. The student inquires about the necessary grade point average to graduate and when told that it is 2.25, the student decides that maybe another institution might be more appropriate in case he becomes involved in some "heavy duty partying."
When asked about the large state university, the counselor predicts that he might succeed, but chances for success are not great, with a predicted grade point average of 1.23. A regional institution is then proposed, with a predicted grade point average of 1.54. Deciding that is still not high enough to graduate, the student decides to attend a local community college, graduates with an associates degree and makes a fortune selling real estate.
If the counselor was using a regression model to make the predictions, he or she would know that this particular student would not make a grade point of 0.64 at Harvard, 1.23 at the state university, and 1.54 at the regional university. These values are just "best guesses." It may be that this particular student was completely bored in high school, didn't take the standardized tests seriously, would become challenged in college and would succeed at Harvard. The selection committee at Harvard, however, when faced with a choice between a student with a predicted grade point of 3.24 and one with 0.64 would most likely make the rational decision of the most promising student.
A woman in the first trimester of pregnancy has a great deal of concern about the environmental factors surrounding her pregnancy and asks her doctor about what to impact they might have on her unborn child. The doctor makes a "point estimate" based on a regression model that the child will have an IQ of 75. It is highly unlikely that her child will have an IQ of exactly 75, as there is always error in the regression procedure. Error may be incorporated into the information given the woman in the form of an "interval estimate." For example, it would make a great deal of difference if the doctor were to say that the child had a ninetyfive percent chance of having an IQ between 70 and 80 in contrast to a ninetyfive percent chance of an IQ between 50 and 100. The concept of error in prediction will become an important part of the discussion of regression models.
It is also worth pointing out that regression models do not make decisions for people. Regression models are a source of information about the world. In order to use them wisely, it is important to understand how they work.
Selection and Placement During the World Wars
Technology helped the United States and her allies to win the first and second world wars. One usually thinks of the atomic bomb, radar, bombsights, better designed aircraft, etc when this statement is made. Less well known were the contributions of psychologists and associated scientists to the development of test and prediction models used for selection and placement of men and women in the armed forces.
During these wars, the United States had thousands of men and women enlisting or being drafted into the military. These individuals differed in their ability to perform physical and intellectual tasks. The problem was one of both selection, who is drafted and who is rejected, and placement, of those selected, who will cook and who will fight. The army that takes its best and brightest men and women and places them in the front lines digging trenches is less likely to win the war than the army who places these men and women in the position of leadership.
It costs a great deal of money and time to train a person to fly an airplane. Every time one crashes, the air force has lost a plane, the time and effort to train the pilot, and not to mention, the loss of the life of a person. For this reason it was, and still is, vital that the best possible selection and prediction tools be used for personnel decisions.
A new plant to manufacture widgets is being located in a nearby community. The plant personnel officer advertises the employment opportunity and the next morning has 10,000 people waiting to apply for the 1,000 available jobs. It is important to select the 1,000 people who will make the best employees because training takes time and money and firing is difficult and bad for community relations. In order to provide information to help make the correct decisions, the personnel officer employs a regression model. None of what follows will make much sense if the procedure for constructing a regression model is not understood, so the procedure will now be discussed.
PROCEDURE FOR CONSTRUCTION OF A REGRESSION MODEL
In order to construct a regression model, both the information which is going to be used to make the prediction and the information which is to be predicted must be obtained from a sample of objects or individuals. The relationship between the two pieces of information is then modeled with a linear transformation. Then in the future, only the first information is necessary, and the regression model is used to transform this information into the predicted. In other words, it is necessary to have information on both variables before the model can be constructed.
For example, the personnel officer of the widget manufacturing company might give all applicants a test and predict the number of widgets made per hour on the basis of the test score. In order to create a regression model, the personnel officer would first have to give the test to a sample of applicants and hire all of them. Later, when the number of widgets made per hour had stabilized, the personnel officer could create a prediction model to predict the widget production of future applicants. All future applicants would be given the test and hiring decisions would be based on test performance.
A notational scheme is now necessary to describe the procedure:
X_{i} is the variable used to predict, and is sometimes called the independent variable. In the case of the widget manufacturing example, it would be the test score.
Y_{i} is the observed value of the predicted variable, and is sometimes called the dependent variable. In the example, it would be the number of widgets produced per hour by that individual.
Y'_{i} is the predicted value of the dependent variable. In the example it would be the predicted number of widgets per hour by that individual.
The goal in the regression procedure is to create a model where the predicted and observed values of the variable to be predicted are as similar as possible. For example, in the widget manufacturing situation, it is desired that the predicted number of widgets made per hour be as similar to observed values as possible. The more similar these two values, the better the model. The next section presents a method of measuring the similarity of the predicted and observed values of the predicted variable.
THE LEASTSQUARES CRITERIA FOR GOODNESSOFFIT
In order to develop a measure of how well a model predicts the data, it is valuable to present an analogy of how to evaluate predictions. Suppose there were two interviewers, Mr. A and Ms. B, who separately interviewed each applicant for the widget manufacturing job for ten minutes. At the end of that time, the interviewer had to make a prediction about how many widgets that applicant would produce two months later. All of the applicants interviewed were hired, regardless of the predictions, and at the end of the two month's trial period, one interviewer, the best one, was to be retained and promoted, the other was to be fired. The purpose of the following is to develop a measure of goodnessoffit, or, how well the interviewer predicted.
The notational scheme for the table is as follows:
Y_{i} is the observed or actual number of widgets made per hour
Y'_{i} is the predicted number of widgets
Suppose the data for the five applicants were as follows:
Interviewer 

Observed 
Mr. A 
Ms. B 
Y_{i} 
Y'_{i} 
Y'_{i} 
23 
38 
21 
18 
34 
15 
35 
16 
32 
10 
10 
8 
27 
14 
23 
Obviously neither interviewer was impressed with the fourth applicant, for good reason. A casual comparison of the two columns of predictions with the observed values leads one to believe that interviewer B made the better predictions. A procedure is desired which will provide a measure, or single number, of how well each interviewer performed.
The first step is to find how much each interviewer missed the predicted value for each applicant. This is done by finding the difference between the predicted and observed values for each applicant for each interviewer. These differences are called residuals. If the column of differences between the observed and predicted is summed, then it would appear that interviewer A is the better at prediction, because he had a smaller sum of deviations, 1, than interviewer B, with a sum of 14. This goes against common sense. In this case large positive deviations cancel out large negative deviations, leaving what appears as an almost perfect prediction for interviewer A, but that is obviously not the case.
Interviewer 

Observed 
Mr. A 
Ms . B 
Mr. A 
Ms. B 
Y_{i} 
Y'_{i} 
Y'_{i} 
Y_{i}  Y'_{i} 
Y_{i}  Y'_{i} 
23 
38 
21 
15 
2 
18 
34 
15 
16 
3 
35 
16 
32 
19 
3 
10 
10 
8 
0 
2 
27 
14 
23 
13 
4 



1 
14 
In order to avoid the preceding problem, it would be possible to ignore the signs of the differences and then sum, that is, take the sum of the absolute values. This would work, but for mathematical reasons the sign is eliminated by squaring the differences. In the example, this procedure would yield:
Interviewer 

Obs 
Mr. A 
Ms . B 
Mr. A 
Ms. B 
Mr. A 
Ms. B 
Y_{i} 
Y'_{i} 
Y'_{i} 
Y_{i}  Y'_{i} 
Y_{i}  Y'_{i} 
(Y_{i}  Y'_{i})^{2} 
(Y_{i}  Y'_{i})^{2} 
23 
38 
21 
15 
2 
225 
4 
18 
34 
15 
16 
3 
256 
9 
35 
16 
32 
19 
3 
361 
9 
10 
10 
8 
0 
2 
0 
4 
27 
14 
23 
13 
4 
169 
16 



1 
14 
1011 
42 
Summing the squared differences yields the desired measure of goodnessoffit. In this case the smaller the number, the closer the predicted to the observed values. This is expressed in the following mathematical equation.
The prediction which minimizes this sum is said to meet the leastsquares criterion. Interviewer B in the above example meets this criterion in a comparison between the two interviewers with values of 42 and 1011, respectively, and would be promoted. Interviewer A would receive a pink slip.
The situation using the regression model is analogous to that of the interviewers, except instead of using interviewers, predictions are made by performing a linear transformation of the predictor variable. Rather than interviewers in the above example, the predicted value would be obtained by a linear transformation of the score. The prediction takes the form
where a and b are parameters in the regression model.
In the above example, suppose that, rather than being interviewed each applicant took a formboard test. A formboard is a board with holes cut out in
various shapes: square, round triangular, etc. The goal is to put the right pegs in the right holes as fast as possible. The saying "square peg in a round hole" came from this test, as the test has been around for a long time. The score for the test is the number of seconds it takes to complete putting all the pegs in the right holes. The data was collected as follows:
FormBoard Test 
Widgets/hr 
X_{i} 
Y_{i} 
13 
23 
20 
18 
10 
35 
33 
10 
15 
27 
Because the two parameters of the regression model, a and b, can take on any real value, there are an infinite number of possible models, analogous to having an infinite number of possible interviewers. The goal of regression is to select the parameters of the model so that the leastsquares criterion is met, or, in other words, to minimize the sum of the squared deviations. The procedure discussed in the last chapter, that of transforming the scale of X to the scale of Y, such that both have the same mean and standard deviation will not work in this case, because of the prediction goal.
A number of possible models will now be examined where:
X_{i} is the number of seconds to complete the form board task
Y_{i} is the number of widgets made per hour two months later
Y'_{i} is the predicted number of widgets
For the first model, let a=10 and b=1, attempting to predict the first score perfectly. In this case the regression model becomes
The first score (X1=13) would be transformed into a predicted score of Y_{1}'= 10 + (1*13) = 23. The second predicted score, where X_{2} = 20 would be Y_{2}'= 10 + (1*20) = 30. The same procedure is then applied to the last three scores, resulting in predictions of 20, 43, and 25, respectively.
FormBoard 
Widgets/hr 
Residuals 
Squared Residuals 

Observed 
Observed 
Predicted 


X_{i} 
Y_{i} 
Y'_{i}=a+bX_{i} 
(Y_{i}Y'_{i}) 
(Y_{i}Y'_{i})^{2} 
13 
23 
23 
0 
0 
20 
18 
30 
12 
144 
10 
35 
20 
15 
225 
33 
10 
43 
33 
1089 
15 
27 
25 
2 
4 



(Y_{i}Y'_{i})^{2} 
1462 
It can be seen that the model does a good job of prediction for the first and last applicant, but the middle applicants are poorly predicted. Because it is desired that the model work for all applicants, some other values for the parameters must be tried.
The selections of the parameters for the second model is based on the observation that the longer it takes to put the form board together, the fewer the number of widgets made. When the tendency is for one variable to increase while the other decreases, the relationship between the variables is said to be inverse. The mathematician knows that in order to model an inverse relationship, a negative value of b must be used in the regression model. In this case the parameters of a=36 and b=1 will be used.
X_{i} 
Y_{i} 
Y'_{i}=a+bX_{i} 
(Y_{i}Y'_{i}) 
(Y_{i}Y'_{i})^{2} 
13 
23 
23 
0 
0 
20 
18 
16 
2 
4 
10 
35 
26 
9 
81 
33 
10 
3 
7 
49 
15 
27 
21 
6 
36 



(Y_{i}Y'_{i})^{2} 
170 
This model fits the data much better than did the first model. Fairly large deviations are noted for the third applicant, which might be reduced by increasing the value of the additive component of the transformation, a. Thus a model with a=41 and b=1 will now be tried.
X_{i} 
Y_{i} 
Y'_{i}=a+bX_{i} 
(Y_{i}Y'_{i}) 
(Y_{i}Y'_{i})^{2} 
13 
23 
28 
5 
25 
20 
18 
21 
3 
19 
10 
35 
31 
4 
16 
33 
10 
8 
2 
4 
15 
27 
26 
1 
1 



(Y_{i}Y'_{i})^{2} 
55 
This makes the predicted values closer to the observed values on the whole, as measured by the sum of squared deviations (residuals). Perhaps a decrease in the value of b would make the predictions better. Hence a model where a=32 and b=.5 will be tried.
X_{i} 
Y_{i} 
Y'_{i}=a+bX_{i} 
(Y_{i}Y'_{i}) 
(Y_{i}Y'_{i})^{2} 
13 
23 
25.5 
2.5 
6.25 
20 
18 
22 
4 
16 
10 
35 
27 
8 
64 
33 
10 
17.5 
7.5 
56.25 
15 
27 
24.5 
3.5 
12.25 



(Y_{i}Y'_{i})^{2} 
142.5 
Since the attempt increased the sum of the squared deviations, it obviously was not a good idea.
The point is soon reached when the question, "When do we know when to stop?" must be asked. Using this procedure, the answer must necessarily be "never", because it is always possible to change the values of the two parameters slightly and obtain a better estimate, one which makes the sum of squared deviations smaller. The following table summarizes what is known about the problem thus far.
a 
b 
(Y_{i}Y'_{i})^{2} 
10 
1 
1462 
36 
1 
170 
41 
1 
55 
32 
.5 
142.5 
With four attempts at selecting parameters for a model, it appears that when a=41 and b=1, the bestfitting (smallest sum of squared deviations) is found to this point in time. If the same search procedure were going to be continued, perhaps the value of a could be adjusted when b=2 and b=1.5, and so forth. The following program provides scroll bars to allow the student to adjust the values of "a" and "b" and view the resulting table of squared residuals. Unless the sum of squared deviations is equal to zero, which is seldom possible in the real world, we will never know if it is the best possible model. Rather than throwing their hands up in despair, applied statisticians approached the mathematician with the problem and asked if a mathematical solution could be found. This is the topic of the next section. If the student is simply willing to "believe" it may be skimmed without any great loss of the ability to "do" a linear regression problem.
SOLVING FOR PARAMETER VALUES WHICH SATISFY THE LEASTSQUARES CRITERION
The problem is presented to the mathematician as follows: "The values of a and b in the linear model Y'_{i} = a + b X_{i} are to be found which minimize the algebraic expression .
The mathematician begins as follows:
Now comes the hard part that requires knowledge of calculus. At this point even the mathematically sophisticated student will be asked to "believe." What the mathematician does is take the firstorder partial derivative of the last form of the preceding expression with respect to b, set it equal to zero, and solve for the value of b. This is the method that mathematicians use to solve for minimum and maximum values. Completing this task, the result becomes:
Using a similar procedure to find the value of a yields:
The "optimal" values for a and b can be found by doing the appropriate summations, plugging them into the equations, and solving for the results. The appropriate summations are presented below:
X_{i} 
Y_{i} 
X_{i}^{2} 
X_{i}Y_{i} 


13 
23 
169 
299 

20 
18 
400 
360 

10 
35 
100 
350 

33 
10 
1089 
330 

15 
27 
225 
405 
SUM 
91 
113 
1983 
1744 
The result of these calculations is a regression model of the form:
Solving for the a parameter is somewhat easier.
This procedure results in an "optimal" model. That is, no other values of a and b will yield a smaller sum of squared deviations. The mathematician is willing to bet the family farm on this result. A demonstration of this fact will be done for this problem shortly.
In any case, both the number of pairs of numbers (five) and the integer nature of the numbers made this problem "easy." This "easy" problem resulted in considerable computational effort. Imagine what a "difficult" problem with hundreds of pairs of decimal numbers would be like. That is why a bivariate statistics mode is available on many calculators.
USING STATISTICAL CALCULATORS TO SOLVE FOR REGRESSION PARAMETERS
Most statistical calculators require a number of steps to solve regression problems. The specific keystrokes required for the steps vary for the different makes and models of calculators. Please consult the calculator manual for details.
Step One: Put the calculator in "bivariate statistics mode." This step is not necessary on some calculators.
Step Two: Clear the statistical registers.
Step Three: Enter the pairs of numbers. Some calculators verify the number of numbers entered at any point in time on the display.
Step Four: Find the values of various statistics including:
The mean and standard deviation of both X and Y
· The correlation coefficient (r)
· The parameter estimates of the regression model
· The slope (b)
· The intercept (a)
The results of these calculations for the example problem are:
The discussion of the correlation coefficient is left for the next chapter. All that is important at the present time is the ability to calculate the value in the process of performing a regression analysis. The value of the correlation coefficient will be used in a later formula in this chapter.
DEMONSTRATION OF "OPTIMAL" PARAMETER ESTIMATES
Using either the algebraic expressions developed by the mathematician or the calculator results, the "optimal" regression model which results is:
Applying procedures identical to those used on earlier "nonoptimal" regression models, the residuals (deviations of observed and predicted values) are found, squared, and summed to find the sum of squared deviations.
X_{i} 
Y_{i} 
Y'_{i}=a+bX_{i} 
(Y_{i}Y'_{i}) 
(Y_{i}Y'_{i})^{2} 
13 
23 
27.58 
4.57 
20.88 
20 
18 
20.88 
2.78 
8.28 
10 
35 
30.44 
4.55 
20.76 
33 
10 
8.44 
1.56 
2.42 
15 
27 
25.66 
1.34 
1.80 



(Y_{i}Y'_{i})^{2} 
54.14 
Note that the sum of squared deviations ((Y_{i}Y'_{i})^{2}=54.14) is smaller than the previous low of 55.0, but not by much. The mathematician is willing to guarantee that this is the smallest sum of squared deviations that can be obtained by using any possible values for a and b.
The bottom line is that the equation
will be used to predict the number of widgets per hour that a potential employee will make, given the score that he or she has made on the formboard test. The prediction will not be perfect, but it will be the best available, given the data and the form of the model.
SCATTER PLOTS AND THE REGRESSION LINE
The preceding has been an algebraic presentation of the logic underlying the regression procedure. Since there is a onetoone correspondence between algebra and geometry, and since some students have an easier time understanding a visual presentation of an algebraic procedure, a visual presentation will now be attempted. The data will be represented as points on a scatter plot, while the regression equation will be represented by a straight line, called the regression line.
A scatter plot or scatter gram is a visual representation of the relationship between the X and Y variables. First, the X and Y axes are drawn with equally spaced markings to include all values of that variable that occur in the sample. In the example problem, X, the seconds to put the formboard together, would have to range between 10 and 33, the lowest and highest values that occur in the sample. A similar value for the Y variable, the number of widgets made per hour, is from 10 to 35. If the axes do not start at zero, as in the present case where they both start at 10, a small space is left before the line markings to indicate this fact.
The paired or bivariate (two variable, X,Y) data will be represented as vectors or points on this graph. The point is plotted by finding the intersection of the X and Y scores for that pair of values. For example, the first point would be located at the intersection of and X=13 and Y=23. The first point and the remaining four points are presented on the following graph.
The regression line is drawn by plotting the X and Y' values. The next figure presents the five X and Y' values that were found on the regression table of observed and predicted values. Note that the first point would be plotted as (13, 27.57) the second point as (20, 20.88), etc.
Note that all the points fall on a straight line. If every possible Y' were plotted for every possible X, then a straight line would be formed. The equation Y' = a + bX defines a straight line in a two dimensional space. The easiest way to draw the line is to plot the two extreme points, that is, the points corresponding to the smallest and largest X and connect these points with a straightedge. Any two points would actually work, but the two extreme points give a line with the least drawing error. The a value is sometimes called the intercept and defines where the line crosses the Yaxis. This does not happen very often in actual drawings, because the axes do not begin at zero, that is, there is a break in the line. The following illustrates how to draw the regression line.
Most often the scatterplot and regression line are combined as follows:
THE STANDARD ERROR OF ESTIMATE
The standard error of estimate is a measure of error in prediction. It is symbolized as s_{Y.X,} read as s sub Y dot X. The notation is used to mean the standard deviation of Y given the value of X is known. The standard error of estimate is defined by the formula
As such it may be thought of as the average deviation of the predicted from the observed values of Y, except the denominator is not N, but N2, the degrees of freedom for the regression procedure. One degree of freedom is lost for each of the parameters estimated, a and b. Note that the numerator is the same as in the least squares criterion.
The standard error of estimate is a standard deviation type of measure. Note the similarity of the definitional formula of the standard deviation of Y to the definitional formula for the standard error of measurement.
Two differences appear. First, the standard error of measurement divides the sum of squared deviations by N2, rather than N1. Second, the standard error of measurement finds the sum of squared differences around a predicted value of Y, rather than the mean.
The similarity of the two measures may be resolved if the standard deviation of Y is conceptualized as the error around a predicted Y of Y'_{i} = a. When the leastsquares criterion is applied to this model, the optimal value of a is the mean of Y. In this case only one degree of freedom is lost because only one parameter is estimated for the regression model.
The standard error of estimate may be calculated from the definitional formula given above. The computation is difficult, however, because the entire table of differences and squared differences must be calculated. Because the numerator has already been found, the calculation for the example data is relatively easy.
The calculation of the standard error of estimate is simplified by the following formula, called the computational formula for the standard error of estimate. The computation is easier because the statistical calculator computed the correlation coefficient when finding a regression line. The computational formula for the standard error of estimate will always give the same result, within rounding error, as the definitional formula. The computational formula may look more complicated, but it does not require the computation of the entire table of differences between observed and predicted Y scores. The computational formula is as follows:
The computational formula for the standard error of estimate is most easily and accurately computed by temporality storing the values for s_{Y}^{2} and r^{2} in the calculator's memory and recalling them when needed. Using this formula to calculate the standard error of estimate with the example data produces the following results
Note that the result is the same as the result from the application of the definitional formula, within rounding error.
The standard error of estimate is a measure of error in prediction. The larger its value, the less well the regression model fits the data, and the worse the prediction.
A conditional distribution is a distribution of a variable given a particular value of another variable. For example, a conditional distribution of number of widgets made exists for each possible value of number of seconds to put the form board together. Conceptually, suppose that an infinite number of applicants had made the same score of 18 on the form board test. If everyone was hired, not everyone make the same number of widgets three months later. The distribution of scores which results would be called the conditional distribution of Y (widgets) given X (form board). The relationship between X and Y in this case is often symbolized by Y X. The conditional distribution of Y given that X was 18 would be symbolized as Y X=18.
It is possible to model the conditional distribution with the normal curve. In order to create a normal curve model, it is necessary to estimate the values of the parameters of the model, m _{Y X} and d _{Y X} . The best estimate of m _{Y X }is the predicted value of Y, Y', given X equals a particular value. This is found by entering the appropriate value of X in the regression equation, Y'=a+bX. In the example, the estimate of m _{Y X} for the conditional distribution of number of widgets made given X=18, would be Y'=40.01.957*18=22.78. This value is also called a point estimate, because it is the best guess of Y when X is a given value.
The standard error of estimate is often used as an estimate of d _{Y X} for all the conditional distributions. This assumes that all conditional distributions have the same value for this parameter. One interpretation of the standard error of estimate, then, is an estimate of the value of d _{Y X} for all possible conditional distributions or values of X. The conditional distribution which results when X=18 is presented below.
It is somewhat difficult to visualize all possible conditional distributions in only two dimensions, although the following illustration attempts the relatively impossible. If a hill can be visualized with the middle being the regression line, the vision would be essentially correct.
The conditional distribution is a model of the distribution of points around the regression line for a given value of X. The conditional distribution is important in this text mainly for the role it plays in computing an interval estimate.
The error in prediction may be incorporated into the information given to the client by using interval estimates rather than point estimates. A point estimate is the predicted value of Y, Y'. While the point estimate gives the best possible prediction, as defined by the least squares criterion, the prediction is not perfect. The interval estimate presents two values; low and high, between which some percentage of the observed scores are likely to fall. For example, if a person applying for a position manufacturing widgets made a score of X=18 on the form board test, a point estimate of 22.78 would result from the application of the regression model and an interval estimate might be from 14.25 to 31.11. The interval computed is a 95 percent confidence interval. It could be said that 95 times out of 100 the number of widgets made per hour by an applicant making a score of 18 on the form board test would be between 14.25 and 31.11.
The model of the conditional distribution is critical to understanding the assumptions made when calculating an interval estimate. If the conditional distribution for a value of X is known, then finding an interval estimate is reduced to a problem that has already been solved in an earlier chapter. That is, what two scores on a normal distribution with parameters and cut off some middle percent of the distribution? While any percentage could be found, the standard value is a 95% confidence interval.
For example, the parameter estimates of the conditional distribution of X=18 are m _{Y X}=22.78 and d _{Y X}=4.25. The two scores which cut off the middle 95% of that distribution are 14.25 and 31.11. The use of the Normal Curve Area program to find the middle area is illustrated below
In this case, subscripts indicating a conditional distribution may be employed.
Because m _{Y X} is estimated by Y', d _{Y X} by s_{Y.X}, and the value of z is 1.96 for a 95% confidence interval, the computational formula for the confidence interval becomes
which is the computational form for computing the 95% confidence interval. For example, for X=18, Y'=22.78, and sY.X = 4.25, computation of the confidence interval becomes
Other sizes of confidence intervals could be computed by changing the value of z.
Interpretation of the confidence interval for a given score of X necessitates several assumptions. First, the conditional distribution for that X is a normal distribution. Second, m _{YX} is correctly estimated by Y', that is, the relationship between X and Y can be adequately modeled by a straight line. Third, d _{Y X} is correctly estimated by s_{Y.X}, which means assuming that all conditional distributions have the same estimate for d _{YX}.
REGRESSION ANALYSIS USING SPSS
The REGRESSION command is called in SPSS as follows:
Selecting the following options will command the program to do a simple linear regression and create two new variables in the data editor: one with the predicted values of Y and the other with the residuals.
The output from the preceding includes the correlation coefficient and standard error of estimate.
The regression coefficients are also given in the output.
The optional save command generates two new variables in the data file.
Regression models are powerful tools for predicting a score based on some other score. They involve a linear transformation of the predictor variable into the predicted variable. The parameters of the linear transformation are selected such that the least squares criterion is met, resulting in an "optimal" model. The model can then be used in the future to predict either exact scores, called point estimates, or intervals of scores, called interval estimates.
Introductory
Statistics: Concepts, Models, and Applications
David W. Stockburger
The Pearson ProductMoment Correlation Coefficient (r), or correlation coefficient for short is a measure of the degree of linear relationship between two variables, usually labeled X and Y. While in regression the emphasis is on predicting one variable from the other, in correlation the emphasis is on the degree to which a linear model may describe the relationship between two variables. In regression the interest is directional, one variable is predicted and the other is the predictor; in correlation the interest is nondirectional, the relationship is the critical aspect.
The computation of the correlation coefficient is most easily accomplished with the aid of a statistical calculator. The value of r was found on a statistical calculator during the estimation of regression parameters in the last chapter. Although definitional formulas will be given later in this chapter, the reader is encouraged to review the procedure to obtain the correlation coefficient on the calculator at this time.
The correlation coefficient may take on any value between plus and minus one.
The sign of the correlation coefficient (+ , ) defines the direction of the relationship, either positive or negative. A positive correlation coefficient means that as the value of one variable increases, the value of the other variable increases; as one decreases the other decreases. A negative correlation coefficient indicates that as one variable increases, the other decreases, and viceversa.
Taking the absolute value of the correlation coefficient measures the strength of the relationship. A correlation coefficient of r=.50 indicates a stronger degree of linear relationship than one of r=.40. Likewise a correlation coefficient of r=.50 shows a greater degree of relationship than one of r=.40. Thus a correlation coefficient of zero (r=0.0) indicates the absence of a linear relationship and correlation coefficients of r=+1.0 and r=1.0 indicate a perfect linear relationship.
UNDERSTANDING AND INTERPRETING THE CORRELATION COEFFICIENT
The correlation coefficient may be understood by various means, each of which will now be examined in turn.
The scatterplots presented below perhaps best illustrate how the correlation coefficient changes as the linear relationship between the two variables is altered. When r=0.0 the points scatter widely about the plot, the majority falling roughly in the shape of a circle. As the linear relationship increases, the circle becomes more and more elliptical in shape until the limiting case is reached (r=1.00 or r=1.00) and all the points fall on a straight line.
A number of scatterplots and their associated correlation coefficients are presented below in order that the student may better estimate the value of the correlation coefficient based on a scatterplot in the associated computer exercise.
r = 1.00
r = .54
r = .85
r = .94
r = .42
r = .33
r = .17
r = .39
Slope of the Regression Line of zscores
The correlation coefficient is the slope (b) of the regression line when both the X and Y variables have been converted to zscores. The larger the size of the correlation coefficient, the steeper the slope. This is related to the difference between the intuitive regression line and the actual regression line discussed above.
This interpretation of the correlation coefficient is perhaps best illustrated with an example involving numbers. The raw score values of the X and Y variables are presented in the first two columns of the following table. The second two columns are the X and Y columns transformed using the zscore transformation.
That is, the mean is subtracted from each raw score in the X and Y columns and then the result is divided by the sample standard deviation. The table appears as follows:
X 
Y 
z_{X} 
z_{Y} 


12 
33 
1.07 
0.61 

15 
31 
0.07 
1.38 

19 
35 
0.20 
0.15 

25 
37 
0.55 
.92 

32 
37 
1.42 
.92 






20.60 
34.60 
0.0 
0.0 
= 
8.02 
2.61 
1.0 
1.0 
There are two points to be made with the above numbers: (1) the correlation coefficient is invariant under a linear transformation of either X and/or Y, and (2) the slope of the regression line when both X and Y have been transformed to zscores is the correlation coefficient.
Computing the correlation coefficient first with the raw scores X and Y yields r=0.85. Next computing the correlation coefficient with z_{X} and z_{Y} yields the same value, r=0.85. Since the zscore transformation is a special case of a linear transformation (X' = a + bX), it may be proven that the correlation coefficient is invariant (doesn't change) under a linear transformation of either X and/or Y. The reader may verify this by computing the correlation coefficient using X and z_{Y} or Y and z_{X}. What this means essentially is that changing the scale of either the X or the Y variable will not change the size of the correlation coefficient, as long as the transformation conforms to the requirements of a linear transformation.
The fact that the correlation coefficient is the slope of the regression line when both X and Y have been converted to zscores can be demonstrated by computing the regression parameters predicting z_{X} from z_{Y} or z_{Y} from z_{X}. In either case the intercept or additive component of the regression line (a) will be zero or very close, within rounding error. The slope (b) will be the same value as the correlation coefficient, again within rounding error. This relationship may be illustrated as follows:
The squared correlation coefficient (r^{2}) is the proportion of variance in Y that can be accounted for by knowing X. Conversely, it is the proportion of variance in X that can be accounted for by knowing Y.
One of the most important properties of variance is that it may be partitioned into separate additive parts. For example, consider shoe size. The theoretical distribution of shoe size may be presented as follows:
If the scores in this distribution were partitioned into two groups, one for males and one for females, the distributions could be represented as follows:
If one knows the sex of an individual, one knows something about that person's shoe size, because the shoe sizes of males are on the average somewhat larger than females. The variance within each distribution, male and female, is variance that cannot be predicted on the basis of sex, or error variance, because if one knows the sex of an individual, one does not know exactly what that person's shoe size will be.
Rather than having just two levels the X variable will usually have many levels. The preceding argument may be extended to encompass this situation. It can be shown that the total variance is the sum of the variance that can be predicted and the error variance, or variance that cannot be predicted. This relationship is summarized below:
The correlation coefficient squared is equal to the ratio of predicted to total variance:
This formula may be rewritten in terms of the error variance, rather than the predicted variance as follows:
The error variance, s^{2}_{ERROR}, is estimated by the standard error of estimate squared, s^{2}_{Y.X}, discussed in the previous chapter. The total variance (s2TOTAL) is simply the variance of Y, s^{2}_{Y}.The formula now becomes:
Solving for s_{Y.X}, and adding a correction factor (N1)/(N2), yields the computational formula for the standard error of estimate,
This captures the essential relationship between the correlation coefficient, the variance of Y, and the standard error of estimate. As the standard error of estimate becomes large relative to the total variance, the correlation coefficient becomes smaller. Thus the correlation coefficient is a function of both the standard error of estimate and the total variance of Y. The standard error of estimate is an absolute measure of the amount of error in prediction, while the correlation coefficient squared is a relative measure, relative to the total variance.
CALCULATION OF THE CORRELATION COEFFICIENT
The easiest method of computing a correlation coefficient is to use a statistical calculator or computer program. Barring that, the correlation coefficient may be computed using the following formula:
Computation using this formula is demonstrated below on some example data: Computation is rarely done in this manner and is provided as an example of the application of the definitional formula, although this formula provides little insight into the meaning of the correlation coefficient.
X 
Y 
z_{X} 
z_{Y} 
z_{X}z_{Y} 
12 
33 
1.07 
0.61 
0.65 
15 
31 
0.07 
1.38 
0.97 
19 
35 
0.20 
0.15 
0.03 
25 
37 
0.55 
.92 
0.51 
32 
37 
1.42 
.92 
1.31 



SUM = 
3.40 
A convenient way of summarizing a large number of correlation coefficients is to put them in in a single table, called a correlation matrix. A Correlation Matix is a table of all possible correlation coefficients between a set of variables. For example, suppose a questionnaire of the following form (Reed, 1983) produced a data matrix as follows.
AGE  What is your age? _____
KNOW  Number of correct answers out of 10 possible to a Geology quiz which consisted of correctly locating 10 states on a state map of the United States.
VISIT  How many states have you visited? _____
COMAIR  Have you ever flown on a commercial airliner? _____
SEX  1 = Male, 2 = Female
Since there are five questions on the example questionnaire there are 5 * 5 = 25 different possible correlation coefficients to be computed. Each computed correlation is then placed in a table with variables as both rows and columns at the intersection of the row and column variable names. For example, one could calculate the correlation between AGE and KNOWLEDGE, AGE and STATEVIS, AGE and COMAIR, AGE and SEX, KNOWLEDGE and STATEVIS, etc., and place then in a table of the following form.
One would not need to calculate all possible correlation coefficients, however, because the correlation of any variable with itself is necessarily 1.00. Thus the diagonals of the matrix need not be computed. In addition, the correlation coefficient is nondirectional. That is, it doesn't make any difference whether the correlation is computed between AGE and KNOWLEDGE with AGE as X and KNOWLEDGE as Y or KNOWLDEGE as X and AGE as Y. For this reason the correlation matrix is symmetrical around the diagonal. In the example case then, rather than 25 correlation coefficients to compute, only 10 need be found, 25 (total)  5 (diagonals)  10 (redundant because of symmetry) = 10 (different unique correlation coefficients).
To calculate a correlation matrix using SPSS select CORRELATIONS and BIVARIATE as follows:
Select the variables that are to be included in the correlation matrix as follows. In this case all variables will be included, and optional means and standard deviations will be output.
The results of the preceding are as follows:
Interpretation of the data analysis might proceed as follows. The table of means and standard deviations indicates that the average Psychology 121 student who filled out this questionnaire was about 19 years old, could identify slightly more than six states out of ten, and had visited a little over 18 of the 50 states. The majority (67%) have flown on a commercial airplane and there were fewer females (43%) than males.
The analysis of the correlation matrix indicates that few of the observed relationships were very strong. The strongest relationship was between the number of states visited and whether or not the student had flown on a commercial airplane (r=.42) which indicates that if a student had flown he/she was more likely to have visited more states. This is because of the positive sign on the correlation coefficient and the coding of the commercial airplane question (0=NO, 1=YES). The positive correlation means that as X increases, so does Y: thus, students who responded that they had flown on a commercial airplane visited more states on the average than those who hadn't.
Age was positively correlated with number of states visited (r=.22) and flying on a commercial airplane (r=.19) with older students more likely both to have visited more states and flown, although the relationship was not very strong. The greater the number of states visited, the more states the student was likely to correctly identify on the map, although again relationship was weak (r=.28). Note that one of the students who said he had visited 48 of the 50 states could identify only 5 of 10 on the map.
Finally, sex of the participant was slightly correlated with both age, (r=.17) indicating that females were slightly older than males, and number of states visited (r=.16), indicating that females visited fewer states than males These conclusions are possible because of the sign of the correlation coefficient and the way the sex variable was coded: 1=male 2=female. When the correlation with sex is positive, females will have more of whatever is being measured on Y. The opposite is the case when the correlation is negative.
CAUTIONS ABOUT INTERPRETING CORRELATION COEFFICIENTS
Correct interpretation of a correlation coefficient requires the assumption that both variables, X and Y, meet the interval property requirements of their respective measurement systems. Calculators and computers will produce a correlation coefficient regardless of whether or not the numbers are "meaningful" in a measurement sense.
As discussed in the chapter on Measurement, the interval property is rarely, if ever, fully satisfied in real applications. There is some difference of opinion among statisticians about when it is appropriate to assume the interval property is met. My personal opinion is that as long as a larger number means that the object has more of something or another, then application of the correlation coefficient is useful, although the potentially greater deviations from the interval property must be interpreted with greater caution. When the data is clearly nominal categorical with more than two levels (1=Protestant, 2=Catholic, 3=Jewish, 4=Other), application of the correlation coefficient is clearly inappropriate.
An exception to the preceding rule occurs when the nominal categorical scale is dichotomous, or has two levels (1=Male, 2=Female). Correlation coefficients computed with data of this type on either the X and/or Y variable may be safely interpreted because the interval property is assumed to be met for these variables. Correlation coefficients computed using data of this type are sometimes given special, different names, but since they seem to add little to the understanding of the meaning of the correlation coefficient, they will not be presented.
An outlier is a score that falls outside the range of the rest of the scores on the scatterplot. For example, if age is a variable and the sample is a statistics class, an outlier would be a retired individual. Depending upon where the outlier falls, the correlation coefficient may be increased or decreased.
An outlier which falls near where the regression line would normally fall would necessarily increase the size of the correlation coefficient, as seen below.
r = .457
An outlier that falls some distance away from the original regression line would decrease the size of the correlation coefficient, as seen below:
r = .336
The effect of the outliers on the above examples is somewhat muted because the sample size is fairly large (N=100). The smaller the sample size, the greater the effect of the outlier. At some point the outlier will have little or no effect on the size of the correlation coefficient.
When a researcher encounters an outlier, a decision must be made whether to include it in the data set. It may be that the respondent was deliberately malingering, giving wrong answers, or simply did not understand the question on the questionnaire. On the other hand, it may be that the outlier is real and simply different. The decision whether to include or not include an outlier remains with the researcher; he or she must justify deleting any data to the reader of a technical report, however. It is suggested that the correlation coefficient be computed and reported both with and without the outlier if there is any doubt about whether or not it is real data. In any case, the best way of spotting an outlier is by drawing the scatterplot.
No discussion of correlation would be complete without a discussion of causation. It is possible for two variables to be related (correlated), but not have one variable cause another.
For example, suppose there exists a high correlation between the number of popsicles sold and the number of drowning deaths. Does that mean that one should not eat popsicles before one swims? Not necessarily. Both of the above variable are related to a common variable, the heat of the day. The hotter the temperature, the more popsicles sold and also the more people swimming, thus the more drowning deaths. This is an example of correlation without causation.
Much of the early evidence that cigarette smoking causes cancer was correlational. It may be that people who smoke are more nervous and nervous people are more susceptible to cancer. It may also be that smoking does indeed cause cancer. The cigarette companies made the former argument, while some doctors made the latter. In this case I believe the relationship is causal and therefore do not smoke.
Sociologists are very much concerned with the question of correlation and causation because much of their data is correlational. Sociologists have developed a branch of correlational analysis, called path analysis, precisely to determine causation from correlations (Blalock, 1971). Before a correlation may imply causation, certain requirements must be met. These requirements include: (1) the causal variable must temporally precede the variable it causes, and (2) certain relationships between the causal variable and other variables must be met.
If a high correlation was found between the age of the teacher and the students' grades, it does not necessarily mean that older teachers are more experienced, teach better, and give higher grades. Neither does it necessarily imply that older teachers are soft touches, don't care, and give higher grades. Some other explanation might also explain the results. The correlation means that older teachers give higher grades; younger teachers give lower grades. It does not explain why it is the case.
A simple correlation may be interpreted in a number of different ways: as a measure of linear relationship, as the slope of the regression line of zscores, and as the correlation coefficient squared as the proportion of variance accounted for by knowing one of the variables. All the above interpretations are correct and in a certain sense mean the same thing.
A number of qualities which might effect the size of the correlation coefficient were identified. They included missing parts of the distribution, outliers, and common variables. Finally, the relationship between correlation and causation was discussed.
Introductory
Statistics: Concepts, Models, and Applications
David W. Stockburger
Hypothesis tests are procedures for making rational decisions about the reality of effects.
Most decisions require that an individual select a single alternative from a number of possible alternatives. The decision is made without knowing whether or not it is correct; that is, it is based on incomplete information. For example, a person either takes or does not take an umbrella to school based upon both the weather report and observation of outside conditions. If it is not currently raining, this decision must be made with incomplete information.
A rational decision is characterized by the use of a procedure which insures the likelihood or probability that success is incorporated into the decisionmaking process. The procedure must be stated in such a fashion that another individual, using the same information, would make the same decision.
One is reminded of a STAR TREK episode. Captain Kirk, for one reason or another, is stranded on a planet without his communicator and is unable to get back to the Enterprise. Spock has assumed command and is being attacked by Klingons (who else). Spock asks for and receives information about the location of the enemy, but is unable to act because he does not have complete information. Captain Kirk arrives at the last moment and saves the day because he can act on incomplete information.
This story goes against the concept of rational man. Spock, being the ultimate rational man, would not be immobilized by indecision. Instead, he would have selected the alternative which realized the greatest expected benefit given the information available. If complete information were required to make decisions, few decisions would be made by rational men and women. This is obviously not the case. The script writer misunderstood Spock and rational man.
When a change in one thing is associated with a change in another, we have an effect. The changes may be either quantitative or qualitative, with the hypothesis testing procedure selected based upon the type of change observed. For example, if changes in salt intake in a diet are associated with activity level in children, we say an effect occurred. In another case, if the distribution of political party preference (Republicans, Democrats, or Independents) differs for sex (Male or Female), then an effect is present. Much of the behavioral science is directed toward discovering and understanding effects.
The effects discussed in the remainder of this text appear as various statistics including: differences between means, contingency tables, and correlation coefficients.
All hypothesis tests conform to similar principles and proceed with the same sequence of events.
· A model of the world is created in which there are no effects. The experiment is then repeated an infinite number of times.
· The results of the experiment are compared with the model of step one. If, given the model, the results are unlikely, then the model is rejected and the effects are accepted as real. If, the results could be explained by the model, the model must be retained. In the latter case no decision can be made about the reality of effects.
Hypothesis testing is equivalent to the geometrical concept of hypothesis negation. That is, if one wishes to prove that A (the hypothesis) is true, one first assumes that it isn't true. If it is shown that this assumption is logically impossible, then the original hypothesis is proven. In the case of hypothesis testing the hypothesis may never be proven; rather, it is decided that the model of no effects is unlikely enough that the opposite hypothesis, that of real effects, must be true.
An analogous situation exists with respect to hypothesis testing in statistics. In hypothesis testing one wishes to show real effects of an experiment. By showing that the experimental results were unlikely, given that there were no effects, one may decide that the effects are, in fact, real. The hypothesis that there were no effects is called the NULL HYPOTHESIS. The symbol H_{0} is used to abbreviate the Null Hypothesis in statistics. Note that, unlike geometry, we cannot prove the effects are real, rather we may decide the effects are real.
For example, suppose the following probability model (distribution) described the state of the world. In this case the decision would be that there were no effects; the null hypothesis is true.
Event A might be considered fairly likely, given the above model was correct. As a result the model would be retained, along with the NULL HYPOTHESIS. Event B on the other hand is unlikely, given the model. Here the model would be rejected, along with the NULL HYPOTHESIS.
The SAMPLING DISTRIBUTION is a distribution of a sample statistic. It is used as a model of what would happen if
1.) the null hypothesis were true (there really were no effects), and
2.) the experiment was repeated an infinite number of times.
Because of its importance in hypothesis testing, the sampling distribution will be discussed in a separate chapter.
Probability is a theory of uncertainty. It is a necessary concept because the world according to the scientist is unknowable in its entirety. However, prediction and decisions are obviously possible. As such, probability theory is a rational means of dealing with an uncertain world.
Probabilities are numbers associated with events that range from zero to one (01). A probability of zero means that the event is impossible. For example, if I were to flip a coin, the probability of a leg is zero, due to the fact that a coin may have a head or tail, but not a leg. Given a probability of one, however, the event is certain. For example, if I flip a coin the probability of heads, tails, or an edge is one, because the coin must take one of these possibilities.
In real life, most events have probabilities between these two extremes. For instance, the probability of rain tonight is .40; tomorrow night the probability is .10. Thus it can be said that rain is more likely tonight than tomorrow.
The meaning of the term probability depends upon one's philosophical orientation. In the CLASSICAL approach, probabilities refer to the relative frequency of an event, given the experiment was repeated an infinite number of times. For example, the .40 probability of rain tonight means that if the exact conditions of this evening were repeated an infinite number of times, it would rain 40% of the time.
In the Subjective approach, however, the term probability refers to a "degree of belief." That is, the individual assigning the number .40 to the probability of rain tonight believes that, on a scale from 0 to 1, the likelihood of rain is .40. This leads to a branch of statistics called "BAYESIAN STATISTICS." While many statisticians take this approach, it is not usually taught at the introductory level. At this point in time all the introductory student needs to know is that a person calling themselves a "Bayesian Statistician" is not ignorant of statistics. Most likely, he or she is simply involved in the theory of statistics.
No matter what theoretical position is taken, all probabilities must conform to certain rules. Some of the rules are concerned with how probabilities combine with one another to form new probabilities. For example, when events are independent, that is, one doesn't effect the other, the probabilities may be multiplied together to find the probability of the joint event. The probability of rain today AND the probability of getting a head when flipping a coin is the product of the two individual probabilities.
A deck of cards illustrates other principles of probability theory. In bridge, poker, rummy, etc., the probability of a heart can be found by dividing thirteen, the number of hearts, by fiftytwo, the number of cards, assuming each card is equally likely to be drawn. The probability of a queen is four (the number of queens) divided by the number of cards. The probability of a queen OR a heart is sixteen divided by fiftytwo. This figure is computed by adding the probability of hearts to the probability of a queen, and then subtracting the probability of a queen AND a heart which equals 1/52.
An introductory mathematical probability and statistics course usually begins with the principles of probability and proceeds to the applications of these principles. One problem a student might encounter concerns unsorted socks in a sock drawer. Suppose one has twentyfive pairs of unsorted socks in a sock drawer. What is the probability of drawing out two socks at random and getting a pair? What is the probability of getting a match to one of the first two when drawing out a third sock? How many socks on the average would need to be drawn before one could expect to find a pair? This problem is rather difficult and will not be solved here, but is used to illustrate the type of problem found in mathematical statistics.
Introductory
Statistics: Concepts, Models, and Applications
David W. Stockburger
The sampling distribution is a distribution of a sample statistic. While the concept of a distribution of a set of numbers is intuitive for most students, the concept of a distribution of a set of statistics is not. Therefore distributions will be reviewed before the sampling distribution is discussed.
The sample distribution is the distribution resulting from the collection of actual data. A major characteristic of a sample is that it contains a finite (countable) number of scores, the number of scores represented by the letter N. For example, suppose that the following data were collected:
32 
35 
42 
33 
36 
38 
37 
33 
38 
36 
35 
34 
37 
40 
38 
36 
35 
31 
37 
36 
33 
36 
39 
40 
33 
30 
35 
37 
39 
32 
39 
37 
35 
36 
39 
33 
31 
40 
37 
34 
34 
37 
These numbers constitute a sample distribution. Using the procedures discussed in the chapter on frequency distributions, the following relative frequency polygon can be constructed to picture this data:
In addition to the frequency distribution, the sample distribution can be described with numbers, called statistics. Examples of statistics are the mean, median, mode, standard deviation, range, and correlation coefficient, among others. Statistics, and procedures for computing statistics, have been discussed in detail in an earlier chapter.
If a different sample was taken, different scores would result. The relative frequency polygon would be different, as would the statistics computed from the second sample. However, there would also be some consistency in that while the statistics would not be exactly the same, they would be similar. To achieve order in this chaos, statisticians have developed probability models.
PROBABILITY MODELS  POPULATION DISTRIBUTIONS
Probability models exist in a theoretical world where complete information is available. As such, they can never be known except in the mind of the mathematical statistician. If an infinite number of infinitely precise scores were taken, the resulting distribution would be a probability model of the population.
This probability model is described with pictures (graphs) which are analogous to the relative frequency polygon of the sample distribution. The two graphs below illustrate two types of probability models, the uniform distribution and the normal curve.
As discussed earlier in the chapter on the normal curve, probability distributions are described by mathematical equations that contain parameters. Parameters are variables that change the shape of the probability model. By setting these parameters equal to numbers, a member of that probability model family of models results.
A critical aspect of statistics is the estimation of parameters with sample statistics. Sample statistics are used as estimators of the corresponding parameters in the population model. For example, the mean and standard deviation of the sample are used as estimates of the corresponding population parameters m and d . Mathematical statistics texts devote considerable effort to defining what is a good or poor parameter estimation procedure.
Note the "ING" on the end of SAMPLE. It looks and sounds similar to the SAMPLE DISTRIBUTION, but, in reality the concept is much closer to a population model.
The sampling distribution is a distribution of a sample statistic. It is a model of a distribution of scores, like the population distribution, except that the scores are not raw scores, but statistics. It is a thought experiment; "what would the world be like if a person repeatedly took samples of size N from the population distribution and computed a particular statistic each time?" The resulting distribution of statistics is called the sampling distribution of that statistic.
For example, suppose that a sample of size sixteen (N=16) is taken from some population. The mean of the sixteen numbers is computed. Next a new sample of sixteen is taken, and the mean is again computed. If this process were repeated an infinite number of times, the distribution of the now infinite number of sample means would be called the sampling distribution of the mean.
Every statistic has a sampling distribution. For example, suppose that instead of the mean, medians were computed for each sample. The infinite number of medians would be called the sampling distribution of the median.
Just as the population models can be described with parameters, so can the sampling distribution. The expected value (analogous to the mean) of a sampling distribution will be represented here by the symbol m . The m symbol is often written with a subscript to indicate which sampling distribution is being discussed. For example, the expected value of the sampling distribution of the mean is represented by the symbol , that of the median by , etc. The value of can be thought of as the mean of the distribution of means. In a similar manner the value of is the mean of a distribution of medians. They are not really means, because it is not possible to find a mean when N=¥ , but are the mathematical equivalent of a mean.
Using advanced mathematics, in a thought experiment, the theoretical statistician often discovers a relationship between the expected value of a statistic and the model parameters. For example, it can be proven that the expected value of both the mean and the median, and M_{d}, is equal to . When the expected value of a statistic equals a population parameter, the statistic is called an unbiased estimator of that parameter. In this case, both the mean and the median would be an unbiased estimator of the parameter .
A sampling distribution may also be described with a parameter corresponding to a variance, symbolized by . The square root of this parameter is given a special name, the standard error. Each sampling distribution has a standard error. In order to keep them straight, each has a name tagged on the end of the word "standard error" and a subscript on the symbol. The standard deviation of the sampling distribution of the mean is called the standard error of the mean and is symbolized by . Similarly, the standard deviation of the sampling distribution of the median is called the standard error of the median and is symbolized by .
In each case the standard error of a statistics describes the degree to which the computed statistics will differ from one another when calculated from sample of similar size and selected from similar population models. The larger the standard error, the greater the difference between the computed statistics. Consistency is a valuable property to have in the estimation of a population parameter, as the statistic with the smallest standard error is preferred as the estimator of the corresponding population parameter, everything else being equal. Statisticians have proven that in most cases the standard error of the mean is smaller than the standard error of the median. Because of this property, the mean is the preferred estimator of .
THE SAMPLING DISTRIBUTION OF THE MEAN
The sampling distribution of the mean is a distribution of sample means. This distribution may be described with the parameters and
These parameters are closely related to the parameters of the population distribution, with the relationship being described by the CENTRAL LIMIT THEOREM. The CENTRAL LIMIT THEOREM essentially states that the mean of the sampling distribution of the mean () equals the mean of the population () and that the standard error of the mean () equals the standard deviation of the population () divided by the square root of N as the sample size gets infinitely larger (N> ). In addition, the sampling distribution of the mean will approach a normal distribution. These relationships may be summarized as follows:
The astute student probably noticed, however, that the sample size would have to be very large ( ) in order for these relationships to hold true. In theory, this is fact; in practice, an infinite sample size is impossible. The Central Limit Theorem is very powerful. In most situations encountered by behavioral scientists, this theorem works reasonably well with an N greater than 10 or 20. Thus, it is possible to closely approximate what the distribution of sample means looks like, even with relatively small sample sizes.
The importance of the central limit theorem to statistical thinking cannot be overstated. Most of hypothesis testing and sampling theory is based on this theorem. In addition, it provides a justification for using the normal curve as a model for many naturally occurring phenomena. If a trait, such as intelligence, can be thought of as a combination of relatively independent events, in this case both genetic and environmental, then it would be expected that trait would be normally distributed in a population.
MICROCOMPUTER SIMULATION OF SAMPLING DISTRIBUTIONS
The purpose of the microcomputer simulation exercise (named SIMSAM) is to demonstrate how a sampling distribution is created. To run properly the program requires the use Internet Explore 3.x. The opening screen will appear as follows:
Although it is possible to skip directly to the test mode, it is suggested that
the student spend some time familiarizing him or herself in the learning mode.
The learning mode screen will appear as follows:
Selecting a distribution, a value for either the Range or Sigma, a sample size,
and then clicking on the Sample button runs each simulation. The values for
either the Range or Sigma and the Sample Size are changed using the scrollbar.
When the sample button is pressed, the computer generates 100 samples of the
sample size selected, computes the mean for each sample, and then draws the sampling
distribution of the mean below the population model. The student should verify
that the sampling distribution changes as a function of the type of population
model, the variability of the population model, and the sample size. In
addition, the student should verify that the shape of the sampling distribution
of the mean approaches a normal distribution as the sample size increases no
matter what the population model looks like.
When the student is comfortable and understands the above screen, he or she should Exit and proceed to the performance mode. The performance mode screen will appear as follows:
On each trial, the student is presented with a population model and a sample
size. The student must guess which of the four potential sampling distributions
will be closest to the sampling distribution of the mean that is generated by
the computer. After selecting one of the four possibilities by clicking on the
button next to the graph, the computer will generate 100 samples, compute the
mean for each sample, draw the sampling distribution of the mean in the labeled
box, and compare the observed sampling distribution of the mean with each of
the four possibilities. Using a measure of "goodness of fit", the
computer will select the distribution which is closest to the actual
distribution. If that distribution is the one that the student selected, both
the trial counter and the correct counter will be incremented by one, if not,
only the trial counter will be incremented.
The number of points given in this assignment will be the number appearing in the "Maximum Correct" box with a maximum of eight. When satisfied with the score, print this screen to turn in for the assignment and exit the program.
To summarize: 1.) the sampling distribution is a theoretical distribution of a sample statistic. 2.) There is a different sampling distribution for each sample statistic. 3.) Each sampling distribution is characterized by parameters, two of which are and . The latter is called the standard error. 4.) The sampling distribution of the mean is a special case of the sampling distribution. 5.) The Central Limit Theorem relates the parameters of the sampling distribution of the mean to the population model and is very important in statistical thinking.
Introductory
Statistics: Concepts, Models, and Applications
David W. Stockburger
TESTING HYPOTHESES
ABOUT SINGLE MEANS
THE HEADSTART EXPERIMENT
Suppose an educator had a theory which argued that a great deal of learning occurs before children enter grade school or kindergarten. This theory explained that socially disadvantaged children start school intellectually behind other children and are never able to catch up. In order to remedy this situation, he proposes a headstart program, which starts children in a school situation at ages three and four.
A politician reads this theory and feels that it might be true. However, before he is willing to invest the billions of dollars necessary to begin and maintain a headstart program, he demands that the scientist demonstrate that the program really does work. At this point the educator calls for the services of a researcher and statistician.
Because this is a fantasy, the following research design would probably never be used in practice. This design will be used to illustrate the procedure and the logic underlying the hypothesis test. At a later time, we will discuss a more appropriate design.
A random sample 64 fouryear old children is taken from the population of all fouryear old children. The children in the sample are all enrolled in the headstart program for a year, at the end of which time they are given a standardized intelligence test. The mean I.Q. of the sample is found to be 103.27.
On the basis of this information, the educator wishes to begin a nationwide headstart program. He argues that the average I.Q. in the population is 100 (m =100) and that 103.27 is greater than that. Therefore, the headstart program had an effect of about 103.27100 or 3.27 I.Q. points. As a result, the billions of dollars necessary for the program would be well invested.
The statistician, being in this case the devil's advocate, is not ready to act so hastily. He wants to know whether chance could have caused the large mean. In other words, head start doesn't make a bit of difference. The mean of 103.27 was obtained because the sixtyfour students selected for the sample were slightly brighter than average. He argues that this possibility must be ruled out before any action is taken. If not ruled out completely, he argues that although possible, the likelihood must be small enough that the risk of making a wrong decision outweighs possible benefits of making a correct decision.
To determine if chance could have caused the difference, the hypothesis test proceeds as a thought experiment. First, the statistician assumes that there were no effects; in this case, the headstart program didn't work. He then creates a model of what the world would look like if the experiment were performed an infinite number of times under the assumption of no effects. The sampling distribution of the mean is used as this model. The reasoning goes something like this:
POPULATION DISTRIBUTION ASSUMING NO EFFECTS
SAMPLING DISTRIBUTION ASSUMING NO EFFECTS AND N = 64
He or she then compares the results of the actual experiment with those expected from the model, given there were no effects and the experiment was repeated an infinite number of times. He or she concludes that the model probably could explain the results.
Therefore, because chance could explain the results, the educator was premature in deciding that headstart had a real effect.
Suppose that the researcher changed the experiment. Instead of a sample of sixtyfour children, the sample was increased to N=400 fouryear old children. Furthermore, this sample had the same mean (=103.27) at the conclusion as had the previous study. The statistician must now change the model to reflect the larger sample size.
POPULATION DISTRIBUTION ASSUMING NO EFFECTS
SAMPLING DISTRIBUTION ASSUMING NO EFFECTS AND N = 400
The conclusion reached by the statistician states that it is highly unlikely the model could explain the results. The model of chance is rejected and the reality of effects accepted. Why? The mean that resulted from the study fell in the tail of the sampling distribution.
The different conclusions reached in these two experiments may seem contradictory to the student. A little reflection, however, reveals that the second experiment was based on a much larger sample size (400 vs. 64). As such, the researcher is rewarded for doing more careful work and taking a larger sample. The sampling distribution of the mean specifies the nature of the reward.
At this point it should also be pointed out that we are discussing statistical significance: whether or not the results could have occurred by chance. The second question, that of practical significance, occurs only after an affirmative decision about the reality of the effects. The practical significance question is tackled by the politician, who must decide whether the effects are large enough to be worth the money to begin and maintain the program. Even though headstart works, the money may be better spent in programs for the health of the aged or more nuclear submarines. In short, this is a political and practical decision made by people and not statistical procedures.
1.) A significance test comparing a single mean to a population parameter () was discussed.
2.) A model of what the world looks like, given there were no effects and the experiment was repeated an infinite number of times, was created using the sampling distribution of the mean.
3.) The mean of the experiment was compared to the model to decide whether the effects were due to chance or whether another explanation was necessary (the effects were real). In the first case, the decision was made to retain the model. It could explain the results. In the second case, the decision was to reject the model and accept the reality of the effect.
4.) A final discussion concerned the difference between statistical significance and practical significance.
Introductory
Statistics: Concepts, Models, and Applications
David W. Stockburger
Before an experiment is performed, the question of experimental design must be addressed. Experimental design refers to the manner in which the experiment will be set up, specifically the way the treatments were administered to subjects. Treatments will be defined as quantitatively or qualitatively different levels of experience. For example, in an experiment on the effects of caffeine, the treatment levels might be exposure to different amounts of caffeine, from 0 to .0375 mg. In a very simple experiment there are two levels of treatment; none, called the control condition, and some, called the experimental condition.
The type of analysis or hypothesis test used is dependent upon the type of experimental design employed. The two basic types of experimental designs are crossed and nested.
In a crossed design each subject sees each level of the treatment conditions. In a very simple experiment, such as one that studies the effects of caffeine on alertness, each subject would be exposed to both a caffeine condition and a no caffeine condition. For example, using the members of a statistics class as subjects, the experiment might be conducted as follows. On the first day of the experiment, the class is divided in half with one half of the class getting coffee with caffeine and the other half getting coffee without caffeine. A measure of alertness is taken for each individual, such as the number of yawns during the class period. On the second day the conditions are reversed; that is, the individuals who received coffee with caffeine are now given coffee without and viceversa. The size of the effect will be the difference of alertness on the days with and without caffeine.
The distinguishing feature of crossed designs is that each individual will have more than one score. The effect occurs within each subject, thus these designs are sometimes referred to as WITHIN SUBJECTS designs.
Crossed designs have two advantages. One, they generally require fewer subjects, because each subject is used a number of times in the experiment. Two, they are more likely to result in a significant effect, given the effects are real.
Crossed designs also have disadvantages. One, the experimenter must be concerned about carryover effects. For example, individuals not used to caffeine may still feel the effects of caffeine on the second day, even though they did not receive the drug. Two, the first measurements taken may influence the second. For example, if the measurement of interest was score on a statistics test, taking the test once may influence performance the second time the test is taken. Three, the assumptions necessary when more than two treatment levels are employed in a crossed design may be restrictive.
In a nested design, each subject receives one, and only one, treatment condition. The critical difference in the simple experiment described above would be that the experiment would be performed on a single day, with half the individuals receiving coffee with caffeine and half without receiving caffeine. The size of effect in this case is determined by comparing average alertness between the two groups.
The major distinguishing feature of nested designs is that each subject has a single score. The effect, if any, occurs between groups of subjects and thus the name BETWEEN SUBJECTS is given to these designs.
The relative advantages and disadvantages of nested designs are opposite those of crossed designs. First, carry over effects are not a problem, as individuals are measured only once. Second, the number of subjects needed to discover effects is greater than with crossed designs.
Some treatments by their nature are nested. The effect of sex, for example, is necessarily nested. One is either a male or a female, but not both. Current religious preference is another example. Treatment conditions which rely on a preexisting condition are sometimes called demographic or blocking factors.
Introductory
Statistics: Concepts, Models, and Applications
David W. Stockburger
As discussed earlier, a crossed design occurs when each subject sees each treatment level, that is, when there is more than one score per subject. The purpose of the analysis is to determine if the effects of the treatment are real, or greater than expected by chance alone.
An experimenter is interested in the difference of fingertapping speed by the right and left hands. She believes that if a difference is found, it will confirm a theory about hemispheric differences (left vs. right) in the brain.
A sample of thirteen subjects (N=13) is taken from a population of adults. Six subjects tap for fifteen seconds with their righthand ring finger. Seven subjects tapped with their left hand. After the number of taps have been recorded, the subjects tap again, but with the opposite hand. Thus each subject taps with both the right hand and the left hand. They appeared in each level of the treatment condition.
After the data is collected, it is usually arranged in a table like the following:
<div align=center>
Subject 
1 
2 
3 
4 
5 
6 
7 
8 
9 
10 
11 
12 
13 
Right Hand 
63 
68 
49 
51 
54 
32 
43 
48 
55 
50 
62 
38 
41 
Left Hand 
65 
63 
42 
31 
57 
33 
38 
37 