Introductory Statistics: Concepts, Models, and Applications
David W. Stockburger
The knowledge and understanding that the scientist has about the world is often represented in the form of models. The scientific method is basically one of creating, verifying, and modifying models of the world. The goal of the scientific method is to simplify and explain the complexity and confusion of the world. The applied scientist and technologist then use the models of science to predict and control the world.
This book is about a particular set of models, called statistics, which social and behavioral scientists have found extremely useful. In fact, most of what social scientists know about the world rests on the foundations of statistical models. It is important, therefore, that social science students understand both the reasoning behind the models, and their application in the world.
A model is a representation containing the essential structure of some object or event in the real world.
The representation may take two major forms:
1) Physical, as in a model airplane or architect's model of a building
2) Symbolic, as in a natural language, a computer program, or a set of mathematical equations.
In either form, certain characteristics are present by the nature of the definition of a model.
1. Models are necessarily incomplete.
Because it is a representation, no model includes every aspect of the real world. If it did, it would no longer be a model. In order to create a model, a scientist must first make some assumptions about the essential structure and relationships of objects and/or events in the real world. These assumptions are about what is necessary or important to explain the phenomena.
For example, a behavioral scientist might wish to model the time it takes a rat to run a maze. In creating the model the scientist might include such factors as how hungry the rat was, how often the rat had previously run the maze, and the activity level of the rat during the previous day. The model-builder would also have to decide how these factors interacted when constructing the model. The scientist does not assume that only factors included in the model affect the behavior. Other factors might be the time-of-day, the experimenter who ran the rat, and the intelligence of the rat. The scientist might assume that these are not part of the "essential structure" of the time it takes a rat to run a maze. All the factors that are not included in the model will contribute to error in the predictions of the model.
2. The model may be changed or manipulated with relative ease.
To be useful it must be easier to manipulate the model than the real world. The scientist or technician changes the model and observes the result, rather than doing a similar operation in the real world. He or she does this because it is simpler, more convenient, and/or the results might be catastrophic.
A race car designer, for example, might build a small model of a new design and test the model in a wind tunnel. Depending upon the results, the designer can then modify the model and retest the design. This process is much easier than building a complete car for every new design. The usefulness of this technique, however, depends on whether the essential structure of the wind resistance of the design was captured by the wind tunnel model.
Changing symbolic models is generally much easier than changing physical models. All that is required is rewriting the model using different symbols. Determining the effects of such models is not always so easily accomplished. In fact, much of the discipline of mathematics is concerned with the effects of symbolic manipulation.
If the race car designer was able to capture the essential structure of the wind resistance of the design with a mathematical model or computer program, he or she would not have to build a physical model every time a new design was to be tested. All that would be required would be the substitution of different numbers or symbols into the mathematical model or computer program. As before, to be useful the model must capture the essential structure of the wind resistance.
The values, which may be changed in a model to create different models, are called parameters. In physical models, parameters are physical things. In the race car example, the designer might vary the length, degree of curvature, or weight distribution of the model. In symbolic models parameters are represented by symbols. For example, in mathematical models parameters are most often represented by variables. Changes in the numbers assigned to the variables change the model.
Of the two types of models, physical and symbolic, the latter is used much more often in science. Symbolic models are constructed using either a natural or formal language (Kain, 1972). Examples of natural languages include English, German, and Spanish. Examples of formal languages include mathematics, logic, and computer languages. Statistics as a model is constructed in a branch of the formal language of mathematics, algebra.
Natural and formal languages share a number of commonalties. First, they are both composed of a set of symbols, called the vocabulary of the language. English symbols take the form of words, such as those that appear on this page. Algebraic symbols include the following as a partial list: 1, -3.42, X, +, =, >.
The language consists of strings of symbols from the symbol set. Not all possible strings of symbols belong to the language. For instance, the following string of words is not recognized as a sentence, "Of is probability a model uncertainty," while the string of words "Probability is a model of uncertainty." is recognized almost immediately as being a sentence in the language. The set of rules to determine which strings of symbols form sentences and which do not is called the syntax of the language.
The syntax of natural languages is generally defined by common usage. That is, people who speak the language ordinarily agree on what is, and what is not, a sentence in the language. The rules of syntax are most often stated informally and imperfectly, for example, "noun phrase, verb, noun phrase."
The syntax of a formal language, on the other hand, may be stated with formal rules. Thus it is possible to determine whether or not a string of symbols forms a sentence in the language without resorting to users of the language. For example, the string "x + / y =" does not form a sentence in the language of algebra. It violates two rules in algebra: sentences cannot end in "=" and the symbols "+" and "/" cannot follow one another. The rules of syntax of algebra may be stated much more succinctly as will be seen in the next chapter.
Both natural and formal languages are characterized by the ability to transform a sentence in the language into a different sentence without changing the meaning of the string. For example, the active voice sentence "The dog chased the cat," may be transformed to the sentence "The cat was chased by the dog," by using the passive voice transformation. This transformation does not change the meaning of the sentence. In an analogous manner, the sentence "ax + ay" in algebra may be transformed to the sentence "a(x+y)" without changing the meaning of the sentence. Much of what has been taught as algebra consists of learning appropriate transformations, and the order in which to apply them.
The transformation process exists entirely within the realm of the language. The word proof will be reserved for this process. That is, it will be possible to prove that one algebraic sentence equals another. It will not be possible, however, to prove that a model is correct, because a model is never complete.
The scientific method is a procedure for the construction and verification of models. After a problem is formulated, the process consists of four stages.
As mentioned previously, a model contains the essential structure of objects or events. The first stage identifies the relevant features of the real world.
The symbols in a formal language are given meaning as objects, events, or relationships in the real world. This is the process used in translating "word problems" to algebraic expressions in high school algebra. This process is called representation of the world. In statistics, the symbols of algebra (numbers) are given meaning in a process called measurement.
Sentences in the language are transformed into other statements in the language. In this manner implications of the model are derived.
Selected implications derived in the previous stage are compared with experiments or observations in the real world. Because of the idealization and simplification of the model-building process, no model can ever be in perfect agreement with the real world. In all cases, the important question is not whether the model is true, but whether the model was adequate for the purpose at hand. Model-building in science is a continuing process. New and more powerful models replace less powerful models, with "truth" being a closer approximation to the real world.
These four stages and their relationship to one another are illustrated below.
In general, the greater the number of simplifying assumptions made about the essential structure of the real world, the simpler the model. The goal of the scientist is to create simple models that have a great deal of explanatory power. Such models are called parsimonious models. In most cases, however, simple yet powerful models are not available to the social scientist. A trade-off occurs between the power of the model and the number of simplifying assumptions made about the world. A social or behavioral scientist must decide at what point the gain in the explanatory power of the model no longer warrants the additional complexity of the model.
The power of the mathematical model is derived from a number of sources. First, the language has been used extensively in the past and many models exist as examples. Some very general models exist which may describe a large number of real world situations. In statistics, for example, the normal curve and the general linear model often serve the social scientist in many different situations. Second, many transformations are available in the language of mathematics.
Third, mathematics permit thoughts which are not easily expressed in other languages. For example, "What if I could travel approaching the speed of light?" or "What if I could flip this coin an infinite number of times?" In statistics these "what if" questions often take the form of questions like "What would happen if I took an infinite number of infinitely precise measurements?" or "What would happen if I repeated this experiment an infinite number of times?"
Finally, it is often possible to maximize or minimize the form of the model. Given that the essence of the real world has been captured by the model, what values of the parameters optimize (minimize or maximize) the model. For example, if the design of a race car can be accurately modeled using mathematics, what changes in design will result in the least possible wind resistance? Mathematical procedures are available which make these kinds of transformations possible.
Building a Better Boat - Example of Model-Building
Suppose for a minute that you had lots of money, time, and sailing experience. Your goal in life is to build and race a 12-meter yacht that would win the America's Cup competition. How would you go about doing it?
Twelve-meter racing yachts do not have to be identical to compete in the America's Cup race. There are certain restrictions on the length, weight, sail area, and other areas of boat design. Within these restrictions, there are variations that will change the handling and speed of the yacht through the water. The following two figures (Lethcer, Marshall, Oliver, and Salvesen, 1987) illustrate different approaches to keel design. The designer has the option of whether to install a wing on the keel. If a wing is chosen, the decision of where it will be placed must be made.
You could hire a designer, have him or her draw up the plans, build the yacht, train a crew to sail it, and then compete in yachting races. All this would be fine, except it is a very time-consuming and expensive process. What happens if you don't have a very good boat? Do you start the whole process over again?
The scientific method suggests a different approach. If a physical model was constructed, and a string connected to weights was connected to the model through a pulley system, the time to drag the model from point A to point B could be measured. The hull shape could be changed using a knife and various weights. In this manner, many more different shapes could be attempted than if a whole new yacht had to be built to test every shape.
One of the problems with this physical model approach is that the designer never knows when to stop. That is, the designer never knows that if a slightly different shape was used, it might be faster than any of the shapes attempted up to that point. In any case the designer has to stop testing models and build the boat at some point in time.
Suppose the fastest hull shape was selected and the full-scale yacht was built. Suppose also that it didn't win. Does that make the model-building method wrong? Not necessarily. Perhaps the model did not represent enough of the essential structure of the real world to be useful. In examining the real world, it is noticed that racing yachts do not sail standing straight up in the water, but at some angle, depending upon the strength of the wind. In addition, the ocean has waves which necessarily change the dynamics of the movement of a hull through water.
If the physical model was pulled through a pool of wavy water at an angle, then the simulation would more closely mirror the real world. Assume that this is done, and the full-scale yacht built. It still doesn't win. What next?
One possible solution is the use of symbolic or mathematical models in the design of the hull and keel. Lethcer, et. al. (1987) describe how various mathematical models were employed in the design of Stars and Stripes. The mathematical model uses parameters which allow one to change the shape of the simulated hull and keel by setting the values of the parameters to particular numbers. That is, a mathematical model of a hull and keel shape does not describe a particular shape, but a large number of possible shapes. When the parameters of the mathematical model of the hull shape are set to particular numbers, one of the possible hull shapes is specified. By sailing the simulated hull shape through simulated water, and measuring the simulated time it takes, the potential speed of a hull shape may be evaluated.
The advantage of creating a symbolic model over a physical model is that many more shapes may be assessed. By turning a computer on and letting it run all night, hundreds of shapes may be tested. It is sometimes possible to use mathematical techniques to find an optimal model, one that guarantees that within the modeling framework, no other hull shape will be faster. However, if the model does not include the possibility of a winged keel, it will never be discovered.
Suppose that these techniques are employed, and the yacht is built, but it still does not win. It may be that not enough of the real world was represented in the symbolic model. Perhaps the simulated hull must travel at an angle to the water and sail through waves. Capturing these conditions makes the model more complex, but are necessary if the model is going to be useful.
In building Stars and Stripes, all the above modeling techniques were employed (Lethcer et. al., 1987). After initial computer simulation, a one-third scale model was constructed to work out the details of the design. The result of the model-building design process is history.
In conclusion, the scientific method of model-building is a very powerful tool in knowing and dealing with the world. The main advantage of the process is that model may be manipulated where it is often difficult or impossible to manipulate the real world. Because manipulation is the key to the process, symbolic models have advantages over physical models.