Introductory Statistics: Concepts, Models, and Applications

David W. Stockburger

It is necessary to enhance the language of algebra with an additional notational system in order to efficiently write some of the expressions which will be encountered in the next chapter on statistics. The notational scheme provides a means of representing both a large number of variables and the summation of an algebraic expression.

Suppose the following were scores made on the first homework assignment for five students in the class: 5, 7, 7, 6, and 8. These scores could be represented in the language of algebra by the symbols: V, W, X, Y, and Z. This method of representing a set of scores becomes unhandy when the number of scores is greater than 26, so some other method of representation is necessary. The method of choice is called subscripted variables, written as X_{i}, where the X is the variable name and the i is the subscript. The subscript (i) is a "dummy" or counter variable in that it may take on values from 1 to N, where N is the number of scores, to represent which score is being described. In the case of the example scores, then, X_{1}=5, X_{2}=7, X_{3}=7, X_{4}=6, and X_{5}=8.

If one wished to represent the scores made on the second homework by these same students, the symbol Y_{i} could be used. The variable Y_{1} would be the score made by the first student, Y_{2} the second student, etc.

Very often in statistics an algebraic expression of the form X_{1}+X_{2}+X_{3}+...+X_{N} is used in a formula to compute a statistic. The three dots in the preceding expression mean that something is left out of the sequence and should be filled in when interpretation is done. It is tedious to write an expression like this very often, so mathematicians have developed a shorthand notation to represent a sum of scores, called the summation notation.

The expression in front of the equals sign in what follows is summation notation; the expression that follows gives the meaning of the expression in "longhand" notation.

The expression is read, "the sum of X sub i from i equals 1 to N." It means "add up all the numbers." In the example set of five numbers, where N=5, the summation could be written:

The "i=1" in the bottom of the summation notation tells where to begin the sequence of summation. If the expression were written with "i=3", the summation would start with the third number in the set. For example:

In the example set of numbers, this would give the following result:

The "N" in the upper part of the summation notation tells where to end the sequence of summation. If there were only three scores then the summation and example would be:

Sometimes if the summation notation is used in an expression and the expression must be written a number of times, as in a proof, then a shorthand notation for the shorthand notation is employed. When the summation sign "" is used without additional notation, then "i=1" and "N" are assumed. For example:

The summation notation may be used not only with single variables, but with algebraic expressions containing more than one variable. When these expressions are encountered, considerable attention must be paid to where the parentheses are located. If the parentheses are located after the summation sign, then the general rule is: **DO THE ALGEBRAIC OPERATION AND THEN SUM**. For example, suppose that X is the score on first homework and Y is the score for the second and that the gradebook is as follows:

X |
Y |

5 |
6 |

7 |
7 |

7 |
8 |

6 |
7 |

8 |
8 |

The sum of the product of the two variables could be written:

The preceding sum may be most easily computed by creating a third column on the data table above:

X |
Y |
X * Y |

5 |
6 |
30 |

7 |
7 |
49 |

7 |
8 |
56 |

6 |
7 |
42 |

8 |
8 |
64 |

33 |
36 |
241 |

Note that a change in the position of the parentheses dramatically changes the results:

A similar kind of differentiation is made between and . In the former the sum would be 223, while the latter would be 33^{2} or 1089.

Three exceptions to the general rule provide the foundation for some simplification and statistical properties to be discussed later. The three exceptions are:

**1. When the expression being summed contains a "+" or "-" at the highest level, then the summation sign may be taken inside the parentheses.** The rule may be more concisely written:

Computing both sides from a table with example data yields:

X |
Y |
X + Y |
X - Y |

5 |
6 |
11 |
-1 |

7 |
7 |
14 |
0 |

7 |
8 |
15 |
-1 |

6 |
7 |
13 |
-1 |

8 |
8 |
16 |
0 |

33 |
36 |
69 |
-3 |

Note that the sum of the X+Y column is equal to the sum of X plus the sum of Y. Similar results hold for the X-Y column.

2. The sum of a constant times a variable is equal to the constant times the sum of the variable.

A constant is a value that does not change with the different values for the counter variable, "i", such as numbers. If every score is multiplied by the same number and then summed, it would be equal to the sum of the original scores times the constant. Constants are usually identified in the statement of a problem, often represented by the letters "c" or "k". If c is a constant, then, as before, this exception to the rule may be written in algebraic form:

For example, suppose that the constant was equal to 5. Using the example data produces the result:

X |
c = 5 |

5 |
25 |

7 |
35 |

7 |
35 |

6 |
30 |

8 |
40 |

33 |
165 |

Note that c * 33 = 165, the same as the sum of the second column.

3. The sum of a constant is equal to N times the constant.

If no subscripted variables (non-constant) are included on the right of a summation sign, then the number of scores is multiplied times the constant appearing after the summation. Writing this exception to the rule in algebraic notation:

For example, if C = 8 and N = 5 then:

When algebraic expressions include summation notation, simplification can be performed if a few rules are remembered.

1. The expression to the right of the summation sign may be simplified using any of the algebraic rewriting rules.

2. The entire expression including the summation sign may be treated as a phrase in the language.

3. The summation sign is NOT a variable, and may not be treated as one (cancelled for example.)

4. The three exceptions to the general rule may be used whenever applicable.

Two examples follow with X and Y as variables and c, k, and N as constants: