Measures of location

Often it is not possible to list all the data or draw a histogram; it would be nice to have one number which best represents a data set. Often where the data lies is of interest, for which purpose a measure of location is useful. There are several measures of location, which we shall illustrate with the data sets A={2, 9, 5, 3, 8}, B={1, 4, 7, 3, 9, 2}, and the weights of students of a previous lesson.

Minimum

The minimum is the smallest value in a data set. It is often useful to put data in rank order when studying it, in which case A would be represented as {2, 3, 5, 8, 9} and B as {1, 2, 3, 4, 7, 9}, and the rank order of the weights was given before. From these rank order listings, it is immediate that the minimum of A is 2, the minimum of B is 1, and the minimum of the weights is 105.

Maximum

The maximum is the largest datum in a data set. From the above rank order listings, it is immediate that the maximum of A is 9, the maximum of B is 9, and the maximum of the weights is 235.

Midrange

The midrange is the middle value in the sense that it is halfway between the maximum and minimum. It is computed as (maximum+minimum)/2. The midrange for data set A is (9+2)/2=5.5, the midrange for data set B is (9+1)/2=5, the midrange for the weights is (235+105)/2=170. The midrange is easy to calculate, but because it is defined by the two extreme data, it may not be representative of where most of the data lie.

Median

The median is the middle value in the sense that half the data are above it, and half the data are below it. If there are an odd number of data points, the median is the middle value, e.g., 5 for data set A. If there are an even number of data, the median is half way between the two middle values, e.g., (3+4)/2=3.5 for data set B and (155+155)/2=155 for the weights. When finding the median, make sure the data are in rank order, and each value has been listed as often as it occurs. The median is perhaps the best indicator of where the data lies, being truly amid the data values. Some comments on the median by Stephen Jay Gould may be of interest.

Mean

The mean (which is represented as an overscored x which is pronounced x-bar) is calculated by adding up all the data values and dividing by the number of data (usually denoted by n). This formula can be concisely represented using summation notation. For data set A the mean is (2+3+5+8+9)/5=5.4, for data set B the mean is (1+2+3+4+7+9)/6=4.33, for the weights the mean is (105+110+112+113+120+125+125+130+...+235)/30=153.43. The mean reflects all the data, but is widely used because it can be algebraically manipulated and works well with other statistics.

If a data set is symmetric, the mean is equal to the median, which is equal to the midrange.

Exercise: When is the mean versus the median a better indication of where data lies? Would you expect the mean or median age in a community to be larger? The mean or median income? The mean or median cost of a house?

Challenge: How would you calculate the average class size at UNI?

The Mean is more correctly referred to as the arithmetic mean. It is worth noting that there are other notions of average which are better suited to specific problems. Two of these are the geometric mean and the harmonic mean.

Geometric Mean

The geometric mean is the nth root of the product of n numbers, or equivalently the antilogarithm of the arithmetic mean of the logarithms. If one has the data 2, 5, 8; one can calculate the geometric mean of those numbers as:
(2×5×8)^(1/3)=4.31 or
e^((1/3)(ln(2) + ln(5) + ln(8))) = e^((1/3)(.69+1.61+2.08))=4.31
The geometric mean is the appropriate concept of the mean for average interest or inflation rates. If the inflation rates for three successive years are 3%, 12%, and 5%; then the cost of living will be multiplied by 1.03, 1.12, and 1.05 respectively, with a net result of multiplication by 1.03×1.12×1.05 = 1.21128 for the three year period. The cube root of 1.21128 is 1.066, which is the geometric mean, i.e., the inflation rate which, if constant for three years, would produce the same increase. (The inflation rate is actually .066%, since the inflation rate refers to the increase.)

The geometric mean will always be less than or equal to the arithmetic mean.

Harmonic mean

The harmonic mean is the reciprocal of the arithmetic mean of the reciprocals. Thus the harmonic mean of 2, 5, and 8 is:
1/((1/3)((1/2)+(1/5)+(1/8)))=3.64
The canonical problem in which the harmonic mean is employed is that if a car drives 50 miles at 30 mph, 50 miles at 50 mph, and 50 miles at 60 mph, what is its average speed? The total distance is 150 miles, the total time is (50/30)+(50/50)+(50/60)=3.5; hence the average speed is 150/3.5=42.86 mph.
This can be concisely calculated as:
1/((1/3)((1/30)+(1/50)+(1/60)))=42.86

The harmonic mean is always less than or equal to the geometric mean.

Competencies: For the data set {2 5 9 4 6 7 6 8 8}, calculate the mean, median, midrange, maximum, minimum, geometric mean, and harmonic mean

Reflection: For the above data set, which of the above statistics best describes where the data is?

Challenge: When will the mean, median, and midrange be equal? When will the maximum, minimum, and median be equal?

May 2003

return to index

campbell@math.uni.edu