Other Descriptive Statistics
There are other measures of location and spread than have been discussed previously. A few which
may be of interest are:
The mode is the data value which occurs most often. It is seldom of interest
with quantitative data, but is the only notion of average which is possible
for qualitative (categorical) data. If there are three red balloons, two green
balloons and five yellow balloons, the mode color is yellow. We could not
compute a mean, median, or midrange color.
There are at least two motivations for the weighted mean. One is concerned
with averaging average values, which is best explaind with an example: If I have a class of 30 students for whom the mean score on a test is 75, and
another class of 50 students for whom the mean score is 80; then the sum of all
the scores is (30)(75)+(50)(80), hence the overall mean is
((30)(75)=(50)(80))/(30+50) = 78.12. This is obtained by "weighting" the means
by 30 and 50, respectively, and dividing by the sum of the wieghts. The result
is called the weighted mean of the means. The weights employed could also be
fractional. This is illustrated by the second motivation for weighted means.
Sometimes one feels that some data is more important/accurate than other data.
For example, If one wanted to know what the average temperature in July is,
he might feel that temperatures from recent years are more important than
temperatures form less recent years. If the mean temperatures in July were
72 (1996), 69 (1995(, 75 (1994), 73 (1993), and 68 (1992); one could calculate
a weighted mean:
(72x1 + 69x.8 + 75x.6 + 73x.4 + 68x.2)/(1+.8+.6+.4+.2)=71.67
Determining the average size of classes at a university is another problem
where weighted means may be appropriate.
The geometric mean is the nth root of the product of n numbers, or equivalently
the antilogarithm of the arithmetic mean of the logarithms. If one has the
data 2, 5, 8; one can calculate the geometric mean of those numbers as:
(2x5X8)^(1/3)=4.31 or
e^((1/3)(ln(2) + ln(5) + ln(8))) = e^((1/3)(.69+1.61+2.08))=4.31
The geometric mean is the appropriate concept of the mean for average interest
or inflation rates. If the inflation rates for three successive years are
3%, 12%, and 5%; the geometric mean of 1.03, 1.12, and 1.05, which is 1.066
gives the annual inflaton rate, 6.6%, which if constant for three years
would produce the same increase in prices.
The geometric mean will always be
less than or equal to the arithmetic mean.
The harmonic mean is the reciprocal of the arithmetic mean of the reciprocals. Thus the harmonic
mean of 2, 5, and 8 is:
1/((1/3)((1/2)+(1/5)+(1/8)))=3.64
The canonical problem in which the harmonic mean is employed is that if a car
drives 50 miles at 30 mph, 50 miles at 50 mph, and 50 miles at 60 mph, what is
its average speed? The total distance is 150 miles, the total time is
(50/30)+(50/50)+(50/60)=3.5; hence the average speed is 150/3.5=42.86 mph.
This can be concisely calculated as:
1/((1/3)((1/30)+(1/50)+(1/60)))=42.86
The harmonic mean is always less than or equal to the geometric mean.
It is now appropriate to distinguish between the two types of quantitative
data, interval and ratio. Interval data is data which can be represented with
real numbers such as temperature, altitude, or weight. Ratio data is data which
is identified with positive real numbers such as height and weight (temperatures
and altitudes can be positive or negative, heights and weights cannot be
negative). Ratio data is interval data, but the converse does not hold. The
term ratio refers to the fact that you can characterize a stone as being twice
as heavy as another, but you cannot refer to a day as being twice as hot as
another.
A motivation for the coefficient of variation is if one is wondering whether
there is more variation in weight among men than mice. Since the heaviest mouse
weighs less than the standard deviation of weights of man, the standard
deviation of the weights of mice must be less. But the question can be
reposed as relative variation, which is what the coefficient of variation
measures: the coefficient of variation is the ratio of the standard deviation
to the mean. For one of my classes, the mean weight was 152 pounds, with a
standard deviation of 31 pounds; the mean height was 69.3 inches, with a standard deviation of 3.86 inches. Hence the coefficients of variation were 31/152=.20 and
3.86/69.3=.056 for weight and height respectively. Note that that the
coefficient of variation is independent of what units the data were measured in
(pounds or kilograms or stones; inches or feet or metres).
return to index
Questions?