Interrelations of summary statistics
The median is the second quartile, hence it is natural to use the median and
the interquartile range together. We have also seen the three quartiles used
with the maximum and minimum in the five number
summary. The mean is used in the definition of the standard deviation,
hence the mean and standard deviation are often used together. The rule of thumb (empirical rule) that 2/3 (68%) of data lies within one standard deviation unit of the mean, 95% lies within two standard deviation units of the mean, and 99.7% lies within three standard deviation units of the mean employs the mean and standard deviation. The midrange
may be used with the range, in which case the maximum and minimum can be
calculated. There are some basic properties of these statistics one should
know.
- minimum <= Q1 <= Q2 <= Q3 <= maximum
Note that all the inequalities are weak, there would be equality if thall the
data had the same value.
- minimum <= midrange <= maximum
- minimum <= mean <= maximum
A data set is symmetric if it is a mirror image about its middle. An example
of a symmetric data set is {1, 2, 4, 5, 6, 8, 9}. If a data set is symmetric,
its
mean equals its median equals its midrange. If there are more extreme
individuals on one side of the middle than the other, a data set is called
skewed in that direction. For example, the data set {1, 3, 4, 5, 9} is skewed
to the right. Since the midrange depends only on the maximum and minimum, the
mean is calculated using all the data, and the median is not affected by the
values of the maximum and minimum; if a data set is skewed to the right, the
midrange will generally be larger than the mean, which will be larger than the
median. (In fact, many introductory texts use the mean being greater than the median as the
definition of skewed to the right; although this is not always
consistent with the standard definition of skewness, you may use the mean being greater than the
median as the definition of skewed to the right.) For the data set {1, 3, 4,
5, 9} the midrange is 5, the mean is 4.4,
and the median is 4. If a data set is skewed to the left, the inequalities
will be reversed.
Skewness is manifested in stem-and-leaf plots, histograms, and box-and-whisker plots. In histograms, data will slowly trail off in the direction of skewness as opposed to more abruptly ending in the other direction, this produces a tail in the direction of skewness. Stem-and-leaf plots are essentially histograms, hence a similar tail can be seen. In box-and-whisker plots, the whisker is longer in the direction of skewness. Most non-negative data is skewed to the right, because it cannot have a tail extend into negative values, weights of students is an example of data that is skewed to the right.
Members of a crew
provide an example of a distribution skewed to the left.
It is often of interest whether the data distribution trails off in both directions from a single high point, in which case it is called unimodal, or there are a couple of high points with a valley in between, in which case it is called bimodal. One text required that both high points be the same height for bimodality, but that is not generally required. Multimodality with multiple high points can also occur. Bimodality often occurs when two distinct populations are combined, such as the heights of men and women.
When using percentiles or z-scores, one should remember that "average" is the median (50th percentile) when percentiles are used, but "average" is the mean when z-scores are used.
- The maximum is equal to the midrange plus half the range.
- The minimum is equal to the midrange minus half the range.
It is important to recognize the different roles of measures of location (mean, median, minimum, etc.) and measures of spread (range, standard deviation, etc.). If a constant is added to all the data values, it changes the location, but not the spread. If all the data values are multipled by a consant, both the location and spread are multiplied by that constant. Caveat: the variance is multiplied by the square of that constant because the variance measures the square of the distance from the mean.
Applets:
An interactive
histogram illustrates how the summary statistics are related to a
histogram.
Histogram
explorer provides another way to shape a histogram and look at the summary
statistics.
Competencies:
Is the data set {2 5 9 4 6 7 6 8 8} symmetric, skewed to the right, or skewed
to the left?
return to index
Questions?