[If some values are taken multiple times, calculations may be simplified be noting that (2+1+3+5+2+3+3)/7 = (1+2×2+3×3+5)/(1+2+3+1).]
Information in histograms
When data is displayed in a histogram, the exact values of the data are lost.
It is reasonable to ask how much we know about the original data. For the
histogram which we drew from the weights of students,
10_| _______
| | |
| _______| |
| _______| | |
| | | | |
5_| | | | |
| | | | |
| _______| | | |
| | | | | |
| | | | | |_______ _______
__|__|_______|_______|_______|_______|_______|_______|____
| | | | | |
100 125 150 175 200 225
weights of students in pounds
Weights of Students in Statistics Course
we shall ask four questions:
The above histogram specifies that three data are in the range (87.5,112.5),
seven are in the range (112.5, 137.5), eight are in the range (137.5, 162.5),
ten are in the range (162.5, 187.5), one is in the range (187.5, 212.5), and
one is in the range (212.5, 237.5). If the data were as small as possible
consistent with these constraints (and assuing integer values), three data
would be equal to 88, seven data would be equal to 113, eight data would be
equal to 138, ten data would be equal to 163, one datum would be equal to 188,
and one datum would be equal to 213. These values yield the mean 139.67,
which is the least possible mean consistent withthe above histogram. (The
greatest possible mean can be calculated in a similar manner.)
There are 30 data represented in the above histogram, the median is halfway
between the values of the 15th and 16th in rank order. Three of the data are
in the first class, seven of the data are in the second class, and eight of
the data are inthe third class; in particular the 15th and 16th data must
both be in the third class. Therefore the 15th and 16th data are at least
138, and the median must be at least 138. (Similarly, the median is at most
162.)
For the "best" estimate, we shall assume that the data is uniformly spread
within each class. (There are other ways to define "best", which will produce
different results.) This is equivalent for purposes of calculating the mean
to putting all the individuals in each class at the class mark. Hence the "best"
estimate for the mean is obtained by assuming three individuals have weight
100,
seven individuals have weight 125, eight individuals have weight 150, ten
individuals have weight 175, one individual has weight 200, and one individual
has weight 225. This provides a mean of 151.67.
What is the "best" estimate for the median of the
data
which produced this histogram?
A histogram is constructed so that area is proportional to the number of
individuals, hence seeking the value with half the individuals above it and
half the individual below it is seeking the value where the area under the
histogram to the right of it is equal to the area under the histogram to the
left of it. Each of the rectangles in the histogram has a base of 25, and the
heights are 3, 7, 8, 10, 1, and 1. Thus the total area of the histogram is
3x25+7x25+8x25+10x25+1x25+1x25=750. We want the value with area 750/2=375 to
the right of it and 375 to the left of it. The area of the first rectangle is
75, which is less than 375; the area of the first two rectangles is 75+175=250,
which is still less than 375; the area of the first three rectangles is
75+175+200=450 which is greater than 375. Therefore we need an area of
375-(75+175)=125 from the thid rectangle to get the middle. Since the height
of the third rectangle is 8, we must use 125/8=15.62 of its base to get the
requisite area. The third rectangle begins at 137.5; upon adding 15.62 to
that
we get 153.12, which is the "best" estimate for the median.
Competencies: Give upper and lower bounds, and the best estimates, for
the mean and median of the data represented in the following histogram:
10_|
|
|
| _______
| | |_______
5_| | | |
| | | |_______
| | | | |
| _______| | | |
| | | | | |
__|__|_______|_______|_______|_______|
| | | |
125 150 175 200
weights of students in pounds
Weights of Students in Mathematics Course
Reflection: What can you say about the maximum, minimum, Q1, and Q3
weights?
Challenge: What can you say about the variance and standard eviaton of
the weights?
return to index
Questions?