1. The short answer is communication of information. Although some information may be lost by grouping data, the viewer can easily comprehend where the data lies. All the raw data, especially in tabular form, would be too much to comprehend.
2. It would be appropriate if there were so many classes that you could not easily comprehend the data. Examples of groupings might be grouping models of cars by manufacturer (Ford, Chrysler, GM, etc) or style (sedan, station wagon, SUV, etc.). Another circumstance for grouping would be identifying the desserts people chose as fruit, cake, pie, ice cream, pudding. As the car example shows, it is important that the grouping does not lose the information you are interested in.
3. Midrange is easiest to calculate if you have all the data, but only reflects the extreme individuals, hence does not indicate where where most of the data lies; it may help if you want to graphically display the data (it gives the center of your axix). The median is the best indicator of the typical datum, giving the value at the middle position; it requires you to put data in rank order, which may be more difficult that calculation the mean or midrange (but is easier if you do not have a calculator). The mean is the most common average, hence it is good because it is often what people expect, it also works well with mathematical manipulations. The mean is appropriate if you want to be able to recover the total (e.g., average weight of persons in an elevator or trucks on a bridge), it is easy to calculate if te total is already known. It is impacted by all data including extreme data, hence may not represent the `average' individual.
4. Percentiles essentially provide a rank order, while z-scores give the distance from the mean in units of average distance. An advantage of percentiles is that theman on the street can understand them, whereas a z-score does not mean anything to most people. The 50th percentile is the median, hence percentiles give position relative to the median, while z-scores are based on distance from the mean. Percentiles are useful cutoffs if you you want to fill quotas (e.g., the top 10% of each high school class, you know how many people are involved if you know the class sizes). z-scores are useful if you want to identify whether the best individual is truly outstanding (much better than the rest) (If the best person in a class had a z-score of 3 he would be truly outstanding, while a z-score of 2 would indicate there would be others of similar quality in the class).
5. Consider the data set {1, 3, 3, 3, 3, 3, 3, 5}. The mean is 3 and the standard deviastion (dividing by n) is 1. This illustrates how Tchebychev's theorem is a worst case scenario: in order for some data to be far from the mean, the other data must me near the mean (and not contribut to the average distance).
6. An experiment is something which generates an outcome, the text further specifies that the outcome cannot be predicted with certainty. (I do not want to get into a philosophical discussion about whether this is an appropriate distinction.) If you live in a dorm, entering the bathroom and seeing whether a shower is free is an experiment. seeing who sits down next to you (perhaps nobody) in class or at lunch. Looking out the window (and seeing rain or sun or snow or ?). Brushing your teeth and counting how many fall out.
7. The color of the blouse worn by the girl on your left and on your right (assuming they do not live together; in some cultures (e.g. Thailand) the color of clothing is coordinated with the day of the week, but this is not the practice in America). This is a case where independent means the decisions are made independently. The colors of two cars in a crash (assuming people do not get into accidents because they are thinking about the color of the other car). The ages of partners in a bridge game (should be independent if partners are randomly assigned, but would not be independent if spouses were partners).
8. A binomial experiment has only two possible outcomes, hence a poll would be a binomial experiment if there were only two possible answers (or you grouped the responses into two categories (e.g., {stronly agree, agree, indifferent} vs {disagree, strongly disagree}. [YOu could also mention that it is generally a hypergeometric rather than binomial experiment (sampling without replacement).
9. The probability of being less than -3.9 is 0.0000 to four decimal places (and the probability of being less than 3.9 is 1.0000 to four decimal places. Since that is usually sufficient accuracy, the table beyond 3.9 is not of interest, we simply use 0 for z-scores less than -3.9 and 1 for z-scores greater than 3.9.
10. The short answer, and perhaps only answer, is that exact binomial calculations my be tedious and impractical. The error from using the approximation may well be less than the recorded decimal places.
11. You reject the null hypothesis (the observed value is significant) when the observed value is far from the hypothesized value (hence the tail area (P-value) is small). If an observation is significant at the 5% level, then it certainly lies in the larger 10% tail and it is significant at the 10% level. It may or may not be far enough from the hypothesized value to lie in the 1% tail (be significant at the 1% level).
12. You reject the null hypothesis when the observed value is far from the hypothesized value; this can be interpreted as outside the confidence interval (centered at the observed value). In particular, if you are performing a two tailed test of hypothesis at the 5% significance level, you will reject the null hypothesis if the hypothesized value does not lie in the 95% confidence interval.
13. You reject the null hypothesis if the P-value is small or the z-score is large (for a one-tailed test you need to pay attention to the sign of the z-score). P-values tell you exactly how rare the observed value is, most people will not know what a z-score means without consulting a table (which is how the P-value was obtained). It merits mention that the calculation of P-values incorporates whether it is a one or two tailed test, while the critical z-scores reflect incorporate whether it is a one or two tailed test.
14. This is the justification for the normal approximation to the binomial distribution. The mean (i.e., the proportion) and standard deviation of the binomial distribution are easy to calculate, from which z-scores for use with the normal distribution are readily calculated.
15. Both tails of the normal distribution correspond to the right tail of the chi-square distribution (observations are far from the hypothesized value, reject the null hypothesis). There is no notion of a one-tailed test for the chi-square distribution because there is no notion of greater than or less than in the multidimensional context. The left hand tail of the chi-square distribution corresponds to being near zero in the normal distribution, perhaps the agreement with the null hypothesis is "too good".
16. They both have the same sign, which comes from their numerator SSxy. Since SSxx and SSyy measure the spread of the data in the x and y directions, their discrepancy measuers the discrepancy of the spreads. If one converts to standard deviations, it is immediate that the slope of the regression line is the correlation multiplied by the ratio of the standard deviation of the y coordinates to the standard deviation of the x coordinates.
17. Correlation merely measurs association. There may be causality, and that is often what we are interested. But a significant correlation merely suggests we should look for causality, it does not prove causality.
18. When does more mean less? Price of an item versus number sold (a basis tenet of economics, but there are exceptions (Giffin goods). Outside temperture versus heating oil consumed (in winter). Length of a race, versus speed of participants.