**Econ 57 Spring 1999 Final Examination (150 minutes)**

Answer all 20 questions, leaving tedious calculations undone.

**1.** Three hundred and seventy college students were asked,
"What day would you have the most free time in the evening to watch Japanese
animation?"

Monday | Tuesday | Wednesday | Thursday | Friday | Saturday | Sunday |

40 | 41 | 39 | 46 | 65 | 87 | 52 |

**2.** Ninety-nine college students were asked, "Disregarding obvious gender differences, which biological parent do you physically resemble most closely?" Fifty-one students said their mother and 48 said their father. Are these data statistically persuasive evidence against the null hypothesis that half of all students would say their mother?

**3.** A study of college seniors founds that economics majors had above-average scores on a test of mathematical ability. Identify the most important reason that we should not conclude that the study of economics improves mathematical ability.

**4.** Without doing any calculations, explain why you believe that an ANOVA F test using these three data sets either would or would not have a P value less than 0.05:

Sample1 | Sample2 | Sample3 | |

Sample size | 30 | 30 | 30 |

Mean | 71.33 | 84.73 | 81.77 |

Standard deviation | 13.66 | 6.93 | 10.84 |

**5.** A Pomona College student wanted to see if there is a relationship between the weather and the color of one's clothing. He collected data for 13 days in April on these two variables:

x: | weather, ranging from x = 1 (complete cloud cover with precipitation) to x = 6 (clear, bright, sunny day) |

y: | brightness of clothing worn, ranging from y = 1 (black, navy blue) to y = 6 (white) |

a. Explain why you either agree or disagree with this student's decision to let x be weather and y be clothing, rather than the other way around.

b. Do his regression results indicate that students tend to wear brighter or darker clothing on sunny days?

c. Is the relationship statistically significant at the 5 percent level? How can you tell?

d. What problems do you see with this study?

**6.** A researcher wanted to see if U.S. News & World Report rankings influence the number of applications received by colleges and universities. He obtained the unadjusted data shown below on applications to one small liberal arts college and made a scatter diagram with the number of applications on the vertical axis and the U.S. News ranking the previous year on the horizontal axis (with 1 being the best possible rating). If there is a relationship, would you expect it to be positive or negative? In order to take into account the nationwide growth of college applications, this researcher used data from The Consortium on Financing Higher Education (COFE) showing that during this period college applications had increased by an average of 4 percent per year. He consequently adjusted the number of applications to this college by dividing each value by 1.04:

Number of Applications | ||

Unadjusted | Adjusted | |

1987 | 3192 | 3069.2 |

1988 | 2945 | 2831.7 |

1989 | 3176 | 3053.8 |

1990 | 2869 | 2758.7 |

1991 | 2852 | 2742.3 |

1992 | 2883 | 2772.1 |

1993 | 3037 | 2920.2 |

1994 | 3293 | 3166.3 |

1995 | 3586 | 3448.1 |

Identify his mistake and explain how you would calculate the adjusted values correctly.

**7.** The 794 students in Pomona College's freshmen classes of 1987-1988 and 1988-1989 were separated into two groups: 552 who had attended public high schools and 242 who had attended private schools. Of those from public schools, 48 (8.7%) were Pomona Scholars their freshman year (an A- grade point average); of those from private schools, 13 (5.4%) were Pomona Scholars. Does the observed difference seem substantial? The author calculated the z value to be 1.64 and concluded that although "two classes of Pomona freshmen proved the null hypothesis to be true,...the sample size is minuscule in comparison to the population of college and university students." Did she prove the null hypothesis to be true? Are her results invalidated by the small sample size?

**8.** The equation y = a + bx + e, where y = unemployment rate and x = production, was estimated using monthly 1998 data, entering the months in a computer program in order: January, February, March, and so on. How would the estimates be affected if we enter the data backwards: December, November, October, and so on?

**9.** For which of the following tests do we need equal sample sizes?

a. difference-in-means

b. difference-in-proportions

c. ANOVA

d. chi-square

e. regression

**10.** A French hospital put 412 patients who had suffered one heart attack on a traditional Mediterranean diet (including olive oil, fruit, and bread) for two years; the control group consisted of 358 heart-attack patients who were given a recommended low-fat diet. Four of the patients on a Mediterranean diet and 17 of the patients on low-fat diet suffered a second heart attack during the two years of the study. Is the observed difference substantial and statistically significant at the 1 percent level?

**11.** A dean claims that the salaries of humanities professors Y are determined solely by the number of years of teaching experience X. To see if there is a statistically significant difference in the salaries of male and female humanities professors, the equation Y = a + b_{1}D + b_{2}X + b_{3}DX + e was estimated, using the variable D = 0 if male, 1 if female. Interpret each of the coefficients

a:

b_{1}:

b_{2}:

b_{3}:

a. What advantages does a regression model have over a difference-in-means test that compares average male and female salaries?

b. Describe a specific situation in which a difference-in-means test could show discrimination against females while a regression equation does not.

**12.** Pepys asked Newton which of the following three events is most likely: (a) at least one 6 when six dice are rolled; (b) at least two 6s when twelve dice are rolled; or (c) at least three 6s when eighteen dice are rolled. Answer Pepys' question.

**13.** Researchers counted the number of pages of bankruptcy-lawyer ads in the Yellow Pages of 7 cities that had a large per capita rate of personal bankruptcy filings and in 5 cities that had low rates. There were an average of 9.6 pages in the high-bankruptcy cities and only 3.8 pages in the low-bankruptcy cities, suggesting that bankruptcy-lawyer ads are responsible for the variations in the number of personal bankruptcy filings. Why should we interpret these results cautiously?

**14.** Fifteen percent of U.S. households live below the poverty line. In a third of all U.S. households, a woman is the sole income provider. In 60 percent of poor households, a woman is the sole income provider. Of those households in which a woman is the sole income provider, what fraction are poor?

**15.** A random sample of 59 Pomona College students were asked their grade point averages (GPAs) and whether they had been admitted to the college through the early decision program or as part of the regular admissions process. Here are the results:

Early (12) | Regular (47) | |

Mean | 3.57 | 3.37 |

Standard deviation | 0.22 | 0.33 |

Do these data indicate a substantial and statistically persuasive difference in the GPAs of these two groups of students? Do not assume that the population variances are equal.

**16.** The manufacturer of plain M&M's claims that the overall percentages of the candies of different colors are as follows: 30 percent brown, 20 percent red, 20 percent yellow, 10 percent blue, 10 percent green, and 10 percent orange. A randomly selected 17.6 ounce bag of plain M&M's yielded the following data: 34 percent brown, 18 percent red, 21 percent yellow, 12 percent blue, 7 percent green, and 8 percent orange. The researchers used this chi-square statistic:

With 5 degrees of freedom P[c2 > 2.48] = 0.78. Their conclusion: "The probability of every color matching the null hypothesis was fairly high, and could not be rejected at the 5% level."

a. What is the correct null hypothesis?

b. Identify the error in their calculation of the chi-square statistic.

c. Explain logically why their chi-square statistic is wrong. Do not just say, "They should have used this formula." Explain why the correct formula is more sensible than the formula they used.

**17.** A study compared the estimates made by three different organizations of the number of people killed in Guatemala during the years 1980--1984:

A: 50,000 to 70,000

B: 30,000 to 50,000

C: 40,000 to 60,000

For each organization, the ranges were taken to be two observations that were used to calculate the mean and standard deviation. For Organization A, for example, the numbers 50,000 and 70,000 have a mean of 60,000 with a standard deviation of 14,142. Explain why this procedure is inappropriate.

**18.** If the age at death of a 20-year-old U.S. female is normally distributed with a mean of 76.4 years and a standard deviation of 8.1 years, what is the probability that she will live past 80?

**19.** Explain the misleading features of this figure used by USA Today to show the eight increases in the cost of a first-class stamp between 1971 and 1991:

**20.** Suppose that you meet a 25-year-old man who is is 6 feet, 2 inches tall and that he tells you that he has two brothers, one 23 years old and the other 27 years old. If you had to guess whether he is the shortest, tallest, or in-between of these three brothers, which would you select? Explain your reasoning.