1. Howard Wainer, a statistician with the Educational Testing
Service recounted the following:
My phone rang just before Thanksgiving. On the other end was Katherine Tibbetts;
she is involved in program evaluation and planning for the Kamehameha Schools
in Honolulu. Ms. Tibbetts explained that the school was being criticized by
one of the trustees of the Bishop Estate (which funds and oversees the schools)
because the schools first graders that finish at or beyond the 90th
percentile nationally in reading slip to the 70th percentile by 4th grade.
This was viewed as a failure in their education. Ms. Tibbetts asked if I knew
of any longitudinal research in reading with this age group that might shed
light on this problem and so aid them in solving it. I suggested that it might
be informative to examine the heights of the tallest first graders when they
reached fourth grade. She politely responded that I wasnt being helpful.
Explain his suggestion to Ms. Tibbetts.
2. For each of the following studies, identify the type of graph (histogram,
side-by-side boxplots, scatter diagram, or time series graph) that would be
MOST appropriate.
a. Have test scores in Econ 57 risen or fallen in the past 20 years?
b. Do colleges that accept a large percentage of their students in early-decision
programs have higher yields (percentage of accepted students that enroll)?
c. Can starting salaries be predicted from grade point averages?
d. Are final exam scores independent of homework scores?
e. Is there more dispersion in the starting salaries of economics majors or
English majors?
3. Jurors are not supposed to discuss a case with anyone until after
all of the evidence has been presented. They then retire to the jury room for
deliberations, which sometimes begin with a secret-ballot vote of the jurors.
Suppose that we have a case where each selection of a person for the jury has
a 0.9 probability of picking someone who, after hearing all of the evidence,
would initially vote guilty and a 0.1 probability of picking someone who would
initially vote not guilty. What is the probability that the initial vote will
be
a. unanimously guilty if there is a 12-person jury
b. unanimously guilty if there is a 6-person jury?c. at least three-quarters
guilty if there is a 12-person jury?
4. In horseshoes, each player pitches two shoes in each inning. If
the player pitches two ringers, this is called a double. A study of the 2000
World Horseshoe Pitching championships looked at whether players were more likely
or less likely to throw a double after throwing a double in the previous inning.
For example, female pitchers threw 1641 doubles and 1691 nondoubles after a
nondouble inning, and threw 2142 doubles and 1666 nondoubles after a doubles
inning. Set up an appropriate statistical test. What is your null hypothesis?
5. Explain why these data are not convincing evidence that the way
to seize more illegal drugs is to conduct fewer searches: In 2000, [the
U.S. Customs service] conducted 61 percent fewer searches than in 1999, but
seizures of cocaine, heroin and Ecstasy all increased. From 1998 to 2000, hit
rates for whites and blacks increased by about 125 percent, from less than 7
percent to 15.8 percent, while hit rates for Latinos increased more than fourfold,
from 2.8 percent to 13.1 percent.
6. It has been argued that people may be able to postpone their deaths
until after the celebration of an important event. Peter Lee looked at Jewish
members of Sinai Memorial Chapel who died within 4 weeks of Passover between
January 1, 1987 and December 31, 1995. Set up the appropriate statistical test.
weeks before (-) or after (+) Passover
|
-4
|
-3
|
-2
|
-1
|
1
|
2
|
3
|
4
|
number of deaths
|
119
|
102
|
90
|
99
|
100
|
97
|
78
|
88
|
7. In October 2001, Reuters Health reported that
In a study of more than 2,100 secondary school students, researchers found
that boys who used computers to do homework, surf the Internet and communicate
with others were more socially and physically active than boys who did not
use computers at all.
On the other hand, boys who used computers to play games tended to exercise
less, engage in fewer recreational activities, and have less social support
than their peers. The total number of hours the boys spent on the computer
had little bearing on their lifestyle.
The findings, published in the October issue of the Journal of Adolescent
Health, suggest that parents should monitor how their sons use their computersand
not just how much time they spend in front of the screen, lead author Dr.
Samuel M.Y. Ho, of the University of Hong Kong, told Reuters Health.
What problem do you see with this conclusion?
8. A carnival game has four boxes, into which the contestant tosses
four balls:
Each box is deep enough to hold all four balls and the contestant is allowed
to toss each ball until it lands in a box. The contestant wins the prize if
each box has one ball. Assuming that balls are equally likely to land in any
box (this is a game of chance, not skill), what is the probability of winning
the game?
9. Consider a guesser on a multiple-choice test, where each question
has 5 possible answers. The test might have 5, 20, or 50 questions. Will an
increase in the number of questions increase or decrease the probability that
this guesser will get
a. exactly 20% of the questions right?
b. between 10% and 30% of the questions right?
c. more than 50% of the questions correct?
10. Consider a multiple-choice question that has n possible answers.
A person who does not answer the question gets a score of 0. A person who answers
the question gets +1 point if the answer is correct and x points if the
answer is incorrect. For what value of x is the expected value of a guessers
score equal to 0?
11. A survey of starting salaries obtained the data below. Display these
data in a histogram.
starting salary ($1000s)
|
number of people
|
30-40
|
50
|
40-50
|
10
|
50-70
|
40
|
12. Use the data in the preceding exercise to show how to calculate
a 95% confidence interval for the population mean. (Assume that all of the observations
in each interval are at the interval midpoint; for example, that the 50 observations
in the 30-40 interval are each equal to 35.) Is this confidence interval invalid
if the histogram in the preceding exercise is not approximately normally distributed?
Why or why not?
13. A recent study examined the effect of baseball payrolls on a teams
record:
Y
|
= 280
|
+ 100 X1
|
- 20 X2
|
- 7 X3
|
|
[14]
|
[4.5]
|
[3.2]
|
[0.4
|
where Y = number of games won, average value 81; X1 = team batting average,
average value .286; X2 = team earned run average, average value 3.24; X3 = team
payroll, average value 64.5 million; and the t values are in brackets. The variable
X1 is a measure of the teams offensive abilities; the variables X2 is
a measure of a teams defense. The data consisted of one observation for
each team during the 2000 baseball season.
a. Is the coefficient of X1 statistically significant? How can you tell?
b. Interpret the coefficient of X3.
c. Explain why this equation does not tell us whether teams with higher payrolls
win more games.
14. The pointspread set by bookmakers is a prediction of the margin
of victory in a game. For example, in its first game of the 2000 National Football
League season, Minnesota was a 4-point favorite to beat Chicago. Bettors who
pick Minnesota win their bets if Minnesota wins by more than 4 points; those
who pick Chicago win their bets if Chicago wins or if Minnesota wins by fewer
than 4 points. Because Minnesota won 30-27, those who picked Chicago won their
bets.
In a recent paper, Marcus Lee found that over the years 1993-2000 a strategy
of betting against teams that beat their pointspreads the previous week would
have won 902 wagers and lost 831 wagers. Test the null hypothesis that this
strategy is equivalent to flipping a coin to pick winners.
15. Marcus Lee also looked at wagers on the total line, the total number
of points scored in a game and found that a strategy of betting that teams that
go over or under the total line will do the opposite the next week would have
won 883 wagers and lost 860 wagers. Below are two ways that we might calculate
a single p value using the data from both types of wagers. Set up the formulas
for calculating these two p values.
a. The probability of winning at least 902 of 1733 wagers and also winning
at least 883 of 1743 wagers.
b. The probability of winning at least 1785 of 3476 wagers.
16. Which of the two p values in the preceding exercise do you suppose
is smaller? Explain your reasoning.
17. Before a magician flips a coin, you believe that there is a 1/3
chance it has heads on both sides, a 1/3 chance that it has tails on both sides,
and a 1/3 chance that it has a head on one side and a tail on the other. The
coin lands heads. What are your revised probabilities that this coin has heads
on both sides, has tails on both sides, and has a head on one side and a tail
on the other?
18. A comparison of the 1997, 1998, and 1999 batting averages of the
10 major league baseball players with the highest batting averages in 1998 concluded
that, The F value is 2.74 and the p value is 0.0808. Since the F value
is smaller than the 3.39 value at the end of the textbook, and the p value is
larger than 0.05, these three samples are not statistically significant at the
5% level. Consequently the data reject the null hypothesis that all population
means are equal. Identify the error in this conclusion.
19. In a recent study of stocks that were very popular in the 1970s,
Jeff Fesenmaier collected data on the December 1972 price earnings ratios (P/E)
and the percentage annual rate of return (R) on each stock from 1973 through
2000. Below are some of his data. What statistical test would you use to show
that stocks with high P/E ratios tended to have low annual returns? (You do
not have to actually do the test, just identify it clearly.) Be sure to identify
the null hypothesis.
|
1972 P/E
|
R
|
Polaroid
|
90.7
|
-6.07
|
McDonald's
|
85.7
|
11.85
|
Walt Disney
|
81.6
|
10.58
|
Avon Products
|
65.4
|
6.30
|
Kresge (now Kmart)
|
54.3
|
1.20
|
Simplicity Pattern
|
53.1
|
-2.01
|
Xerox
|
48.8
|
-1.99
|
Eastman Kodak
|
48.2
|
2.68
|
Coca-Cola
|
47.6
|
14.64
|
IBM
|
37.4
|
8.65
|
Procter & Gamble
|
32.0
|
12.27
|
PepsiCo
|
29.3
|
16.17
|
General Electric
|
26.1
|
16.85
|
Philip Morris
|
25.9
|
18.00
|
Gillette
|
25.9
|
14.89
|
20. A statistics textbook gives this example:
Suppose a downtown department store questions forty-nine downtown shoppers
concerning their age....The sample mean and standard deviation are found to
be 40.1 and 8.6, respectively. The store could then estimate m, the mean age
of all downtown shoppers, via a 95% confidence interval as follows:
Thus the department store should gear its sales to the segment of consumers
with average age between 37.7 and 42.5.
Explain why you either agree or disagree with this interpretation: 95 percent
of downtown shoppers are between the age of 37.7 and 42.5.