## Learning Statistics by Doing Statistics

A Chinese proverb is: "I hear, I forget. I see, I remember. I do, I understand." If so, maybe a good way to learn statistics is by doing statistics. To help students develop their statistical reasoning, a traditional introductory statistics course was modified to incorporate a semester-long sequence of projects, with written and oral reports of the results. Student test scores improved dramatically and students were overwhelmingly positive in their assessment of this new approach. This experience is described in a paper Learning Statistics by Doing Statistics.
Hundreds of possible projects are listed at the end of the chapters of this textbook: Introduction to Statistical Reasoning, WCB/McGraw-Hill, 1998. Here are some additional ideas. If you have any good projects, please e-mail them to me and I'll post them here (with your name).
Explaining Data
1. Use computer software to simulate 1,000 spins of a roulette wheel that has 18 red slots, 18 black slots, and 1 green slot. Record the fraction of the spins that are red after 10, 100, and 1,000 spins. Repeat this experiment 100 times and then use three box plots to summarize your results.
2. Use pre-race odds and post-race payoff data from the sports section of a newspaper to compare the average returns from betting on favorites and long shots in horse races.
Hypothesis Tests
1. Ask 50 female students these four questions: Among female students at this college, is your height above average or below average? Is your weight above average or below average? Is your intelligence above average or below average? Is your physical attractiveness above average or below average? Ask 50 male students these same questions (in comparison to male students). Try to design a survey procedure that will ensure candid answers. For each gender and each question, test the null hypothesis that p = 0.5. (Alternatively, this can be done as a female/male chi-square test.)
2. Ask 100 randomly selected students if, aside from the obvious gender differences, they most resemble their biological mother or father. (Evolutionary psychologists suggest that people are more likely to say that a child resembles the father—apparently so that the father will be more likely to protect and care for the child.) For each gender, test the null hypothesis that p = 0.5. (Alternatively, this can be done as a female/male chi-square test.)
3. Calculate the percentage change in the Dow Jones Industrial Average from the close on Thursday the 12th to the close on Friday the 13th for every Friday the 13th beginning in 1980. Is the average percentage change substantial? Determine the two-sided p value for a test of the null hypothesis that the mean percentage change is 0.
4. Ask 30 people to tell you when 30 seconds has elapsed, perhaps offering a prize if they are within 1 second. Think of a way to conduct this experiment that avoids this potential problem: if you time the 30 seconds by looking at your watch, the subject may be able to draw inferences from your facial expressions. Do your data indicate that people are more likely to underestimate or overestimate the passage of time?
5. People experiencing an earthquake often grossly overestimate how long the quake lasts; for example reporting that a 6-second quake lasted 30 seconds. Show a random sample of people some memorable event, such a snippet of loud music or you dancing, and then ask them how long this event lasted. Do your data indicate that people are more likely to underestimate or overestimate the how long the event lasted?
6. Use a newspaper or national news magazine to collect pictures of the winner of the last presidential election, some printed a month before the election and an equal number printed a month after the election. All the pictures should be full face shots of approximately equal sizes. Do not otherwise screen the photos for being attractive or unattractive. Ask a random sample of students to pick the photo that they consider to be the most flattering and the photo that they consider to be the least flattering. Are their choices equally likely to be from the pre-election and post-election categories?
7. Use a newspaper or national news magazine to collect pictures of the Richard Nixon, some printed a month before his defeat in the 1962 election for governor of California and an equal number printed a month after the election. All the pictures should be full face shots of approximately equal sizes. Do not otherwise screen the photos for being attractive or unattractive. Ask a random sample of students to pick the photo that they consider to be the most flattering and the photo that they consider to be the least flattering. Are their choices equally likely to be from the pre-election and post-election categories?
Comparing Two Samples
1. Compare the prices of men's and women's T-shirts.
2.  Compare the prices of men's and women's shaving cream
3. Select a grocery store chain and compare the prices at stores in two different areas of town.
4. Redo Exercise 8.2 in Introduction to Statistical Reasoning, using data for a longer time period than 1929–1989.
5. Estimate and compare the average words per sentence in the New York Timesand in a local newspaper.
6. Ask a random sample of students or professors to grade two short essays. One might be better organized, but have more grammatical mistakes. For half of the sample, the two names on the essays are female (or, even better, the language in the essays reveals the authors to be female). For the other half, the names or language are male. Compare the grades given to the females with those given to the males. (This can also be done as an ANOVA or chi-square test.)
7. Use secret ballots to survey student preferences for the next president of the United States. On half of the ballots, list the likely Republican candidate first; on the other half, list the likely Democratic candidate first.
8. Go to a local cemetery and and compare the number and size of male and female tombstones.
9. Post a sign on the main entrance to a campus building requesting the use of a less convenient entrance; for example, "Please use the door on the north side of building." From an inconspicuous location, observe how many people ignore the sign and use the main entrance and how many people do not use the main entrance. Compare the behavior of students and professors or males and females. Try to pick a building and time when traffic is light, so that large numbers do not try to enter simultaneously.
10. Choose two categories A and B based on either race or gender. Select 10 photographs of Category A people and 10 photographs of Category B people. These should not be pictures of celebrities or anyone else that your subjects will recognize. Show each subject 5 of the A pictures and 5 of the B pictures mixed together. Then show each subject all 20 pictures and ask them to select the 10 pictures that they had been shown previously. Compare the accuracy of subjects who are in Category A with the accuracy of subjects who are in Category B.
11. What fraction of New York Timesobituaries are for New Yorkers? Compare this with the fraction of obituaries in another major newspaper that are for local citizens; for example, Los Angeles Times obituaries for Californians.
Using ANOVA to Compare Several Means
1. Estimate and compare the average words per sentence in People, Time, and New Republic.
2. Compare the length of New York Timesobituaries with the occupations of the deceased.
3. Compare the length of the descriptions in Who's Who in Americawith the person's occupation.
4. Test whether the length of the descriptions in American Men of Sciencedepends on the person's scientific field.
Chi-square Tests for Categorical Data
1. Administer the following four tests to at least 50 subjects, and then apply a chi-square test to the six possible pairs of tests: a with b, a with c, and so on:
a. Ask the subject to stand with his or her back to you. Then ask the subject to jump around in a single motion to face you. Record whether the person jumps clockwise (pushing off with a dominant left foot) or counterclockwise (pushing off with a dominant right foot).
b. Ask the subject to look at an object 10 feet away through a tube made with the hands held a foot in front of his or her face. Close or cover first one eye and then the other and record whether the subject can still see the object through the tube when the left eye is open (left-eye dominance) or when the right eye is open (right-eye dominance).
c. Ask the subject to put his or her hands together behind the head, with the fingers interlaced. Record whether the thumb on the bottom (the dominant thumb) is from the left or right hand.
d. Ask the subject whether he or she is lefthanded or righthanded.
2. The nine positions on a baseball team can be divided into four categories: pitcher, catcher, the four infielders, and the three outfielders. Collect all the data you can on major league baseball managers and test the null hypothesis that, among those managers who played baseball, the probabilities of having played in these four categories are 1/9, 1/9, 4/9, and 3/9, respectively.
Simple Regression
1. Go to a local grocery store and collect these data for at least 75 breakfast cereals: cereal name; grams of sugar per serving; and the price per ounce (or gram). If the store you select does not have at least 75 breakfast cereals, then collect data from another store too. Use these data to estimate the simple regression model with price as the dependent variable and sugar as the explanatory variable.