By Thomas R. Schori, Ph.D., and Michael L. Garee, Principals, Millennium Marketing Research, 808 E. Ironwood, Normal, IL 61761-5239.
In the wake of "Deep Blue," IBMs super desktop computer, beating renowned world chess champion Garry Kasparov, renewed cries of doom were heard about how computers soon will overtake us humans in the "brain power" department. Dont you believe it. The computer is, was and always will be nothing more than a tool to help us humans get the most out of our own brains. It will only do what we tell it to do, but, ah, theres the rub. . . . And, thats particularly the case when it comes to using computer-generated statistics. Lets consider an example.
Soon after he was a "newly minted" PhD, one of our senior consulting partners, Tom Schori, was under contract by the U.S. Army to document the software he had developed for a new statistical procedure, "Versatile MANOVA: A repeated measures multivariate analysis of variance."
While working with the Army, Schori was asked to visit one of their research laboratories that was struggling with a problem they had encountered with a helicopter night vision goggle study. Helicopters, being very unstable flight platforms, are difficult to fly without the pilots having a visual reference point, which of course normally is lacking during night flying conditions. Therefore, the lab scientists were investigating the versatility and usefulness of three different types of night vision goggles.
To evaluate the various types of goggles, the scientists conducted a study in which they looked at pilot performance as a function of goggle type. Then, they exactly replicated the study two times. So far, so good. The only problem was that each experiment picked a different set of goggles as resulting in the most effective pilot performance! Clearly, something was awry, but what?
The scientists said that their helicopters were designed to simultaneously collect data on about 100 flight performance variables, each of which was sampled at a rate of something like 100 times per second. Clearly, a lot of data. They then said that theyd conducted the identical experiment three times and had analyzed it each time using a stepwise multiple discriminant analysis. Each time, significant differences (p < .05) were detected as a function of goggle type. That is, statistically significant differences were detected in flight performance as a function of what type goggles the pilots were wearing.
The reason this ambiguity occurred proved to be rather simple. First, it resulted from the scientists having used the wrong type of discriminant analysis. They had used stepwise multiple discriminant analysis instead of the garden variety multiple discriminant analysis. Second, even though the scientists knew that it wasnt wise to consider all 100 flight performance variables simultaneously, they opted not to use their own "brain power" to reduce the number of variables to a meaningful, relevant number. Instead, they simply opted to have the "smarter" computer eliminate the superfluous variables for them. And, finally, they exacerbated the whole situation by not relying enough on their own innate intelligence to quickly deduce the nature of the problem regarding the ambiguous test results.
Any stepwise procedure throws out the variables for which statistically significant differences (p < .05) arent evident. To the naïve researcher (of which there appears to be many more than probably should be tolerated!), weeding out the insignificant variables might appear to make sense. But the result is that one risks amplifying "noise" within the data. And thats precisely what the Army study had done. Heres how.
When p < .05 is used as a criterion of statistical significance, it means that the researcher is willing to accept the fact that 5% of the time differences will be proclaimed to be statistically significant, when in fact they are only chance differences. Consequently, when 100 dependent variables are considered, we would expect to find significant differences as a function of experimental condition (type of goggles) five times merely by chance.
Now, enter the first culprit in our story, stepwise discriminant analysis. This function includes only those variables on which significant differences are apparent. Lets suppose that the truth was that no one set of night vision goggles made for any better flight performance than did any other set. Even so, in each analysis wed expect five variables to be entered for which any differences due to goggle type were due to chance. As a result, it should come as no surprise that the stepwise procedure detected significant differences in each analysis, and that in each analysis a different type of goggle was identified as producing the best flight performance. Using stepwise discriminant analysis, these researchers set themselves up to achieve ambiguous results.
Schori suggested that the scientists decide which of the 100 dependent variables they believed to be most pertinent to effective flight performance, and then analyze the data from each experiment using ordinary discriminant analysis rather than stepwise. They ultimately selected 12 key measures to consider. When they re-ran their analyses, using ordinary discriminant analysis, lo and behold, the same set of goggles was associated with the best flight performance in each analysis.
The moral of this story? Computers are smart, but were a heck of a lot smarter. Computers can spew out statistics 100 times (or more!) faster than we mere humans, but the computer doesnt "know" if they are the right statistics for the situation. Thats what we humans are supposed to use our brains for. Computers never get tired, but we humans do. Sometimes we get lazy, too, and that can be costly!