On January 28th, 1986, with the world watching, NASA launched the space shuttle challenger from the Kennedy Space Center in Florida in freezing temperatures. The Challenger had 6 critical parts known as O-rings which prevent burning rocket fuel from leaking out of booster joints. There had been seven incidents of distressed O-rings in 23 trail run leading up to the launch date.
Below is the plot of flights with the incidents of O-rings thermal distress as a function of temperature.
NASA reported there is nothing irregular in the distribution of O-rings distress over the spectrum of joint temperatures between 53 degrees Fahrenheit and 75 degrees Fahrenheit. No previous liftoff temperature was under 53 degrees F. Although tremendous extrapolation must be done from the given data to assess risk at 31 degrees F (Temperature at the day of launch), it is obvious even to the layman "to foresee the unacceptably high risk created by launching at 31 degrees F.”
NASA went ahead with the launch, based in their review of these failures. But on the day of the launch, the O-rings failed to seal properly in the unusually cold conditions and the shuttle broke apart 73 seconds into its flight, killing all the seven astronauts aboard.
Can you think of any data they missed ? Would you have thought to ask for the missing data ? If you had, you might have seen there was an underlying trend predictive of part failure at lower temperatures.
The number of incidents is numeric count data and requires a special type of modeling called Binomial regression. Because this is a count data and not continuous data, you cannot use linear regression. The type of data dictates the analysis method. If you used linear regression to draw a straight line through the data, you’d predict negative failure incidents for hot temperatures which makes no sense.
In this case, since we know each shuttle has exactly 6 O-rings, and each O-ring either fails or doesn't, a binomial logistic regression model is appropriate to use. You would model the probability of failure for an individual O-ring given the temperature.
I fitted the Binomial logistic regression model to the data predicting 5 out of 6 O-rings could fail in freezing temperatures.
Statistical science could have provided valuable input to the launch decision process. Would you have wanted to see the above chart the night before the launch ? The Challenger story offers a chilling scenario of a common phenomenon that we often look at data that appears to encode the information we need while discarding data we assume would’t be relevant. We don’t want to speculate that if only they looked at the full dataset, they would have made the right decision. There is really no way to know this. Other factors were surely at play. Rather, I just want to point out that there are often stories to uncover when we further argue with the data.
Most businesses don’t argue with their data. Instead, they have a culture of acceptance. The effect of this is a slow burn where data projects continue to fail without important questions being asked during the project.
The data is available for download from the University of California, Irvine, Machine Learning Repository.