There was a Russian trapeze act that was famous for its high flying antics. One day their stage manager noticed that at the outer point of their swings they were beyond the edge of the safety net. Since the act was his prime draw he was concerned about what might happen if one of them would fall. His solution: he cut the net up in to pieces and tied a section of it to each person so that there would always be a safety net below them.
Absurd though this is, we often do the same thing in writing code: we do things intended to make us safe that ultimately kill us. At one company I'm familiar with there was a very simple home-grown broadcast message delivery system that very occasionally failed for unknown reasons. When it did, the system would hiccup a bit and people would lose messages for a minute or so before it would come back to life. The end user impact was noticeable, but there was no permanent harm.
Since the company felt that it would never be possible to handle all the "edge cases" in the current code it made sense to switch to a third-party messaging system that included a powerful redundancy mechanism. When the next release of the software came out customer complaints multiplied. Although the third party software was reliable there were numerous bugs in the integration and as a result the system would crash and data would be lost. The problems persisted for over a year and a half before the system stabilized. Analysis showed that the number of failures in the first month alone exceeded the projected number of failures in the old system had it been left running (unfixed) for another five years. In addition, the time and energy spent making the switch prevented other enhancements and bug fixes from getting done.
This kind of thing happens in the hardware world too. The system is up for three years and the tape-based-backup is ok and then one day someone decides they need to install something safer and more sophisticated like GDPS (Globally Dispersed Parallel Sysplex) or a VERITAS based system and when they go live disaster strikes bringing the system down for an extended period.
The place it happens the most, though, is in human systems. A few years back I encountered a situation where a company was having a problem with job performance. The COO decided that what was needed was an extremely detailed tracking system that would ensure that people were doing the requisite work. The results were predictable, the productivity got much worse after the best people left.
The danger in all of these stories is that they are all about situations where the companies involved would have been better off if they would have done nothing, but the moral isn't intended to be "fear change". The idea is to be careful about making changes and to be sure that the changes are in line with the scope of the problem. Most important, though, is to be prepared to back out of a decision if an idea isn't working, and to listen carefully to the people doing the work. In the first example the engineers knew the new system wasn't working, but once a decision has been made people can get very reluctant to abandon it (regardless of the facts).
As a final counterpoint, consider this: the good thing about never changing is that it produces certainty, the bad thing is that it is certainty of failure (but that's another story).









