When I was reading the observations on the challenger shuttle disaster https://science.ksc.nasa.gov/shuttle/missions/51-l/docs/rogers-commission/Appendix-F.txt, I begun wondering about the value of consensus when we are challenged to solve a wicked and novel problem. The shuttle was built using data and there was consensus on how and what to build, or at least enough consensus to create agreements on which processes to follow.
I am quoting the conclusions of the observations here:
If a reasonable launch schedule is to be maintained, engineering often cannot be done fast enough to keep up with the expectations of originally conservative certification criteria designed to guarantee a very safe vehicle. In these situations, subtly, and often with apparently logical arguments, the criteria are altered so that flights may still be certified in time. They therefore fly in a relatively unsafe condition, with a chance of failure of the order of a percent (it is difficult to be more accurate). Official management, on the other hand, claims to believe the probability of failure is a thousand times less. One reason for this may be an attempt to assure the government of NASA perfection and success in order to ensure the supply of funds. The other may be that they sincerely believed it to be true, demonstrating an almost incredible lack of communication between themselves and their working engineers. In any event this has had very unfortunate consequences, the most serious of which is to encourage ordinary citizens to fly in such a dangerous machine, as if it had attained the safety of an ordinary airliner. The astronauts, like test pilots, should know their risks, and we honor them for their courage. Who can doubt that McAuliffe was equally a person of great courage, who was closer to an awareness of the true risk than NASA management would have us believe? Let us make recommendations to ensure that NASA officials deal in a world of reality in understanding technological weaknesses and imperfections well enough to be actively trying to eliminate them. They must live in reality in comparing the costs and utility of the Shuttle to other methods of entering space. And they must be realistic in making contracts, in estimating costs, and the difficulty of the projects. Only realistic flight schedules should be proposed, schedules that have a reasonable chance of being met. If in this way the government would not support them, then so be it. NASA owes it to the citizens from whom it asks support to be frank, honest, and informative, so that these citizens can make the wisest decisions for the use of their limited resources. For a successful technology, reality must take precedence over public relations, for nature cannot be fooled.
This reminds me of software engineering. We create agreements and processes which we follow and then we feel secure with our work if the checklist is complete. If the checklist is completed, then the system must be reliable, robust, maintainable and safe.
I disagree that consensus, checklists and data (I will name these “the agreements toolkit”) are enough for building complex reliable and fault tolerant systems. Such systems include wicked problems that require black box thinking. Definitely, the agreements toolkit is valuable as for example it enables teams to iterate fast and solve most problems fast and efficiently. But, when the agreement toolkit is all we have and trust for solving wicked problems then it leads to consensus optimisation.
What if instead we chose to optimise for decisions? Based on these observations and as an experiment I will start having post-mortems (which I will call pre-mortems) not only when things go wrong but also after successful projects and releases. Behind every success there is luck and things that were never optimised. With pre-mortems I want to find out what was the role of luck in the project’s success and what issues remained unsolved.