I started musing about the complexity of modern computer systems after talking with a colleague about recent computer crashes. Those not familiar with modern computer systems expect them to work better than they do and seem to take delight in handing out blame where none lies. Having worked for decades in computers both large and small I am astounded by a couple of things: that complex computer systems actually work pretty well, and that they don’t crash more.
Humans seem to naturally enjoy reducing things to black and white, or two or three choices. Listen to the evolutionary biologists and they’d claim all sorts of things to prove this, such as living in caves and having only a few foods to eat or whatnot. I don’t entirely believe them either – they sound too much like looking for the hypothesis in the evidence, rather than the other way around.
In my professional life I have been through perhaps 3 or 4 major incidents where massive numbers of people could have been impacted and each time a very focused and dedicated technical team have averted disaster. These things are not easy and take real hard work. I’d rather rely on people who have experienced real complex issues than those armchair generals that never saw a battle.
Case in point: you’d perhaps expect that major companies know exactly what systems they have installed, how many applications run on them and where they are? You’d be wrong, as time and time again I hear stories or know by personal experience that they do not know entirely where things are. This varies from the incidental (we didn’t know that we were running that much software) through to the monumental: a major corporation thinking it had about 8,000 systems and finding that they had … 14,000. That is quite a difference from what they expected.
Putting aside our human tendency to over-simplify, what happens as systems get even more complex? Can we get computers to manage other computers, can we adopt methods and architectures which are much more emergent rather than the procedural and directive that we currently use? What happens when the system becomes more than the sum of its parts and can’t be governed except by itself?