What software crisis? June 3, 2018

You may question the assertion that we have a software crisis and that it's been ongoing for 50 years. After all, universities turn out computer science graduates by the thousand and there are more programming languages and frameworks than ever before. So here's a longish discussion about why I feel things are not as they should be.

There are a number of rather questionable assumptions made in software engineering, that lead to a tendency to make technical choices before really considering what the overall aim of the project really is. Strategic decisions are made by engineers without intimate knowledge of the problem domain and without  adequate input from users. Rules are laid down and slavishly followed even when it becomes obvious they aren't working. And little consideration is given to future-proofing, for the day when the original team is long gone but the project needs updating and maintaining.

I take issue in particular with the dogmatic insistence that only by applying ever more complex software frameworks and practices can we guarantee a satisfactory outcome. Someone (not Albert Einstein, though he's frequently credited for it) defined insanity as "repeating the same mistakes and expecting different results", yet that is exactly what we do every time. And the first law of holes - famously quoted by Denis Healy, the former British Chancellor - says "When you're in a hole, stop digging".

Complexity and denial

The problem is how we deal with complexity. It's true that in some cases complexity is more apparent than real, but we rarely seek to question if this might be the case with our own projects. Instead, we apply the same standard-sized hammer to all problems, ignoring the possibility that parts or all of a project might work better with a different approach.

In my last employment, a great deal of care was expended on laying down rules and formats for the various processes involved. The project - a large website - had started life in Java, with strong object orientation, domain separation, modularity, subclassing and Spring injection, yet after less than 5 years the code was becoming unmaintainable. Refactoring was always a huge exercise that broke as much as it fixed. Most of the code simply shovelled data from one domain to another and the prevailing mantra was that in order to understand the system all you had to do was "read the code", a phrase that always fills me with the greatest of misgivings as it's generally used as an excuse not to write documentation. The problem with Spring as the underlying framework is that although you can read a single file it's often nigh-on impossible to see where it fits into the system. Not even the debugger can take you through the dozens of proxied software layers between one module and another, so only the very best engineers can really understand how system works. Isn't this what frameworks are supposed to avoid?

When - inevitably - you lose the people with that knowledge it becomes hard to replace them. Newcomers do their best but they lack an accurate mental picture, and without good documentation to provide insight into the minds of the original developers they have little chance of gaining it. If they are not at least the intellectual equals of their predecessors they may never do so. Yet the pressure is on to deliver results, so results they deliver, often using a subtly or fundamentally different approach to the original, making it even harder for others to figure out what's happening and further adding to the sclerotic nature of the code.

This particular project, undergoing endless, rapid and substantial feature growth, would probably have ground to a halt before much longer, but during the couple of years I was there a persuasive case was made for new components to be written using Node.js as it was more flexible and potentially cheaper to deploy in a large cloud hosting environment. It was quickly apparent that leaving half the system in Java while all the new parts were JavaScript didn't make effective use of the engineers involved, so a decision was made to retrain the Java people and migrate the whole thing to Node.

Out of the frying pan...

The technical rules changed overnight. Everything now has to be done with pure functions and the domain structure is defined by example in the first modules created, leaving the programmers little to do except churn out code that could be slotted into the structure. However, where complexity really exists you can't get rid of it just by reorganising things. It's like playing Whack-a-mole; if you push it down in one place it pops up in another. Instead of the hierarchies of Java we now had composition, the downside of which is repetition. Because of the continuing lack of an effective documentation policy this repetition tends to be done in slightly different ways each time, so although modules are similar the details tend to trip up the unwary. So complexity increases inexorably, step by step.

Another problem is in the way data is passed between modules. In Java this is always done with beans, each having a well-defined contract.  Much of the coding work goes into creating beans, filling them with data from other beans using a host of mapper classes then passing them to another part of the system where much the same happens again, resulting in a lot of bulk that's doing very little real work. On top of that is a correspondingly huge pile of unit test classes. It's hellishly clunky but it works - mostly.

In the new company culture, classes are regarded as subversive left-overs from the world of object orientation. Data is passed using composition, with objects comprising collections of attributes with no explicit contract ever expressed. The order and nature of these attributes varies from one file to another in a completely arbitrary and undocumented manner, so the need to read the code becomes inescapable. This can be quite hard when objects are deconstructed, spread and reconstructed with new attributes at each step of the way. By the time you've worked your way through the flow you've quite often forgotten why you started and what you were looking for.

I left the company a few months after the start of this exercise but I heard later the pressure was unrelenting, which doesn't surprise me. Such an environment does not allow the time to step back and review progress, is unlikely to encourage the production of good documentation and causes reliance to an unhealthy extent on the skills of a few exceptional individuals to hold it together.

The odd thing about all this is that the job being done is really quite simple. The user clicks a button on a web page; the system fetches the relevant data and writes bits of it to database tables, schedules actions and returns a response. The UI does most of the presentational work so the server is just handling a few JSON structures, combining and recombining them according to business rules. This is all done on a massively parallel scale, of course, but it's the job of the cloud service to meet capacity demands. Why is the server code so hard to develop and maintain? In spite of new frameworks, technologies and programming techniques coming on-stream every few months, the same things keep happening in project after project, year after year for at least the last 50 years. The way things are usually done results in a dangerously small number of people who really know and understand what's going on. We take a huge risk allowing massive systems to be so dependent on such a small pool of expertise.

The power of language

Here I depart from mainstream thinking and make a fundamental assertion: It's all about language.

The human brain is wired to process language, and the printed word is a very efficient input mechanism. When we read a novel, a newspaper or a paper on a subject about which we know something, we process the incoming information very rapidly and retain quite a lot of it because it slots into what we already know. There's already a place in the brain to park the new information.

Computer software is different, as are mathematical formulae. True, there are some whose brains are wired to take in formulae as readily as the rest of us absorb John Grisham, but they are a minority in a world that demands more and more people to have computer expertise. For the majority, processing a page of JavaScript is no easier than handling Government regulations written in a foreign language. However, we can learn foreign languages and eventually have the doubtful pleasure of experiencing the full beauty of those regulations, but the same doesn't apply to Java, JavaScript and the rest.

This is because computer "languages" are not really languages. Not in the same sense as English, French, German or Italian, at least. Imagine that in order to communicate in English you had to use just 100 words (or fewer) and build your sentences by combining those words in different ways, mixing upper and lower case and larding them with a profusion of symbols to reduce wordiness? Can you imagine how hard it would be to construct - or read - the Gettysburg Address or War and Peace?

But that's what programmers are expected to do.

The vision and the reality

Projects are conceived in people's minds, and before they can be implemented they have to be conveyed to a development team, which means writing them down. (Well in some cases they aren't, but then we're asking for some pretty spectacular disasters.) The project requirements are written in English (I'll use the word to refer generically to a human language), initially in a narrative form that describes in the most general of terms what is wanted.

Next, the requirements are defined more formally, but still in English. Use cases (stories) and other defining documents get written. These are then passed to the programming team, who convert them to computer code.

You may never have thought about it, but there's a huge gap in there. I was first aware of it nearly 30 years ago when automating a section of a factory production line. The factory engineers had produced a clear description of what the line should do but were completely baffled by the computer code needed to implement their requirements. One of them asked this memorable question:

"If computers are so smart, why can't they understand what we want?"

This question struck home at the time and has stayed with me ever since. In that time computers have become vastly more powerful than those around when the question was asked, which makes it even more pertinent today. Why can't they understand what we want? Why do we have to translate our needs into a low-level form we find hard to understand, just so the computer can?

The answer lies in language. Computer "languages", not really being languages at all, have no way to express things at a level human beings feel comfortable with.

Actually, that's not quite true. If you use relational databases you'll know all about SQL. Here's a real language; real in that every word in its syntax relates in some way to databases. And to nothing else; it doesn't try to be a Swiss Army Knife in the way Java or JavaScript do. The generic term for this kind of language is Domain Specific Language, or DSL for short. And there aren't enough of them.

Tell it like it is

The gap I mentioned is between the use cases and the code we end up with, and it can often be filled by a DSL. For most domains it's possible to devise a "ubiquitous language" that can be understood both by domain experts and by computers. Like SQL, such a language is dedicated to its own domain (and has little use outside it), is strongly typed and typically rich with real-world objects such as Users, Products or Records, each of which corresponds to a Java bean or a JSON object in a conventional implementation. With such a DSL it's possible to convert the user specifications to an unambiguous description of the entities and processes involved in the domain and run this directly.

There are several advantages to this approach. Firstly there's speed of development, since a great part of the usual development stage is avoided. Secondly there's reliability. DSLs comprise a core that's heavily used - and after a while completely bug-free - plus a set of keyword handlers that follow standard principles in their implementation. In addition, domain experts as well as programmers will have exposure to the code, and as Linus Torvalds says, "given enough eyeballs, all bugs are shallow". And finally there's long-term reliability, assured by making it far easier to recruit engineers who can understand the code well enough to take over the project.

Against this, the DSL has to be built and equipped with sufficient features to do its job properly. This is the job of a specialist programming team, but the techniques are well established and not hard to learn. A fairly comprehensive DSL can be constructed by one programmer in a few months. If enhancements are needed later it's relatively easy for new people to take over because they don't have to be domain experts as well as programmers.

Breaking the cycle of failure

So what I'm saying is that if you can express the domain requirements in some form of unambiguous English script, don't waste time and resources translating this into "standard" computer code but instead build a DSL to run the script directly. Some direct gains are

- the business logic, as expressed by the scripts,  is 'owned' by domain experts at all stages of the programme.
- all code is verifiable by anyone having a good understanding of the business requirements.
- programmers are mainly responsible for maintaining the language itself - a small part of the total codebase.
- maintenance and bug fixing become easier and more reliable.
- the long-term integrity of the system is not dependent on maintaining a pool of key engineers.

Of course, the success or otherwise of a project is not solely due to just code quality, but we have to start somewhere and fix what can be fixed. Continuing to repeat the mistakes of the past 50 years is not a sensible option.

In the articles that follow I'll go into more detail on some of the topics and issues I've highlighted above.

Categories: Uncategorized