Simple systems cause downtime

Aug 01, 2022

Yes, it is common to over-engineer software. Software always ends up doing more than it should.

You won’t solve these problems by refactoring to Erlang.
You won’t solve them with Kubernetes, cloud architecture, or micro-services. These are all technical solutions to software problems. But, the issue of software doing more than it needs to is not a technical problem.

Technicians avoid these non-technical problems by advocating for simple systems. There are many loud voices advocating for simple systems:

  • KISS
  • Yagni
  • Don’t optimize prematurely

These are valuable principles. However, I see people starting to under-engineer their solutions in the name of the above principles. This post is a counter-point to the well beaten path of simple software.

Under-engineering 🔧🙉

Simplicity in software means a lot of different things to many different people.

I want to start a blog. What’s the simplest solution?

  • Apache, FTP, and some static HTML?
  • Ghost running off of a docker container?
  • Github pages and Hugo?
  • The basic WordPress tier?

The first problem with simple systems is that everyone has a different idea of what simple means. I want to talk about a particular flavor of simple software that I call under-engineering.

Under-engineering is the folk wisdom that the naive approach is the best.

Why spend months on an autoscaling serverless ecosystem with IAC pipelines and instrumentation? A cronjob on a VM does the job.

The idea of under-engineering hides behind the KISS and YAGNI principles.

KISS and YAGNI and their relatives are important ideas. Yet, some people misuse these principles in two ugly ways:

  • Fighting the people anticipating non-technical problems.
  • As a defense against using new techniques, tools, or even new versions of old techniques and tools

Don’t be that guy

A tangent on keeping up to date:

Software is moving. If you do not move along with it on your terms, someone will make you move along on their terms. Friendly reminder that .netcore 3.1 and java 16 are almost end of life

Requirements 😭

But nibblebit, you can solve non-technical problems with clear, concise, and complete requirements. If the developer and the user agree on clear requirements, that’s that! It’s then the developer’s job to translate them into a technical solution.

To that, I would say you are correct! And I hope you enjoy your first real software project. 🤪

Software projects are exploratory 🗺🧭🗻

The reality of professional programming is that clients don’t know their requirements. For users with money, a software project is more often than not an exploratory exercise. If it weren’t, they would not be speaking to a developer; they would be buying an existing solution.

Requirements will change over time, and under-engineering will lock your project down. Changing course will come with refactoring and technical debt. The further down the road you go, the more painful taking turns will be.

You can’t know what they need 🔮

If you are a programmer, your primary problem domain is software. If you feel more qualified to make requirements than the end-user, you are making programming tools.

You cannot create requirements alone.

When an auctioneer, barber, or accountant comes with a problem, there will be caveats. The domain expert will be as blind to these caveats as you are. Both of you will realize it at an inconvenient time. .to their business, which you will not account for until it’s too late. Under-engineering will cause the system to break down. You did not account for unknown edge cases, and the system did not handle them with grace.

There’s always a corpse 💀

You will always deal with legacy. It’s 2022; out in the wild, everything has a preexisting software condition. This means you will deal with a legacy for every problem you face. That legacy system will be old and unsupported. If you’re lucky, there will be no manuals. If not, the documentation contains legends, gossip, and blatant lies.

No one will let you plug the new wizbang stuff in with a big bang release. You will always need to dry run your system on production. Measure the load and transfer traffic old the new, bit by bit.

Under-engineering this approach will stall your deployment. You will be running between the show floor and the drawing room. Things will break in production.

Money and time⌛💰

There’s the budget problem. People will not pay you to set up a two-year research project. You won’t have time to collect samples of an under-documented UBL dialect. You can’t analyze all possible edge cases before beginning to code. You will have to account for those contingencies on the go. You will not get two months of post-production. You won’t get to polish, complete unit coverage, and set up integration tests. Users will want to run the first thing hot off the shop and be very upset when it breaks. They will be wrong, and then they will hire someone else. You will have to account for all that during development time.

Under-engineering will speed up the process but lock you out of quality control. It will leave you with exactly one shot at solving all possible problems.

Car Toaster 🍞🚘

But nibblebit, sometimes users will tell you they want a toaster in their car. What they mean is: I don’t have time to make breakfast in the morning.

Cool, but you can’t get hired for a job and send the client on a holistic journey of self-discovery. You’re not a lifestyle guru; you’re a programmer. You can’t tell the user what they need; you must show them.

Under-engineering forces you to keep explaining your clever solution instead of iterating. Iterating from a bad place to a good place is better than shortcutting to a preferred outcome.

What do? 🤔

So what nibbles, we should strive to make each system component as complex as possible? No. Over-engineering is a problem. It’s a bigger problem than under-engineering is. This post is not about attacking KISS and YAGNI. This post is a ward against irresponsible development.

But nibblebit, something something… move fast and break things!

I’m not sure where I heard it, but the response would be that you’re left moving slow with broken things.

Rather than making simple systems, create resilient systems 🔧➡ 🧱

Restricting a system is not a foolproof way to prevent it from breaking. Yes, fewer moving parts mean fewer breaking parts. But when they break, they die.

When software breaks, someone needs to know the when, the how, and the why. It needs to fail with grace. It needs to report to all stakeholders in different ways.

It needs to be open for instrumentation and auto-healing. It needs to be deployable on any number of devices and runtimes. These variables will change over time. Taking the most minimal approach will leave you blind and immobile.

Predict problems. Look at your project starting from the system level, all the way down to writing a line of code. The system should be able to handle any edge case with grace and tell you how it failed and recovered. That will not be a simple process, but it should be impossible to break.

Rather than optimizing for simplicity, make data available 🔧➡ 🛰📡🔬🔭🔦

Focus less on jailing your code. Nobody wants to touch it anyway once it’s done. Instead, focus more on making the data your code produces accessible. All results and side effects of your code should be available to anyone. This information should be unambiguous. Your future self will be grateful.

Rather than trying to clarify requirements, set technical limits. 💡 ➡ 👮‍♀️

You can’t expect what a product should and shouldn’t do in the future, so don’t try to. Instead, start setting technical limits:

  • max concurrent users
  • max database size
  • max retention period
  • min 100ms response times
  • uptime guarantee

These are often only tangential to business logic but give you room to breathe. It prevents you from premature optimization and guards against over-engineering.

Software is hard 🗿

The fact is that solutions to solved problems are not that valuable. Each new software project is, by definition, a novel problem. You do not know what is coming up in the following milestone session, so prepare for anything. Yes, in the end, it means a balance between being cautious and being pragmatic.

There are books and movements dedicated to being a pragmatic programmer. I don’t see many people standing up against under-engineering.

We’re not making green-field social networks backed by VCs and literal biblical angels anymore. That time has passed. We can’t afford the luxury of breaking things. We can’t afford to keep things simple.


Yasen Dinkov

Patrols: an extention to Sprints

Sql Server Virtual Tables