Big Batch Releases

Big-batch releases, coordinated and controlled by a central intelligence, fail, and fail frequently. Several aspects of this are fascinating, because of the interplay of hard mathematical reality with human frailty. Let’s take a swing.

It’s Sunday muse-day, comfort food for the geekery-inclined. Enjoy, take respite, but don’t forget we want to change a lot more than just code.

Black Lives Matter. Stay safe, stay strong, stay angry, stay kind.

We can characterize releases by the number of stories in them. That’s a crude measure, but a viable one. Some organizations have extensive delays between deployments, and N, that number of stories, gets quite large.

Most companies start successfully with some kind of command & control approach — the idea of a central organizing intelligence that drives and controls workflow — in most areas. One of the first places this starts to break is here, in batch-size-N release policy.

I find several aspects of this interesting: 1) The irresistable math that makes this true, 2) The length of time we spend ignoring it. 3) The tweaks we add in an effort to resolve it. 4) The way we can actually resolve it. I want to schmooze this out.

First, the math.

I don’t want to dig too deep into theory, but I want to give you enough theory to get the picture. Please lean in to me: I am going to cut a lot of corners to keep this from being a computer science treatise.

Like a lot of problems, we can use N, in our case the sum of the number of stories, the number of dependencies, the number of affected modules, the number of "merge collisions", as a rough indicator of the number of inputs to this problem.

Most problems get harder to solve as N grows. We can express how much harder they get by using an expression of some kind that takes N. If that expression is a polynomial, it’s called a P problem. If it’s not, it’s called an NP problem.

Now, sidestepping nearly all rigor, what it means in meatspace to say that a problem is NP Complete is to say that even very small increases in N can create staggeringly large increases in difficulty.

And the insight I’m getting at, my intuition, is that the problem of reliably shipping batch-size-N releases is NP complete. It is a kind of packing or exact cover problem.

What does that mean to us?

Humans & machines solve particular instances of NP complete problems with small N’s all the time. The point isn’t that every instance is insoluble. Rather, it’s that tiny increases in N rapidly render such problems "not reliably solvable before the heat death of the universe".

That word "reliably" is important. You see, when you’re solving an NP complete problem, you do so by producing candidate solutions and trying them out. If the problem has a solution, it’s possible you’ll find it on the first try.

Which leads me to the second fascinating aspect: teams can go a very long time experiencing the NP completeness of this problem but not seeing it.

Here’s how that works.

What happens is that a previously successful centralized C&C approach starts experiencing release woes. As I’ve foreshadowed, these are caused by tiny increases in the N. But they manifest in a pattern of release failures amd successes.

We succeed, for a while, at this game, maybe 1 time in 10, cuz remember the stochastic nature of all this. We fail, 9 times in 10. But each time we fail, it’s typically just one or two little problems. We say, "We almost made it!"

The couple of little release issues aren’t hard to solve, but we spend a few hours running around like crazy, rushing to tie up the newfound loose ends, rebuild our app, and get it re-deployed. Or, on bad days, we roll back, with tears of frustration.

A little corner of all this: those one or two problems we get each round? They are different little problems every time. Forgot this link. Didn’t turn on the feature flag. Didn’t know that feature relied on the field we removed. Little shit, different shit, every time.

We don’t see a pattern in the issues, because it’s not a pattern, it’s a meta-pattern: the issues are coming up because the combinatorics are throwing out too many details for our consideration. Our N is starting to make our problem not reliably solvable.

The third fascinating aspect: From time to time, things go well.

These occasional wins thoroughly muddy the water, because they’re really just combinatoric luck. But they lead us to believe it’s possible to be lucky every time.

To be sure, one can take steps to improve one’s luck. In games, that’s called cheating. In computer science, it’s called damn that’s very very clever. But you can’t eliminate luck. it’s built in to the nature of how we solve NP complete problems.

So when we fail, we do two things: 1) fault our lack of effort and will, 2) add more rules and gates and checks. "We just didn’t try hard enough." "We need to add this to the list of things we check for."

We upgrade the seriousness of the release activity, making sure everyone knows how important it is to really try, to put it the hours, to think meticulously, to not make dumb mistakes. We become, I’m not joking here, very somber, frowning and nodding thoughtfully.

We have a handful of individuals and we task them with coordinating all this, and staying on the job, each time, until the release is complete. We reward them, but we’re also requiring them to work long hours under high stress, demanding they be heroic.

And we upgrade the process. There are new sign-offs. There are checklists. There are mandatory passes through generic automated code quality systems. There is a formal rolling schedule, with individuals signed up for each task.

Curiously, although these tweaks are designed to help us control our N-sized batch, they tend to have the actual effect of making the N larger, because it now incorporates these extra process elements into the problem. We can’t win for losing, to quote my Mom.

And raising the seriousness, the stakes, it raises the tensions. We get more blame-centric. We get more aware of what’s called "long pole" politics, where no one wants to be the long pole in the tent. We get angrier.

And now the fourth fascinating aspect: the actual resolution to this problem of painful releases is to do releases much more often, which makes no sense at all, until you consider that it has the effect of ruthlessly shrinking our N.

All of this is happening because our N has grown large enough to break our algorithm. We’ve reached a level of the problem where it is too big to solve in the amount of time we have to solve it.

We’ve been implicitly or explicitly assuming that 1 batch of size 10 is exactly the same effort — or even less — as 10 batches of size 1. But NP completeness is saying that’s just not true. The combinatorics at batch-size-10 are killing us.

The key techniques are 1) pull & swarm, 2) test-driven development, 3) trunk-based development, 4) path-focused story and code design, and 5) stable and trusted automation. We’ve only room to handwave each of these for now.

Pull and swarm: we pull one story, put all of our people on it, and take it all the way to "shipped", as a group, centered around that one piece of work. We use ensemble-programming, or pairs, or even just sitting together. One story at a time. All the way out the door.
Test-driven development: testing code before it is written is faster and cheaper than testing it after it’s written. But the thing is, it’s also more effective than after it’s written. This is our basic first-line safety-net.
Trunk-based development: we do not use complex source branching strategies, which drive up our N. Instead, we pull from head, we push to head, we test at head, we ship from head.
Path-focused story and code design: our designs, of stories and code, have to focus not just on what they will eventually do, but the path by which they can be independently released while still being small enough to fit. This will involve radical change in style.
Stable and trusted automation: Every part of this process that can be automated wants to be. That automation needs to be rock-solid, more solid even than the app itself. It must be maintained and extended to our highest degree of excellence.

Now. This is a lot. Every one of these five techniques requires learning and practice, and not a single one of them can be turned on by an act of fiat. They, themselves, need to be well-pathed.

Some of these I and others have written about extensively, and proposed various ways to take small steps towards. Others less so in the past, but moreso in the future, like path-focused design. We’ll get to it. 🙂

It’s a lot, but it’s a bullet we have to bite. As N grows, the problems get so dramatically more difficult that all of our progress will draw to a halt. This isn’t a failure of will, or an insufficiency of rules, it’s straightforward combinatoric math.

A central intelligence can carry big-batch releases, an NP complete problem, to a certain N, but it can’t carry it beyond that point. It will fail, ever more frequently. Seeing this and doing something about it is a critical challenge for growing organizations.

Do you love the GeePaw Podcast?

If so, consider a monthly donation to help keep the content flowing. You can also subscribe for free to get weekly posts sent straight to your inbox. And to get more involved in the conversation, jump into the Camerata and start talking to other like-minded Change-Harvesters today.