Upstream Uptime #4: Content-Level Versioning and Diagnostics

Part [part not set] of 4 in the series Upstream Uptime

Half of the point of upstream-centric architectures is simultaneous change, and that means the content needs versioning & diagnostics, not just our transport.

The biggest single difference between a modern upstream-centric architecture and our database apps: the database app doesn’t change without our control. We choose how/when we upgrade.

In effect, the database, even though it’s an upstream, feels like and can be treated like just another part of the language syntax. Whether our code works, or doesn’t, and how it doesn’t, it’s all relatively easy to figure out. Further, once it does work, it stays working.

This isn’t true in an upstream-centric world: A huge part of the value of that world is that people other than us can be changing any service any time they like. To support that, we need different practice than we used when we were programming against databases.

I’ve seen orgs try to do this kind of parallel development using a lot of different control-oriented techniques: synchronization points, one-step-ahead staging, "architecture boards", super-configurable feature-toggles, and so on.

Sadly, these approaches tend to just fall back into all the problems the monolith caused us. They tend to work extremely well in powerpoint org charts, and not at all down on the ground.

We’ve already discussed localability as a changed practice. This is making it so that every dev can run her own version of the service on her own box as she works. That’s powerful stuff, the biggest bang for your buck. But it’s not enough.

Because we control the change in that database world, versioning "doesn’t matter much". We’re the ones who turn on the next version, and if it’s turned on, we know globally what version is in play.

Because we control the change in that database world, diagnostics "don’t matter much". Regardless of their content or ease-of-use, they basically just tell us we’re doing it wrong. Once we figure out how, then we do it right. We control change, so once-right means always-right.

The specific advice I offer: give versioning and diagnostic info at the content level, not the transport level. This is like saying that, instead of putting that info on the envelope, put it at the top of the letter.

If we take JSON for a minute, most of you are familiar with throwing around long json trees or arrays. We have lots of libraries for doing that kind of thing, it’s quite rare to have to roll it yourself. What this means in JSON is that we have additional fields in every response that include a version and build, and additonal fields that include error-code and message.

(On the request side, of course, there’s no diagnostics needed. We just send the version.)

Why? Not, "why send it", we’ve gone over that, but "why in the body of the letter?"

The short answer: because that’s the easiest place we can guarantee every client can get at it.

For versioning, you may find this advice pretty radical. For many years, the prevailing practice is to do versioning out at the transport layer. In HTTP, this is done via header (or sometimes via path variable, …/api/1/doSomething).

This is a pain, and it’s always been a pain, for one main reason: it means the code we write to deal with content has to be aware of the code we write to deal with transport.

Now, that pain just doesn’t matter to us much when we have full control over change. We only change things once in a blue moon, it’s a focused effort, and we just paid the price when we wanted it. Hardly the end of the world, once every year or two.

But it costs the same every time you do it, whether you do it once in a blue moon or every couple days. In a modern upstream-centric architecture, every couple days is the norm. And that tax becomes way too high to pay for everyday business.

So every response wants to include all four things: version, build-number, response-code, and message. Let’s make sure we know why we want all four. Version is the most obvious. Of course we want to know this content is in version 3 not version 5. In the past, I’ve tried to do this via algorithm. Look for certain fields and, basically, guess. But the sender knows what version it was aiming for, why throw that info away?

Build-number is trickier. Even otherwise bright people will wonder why we bother. First, a definition: build-number is a guaranteed monotonically-increasing integer, it is automatically increased on every successful deploy.

You’ll say, wait, why isn’t that the version number? Because, and I know this will startle you, people are doofuses. Yes, that’s right. People. Doofuses. People are doofuses, and as a result, we often change de facto versions without changing de jure versions.

I’m on your upstream team. Just like you, I work for a living. Just like you, I don’t always have the bandwidth to think of all the things. Just like you, I think I’m changing code in one way and I’m really changing it in another way altogether. If there’s an automatic build number, you, my downstream, can tell when a change happened that affects you. You can tell me. You can code your way around it. You can compare same message from two different builds. You can do all kinds of things, but not if you can’t see it.

Next we have response-code. This is another obvious one. If something went wrong, or if it went right but in a variant manner, the response-code is perfect for telling us that.

And finally the message. Why a message? Because we can put all kinds of human-readable detail in that message that we can’t put in to a response code.

For anyone who’s worked in these environments, I can drive the value of that message home in just one noun clause: "500 Internal Server Error". There is no more useless error code in all of recorded human history. It means "Something went wrong."

Notice, again, this is all about making. When the upstream is finally stable and wonderful and we’re lounging around in the tree-lined park of the city on the hill, ain’t gonna be no 500’s. The problem is, it’s not finished. Nor should it be, remember, we’re still making it.

Far and away the most common 500 in a developing upstream: a possible case either hasn’t been finished or isn’t even known to the implementing team yet. So what is that case? If only we had some useful natural-language text to give us a hint. If only, Hey, wait, I got an idea.

If the message gives us a hint, it’s possible even those of us on the downstream side can figure it out. And the upstream side can probably figure it out easily. Compare & contrast with filling out a ticket and adding it to the upstream’s queue.

So, we want a version, a build-number, a response-code, and a message. We also want it all at the content level. Let’s take one more minute to understand the content level part. We want it in the letter not the envelope for the same reason we put things in real letters instead of OR IN ADDITION TO on the envelope: because we want to throw envelopes away.

I’m hoping you’re developing in a hexagonal (ports & adapters) style. If you’re not, look that up. But even if you’re not, you’ll be able to get this and take advantage of it.

Layers. We want our app to have a core and a bunch of concentric circles around it. We want to have each layer be as independent and self-similar as possible. We don’t want to cross layers with any degree of complexity.

(We want this because we have bodies and those bodies have profound non-negotiable limits on how many things they can think about at one time. Layers reduce mental bandwidth, same as any other conceptual chunking.)

The transport layer, the part of your app (you most likely didn’t write) that sends requests and gets responses, is deeply involved with all sorts of incredibly intricate manipulations. We want to use it and leave it. We don’t want to write it or debug it or be in it.

The upshot: add those four fields to the letter, where anyone can see them without having the envelope. When we can do that, we gain layering, we gain hexagonality, we gain testability, we gain grokkability.

I said sometime back, in my agility we don’t restrict change, we embrace it. Microservice architectures, rolling a downstream that depends on many upstreams, is a perfect place for us to learn to embrace change.

It’s a surprisingly cool late sunday morning here.

I hope you also have a surprisingly cool day!

Upstream Uptime #4: Content-Level Versioning and Diagnostics

Related Posts