Upstream Uptime #3: Making Local-Runnable Services The Norm

I recently wrote about upstream-centric architectures and how we have to alter our making when we adopt them. A key alteration: change the definition of "deploy" to include "local-runnable".

In that long list of problems I encountered in a real upstream-centric app I worked with a few years ago, a great many of the first-round woes come from one simple fact: "The upstream I’m coding against lives somewhere else."

Network outages, rebuild blockages, vpn passwords, version stability, and dataset- app-sharing all serve to create situations where a geek wants to work on her code but can’t, for reasons that are entirely out of her own, her team’s, and even her manager’s control.

And every one of these happens because the upstream is "out there" rather than "right here".
The specific recommendation: when a team deploys a new build, it deploys it in part as a locally-runnable app that any in-house developer can readily fire up and get running on their own box.

Whether new upstream builds come automatically straight from HEAD, which is the best approach, or feature-branched and externally qa’s and signed-off on by big people and blah-blah-blah, when the button is pushed, we get an app that runs.

There are a host of potential objections to this idea, and plenty of possible variant responses. Most of them come down to two things: 1) That’s too hard. 2) That’s not servicing customers.

"That’s not servicing customers." This is just one more case of valuing made over making. Here’s the answer: We service customers by changing code, so if we can’t change code, we’re not servicing customers anyway, and remote upstreams routinely prevent us from changing code.

"That’s too hard." Here we have to spread out a little, because what it is that’s too hard about it varies, so the specific response has to vary. OTOH, the generic answer is just this: "It’s too hard because we haven’t ever tried to make it easy."

We should consider some of the major variations on what makes local-runnable upstreams hard, and look at specific approaches to resolving them. BUT. before we even go there, we need to be clear.

If it’s too hard for us to make local-runnable services, we have to make that easy before we start expecting downstream teams to be productive. Making that easy has to become #1 priority before we start laying out a plan that counts on downstreams working in parallel with us.

How much you pay a geek? How many geeks in your downstream team? How many hours a day are you willing to pay them for blind debugging of a remote app they don’t know and don’t own and don’t control? How many hours a day you willing to play them to sit there and wait?

(Sorry. I get upset about this kinda stuff sometimes.)

Okay, variation #1 in the it’s-too-hard-theme: it’s too hard because the prod dataset is 87 petabytes.

Answer: it’s not the size of the prod dataset that matters, it’s the schema and its enforcement and the variant cases.

Every variant I’ve solved I’ve solved for all its instances. If that’s not true, then your problem isn’t computable at all. (If the problem is "give this dataset integrity", that’s a whole team problem that must be tackled at the root with a specific project to do so.) Downstreams don’t need the production dataset. They need a dataset that is sufficiently rich and interesting to contain every problem they’re going to have to solve. That’s all.

Variation #2: That’s too hard because the app can only be deployed by one angry genius using emacs to manually edit 19 files in 11 locations with 476 variables.

Well, in the immortal words of every plumber I’ve ever encountered, "well there’s yer problem right there". We are the ones who built the app so it could only be deployed by manual labor that we pay $300K a year to wear headphones and growl at everyone who comes withing ten feet. What if we didn’t build it that way?

The most common real-life situation, it’s done that way because we’re integrating a bunch of COTS frameworks and tools that have separate values that live all over hell’s half-acre. There’s no getting around the COTS. Except. Wait. What if we wrote an app that does that?

They have those kinds of apps. They’re called installers. They themselves can often be written using COTS tools. (They’ll fallback to "not servicing the customer" here. You fall back to "not changing code" here.)

Variation #3: It’s too hard because upstream X only runs on the well-known platform Arugula, and all our devs only work in Windows.
Okay, then you’re going to have to ship your local-runnable as a virtual machine of some kind. Note: I am not saying this is trivial. It’s not. But you have to balance its cost against the cost of having whole teams not able to do their job because you don’t do it.

Variation #3A: No, you don’t understand, O/S Arugula can not possibly run on a dev box of any available dev flavor, because it only runs on mainframes that are bigger than our whole building, or it only runs with a dongle we can’t afford to buy.

And now we come down to the hardest case. It’s the hardest case, because it means you’re going to have to be serious about wanting upstream-centric apps. You’re going to have to get your upstream team to write a fake. In the other immortal words from that same plumber, "This is gonna run ya."

BUT!! Don’t freak out quite yet. We have to determine the meaning of the word fake.

The panic — I feel it, too, I’m not gonna make fun of you here — is that the upstream in question is a monster of combinatoric complexity that does dozens of things, with reads AND writes on machines in boxcar warehouses all over the world. Simulating all that would kill us.

Soooooo, what if we simulated only part of it? What if we only simulated the part of it the downstream cares about? What if we gave up or let go or ran screaming away from anything like a real simulation?

The key is to understand the downstream’s needs. If the upstream’s monstrous complexity is in full use by the downstream, why on earth are we going upstream-centric? Why not just stick w/the monolith we know and love? No, the downstream’s only using part of it.

What can we throw away?

  1. the full dataset.
  2. our own upstream: remote reads and writes.
  3. generality for all possible downstreams: one downstream = one fake.
  4. tons and tons of validation.
  5. noise fields the downstream doesn’t use.
  6. endpoints the downstream doesn’t use.
  7. operations the downstream doesn’t use. 8) fields whose values are opaque to the downstream.

The list goes on and on, varying by your actual domain.

And here’s the thing. Once we’ve taken this step, of shipping that local-runnable fake, we can add to that fake in ways that are insanely useful to downstream teams. Things the app-in-prod would never allow. One example will suffice…

I’m betting you can guess what it does. Do we want it in prod? UNDER NO CONCEIVABLE CIRCUMSTANCES. But put that in your fake, and your downstream teams will paint russian-orthodox-icons of you.

Foreshadowing: I have built once and want to build again, as open source, a single app that will make all this ridiculously easy for downstream & upstream alike. More later on this, as I was very excited about it the first time, and will be even more excited to share it.

So, wrapping up, the message I’m aiming at is this: we can and should make every upstream we write local-runnable for our downstream teams. This is the single biggest step we can take in making parallel development possible in our service-centric architectures.

It’s cheaper than it looks, folks, and it sidesteps a very large number of problems that prevent changing code, which is the central operation of professional software development.

Have a lovely rest-of-Sunday!

Leave a Reply