TDD Pro-Tip: Make The Tacit Explicit

Refactoring Pro-Tip:

When I make tacit relationships explicit I nearly always improve my code. Once I move past syntactical refactorings into semantics, this is often my first set of moves.

So what does this mean, "tacit" vs "explicit", in code? Maybe the easiest way to come at it is with a very dumb example. The simplest tacit relationship I can think of is modeling a one-to-one relationship using two lists. The first list is composed of tokens. The second list is composed of offsets. We represent the line we just parsed by counting indices and appending to these lists.

When we find the end of a token, we put the token’s value at the end of the first list, and we put its offset at the end of the second list. Since both lists start out with an end of 0, tokens[x] has on offset at offsets[x].
The relationship between these two values, token and offset, is "tacit", semantically. As a reader of that code, the semantics are only locked in if you’ve already read and grokked 100% of the usages of those twin lists.

A simple alternative design changes this: make one list, and put a composed type — a single object with multiple fields — into that list. Now the relationship is explicit. It can be grokked more readily, used more readily, and even enforced more readily.

Before we go any further, we need to consider two important aspects of the simple tacit vs simple explicit example: 1) At bottom, everything is explicit, it’s just unreachable at top or middle. 2) there’s more than one way to skin the tacit cat.

First, at bottom. You’ll notice that I referred to token[x] and offset[x]. That tacit relationship I was talking about? It’s actually embedded in the [x] down there close to the metal. See, the metal can’t run using tacit relationships. That’s why humans are people and machines are hardware. That [x] is explict as anything else the metal has in its world. At bottom, it’s all explicit or it can’t run on a Turing machine.

So "tacit" and "explicit" are words we use to describe the experience of the human, not the experience of the metal. Degree-of-tacitness is about the experience of a human encountering the code. As such, it will always be — at best — intersubjective, not objective.

(That’s okay: Turing machines don’t write software for money in the early 21st century, humans do, and so inter-subjective non-binary notions like tacit and explicit are, for the moment, I personally suspect forever, extremely important to wrap our heads around.)

Second, even with this pathetically dumb example, we see that there is more than one way to skin the tacit cat. What if we kept our two lists, but we keep them privately? We make a little wall using our programming language. Inside, we keep two lists. Outside, every exposure of the operation of changing the lists is made in such a fashion that the one-to-one relationship is preserved, stable, and perfect. We test the hell outta that wall. Ship it.

We do this, by the way, all the time, without even intending to: does your programming language have a primitive or psuedo-primitive Dictionary or Map class? I bet that class’s API doesn’t give you the slightest indication about the how the metal actually stores its data.

This is the fundamental concept of encapsulation. "On the internet, nobody knows you’re a dog.". The API is the internet, and the dog is the two internal separate lists, and none the wiser, or for that matter, the worse off.

So what other kinds of tacit relationships are there, ones that are thicker or hairier than just the dumb example? Here’s a messy one I wrote myself just recently, don’t ever let anyone tell you they’re a serious geek who doesn’t give you good mistake stories.

(Brief pause while I anonymize this.)

A Season is a collection of Events. Each Event has a start date and an end date. That means that a Season has a period that is effectively the earliest start date to the latest end date of its Events. Did I write it so that a Season always takes a list of Events? No, dear reader, I did not. I wrote it so that the Season takes a Period. I gave it a field, period, to hold that, and I inited from the constructor, and bob’s yer uncle.

It’s even worse than that, I also gave Season a constructor that takes a list of Events and figures out the period and initializes the field.

The technical expression for this: "dumber’n’sackfullahammers".

How does such a thing happen? Code changes over time, and those changes map to changes in understanding required by changes in feature. My sin wasn’t in making the mistake, it was actually in adding the second constructor and living w/both of them for too long. Once it developed that a Season was never shorter or longer than its Events, a brighter monkey would have taken the time to make that relationship explicit.

The first constructor lets me make, effectively, illegal or cheating Season’s. The fact that, in production, in the business domain, such a thing was never actually done, well, that’s neither here nor there.

(I did it a lot in writing tests against higher-level Season function. Re-rolling those tests to supply Events was an enormous pain: which was btw a strong signal of another bad smell we’ll talk about some other time.)

So, to sum up these still relatively simple cases. The first swings I take at refactoring beyond mere syntax are almost always looking for tacit relationships and logic, and finding ways to make them more explicit to the observer.

The metal doesn’t care. It’s all explicit to the metal. But the humans care, and when I leave my code’s assumptions unobvious or unenforced, I do the nearest human — me — a terrible disservice.

Leave a Reply

Close Menu