Microtest TDD is Gray-Box


Microtest TDD is a gray-box testing approach, neither entirely inside nor outside the code it tests. Let’s talk that over.

Before we dive in:

Black lives matter. Stay safe, stay strong, stay kind, stay angry. Let’s not just embrace change, let’s initiate it.

Any step you take helps, provided only that it’s doable, it’s not definitely backwards, and it’s not the last step.

In the ’70’s, an important movement developed in testing theory, called "black box" testing. The idea is straightforward: We can prove our software works by treating it as a closed opaque container with only some knobs and meters on the front of it, a "black box".

The tests we write don’t know what ‘s inside the box, they only know the specification of how the surface controls are supposed to work. If they don’t work tht way it’s a fail, if they do work that way it’s a pass.

This idea has great merit under certain circumstances. It also has a kind of intellectual simplicity and purity that is profoundly pleasing to both the theorist and the newcomer. "It’s just that simple."

It also received considerable backing from the sociotechnical circumstances surrounding it. It allows a division of labor, for instance: there are testers and there are developers, and never the twain shall meet. And at the time, it was also by and large viable. We could do it.

It really only has two weaknesses.

  1. It lends itself too readily to what I’ll call "legalistic" behavior, and
  2. It has an unavoidable non-linear cost factor correlated to combinatorics.

Notice that both of those weaknesses loom far larger today than they did when the idea was born. Modern software development is very different from that of forty years ago in two ways that really hack at these flaws.

First, modern development is dramatically more collaborative than it was in 1980. When I started, it was routine to see major software packages written entirely by a single individual. That is almost never the case today.

Second, modern development is dramatically more combinatoric than in 1980. My first wp/spreadsheet/database app ran in 32Kb of memory. I doubt there’s a single app on your phone, let alone your computer, that is that small. And that size translates directly to combinatorics.

Consider the legalism flaw. The black-box model involves two players with two artifacts between them. A tester and a developer are divideded by a box and a formal spec. In theory, that is all it takes to do this kind of testing.

The spec is a kind of contract. It contains lots of little formal details. To serve its purpose it must be intricate and complete, and it must spell everything out. And the players must agree about the meaning and the importance of every one of those details.

If they don’t agree, remember, that’s the difference between pass and fail, which is a very high-stakes game. If you think this is a recipe for progress, I advise you to spend a little time with the tax code in your nation, state, province, county, or town.

A real-world example: the spec was the old report, which drew the column headers rotated 90 degrees. The team came to the customer after two weeks and told him they still weren’t done.

Surprising, cuz he’d seen a data-correct version on the second day of development. The problem: their PDFWriter library couldn’t rotate the text. I can not readily or politely express the customer’s indifference to the rotation of the header text. But it was in the spec.

Legalism like this acts in lots of different ways to slow us down. That time it was implementation, but it’s an ever-present tax we pay in planning.

Modern software development has so many people in it, the business of drawing up, managing, changing, and negotiating formal contracts explodes.

What about combinatorics? Here a simple example will suffice. Suppose the outcome of a procedure depends on five independent boolean flags, A, B, C, D, and E. Each processed element has an arbitrary mix of those five flags.

To test every combination takes 2^5 tests, 32. Doable. But of course, for every additional flag we add, we double that number. If the values are partitioned into more than just the two from booleans, it gets even thicker. And it does it fast, much faster than a straight line.

A simple webpage with a ten fields with an average partitioning of 3 takes about 60,000 tests to cover completely.

To quote one of the great sages from my youth: "Ruh-roh."

The size of modern applications isn’t just number of bytes, it’s combinatorics. As with so many of these curves, we see that when the problem gets bigger, the test-numbers get bigger-er. They explode, and rapidly, more rapidly than we can possibly handle.

As we wrestle with this, so we begin to ease in to the gray-box concept. Remember our five independent variables? What if variable A’s value, when true, dominates the other values. That is, if A is true, then we always return the answer 0. That, obviously, is in the spec.

Does that change the # of tests we have to write to get perfect coverage? It does not. Seems like it would, it seems like you’d now only have to have 2^4+1 = 17 tests. But it doesn’t. Why not? Because you’re not allowed to know what means that box is using to produce a result.

If my internal mechanism was borked, it could return 0 for a true value of A in your one extra test, but then return 42 for all the other cases where A is true. You would pass the app, but the app would be broken. Dagnabbit.

Now think for a minute how a competent geek would code this. She’d say, if A is true return 0, and then enter the rest of the B, C, D, E logic, right? Of course she would. Geeks are insane, they’re not stupid.

You can guess the geek will solve it this way, and it’s a good bet, but here’s the thing: YOU CAN NOT POSSIBLY KNOW THIS WITHOUT LOOKING INSIDE THE BOX.

Apocryphal story, usually told of GB Shaw: He asks the woman sitting next to him if she’ll sleep with him for a million pounds. She says yes.

He says, "How about for five pounds?" She says, "What kind of person do you think I am?" He says, "Madam, we’ve already established that, now we’re just haggling."

If you were willing to ship at 17 tests, we’ve already established what we are, and now we’re just haggling.

Gray box tests, the kind a microtesting TDD’er writes, are tests that look like they’re blackbox but don’t fully cover the combinatorics because they cheat: they look inside the box to establish what needs to be tested.

Gray box tests are inherently risky. They are, by definition, not covering the full range of combinatorics. Repeat that to yourself. Gray box testing is risky. It has to be. The problem is there’s no other way. Black-box testing so quickly explodes it is not remotely viable.

So, from the beginning, if we adopt gray box testing, we’re taking a risk, and we’re relying on our judgment. And we’re doing this because we’re in this for the money. That’s two of five underlying premises to TDD, and we’re about to mention another.

Watch my video on this here: Five Underplayed Premises Of TDD | Video

If there’s a "strip the A first" part of the code, we’re safe with our 17 tests. If there’s not, we’re not. We can look at the code and see. That is the beginning of the gray box.

But we can do even more than that, now that we’ve bitten the bullet of risk and judgment and money. We can test exactly that the A case is split off. If we write our code to explicitly strip that case we can test it, at the micro level, to confirm that that happens.

To do this, we lean on another of the five underplayed premises: the steering premise. Tests & testability are first-class citizens in design. If you want to know that A-stripping happens, build the code in one of several ways to make it impossible not to happen, and test it.

Remember our conversation about implicit understanding vs explicit code? Once we get past the simple case of booleans, we enter complex flows of intricate state and values with a variable amount of dependence from 0 to 1.

There, concepts like state machines and strategies and null objects and the rest of the GoF patterns represent sophisticated answers to creating gray-box testability at an affordable cost.

It gets quite challenging, and to be perfectly frank, pretty entertaining, too. The modern synthesis of software development, including ideas like TDD, microtesting, CI, refactoring, CD, is all a rich developing judgment-centric set of answers to these very complicated problems.

So. Microtest TDD relies on the usage of a gray-box testing style. The tests read like they’re black box, but they’re written using white-box info, things only someone inside the code could know. We do this risky judgment-centric thing to ship more value faster.


Supporting The PawCast

If you love the GeePaw Podcast, consider a monthly donation to help keep the content flowing. Support GeePaw Here. You can also participate by sending in voice messages to be included in the podcasts. These can be questions, comments, prompts, etc. Submit A Voice Message Here.

1 thought on “Microtest TDD is Gray-Box”

  1. Great article. A couple of things to add: Use your available tools. A code coverage tool can help you figure out when you missed a branch that you didn’t happen to catch while TDD’ing. Second, find a good unit test framework that let’s you do property based testing. These frameworks will generate inputs based on the properties you specify. This moves the tests from a static spec where one test covers a specific logical branch, to a dynamic one where a set of random variables should cover that branch. By being dynamic you increase the chance of finding that one set of variables that catches the corner condition you didn’t think of, without running in to the combinatoric explosion GeePaw mentioned above.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top