My standards for TDD microtests are the same standards I have for shipping code, and I follow them in microtests for the same reason I follow them in shipping code: they make me faster.
This geekery muse is comfort food, for me and maybe for you, but I want to keep stressing: I fully endorse and support my sibs out there on the street protesting this violent police uprising.
Stay safe, stay strong, stay kind, stay angry. Black lives matter.
I go out in the field, and I inspect a lot of tests, and I have to say, lousy factoring, bad naming, massive duplication, and low signal-to-noise are pretty much the order of the day out there. This often holds true even when the shipping code is comparatively well made.
I suspect a lot of this comes from the "chore" outlook that’s so prevalent around tests. That’s heightened, of course, by the tendency of orgs to try to get to TDD by fiat. I am sympathetic to all persons everywhere who don’t perform well when they’re voluntold to do chores.
(Mmmmm. How ’bout we don’t get me started on the damage this trade suffers from lousy pedagogy, coercive management, over-simple morality, and idols of the schema? We’ll just whistle right past that graveyard of a GeePaw rant.)
So let’s talk about things I see in tests that don’t meet my standards, and what I do about them. Before we do that, I should remind you of the microtest-centric TDD style I practice.
I am an OO programmer. On my good days, I make very small classes, each with a single responsibility and a handful of API’s to accomplish it. I create these for the most part by writing "microtests", one at a time, and building out the class to make them pass.
I work in statically-typed languages for the most part, and I use very few new-fangled tools. In Kotlin/JavaFx, for instance, my test dependencies are on only JUnit and AssertJ. (Down with hamcrest!!) I don’t use auto-mockers outside of legacy rescue.
The tests I write are source-dependent, not binary-dependent. And they are gray-box tests, reading like black-box but written using full knowledge of the code they are testing.
Finally, I use a graphical tool to run and monitor my tests, and I use a grown-up IDE, not a bunch of shells and a text editor. I do not write large-scale tests regularly, but I do sometimes find myself dragged, kicking & screaming, to add a few of them.
Okay, with all that as long preface, let’s look at some crap I don’t like in test code, and what I do when I see it.
- Low-signal test-naming slows me down, so I want my test’s names to be both compact and meaningful in context. The two most common violations are a noisy repetitive naming rule, and tests that describe their purpose in primitives.
Noisy repetition. Spoze the Underpants class has various conditions that are used to decide how to process them, i.e. put them away in the drawer or throw them out. Spoze there is a standard for the elastic.
testUnderpantsElasticLacksSufficientTensionToKeep is the sort of name I’m talking about. a) These are all tests. b) They are all tests of the Underpants class. c) This ain’t english class and real-life full sentences are full of duplication and noise, on purpose.
Primitives hiding intent. This is when we describe the primitive result of an operation we’re testing without actually saying what it means to have that primitive result.
underpantsElasticFalse is what I’m talking about here. What the hell is a false result from checking that the elastic is okay? Does it mean to keep’em? To toss’em? This is the usual primitive obsession smell, only this time applied in a name.
elasticTooTight()? These identify the part we’re testing, and the meaning, and are still quite compact.
(I just made that up, and of course, you’re pretending to be my team, so you might have even better names in mind, which is cool.)
The real point here: naming is really important for productivity, so I take the time to be mindful when I name tests.
- Repeated test prologs slow me down, because I have to a) inspect each one to know it’s the same as the last, and b) change each one as the code changes.
There are a couple of ways to handle this. First, if the prolog is universal or nearly so, to every test, I just put it in the setup (@beforeEach in Junit, constructor in some environments).
Second, if it’s really only the prolog for some of the tests, I give it a name and extract it as a support method, just as I would in shipping code. Wayyyyyyy too much copy/paste goes on in test files, and this is a prime example of it.
- Noisy constructors hide key signals. This is when you have an object that takes a lot of construction, maybe some subparts, maybe a bunch of arguments, whatever, and, from test to test, only one tiny part of that construction changes.
This is of course very similar to the repeated test prologs, and resolved in the same way. But I call it out separate: a remarkable number of people don’t see object-construction as "code", so they don’t optimize its expression in the text.
An advanced case of this happened to me a couple years back. I had a complicated tree thing, stored in a database, with ID’s all over the place, and implicit linking, and different types at each level.
There was, essentially, no way to wrap its construction process in a single simple function. The answer, in Kotlin, was to create a whole builder just for making these objects and expressing their construction in an easily readable form.
There were hundreds of tests needing to work with these object-orchestras, and every test contained a bunch of constructor calls, each with a bunch of arguments controlling the linking, and so on. Kotlin’s end-lambda and extension method notation makes tree-shaped builders easy.
- Primitive-obssessed assertions slow me down because they force deep reading. A common case, manually asserting partial field-equality for two objects (with indifference to the other fields).
Again, dead easy to fix: write yourself an assertion that says what you mean, add it as a helper to the test class, and if it eventually turns out to be generically useful, add it to your testkit as an independent thing any test can use.
If you work in science, you’ll be working with doubles or floats, and having to deal with tolerances and rounding. This is a perfect case for a test-class specific assertion: in these situations, the tolerance is normally identical for all the tests.
- Assertions that assert things I have already tested for slow me down, because they add noise and text, but don’t actually catch any new or different problems. I delete them, using my white-box knowledge of how the code works.
An example: a test against a filter that takes a list of ID’d objects & gives back the matches. My assertions field-check the returned objects, field by field, but the code doesn’t clone its returns, and even if it did, I tested the cloning already. Just check the ID’s.
If I have tested add() and I now am testing remove(), I don’t have an intermediate assertion that the add() worked. I know the add works, I already used tests to drive its behavior.
- Assertions that assume features of return values that are not actually part of the API’s contract slow me down, because when I change the implementing code, they go green-to-red when I haven’t broken anything. False positives are the bane of a programmer’s existence.
The standard example: the assertion assumes a list will be returned in an order that the API’s contract does not require. You test giving me [a,b,c] and you assert against [a,c], but in fact, [c,a] would be legal, too.
AssertJ has nice assertions to address this, but if your tool does not, add them, as it’s extremely common. We want our tests to break when our code breaks, and not to break when it doesn’t, and we want to know that in our bones. Over-specifying contract violates this.
Okay, we’re not getting any younger here. I am confident there are other items that belong on this list, but the real point isn’t the specific cases at all, it’s the strong statement of attitude we started with.
Test code is first-class code, with all the same standards I use for shipping code. In both cases, I hold those standards not for someone else, not for patriotism or decency or art, but because holding them lets me ship more value faster.
Supporting The PawCast
If you love the GeePaw Podcast, consider a monthly donation to help keep the content flowing. Support GeePaw Here.