Metrics and Three Traps

Hey folks, let’s talk about three traps we fall into with metrics.

(Before we begin, let me remind you, this is just comfort food: Stay safe. Stay strong. Stay kind. Stay angry. Black lives matter.)

In twenty years of coaching software development teams, I’ve seen maybe a hundred or more orgs try to figure out how to measure their process, and the majority of them have fallen into some mix of three traps: 1) the more trap, 2) the objectivity trap, 3) the rollup trap.

Years ago I heard the non-standard saying that "the road to hell is lined with great parking spaces". I see these traps that way: they seem to suck us in more or less thoughtlessly, and once in them, we seem to fight noisily about decorating them, but never consider leaving them.

The "more" trap is the trap we fall into where we think, if some X makes it good, it stands to reason that more X will make it even better.

The classic non-trade counter-example for the more trap is just salt in our foods. Some salt in our potatoes makes them taste better, however more salt in our potatoes makes them taste awful.

The commonplace in-trade counter-example is how many orgs use boards, be they physical or virtual. One sees teams using their boards to plan (and track) individual hour estimates for a chunk of work.

Hey, if some data about what stories are in flight and who’s on them and where they’re at is good, then more data must be even better!

This is not, as a general rule, true of metrics.

If your car was made in the last ten years, it has hundreds or even thousands of sensors operating in it at one time. Imagine the consequences of showing all that detail on your dashboard.

We can challenge the "more" trap on several grounds, including the cost of collection, the distraction of detail, the ways both subtle and crude in which measuring a thing shapes it.

Two keys to getting out of the "more" trap: 1) more thoughtfully evaluate the costs and benefits, 2) always use a metric experimentally and for a while before you declare it valuable.

The "objectivity" trap is the trap we fall into where we think process data is only valuable if it isn’t based on the opinions of the humans experiencing the process.

This one’s endemic in the geek trades. We routinely go to extraordinary lengths to assess our process "objectively" without ever bothering to ask the people using it how their week went, even though, week over week, their opinions can tell us way more than their numbers.

An app idea I’ve long mulled on: build a tag cloud by letting process users select from a varied set of words to describe the period since the last time we asked. I gift that idea to the world. If you write it or want to, call me for a detailed spec.

Humans don’t speak in numbers but in words, and there is a reason for that. They speak those words as a "view from somewhere" because the "view from everywhere" is by and large a persuasive rhetorical construct, not a real detectable thing.

(P.S. Don’t at me. I know what you learned in the fourth grade about science and objectivity. We all learned a lot of silly things in the fourth grade.)

Two keys to getting out of the "objectivity" trap: 1) ask yourself how it feels when the process numbers are bad, then ask yourself if we could find that out without the process numbers, 2) always use a metric experimentally and for a while before you declare it valuable.

The "rollup" trap is the trap we fall into when we think that the process data between two different teams is telling us the same thing about both teams.

The operative premise in this trap is that our metric captures a purified "essence", a context-free Platonic form. Anyone who’s ever worked on two different teams knows this is pernicious nonsense.

Geekery is fundamentally a collaborative enterprise around problem-solving. As such, it is always, permanently, and happily contextualized by the style of interaction between the solvers, not to mention the type of problem and the starting conditions.

You cannot compare the velocity of a feature team, a bug team, an ops team, a brushfire-SWAT team, and that’s just at the level of problem. In fact, not only is each team-type significantly different, but so is each actual instance of such a team-type.

Software coaches wouldn’t exist if not for the reality that every actual instance of a team is different from every other actual instance of a team in important ways we have no numbers for. Further, whoever tells you otherwise is either hopelessly incapable or purposefully lying.

Two keys to getting out of the "rollup" trap: 1) Resist every effort to compare or combine inter-team results in a roll-up. 2) always use a metric experimentally and for a while before you declare it valuable.

These three traps are so inviting, and I get that, I get how cool it would be if any or all of them represented real possibilities of measurement.

I get how much you wish it were true.

Wishing does not make it so.

(My Grandma had a particularly earthy and profane way to say wishing don’t make it so, which bait I dangled before my followers once before, but no, nobody actually asked me to tell them, and now it’s too late, the moment is passed. I’m not mad, I’m just a little disappointed.)

You’ll have noticed that the second key to getting out of all of these traps is the same: use a metric experimentally and for a while before you declare it valuable.

Metrics seem especially susceptible to the kind of reasoning we call idols of the schema: they seem awesome on paper, regardless of whether they are beneficial.

My closing advice: be many times more suspicious of metrics than you are suspicious that your process isn’t working.

The GeePaw Podcast

If you love the GeePaw Podcast, show your support with a monthly donation to help keep the content flowing. Support GeePaw Here. You can also show your support by sending in voice messages to be included in the podcasts. These can be questions, comments, etc. Submit Voice Message Here.

This is not, as a general rule, true of metrics.

Wishing does not make it so.

The GeePaw Podcast

Related Posts