On Disappearing Code

Or, The Space Between Commits

Version: 66149d0

I’m sure there’s a word for what I’m referring to, but I don’t know it. By disappearing code, I mean the kind you write as scaffolding. It exists in the space between commits, never making it into your repo. Usually you write it while investigating something. As soon as possible, you remove it so no one ever knows you wrote such slop

Data scientists write a lot of disappearing code.[1]

As the adage attests, on data analysis projects, you spend most of your time trying to make dirty data clean. That’s usually schlep work. You write a little bit of code; some data break it; you try to identify a pattern inherent in the errant data; you update your code; that usually breaks something that previously worked; you update it again; you drink some coffee; then, you do this some more.

When I’m trying to fix a tiny bug. pic.twitter.com/nml6ZS5quW

— Mike Bostock (@mbostock) November 3, 2015

Days later, you have clean data — and, a not-so-sneaking-suspicion that, if you were to apply your data cleaning tool to new data, it would break.

And, yes, it probably would break.

Partly, such fragility is inherent to such a task. Dirty data violate expectations. It’s not reasonable to assume you’ve covered all the cases. However, there is a more insidious problem. Most of coding is a process. We need to acquire good habits because anything non-trivial doesn’t fit in our heads. The code that remains after grooming has form. You can read it; you can review it; and, you can put pieces of it in your head. But, the other kind of code — the disappearing code that often monopolizes a project in terms of time — resists analysis. You can’t point to it because it’s ephemeral. This combination of a cognitive limitations and the absence of inspectable artifacts conspires to blind us.

Of course, disappearing code need not forever remain hidden. At some point, someone intrepid coder reaches the magical point of sufficient frustration, after which they feel compelled to find a better solution. Often, after much decanting, deliberation, and doubt, this better solution evolves into a framework. And, when ameliorating disappearing code problems, such frameworks work best when opinionated.

Why? The authors of the framework — the ones for whom the itch finally became unbearable — have paid their tuition in bitsweat.[2] They’ve learned the hard way what works; what tends to break; and, what tends to lead to maddening Heisenbugs. A good framework crystallizes this experience. Subsequent users don’t have to re-explore the entire terrain. The pioneering authors have already done that, and the framework is the map which guides them quickly and safely from A-to-Z, avoiding the badlands.

Oregon trail screenshot that says, ‘Congratulations! You have made it to Oregon! Let’s see how many points you hve received.’
Oregon trail screenshot that says, ‘Congratulations! You have made it to Oregon! Let’s see how many points you hve received.’

I wish I had a bulletproof prescription to make myself aware of disappearing code contexts. But, I don’t. However, I do know that, when your coding auto-pilot is on, you’re less likely to see it. So, turning the auto-pilot off generally helps. That’s hard though, especially with deadlines. Auto-pilot is the companion that allows you do slog through mind-numbing tedium.

Instead, my preferred practice involves taking five minutes at the end of the day to write down what I did. That is, I try to articulate both the motivations and process. Often, it’s remarkable how the mere act of trying to name something clarifies. The label makes it less ephemeral. It grants you the cognitive manipulation, previously lost. This practice doesn’t payoff immediately. But, by the end of the week, reviewing all your log entries sometimes reveals opportunity. For me — at least a few times — the payoff has been huge. I’m not sure I would have stumbled upon some solutions by other means. And, in any case, I’m convinced the articulate-after-the-fact method reduces the distance between what I have now and something better.

Notes

  1. I’m not a fan of “Data Scientist” as a title. It’s a misnomer — we’re mostly Data Janitors who moonlight as semi-competent statisticians. But, it’s the recognizable term. And, I’ll be damn sure to use it on my résumé.

  2. Nope, I’m not talking about Jeremy Daer. It’s just such a wonderfully intuitive compound word, conveying something coders immediately grasp. Well played, Jeremy.

  3. This post was originally published on Medium.