My Best Bug


I shipped a word processor that formatted the hard drive every 1024 saves.

Must have been ’84 or ’85. I was a bright 25-year-old with about five years in the game. I was one of two programmers who wrote & maintained a suite of apps kinda like Office: spreadsheet, wp, database, plotter, such like. We customized everything for three or four vertical markets.

So I wrote most of the wp. This was in Forth, on a variety of OS/CPU combinations. Young’uns don’t necessarily know this, but in those days there’d be a new microcomputer with a custom OS and one of several CPU’s every month or so.

Forth used block-based disk i/o. Each block was 1kb in length, and to save anything larger than 1kb, you ended up with a file master block that would contain offsets to the blocks that had the data, basically a pair of lists, allocated and none.

To initialize that file master, I’d fill it with zero’s. And away we go!

Long story short, I changed the app to allow us to have files that were twice as long as we’d had previously, and the 256 entries in the file master became 1024.

Each entry was — and here we go — a word you see, of 16 bits. So, while growing that size to 512 would still fit in my 1k master block, growing it to 1024 would require two of those blocks.

So I changed the fill number, for those zeros, without changing the buffer size. In unprotected memory situations, when you fill 2048 zeros over 1024 bytes, the extra bytes overwrite, well, whatever random crap is in memory beyond that buffer size.

In my case, "whatever random crap" was the o/s’s master disk block. Yes. When I went to initialze the new file, which happened every 1024 saves of a 1-block file, I overwrote the drive’s master block, eliminating all of its files.

So, yeah, sure enough. We shipped that version of the code for, idunno, 8 or 9 weeks. And you can imagine the phone calls I was handling from the customers. We wound up telling everyone to always back up (to VCR tapes, btw) every single day.

I spent whole weeks doing nothing but making apologies by phone and running my app in the debugger over and over and over again. Nothing. I got nothing.

One woman called though, and she said she’d made her — this was a trucking company, and this woman’s use of profanity will serve me as a personal model for all the rest of my days — had made her backup, fired up the wp, and immediately wiped her drive.

She’d reloaded the backup and it again immediately wiped her drive. She was seriously pissed.

Do you know it took me almost two hours to realize what I had? I was telling my s.o. the story at lunch, and then I was like, wait. Wait. Immediately? Every time.

I called her up and promised her everything, everything, if she would just pack that tape up and fedex it to me. I told her we’d pay the shipping and waive their monthly charges. (My boss never hesitated, he was like, "Oh hell yeah we’ll waive the charges.")

I finally had the bug replicated. She wasn’t lying. The forth image that had been backed up had the critical point: 1023 saves at the time the backup was done. Load the backup, do a save, get a wipe, guaranteed.

I may be exaggerating, but I’m pretty sure I found the problem the same morning the tape got delivered. It was that simple to see in the debugger.

I was looking at the 2k block about to be filled, and the last 1k was, idunno, garbage. That seemed odd, so I investigated it. Of course, it wasn’t garbage, it was the 1024 bytes of the drive’s master disk block, stashed next to my file master block by the operating system.

What I had was "pre" core dump. I didn’t have the core dump, but I had a guaranteed route to the defect. If I hadn’t gotten lucky, I’d’ve never found it. It had simply never occurred to me that I was writing to system memory instead of my own.

So, yeah. Your master geek shipped a word processor that, for eight or nine weeks, every 1024’th save, would format the customer’s hard drive.

Cuz. You know. I got skillz.

There are prolly lots of morals to the story, but the two I’ll highlight for any juniors:

  1. Don’t use numeric literals for anything but 0, 1, and -1.

  2. Don’t be so hard on yourself. All your old seniors have stories like this, and some of us have several of them.

🙂


The GeePaw Podcast

If you love the GeePaw Podcast, show your support with a monthly donation to help keep the content flowing. Support GeePaw Here. You can also show your support by sending in voice messages to be included in the podcasts. These can be questions, comments, etc. Submit Voice Message Here.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top