Something Rotten In The Core October 24th, 2017
There's a key thought of UNIX philosophy which centers around the idea of linking programs together. You know, piping the output from grep
into sed
and then into sort
, that kind of thing. It kinda works well, I guess. For text at least.
But one of the reasons it can work OK is because you, as the end-user writing this little script or command, have full knowledge of the pieces you're building it from. You understand grep, you understand sed, and if any of those pieces suddenly stop working then you can pick the piped command-line apart again and see why. It's a system made up of pieces that you have control over, and most importantly: all the pieces are exposed to you.
This idea spread throughout UNIX, but in ways it should never have. I'm referring here to the misfortune that is debugging.
You see, on UNIX there's GDB. That's the debugger. That's the only debugger. It's very old and has had a lot of work put into it, and as a result it usually works pretty well, at least in terms of functionality. But on every other metric you measure software by, it kinda sucks.
Despite every computer made in the past 40 years having a graphical display, GDB lives in a parallel universe where the framebuffer was never invented and we all still use teletype printers. Teletypes work OK enough for getting output, but for interactive programs it just falls apart.
And so people who didn't want to deal with the pain of GDB invented the "GDB Wrapper" -- a separate piece of software that would show a nice user interface, but internally would call GDB to do the work.
We're not talking about calling out to a library here. We're talking about actually launching an instance of GDB, passing it commands, and parsing the results it prints out. And this is where we get led down a dangerous path.
APIs are hard to begin with. Good API design is very much an art, and it takes a lot of experience to come up with good ones. And the reason so many APIs are bad isn't because someone designed a bad API -- it's that they didn't even realize they were designing an API to begin with.
So much of our software world now is filled with wrappers -- programs that don't actually do the thing themselves, but 'outsource' their work to other programs. It's a stack of layers, and it's not a nice clean stack. I remember something Jeff Roberts once said to me -- the layers grind against each other, and you can feel each one chipping bits away as they collide.
I had the utter delight a few months back of trying to debug something using Qt Creator one day, except I couldn't. It just suddenly one morning refused to start debugging programs. It wouldn't say why of course, it would just sit there doing nothing.
Nothing had changed, at least so I thought. So why the failure? It turned out, after some experimentation, to be because Microsoft's symbol servers were down. That's right, a remote failure on someone else's part meant I couldn't debug locally.
Now of course errors happen in life, and are to be expected. But I didn't get an error. I didn't get a warning. I didn't get anything, except an unresponsive UI. And the reason for this, I think, is precisely because the wrapper wasn't fully aware of all the facts.
Jeff Goldblum said it best in that famous scene from Jurassic Park:
The problem with the scientific power you've used is it didn't require any discipline to attain it. You read what others had done and you took the next step. You didn't earn the knowledge yourselves, so you don't take the responsibility for it. You stood on the shoulders of geniuses to accomplish something as fast as you could, and before you knew what you had, you patented it, packages it, slapped in on a plastic lunch box, and now you want to sell it.
We've seen it hundreds of times in all kinds of software. Functions that return bool
instead of an error code. Where did the precise error vanish to? Poof, it's gone! What used to be a useful error message became false
, and if you're lucky you'll get a generic "Unexpected error" appearing on screen. And that's if your program is using a library. If it's calling out to a command-line worker, the most likely case is it won't get checked at all and will just get printed out into a log file you'll never find, and then never seen again.
Or networking software that just sits there spinning a cursor when something went wrong. So much user-facing network software is built on top of other programs, like ssh
or rsync
, and when those things fail they just don't know what to do. And so much of the problem is precisely because they're not using them as libraries, they're using them as command-line utilities. They're using these things that have ill-defined interfaces to begin with, and because it's all based on outputting text, the programmers think they can just look at examples of the output and figure out an API from that.
There's a quote from the great Douglas Adams which I'm sure I've used many times before, but it's just so incredibly apt for most of today's software:
The major difference between a thing that might go wrong and a thing that cannot possibly go wrong is that when a thing that cannot possibly go wrong goes wrong it usually turns out to be impossible to get at and repair.
You've all seen those cheap Chinese toys that look like a PlayStation, but inside its just a 6502 and 50 NES games. It's all fake, it's an illusion. It's a nice plastic finish on the outside, but if you were to open it up there's nothing in there. It's a rotten core, wrapped in layers of opaque complexity.
We're making systems that are fragile, because they're just glued on rather than bolted together. We're wrapping complex things up a wrappers that don't take the same responsibilities as the things they rely on. Like Homer's pecking bird in the Simpsons, they work just fine when everything is as expected, but when the slightest change in situation happens then everything breaks.
Written by Richard Mitton,
software engineer and travelling wizard.
Follow me on twitter: http://twitter.com/grumpygiant