Things That Require Further Thinking: May 2006

May 28, 2006

Acme Corp. Do something.

Take any big company ad slogan:

HP. Invent.
IBM. Think.
Apple. Think different.
Samsung. Imagine.

whenever it has instructions in it - to think, invent or do anything in particular. What a heck does thinking (even thinking differently) or being able to invent has to do with the commodities those companies produce ? What does it mean at all ?

Do they really want to say - "Buy our stuff and then do whatever nice things we are telling you to do" ? That would be ridiculous. Do I need permission to think ? Do I have to buy something IBM to apply ?

"Buy our stuff and then you will start doing it no matter if you want it or not" ? Buying a Mac unlikely will make you think differently. Acquiring different GUI habits may be, but not different way of thinking, otherwise nobody would ever be able to interact with Apple owner from the moment of the purchase on.

"Buy our stuff and then doing those things would be easier" ? Nah, I doubt if inventing with HP is easier than with anything else.

Oh, I get it, I'm probably supposed to associate HP with inventions as such ? IBM with thinking ? Whatever company with whatever pleasant activity ? Now, THAT would be really stupid, just to think that there is an association between buying some piece of equipment and engaging in any pleasant activity. Otherwise, where are "IBM. Think, do you really need it ?", "Samsung. Imagine something not made of semiconductors" or even "Don't believe everything you hear in commercials" or "Trash your TV and go see the sunset" ? That last one, wouldn't it be a pleasure to do ?

Anyhow, TV commercials, magazines ads and equipment cartons would be the last things I would use for motivation, thank you very much. Except for blogging.

May 24, 2006

So, how does it feel writing infallible code ?

I think many people feel uncomfortable about the code they write or maintain, systems they support or administer. The stuff is just bad, let's face it. But people (including myself) tend to find a comforting spot in which they are most emotionally stable and least disturbed about the reality they have to live in. But this comfort is often based on ignoring the truth. Let's see how it may work in one particular case.

I had this idea when thinking about moving one very small but very common piece of functionality into a network service. We have each system to determine whether a given day is a workday or a holiday. As there is no such stable rule, each system has its own volatile piece of data in whatever format which it consults, and every administrator of every system has not to forget to keep it in good shape. It repeats in tens of variations, burdens the developers and is a constant administrative pain. Sounds like a good idea to factor it out into a network service ?

Paradoxically, it does not, precisely from the point of view of those troubled systems' developers and administrators. But why not ? It seems odd. Before the refactoring they have


function workday(date):
  return consult_own_database(date) == "W"
...
if workday(today) then do_stuff()
...

whereas after refactoring they have


function workday(date):
  return consult_magic_service(date) == "W"
...
if workday(today) then do_stuff()
...

(mind you that this a pretty real-life looking code). It looks like nothing changes at all, except for now they have one less problem. To simplify, let's assume such change wouldn't require any work from them, will be done in a snap and cause no troubles in itself. Still they refuse. What is the likely reason they give to cancel the change ?

- It's less reliable. A call to a local database (or filesystem) is less likely to fail than some network service.

Reliability, that I understand. And it's also reasonable to assume that the probability of local database failure is less than that of a remote network service. But I argue that it's not the different degree of reliability that matters here. After all the above chunks of code are identical, why would one be less reliable than the other ?

The first (existing) piece of code does not indicate a less probable failure. In fact it indicates a zero probability of failure. The call may fail, but the code does nothing about it. The developer consciously or unconsciously knows that there is a possibility of failure, but as soon as it's perceived below certain threshold, she's still comfortable ignoring it. Now with the subtlest change comes an increased failure probability which is too uncomfortable to ignore.

And what are the options ? Rewrite the code so that it accepts failures ? Impossible, too difficult, can't be done, no way, period. Thus we don't need the change, it will break the system which works fine. Ironical, isn't it ?

And so the thing remains unchanged, with regards to this switch to a network service or not, it becomes too fragile - touching it is dangerously uncomfortable. Therefore the point of this argument is that developer's comfort based on false assumptions, such as improbability of failures, forbids any major changes in the software.

May 06, 2006

The pattern to rescue other patterns: The Golden Hammer

Patterns look like they can solve all of your real-life problem. To me, this is awfully wrong and can do more harm than good. Yes, patterns allow for organizing your knowledge and also for developers to find commond ground easier and they do awesome job there. But they are nothing more than a mnemonic constructs.

I sometimes hear sentences like

- We use pattern X here, because it's the right thing to do in this kind of project.

Not because the solution has the appropriate structure, and pattern X somehow emerges from it, but because such and such problems require such and such patterns. What a heck ? I thought problems require solutions, not patterns. To me, such talk is a bad sympthom - it means that people can't see forest for the trees. Or, which is also possible, that I can't understand what they are saying, but let's forget about this insane assumption at once :)

Now, for pattern lovers - the pattern to the rescue - The Golden Hammer. It says that if you are good at hammer work, all your problems appear to you as nails. Likewise, if you are good at patterns, you find the patterns you already know everywhere. The key point here is that you do not attempt to apply thought and find other kind of patterns, but blindly say - hey, this obviously asks for a bridge, and this is undoubtedly a mediator here.

It appears that most beautiful and influential things appear with reflection - a tendency to analyze oneself. The Golden Hammer is most reflectional as it implies you actually think about the process of applying patterns (including itself). And if you are more conscious in applying patterns - you benefit more, don't you think ?

May 03, 2006

Expecting software errors, not fixing one and pretending none left

There is a problem with your software, something strange is going on, some error appeared that has never had appeared before, or should have never appeared in the first place, the system produced insane output or just died ?

It's unwise to think about anything you are experiencing as of "unbelievable". You see it, you have hard evidence - it exists. Why is that - you have no idea, but it exists. Finding out the origin of such a problem requires a totally different approach to debugging. As now you have no idea why it happened, you likely could have never predicted it, tested for it or even thought about it.

Stop and think about it. You could have never thought your software would fail in that particular way. Taken to the extreme this would mean that you never thought your software could fail at all.

But this is exactly how novice programmers think. As they become more experienced, the amazing world of software failures appears before them, and they start applying the typical code/debug cycle, error handling and/or try/catch patchwork, which still limits them to the errors they are experiencing at that single moment.

It seems logical that if the programmer's experience is taken to another extreme, she would expect failures everywhere, just for the sake of vigilance. No matter if a failure of some kind has never happened before - it's there, waiting to happen.

The point of this argument is that the key thing to writing stable software is being failure-proactive and expecting everything to fail. Just ask yourself pointing finger to any piece of your system: "what happens if this fails (for whatever reason) ?" The results can be far reaching.

I therefore argue that the common debugging is far less important a process than it's usually thought. Of course, finding an error and fixing it is important and must be done, but if you had expected it, your whole system is by now prepared to any error anywhere, and this one error only challenges this fact.

Moreover, I also argue that trying to break your own software is at least as valuable a practice as the forementioned debugging.

Yes, it's a well known maxim that all testing is about breaking things, not about seeing it working. But there is a catch. Tester can only test against problems that he can think of, so much like the developer. Moreover, as soon as developers and testers work side by side (unless it's the same person), they tend to share the same narrow field of view, which greatly reduces testing efficiency.

That's why I like stress testing. Not because it shows me impressive transactions per second, but because it tends to reveal another layer of problems usually never seen. Stress testing is very useful, but it shines whenever a failure leads to a cascade of other failures. Whoa ! I couldn't have thought of THAT !

Now, how else you can intentionally break your system except by giving it a stress load ?

Failure injection comes to mind, but it contradicts to the normal development cycle (as there is simply no place for it), and requires the mentioned failure-ready system structure from day one. I also tend to remove the injected failures in production system, after all being a failure-vigilant software developer won't save you from black cats and broken mirrors.

Unplugging cables from the working system is also fun and can reveal yet another unexpected problems, although again, it's probably not applicable to a production system. For another joke of this kind see "The China Syndrome Test" in "Blueprints for High Availability" by Evan Marcus & Hal Stern.

In conclusion, this discussion again turns me to the PITOMNIK principle: as the system grows, its small errors grow with it and therefore lead to inevitable failure. You cannot ignore the presence of the errors, nor you can fix them any significant fraction of them, so you'd better be ready.

Things That Require Further Thinking