February 26, 2007

What did you do today ?

This one kept bugging me for a long time now.

It appears that many people hate the work they do. I can see how one can be forced to take a job he doesn't like, fate, bad luck, blah blah, but still, I totally don't understand this - how could you live if you cannot be proud of the work that you do ?

What do you say when you come home every day ? And I don't mean - to your family, but even to yourself, what do you say to yourself - I did what today ?

How comfortable it is when you can't point a finger and say "I made this" ?

When you are making crap, and everybody knows it, when people curse and spit when they encounter something that you've made, how do you feel ?

I do believe that most people still feel bad when then do their jobs wrong. Although they can find excuses and even blame somebody else, there has to be something, because if such behaviour was perfectly ok for them, they wouldn't have been looking for excuses and blaming others, and the more agression means more discomfort.

Which means - a lot of people hate their jobs, do them miserably and are at more or less permanent stress about that. Isn't that terrible ?

February 20, 2007

Never underestimate the power of randomness

I've just returned from a deep testing and debugging session and all I can say is again - wow ! never underestimate the power of randomness !

The system I was testing is a complex of network services build on top of Pythomnic platform with multiple Python processes scattered across multiple servers and intertwined together in redundant and fault-tolerant fashion. When it's live, it's going to be the billing hub service for the bank where I work. It has to deal with all sorts of payments to all sorts of providers. And so my job is to build a system to which modules for specific providers will be plugged later. It is also transferring money, so it'd better be reliable.

Someday I'm going to describe the design of that system as a case-study for Pythomnic and publish it on its web site. That will be, but for now, here is my recipe for the best testing:

Stress + failure injection + randomness

Stress: don't spare the system you are testing. The users will not. Give it as high load as it can handle and then some. It's no problem if it breaks now, but the amount of problems (not always bugs) that are revealed under unbearable load is surprising.

Failure injection: don't expect the problems to happen just because you are testing. Make them happen. Break stuff. Insert something like:

if random() < 0.01:
raise Exception("failure before provider request")

if random() < 0.001:
sleep(3600)
raise Exception("provider request hangs")

result = provider_request()

if random() < 0.01:
raise Exception("failure after provider request")


Insert it all over the place. Well, there is no point inserting failures between each two statements, it quickly gets cumbersome, but you should decorate each "external" call with such injected failure frame. It may be a database request, a specific API call etc.

Randomness: that's my favourite part, in testing, you can't beat randomness. You would never make up such a combination of failures that random() would. Make sure your random switches cover all the major code paths and let it running for a while. If it succeeds you can be pretty certain the system is working. To be sure, such random testing may not catch all of the special border cases in each of the modules, but for load testing - it's invaluable.