June 13, 2010

Re: Cryptography

This post is a response to a recent discussion on a "Russian Software Developer Network" forum. The thread was called "Cryptography".

Oh, the drama ! And professionalism level was unrivaled. It was there that I found a new addition to my personal hall of fame:

epileptic curves

Seriosly though, it somehow happens that cryptography becomes the easiest part of security. Easiest to know about, easiest to talk about, easiest to show off with.

Why ? I'd say it is because it is closely related to mathematics and mathematics brings in the safe harbour feeling to those who want certainty in the shaky world of security. Besides, many of those who talk passionately about cryptography (including myself) have mathematical background.

Surprise, the security-related feature of cryptography is not based on hard mathematics. See, the feature that we seek most in cryptography is called "strength". We want it for encryption, for hashes, for digital signatures, for everything. It is strength which causes holy wars on forums. But what is it ?

In cryptography, strength is the ability to withstand currently known attacks.

See the problem ?

The words "currently known" reduce all hopes for certainty to dust. You cannot "prove" strength in mathematical sense. Anything is strong as soon as it hasn't been demonstratively broken.

There is not much reason comparing strength as well. As seen on the Internet:

My kung-fu is stronger than yours by 217

But it only makes sense if you compare identical or very similar algorithms - then you are essentially comparing their lifetimes. As we assume they both haven't been broken yet, the larger the power, the more time on average it takes the attacker to break it using some kind of brute-force attack.

Put simply, all cryptographical strength is based on one big assumption - that the good guys know better than the bad guys.

We believe something is strong because noone has published the way to break it. Even though such way may exists, and may be widely used against us, we still consider it strong until the contrary appears on paper.

The biggest paradox here is that we are even sure that there is a way to break it, it is just that noone (meaning the good guys) has found it yet. And we hope noone (meaning the bad guys) will while we are using it.

We believe that the respectable scientists work hard trying to break every known algorithm and we are somehow sure they break them first. And publish. Not for money, not for fame, just for the sake of it. What were the names of the people who published attacks against MD5 ?

And the bad guys have much better position. They need to attack just one algorithm, or even just one key. They have enormous resources and motivation to do it. They might have affected the design of the algorithm to put a backdoor in it in the first place. And they don't need to publish their results, but silently exploit it for years.

Well, the good guys seem to be winning so far. Or do they ? You never know. This is called security.

May 27, 2010

Python 3 frameworks anyone ?

First, I'm happy to announce that I have just released the next version of Pythomnic3k, a Python 3 framework to develop SOA middleware.

But I'd also like to share with you the big question of this Python 3 framework.

I have been working on its predecessor Pythomnic (similar, but written in Python 2) in 2005-2007 using it for integrating various systems in some bank. It worked, but as any software being developed in ad-hoc manner became messy over time. Not to mention the fact that as I learned Python, the old code looked uglier every day.

And so, as of early 2008, along with the first Python 3 betas being released, I decided that Pythomnic needed a complete redesign and rewrite exclusively in brand new Python ! Pythomnic3k was in development ever since. It has a nice and clean upfront design, based on 3 years of experience with Pythomnic, it's written much better, and it is has extensive self-tests. Which is to say, it is a quality piece of software. I spent next 1.5 years polishing it, until release 1.0 was finally published in 2009. Release 1.1 which I believe I've already announced, came out after some 8 more months of refinement.

All this time I kept using it for what, for the same integration tasks - connecting point A to point B, transforming messages, supporting various protocols. In the company I work for, it is used for delivering bank transfers and billing payments, sending SMS notifications (contains full implementation of SMPP 3.4 among other things), providing cryptographic network services of various sorts, and just about anything. In short, it serves as a middleware glue, and if I'm allowed to judge, it fits the bill.

Now, the big question is - was it really beneficial to switch to Python 3 starting a new development ? Take a look at the list of Python 3 packages. The language is around for 2 years, and there is like what, 50 of them ? Out of which many are one-module utilities ? Give me a break.

Python 3 looked promising, although it was not immediately apparent, what new features are the killer ones. Frankly, I'm still not sure. I love the syntax improvements and the correct str/bytes, but what else ? Am I missing the wave or it is not there yet ?

Anyway, Pythomnic3k architecture has very little dependencies, it is a pretty much self-contained framework, which means that it doesn't suffer from the lack of anything in particular in Python 3 libraries, but I would still love to see more Python 3 libraries around to have them plugged to the framework.

April 02, 2010

You come to software market ...

... and you want your software cheap, fast and of high quality. You my friend want to be fooled, and you will be fooled, because nature cannot be.

March 12, 2010

How are they going to shut the Internet down ?

Well, I never would have thought that my first post after such a long period of silence would be like this, but this is what bothers me.

Given the current political situation in Russia, in which power belongs to totally corrupt organized crime, the Internet remains the only media where anyone can speak out. For doing so you still may be prosecuted, but this is the only place where one can at least publish an unpopular opinion.

For example, check out the Internet shit storm (available mostly in Russian) on the topic of outrageous stealing as much as $50 billion of budget money under state-approved "make drinking water clean" program.

And so my question is - how are they going to shut the Internet down and how soon ?

June 19, 2009

Faulty character decoding as the last line of anti-spam defense

I receive spam every day. Filtering is in place and everything, but occasionally some garbage gets through. And then I may look through it, briefly, less than a second perhaps before I hit "Delete", but the eye is fast enough to read and understand more than I'd want to. Then you might say such kamikaze message still had succeeded.

Much of the spam I receive is in Russian. As a side note, Russian characters have multiple encodings - WIN1251, KOI8-R, CP866, ISO-8859-5 and the universal UTF-8 come to mind. This means that the mail client has to properly understand the encoding and decode the message so that it can be displayed correctly.

I use Thunderbird, and it is just awful in decoding Russian messages. I don't have any idea why is that, but I have to manually specify encoding for every last message, because they always appear garbled.


But then, the bug becomes an unexpected feature - the spam messages look undecipherable just like legitimate ones, and even though I look at it, nothing is imprinted in my mind, and I just hit "Delete".

March 30, 2009

Software architecture

is what you explain to somebody else so that he understands the matter.

February 05, 2009

This is Python: context managers and their use

Python allows the developer to override the behavior of pretty much everything. For example, as I explained before, the ability to override the "dot" operator makes all sorts of magic possible.

The topic of this post is similar magic enablers - "context managers", defined in PEP-343. I will also demonstrate one idiosyncratic context manager example.

To begin with, it is important to note that Python reasonably suggests that when a developer modifies the behavior (i.e. the semantics) of something, it is still done somewhat in line with the original syntax. The syntax therefore implies a certain direction in which a particular behavior could be shifted.

For instance, it would be rather awkward if you override the dot operator on some class in such way that it throws an exception upon attribute access:
class Awkward:
def __getattr__(self, n):
raise Exception(n)

Awkward().foo # throws Exception("foo")
It is a possible but very unusual way of interpreting the meaning of a "dot", which is originally a lookup of an instance attribute.

Having this in mind we proceed to the context managers. They originate from the typical resource-accessing syntactical pattern:
r = allocate_resource(...)
try:
r.use()
finally:
r.deallocate()
Such code is encountered so often, that it indeed was a good idea to wrap it into a simpler syntactical primitive. Context manager in Python is an object whose responsibility is to deallocate the resource when it comes out of its scope (or, context). The developer should only be concerned with allocating a resource and using it:
with allocated_resource(...) as r:
r.use(...)
In simple terms, the above translates to:
ctx_mgr = ResourceAllocator(...)
r = ctx_mgr.__enter__()
try:
r.use()
finally:
ctx_mgr.__exit__()
I note a few obvious things first:
  1. Context manager is any instance that supports __enter__ and __exit__ methods (aka context manager protocol).
  2. A specific ResourceAllocator must be defined for a particular kind of resource. The syntactical simplification does not come for free.
  3. Context managers are one-time objects, which are created and disposed of as wrappers around the resource instances they protect.
What is less obvious is that a class can be a context manager for its own instances, there need not be a separate class for that. For example, instances of threading.Lock are their own context managers, they provide the necessary methods and can be used like this:
lock = threading.Lock()
with lock:
# do something while the lock is acquired
which is identical to
lock = threading.Lock()
lock.acquire()
try:
# do something while the lock is acquired
finally:
lock.release()
Finally, I proceed to an example of my own.

See, I tend to write a lot of self-tests and I love Python for forcing me to. And some of the tests require that you check for a failure. Long ago I used to write code like this:
try:
test_specific_failure_condition()
except SpecificError, e:
assert str(e) == "error message"
else:
assert False, "should have thrown SpecificError"
which made my test code very noisy. I have even posted a suggestion that a syntactical primitive is introduced to the language just for that. It was rejected (duh !).

And then I wrote a simple "expected" context manager which makes exactly the same thing for me every day now:
with expected(SpecificError("error message")):
test_specific_failure_condition()
See how much noise has been eliminated ? How much clearer the test code becomes ? It is not a particularly "resource-protecting" kind of thing, but still in line with the original syntax, just like I said above.

The "expected" context manager source code is available here, please feel free to use it if you like.

To be continued...

December 28, 2008

No sense in sensors

I like buttons. I like handles. I like dials. I like doorknobs. I like doors for that matter. I like physical controls whose shape and feel suggests their usage and whose usage provides physical feedback. If it clicks, budges and moves, then it's good. When it is in expected position and its usage is apparent from its form, then it's good.

Sensor controls makes no sense to me. I hate smearing fingers against black glossy surface, with unclear outcome. Did it work ? Did I activate the right icon ? I hate it when controls are not really controls, but images on the glass. I hate it when controls change their places, look and functions depending on what I am doing.

Even my stove is black and glossy, with no buttons but tiny engraved white icons. Makes it easy to clean for sure, but using it feels nowhere like pressing a button. Oh well, at least the icons are always in same positions.

iPhone, yes I tried it. Could have spoken through a cigarette case instead. Doesn't feel like phone at all. Large flat nothing.

Now, why sensor controls are so popular these days then ?

The way I see it, sensor controls are cheap alternative to good interface design. See, if they knew what this thing would be used for, they could have spent time and money on design and give it a nice interface, specifically for its function.

But there is a problem - they don't know what the thing will be used for. Instead they plan to use it for something no one could imagine at the moment. And they don't want to cast it in stone. They want to leave their options open, so that the interface can be changed later through software update.

From the manufacturer point of view, the sensor panel is the ideal instrument to implement any interface they may need in the future. It is a way to secure investments, rather than make it more pleasant to use. And the rest is nothing but fashion, done through professional advertising, product placement and bandwagon effect.

Ideally, people need to be placed in a world with indifferent black walls, with the content dynamically downloadable from the BigCorp site. Virtual reality, that's what it is. Opaque screens instead of windows, so that you can choose a "view". Dumb sensor panels with fake buttons. Smaller packages with more useless contents. Things that you have no control over.

And I hate it. I like real things.

December 08, 2008

Pray to rest of the best remote banking solution in Russia

Alas, the bank I've been working for for the last five years got a lower hand in a merger. What it means to IT, does not need explaining. Everything we made is slowly dying out.

While it's still boiling, perhaps it's time to look back, and think about what's been done.

The good:

1. Still, our remote banking solution for the last two years in a row has been rated the best in Russia, and the forum full of client complaints for future shutting it down is also a good indication of that. And I am honored to belong to the team to have made it.

2. The side open source project that I have been developing for years during this employment, Pythomnic, I have luckily had time starting early this year to completely rethink, redesign and rewrite from scratch in new Python 3.0. It is a framework for integration in enterprise network using distributed network services. SOA, EAI, you name it. Essentially, this is what I have been doing for the last five years in the Internetbank project. We have even managed to write a few production services with the new framework and port a few from its previous version. If you don't mind me saying, it is a high quality piece of software, well (re)designed and (re)written. This project I will be working on for the years to come.

The bad:

1. The recession is getting worse. Not too good having to look for a job at times like that. Mind to take a look at my slightly outdated resume ?

2. I still can't force myself to release software the quality of which I consider low. What it means is that I tend to work thoughtfully and thoroughly, but yes, slowly. I could have argued for and against such approach myself, but not in this post. Anyway, such habits don't play well with modern freelancing. Who needs quality today ?

Therefore, pray to rest of the wonderful Internetbank project and if you like pay attention to the Pythomnic3k framework - I hope it is worth your attention.

October 16, 2008

Why do e-mails have subject ?

Real mail doesn't need subject, nor headers of any kind really. Could you imagine

From: Leo Tolstoy
To: Anton Chekhov
Subject: Re[2]: War and peace
Date: 16.03.1899

My dear Anton,
...

?

What is the point for e-mail to have headers anyway ? Some of them are transport level technical details. For example, To and From field serve about the same function as the physical letter envelope with handwritten addresses on it. But subject, what is in subject ?

It always takes me considerably more time to come up with a sound subject, and it still almost always says nothing about the contents of the letter. What's the point ?

Is it presumable e-mail volume, so that the user could just look over the long list of subjects without actually opening it ? Or is it limited space on 1970s terminal screens ? Or it is just a technical artefact for the sake of e-mail indexing, storing and referencing ?

Anyhow, right now, neither subject, nor From, nor To fields mean anything.

If a given e-mail is indeed a mail message sent to me, then I don't care about neither To (which is implicit - me), nor From (which is expected to be politely included in the body) nor subject (which, like I said is meaningless when written by a well-meaning sender). I simply open the message and read it entirely.

If, on the other hand, the e-mail is a spam, I care about From, To, or subject even less. I just trash it (in fact, my e-mail filter does it for me).

Then, either way, I care only about the contents, not about From, To or Subject. The key problem is really in separating letters from noise. But then From, To and Subject don't help it either. What's the point in having it ?