December 28, 2008

No sense in sensors

I like buttons. I like handles. I like dials. I like doorknobs. I like doors for that matter. I like physical controls whose shape and feel suggests their usage and whose usage provides physical feedback. If it clicks, budges and moves, then it's good. When it is in expected position and its usage is apparent from its form, then it's good.

Sensor controls makes no sense to me. I hate smearing fingers against black glossy surface, with unclear outcome. Did it work ? Did I activate the right icon ? I hate it when controls are not really controls, but images on the glass. I hate it when controls change their places, look and functions depending on what I am doing.

Even my stove is black and glossy, with no buttons but tiny engraved white icons. Makes it easy to clean for sure, but using it feels nowhere like pressing a button. Oh well, at least the icons are always in same positions.

iPhone, yes I tried it. Could have spoken through a cigarette case instead. Doesn't feel like phone at all. Large flat nothing.

Now, why sensor controls are so popular these days then ?

The way I see it, sensor controls are cheap alternative to good interface design. See, if they knew what this thing would be used for, they could have spent time and money on design and give it a nice interface, specifically for its function.

But there is a problem - they don't know what the thing will be used for. Instead they plan to use it for something no one could imagine at the moment. And they don't want to cast it in stone. They want to leave their options open, so that the interface can be changed later through software update.

From the manufacturer point of view, the sensor panel is the ideal instrument to implement any interface they may need in the future. It is a way to secure investments, rather than make it more pleasant to use. And the rest is nothing but fashion, done through professional advertising, product placement and bandwagon effect.

Ideally, people need to be placed in a world with indifferent black walls, with the content dynamically downloadable from the BigCorp site. Virtual reality, that's what it is. Opaque screens instead of windows, so that you can choose a "view". Dumb sensor panels with fake buttons. Smaller packages with more useless contents. Things that you have no control over.

And I hate it. I like real things.

December 08, 2008

Pray to rest of the best remote banking solution in Russia

Alas, the bank I've been working for for the last five years got a lower hand in a merger. What it means to IT, does not need explaining. Everything we made is slowly dying out.

While it's still boiling, perhaps it's time to look back, and think about what's been done.

The good:

1. Still, our remote banking solution for the last two years in a row has been rated the best in Russia, and the forum full of client complaints for future shutting it down is also a good indication of that. And I am honored to belong to the team to have made it.

2. The side open source project that I have been developing for years during this employment, Pythomnic, I have luckily had time starting early this year to completely rethink, redesign and rewrite from scratch in new Python 3.0. It is a framework for integration in enterprise network using distributed network services. SOA, EAI, you name it. Essentially, this is what I have been doing for the last five years in the Internetbank project. We have even managed to write a few production services with the new framework and port a few from its previous version. If you don't mind me saying, it is a high quality piece of software, well (re)designed and (re)written. This project I will be working on for the years to come.

The bad:

1. The recession is getting worse. Not too good having to look for a job at times like that. Mind to take a look at my slightly outdated resume ?

2. I still can't force myself to release software the quality of which I consider low. What it means is that I tend to work thoughtfully and thoroughly, but yes, slowly. I could have argued for and against such approach myself, but not in this post. Anyway, such habits don't play well with modern freelancing. Who needs quality today ?

Therefore, pray to rest of the wonderful Internetbank project and if you like pay attention to the Pythomnic3k framework - I hope it is worth your attention.

October 16, 2008

Why do e-mails have subject ?

Real mail doesn't need subject, nor headers of any kind really. Could you imagine

From: Leo Tolstoy
To: Anton Chekhov
Subject: Re[2]: War and peace
Date: 16.03.1899

My dear Anton,
...

?

What is the point for e-mail to have headers anyway ? Some of them are transport level technical details. For example, To and From field serve about the same function as the physical letter envelope with handwritten addresses on it. But subject, what is in subject ?

It always takes me considerably more time to come up with a sound subject, and it still almost always says nothing about the contents of the letter. What's the point ?

Is it presumable e-mail volume, so that the user could just look over the long list of subjects without actually opening it ? Or is it limited space on 1970s terminal screens ? Or it is just a technical artefact for the sake of e-mail indexing, storing and referencing ?

Anyhow, right now, neither subject, nor From, nor To fields mean anything.

If a given e-mail is indeed a mail message sent to me, then I don't care about neither To (which is implicit - me), nor From (which is expected to be politely included in the body) nor subject (which, like I said is meaningless when written by a well-meaning sender). I simply open the message and read it entirely.

If, on the other hand, the e-mail is a spam, I care about From, To, or subject even less. I just trash it (in fact, my e-mail filter does it for me).

Then, either way, I care only about the contents, not about From, To or Subject. The key problem is really in separating letters from noise. But then From, To and Subject don't help it either. What's the point in having it ?

August 26, 2008

Google, DNS and finding stuff on the Internet

What if you've encountered Internet for the first time ? World-wide-web for that matter. Someone opens you a browser and says

- This is Internet, it has everything. Just type in an address of a site you want to visit.

Er, excuse me ? An address of a site I want to visit ??? WTF is that supposed to mean ? Anyone remember the address of the Pyramids ? I wouldn't mind visiting that particular site.

But really, what is a site address ? It is merely a reflection of a technical detail of the physical network organization. It just so happens that for the sake of unambiguous data delivery each computer on the Internet needs its own unique address. Now, the techies that invented it in 1970s just chose such address to be an integer number. If it was for them, or shouldn't the count of connected computers have exploded, numbers could have been used just as well:

- Connect me to server 12345 !
- You got it.

But people are notoriously bad in remembering numbers, and so there emerged a service similar to the yellow pages where each address could be given a name, and conveniently looked up later. Then it went like this:

- The new server is at great.new.site.com

and the user never bothered to translate "great.new.site.com" into 12345. The responsible domain name system (DNS), the ubiquitous service for looking up pieces of information by name is quite fascinating. It is perhaps the biggest distributed database in the world, and its capabilities have been largely underutilized over the years. May be this is why it is still up and running.

Presence of the DNS became as important as physical network connectivity. If there is no DNS, the Internet might as well be down. If you care to notice, it is exactly DNS where mainstream operating systems have their like only built-in redundancy. You are actually encouraged to configure multiple DNS servers at once, just in case one dies.

Well, DNS being a nice thing, it still got it own idiosyncrasies. There is really no reason for the site names to be organized in a dot-separated hierarchial fashion. In other words, in

www.yahoo.com

there is no need for neither "www" nor "com". Yahoo is the name, but the rest is irrelevant. The whole "dot separated" thing and "com" are just technical nuisances which made the development of DNS technically feasible, so that the database could be distributed more effectively. And "www" is nothing but a habit, a meme introduced to the culture. The sounds of "double u, double u, double u" and perhaps the visual rhythm of letters www immediately prepare anyone familiar with the Internet that a site address is being transmitted. Synchronization bits if you like.

So, what matters is the "yahoo" part, right ? The name. But the name of what and what's in a name ?

First, I'll go about the "name of what" part. World wide web is de-facto a hypertext, a billion of files intertwined with mutual links. Accordingly, what you type in is but an entry to the web. Once inside, you neither type nor care to remember any more names nor addresses, you just keep following the links. Have you ever stared at a blank browser page trying to invent another name which to type in just to see what comes up ? That's the idea. Any name could be tried as entry gateway, but picking them at random is extremely ineffective. Whenever one has multiple entry points to the web, he has to write them down, which is a starting point for a personal bookmark catalogue, doubtfully a popular sport any more. Instead it happens that everyone has like ten favorite entry points to the web, the ones that are fashionable, familiar, have catchy names or refer to the person's location or interests. Ok, so each user has his own favorite entry points to the web and they are the only ones that need names.

What's in a name then ? Oh, it is then totally irrelevant what exactly the name is. www.google.com, www.wikipedia.org, www.reddit.com, www.e1.ru, www.kazna.ru whatever is meaningless but catchy or meaningful but easy to remember in connection to some relevant topic.

Google is a catchy name and it presents the most rich and the most poor entry page at the same time. See, it might look like it helps, when you type www.google.com and the simplest possible page pops up and says: hi there, just type in what you need. But it is the same question we have started from - just type in what you need ! The only difference is that before we had to type the name of a single site, presumably known beforehand. Now we have to try keywords until we find something.

One point here is that the DNS names of the sites are largely irrelevant. A name of a site used to be the single keyword available for finding it, but no more. Now you are far more likely to find a site through a right query to google.

Another point is that is that google and the likes perform the same function DNS was supposed to - for relieving the user from remembering addresses and looking up relevant sites. Truly distributed DNS mapping site names to addresses became the part of the physical network (on the right ISO layer if you care), and got replaced by centralized mammoth server farms that map keywords to pages.

Finally, this switch gave enormous power to a proficient user, but for the average user it is still a blank stare at

- This is Internet, it has everything. Just type in what you want to find.

Er, excuse me ?

August 22, 2008

This is Python, calling a spade a spade

Python is a high level programming language, but what does this term mean ? What does it mean for a language to be high level or low level ? Can you compare height levels of different languages ?

The meaning for the term is nebulous and there is no single or final definition. Here is one approach - the more effectively the language allows you to handle things, the higher level it is. And by things I'm not meaning just objects as in classes instances. Things, you know, everything, even if I occasionally call them objects.

Enter the notion of first-class objects. Put simply, something is called first-class object in a programming language if it can be treated just like an instance of primitive type, such as int. For example, when you declare a variable (which is a valuable feature already, to be able to declare a variable of that kind)
int i;
you then can do all sorts of things with it, such as passing it as a parameter:
foo(i);
return it as a result of function:
return i;
and do other things, depending on the language. The point is that first-class objects can be handled more effectively and provide additional flexibility. Thus, the more objects in a language are first-class, the higher level that language is.

In Python pretty much everything is first-class. I won't be digging into language reference to find whether or not it is formally true, but in practice it is just like that. It is partly because Python is an dynamically typed language with referential variables semantics - as soon as something exists, you should be able to get a reference to it, and then, once you have a reference, you pass it around as a primitive, not caring about the nature of the object it points to. The language itself does not care what kind of an object is being referenced by the variable you pass. It is only when it comes to real work, such as access to the object's methods, it may turn out to be incompatible with the operation you throw at it. Such just-in-time type compatibility is a very old idea and is called "protocol compatibility" in Python.

Why is it good ? Because I can call a spade a spade. If I need to pass a class as a parameter, what a heck, I can do it:
def create(c, i):
return c(i)

create(int, 0)
See ? Generic programming right there.

Or, why wouldn't I be able to pass in a method ?
def apply(f, x):
return f(x)

def mul_by_2(x):
return x * 2

print(apply(mul_by_2, 1)) # prints 2
Uhm, was it functional programming ?

One other curious and extremely useful first-class thing, which you wouldn't find in many other languages is the call arguments. Remember, I have said that before, there is no declarations in Python. Compatibility of a called function with the actually supplied arguments is checked just-in-time, just as anything else:
def foo(a):
...
foo(1, 2) # this throws at runtime
But nothing stops you from writing a function which accepts any arguments:
def apply(f, *args):
return [f(arg) for arg in args]

apply(mul_by_2, 1)
apply(mul_by_2, 1, 2)
...
And the point is - inside the apply function args is a variable that references a tuple of the actually passed arguments:
def apply(*args):
print(args)

apply(1, 2, 3) # prints (1, 2, 3)
there may be just a little stretch about calling args a first-class object being "arguments to the call", but practically it is just that. Imagine the flexibility of things you can do with it.

Anyway, in conclusion I will demonstrate another situation where calling a spade a spade is good. A state machine. An object with a state, and a set of state transition rules. What would it typically be ?
class C:

def __init__(self):
self._state = "A"

def _switch(self, to):
self._state = to

def _state_A(self):
print("A->B")
self._switch("B")

def _state_B(self):
print("STOP")
self._switch(None)

def simulate(self):
while self._state is not None:
if self._state == "A":
self._state_A()
elif self._state == "B":
self._state_B()

C().simulate() # prints A->B STOP
This is a quickly drawn together sample, so please don't be too picky. The problem with it, which I will try to eliminate is this - you have two kinds of way to represent the same thing - the state. What is the reason for aliasing _state_A by "A" and _state_B by "B" ? Oh, the last letter matches, I see... And what's the point in having the state-by-state switch in simulate ? Why don't we just call a spade a spade ?
class C:

def __init__(self):
self._state = self._state_A

def _switch(self, to):
self._state = to

def _state_A(self):
print("A->B")
self._switch(self._state_B)

def _state_B(self):
print("STOP")
self._switch(None)

def simulate(self):
while self._state is not None:
self._state()

C().simulate()
In this second example, I don't have any arbitrary aliases for state, instead I use for a state its own handler. A method which handles a state is a state here. It simplifies things just a bit - the switch is gone, and it is overall more clean and consistent to my taste.

Well, that's about what I had to say.

Python being a high level language... Other factors, such as wide variety of built-in container types and huge standard library also help Python to be higher level than many other languages, but it's another story.

To be continued...

August 20, 2008

XML is like plankton in the information ocean

as huge amounts of it float around to be consumed by everyone.

August 17, 2008

Bosons, my ass

Higgs boson, they say, is the reason for wasting gazillions of euros on a high-tech circular tunnel.

So how come we still use portable energy sources that date back to 1800 and are only capable of only giving a 3000 mAh of power ? How come we can't purposely transfer a significant amount of energy wirelessly, through the air, without having to wear radiation-proof costume ? Speaking of which, why radiation protection is still 10m of lead ? Kind of limits space travel you know.

Higgs boson, when you discover it, you know what to do with it.

August 05, 2008

This is Python, dot operator and the magic "self"

Although syntactically similar to "regular" imperative programming languages which support OOP and everything, Python offers extra semantical freedom short of being magic.

Consider you have a reference to some object, in
x
some variable. As soon as it contains a reference to an object (and it always does), you can access that object through the variable, by applying all sorts of operators to it:
x += 1
x["foo"] = "bar"
x(1, 2)
x.foo("bar")
and so on. Whether or not each of those accesses will succeed depends on the target object, but the worst thing that could happen if you mistreat an object is a runtime exception, for example:
x = 1
x()
results in
TypeError: 'int' object is not callable
(Note on samples: they are in Python3k, with Python2x theory is the same, but some of the samples may need to be slightly modified.)

Let's keep on looking. As soon as Python is an OOP-capable language (whatever on Earth that means), it supports classes and methods:
class C:
def foo(self, x):
print(x)
and allows overriding reaction to some of the operators, for example the following pieces of code have similar meaning:
class C:                      class C
def __call__(self): {
pass -vs- public:
void operator()(void) {}
};
and it might seem that there is no difference except for Python way of having a fancy double underscore method for anything advanced, but in fact Python offers more.

Python allows overriding of "dot" operator. For example, the following class (despite being a little unclean) appears to support just any method you throw at it:
class C:
def __getattr__(self, name):
def any_method(*args, **kwargs):
print(name, args, kwargs)
return any_method
def i_exist(self):
print("i would not budge")
c = C()
c.ping()
c.add(1, 2)
c.lookup([1, 2], key = 1)
c.i_exist()
prints out
ping () {}
add (1, 2) {}
lookup ([1, 2],) {'key': 1}
i would not budge
The magic method is apparently __getattr__, it is invoked when you apply dot operator to a class instance and it does not have such named attribute by itself, note how the i_exist method stepped up despite of having __getattr__ overriden.
x.foo
^---- __getattr__ is invoked when the dot is crossed
So what does it mean ? It means that you can override anything, including the dot operator, something not possible in static-typed compiled languages, and this feature makes it really simple to hide all sorts of advanced behavior behind a simple method access. For example, consider XMLRPC client in Python:
from xmlrpc.client import ServerProxy
p = ServerProxy("http://1.2.3.4:5678")
p.AddNumbers(1, 2, 3)
and see how straightforward the access to a network service with procedural interface is. ServerProxy class simply intercepts the method access and turns it into a network call. This is done transparently at runtime with no need to recompile any stub or anything - you can access any target service method without any preparation. Compare this to an XMLRPC client library of your choice.

Now take a look at the following fictional line:
foo.bar["biz"]("baz").keep.on("going")
Can you see now that every delimiter (except for literal string quoute) can be intercepted and have its behavior modified ? Given this, I can (and almost universally do) apply aesthetic thinking - how would I like my code to look ? One of the Python principles is to have code (pleasantly) readable. In each case, for each relation between program modules (whatever that means) I can have it
like["this"] -OR-
like("this") -OR-
like_this -OR-
like + "this" -OR-
like.this
and so on. Depending on the situation I can pick up whatever option that makes the code more clear. And guess what ? Overriding the dot is sometimes useful.

Anyhow, this is only half of the story.

The other half is told from the other side of the dot. See, __getattr__ notifies an instance that one of its methods is about to be accessed and allows for it to override. But Python also allows for the accessed member to be notified whenever it is being accessed as a member of some other instance. Sounds weird ? Take a look at this:
class Member:
def __get__(self, instance, owner):
print("I'm a member of {0}".format(instance))
return self

class C:
x = Member()

c = C()
c.x
prints out
I'm a member of <__main__.C object at ...>
See ? The Member instance being a member of some other class is notified whenever it is accessed. Where can it be useful you may ask ? Oh, it is the key to the magic "self" in Python.

Consider the following most simple piece of code:
class C:
def foo(self):
print(self)
Have you ever thought what "self" is ? I mean - it obviously is an argument containing a reference to the instance being called, but where did it come from ? It doesn't even have to be called "self", it is just a convention, the following will work just as well:
class C:
def foo(magic):
print(magic)
And so it turns out that somehow at the moment of the invocation the first argument of every method points to the containing instance. How is it done ?

What happens when you do
c = C()
c.foo()
anyhow ? At first sight, access to c.foo should return a reference to a method - something related to C and irrelevant to c. But it appears that the following two accesses to foo
c1 = C()
c1.foo
c2 = C()
c2.foo
fetch different things - c1.foo returns a method with its first argument set to c1 and c2.foo - to c2. How could that happen ? The key here is that you access a method (which is a member of a class) through a class instance. The class itself contains its methods in a half-cooked "unbound" state, they don't have any "self":
class C:
def foo(self):
pass
print(C.foo)
print(C().foo)
prints out
<function foo at ...>
<bound method C.foo of <__main__.C object at ...>>
See ? When fetched directly from a class, a method is nothing but a regular function, it is not "bound" to anything. You can even call it, but you will have to provide its first argument "self" by yourself as you see fit:
class C:
def foo(self):
print(self)
C.foo("123")
prints out
123
But as soon as you instantiate and fetch the same method through an instance, the magic __get__ method comes into play and allows the returned reference to be "bound" to the actual instance. Something like this:
class Method:
def __init__(self, target):
self._target = target
def __get__(self, instance, owner):
self._self = instance # <<<< binding ahoy !
return self
def __call__(self, *args, **kwargs):
return self._target(self._self, *args, **kwargs)

class C:
foo = Method(lambda self, *args, **kwargs:
print(self, args, kwargs))
c = C()
print(c)
c.foo(1, 2, foo = "bar")
prints out
<__main__.C object at 0x00ADA0D0>
<__main__.C object at 0x00ADA0D0> (1, 2) {'foo': 'bar'}

And so I could demonstrate a reimplementation of a major language feature in a few lines. May be not apparently useful most of the time, such experience certainly makes you understand the language better.

One more thing, have I told you Python was cool ? :)

To be continued...

July 15, 2008

This is Python, variable name lookup

As I already noted there is no declarations in Python. In general, there is no way to tell in advance what an arbitrary piece of code means, whether it is semantically correct and whether it can be successfully executed. All you have is a syntactically correct code fragment, but the meaning for any symbol is undetermined until the code is finally executed. For example,
foo = bar
is syntactically correct, but you cannot tell whether variable bar is defined at that point or what kind of an object it refers to. What behavior do you have in the above simplest assignment ? It is that if variable bar is defined, a new local variable foo will reference the same object as bar. Something like this:
current_namespace["foo"] = reference_by_name("bar")
This may be a trivial example, except for the behavior of the fictional reference_by_name function. Where does the language look up for a variable ? Like in the other languages that support procedural programming, Python procedures are natural namespace compartments. For example:
def foo(a): # begins foo's local namespace
b = 1 # modifies foo's namespace
print(a) # fails because a is invisible here
print(b) # same
Each procedure's individual namespace is in Python terms called "local namespace". Namespaces of nested procedures nest along with their frames, therefore a name inside of inner procedure may refer to the variable defined in an outer:
def foo():
b = 1
def bar():
print(b) # prints 1
bar()
On the other hand, presence or absence of a name in a namespace is determined dynamically, at the moment of access, unlike static lexical scoping, which welcomes all sorts of awkward ambiguities like
def foo():
b = 1
del b # would have deleted b from foo's namespace,
def bar(): # but could not be done, because this nested
print(b) # reference to b would hang (ouch !)
bar()
and
def foo():                    def foo():            
def bar(): def bar():
print(b) # prints 1 -VS- print(b) # fails because b is
b = 1 bar() # only almost there
bar() b = 1
Unless you want to maintain such ugly code, you should minimize using foreign variables in nested scopes, resorting to argument passing instead. Procedure arguments automatically become part of its local namespace and all locally accessed variables thus explicitly become local:
def foo():
b = 2
def bar(b): # explicitly local, no possible ambiguity
print b # prints 1
bar(1)
Nevertheless, it is convenient to visualize the name resolution as scanning chain of nested scopes upwards:
module.py:
5) is b here ?
def foo():
4) is b here ?
def bar():
3) is b here ?
def biz():
2) is b here ?
def baz():
1) is b here ?
a = b
Note that in step 5 the containing module becomes an implicit embracing namespace which is the last chance to find the name. In Python this module namespace is called "global namespace". Finally, in addition to local and global namespaces, there is a "built-in namespace" which contains the language primitives that are not explicitly defined anywhere.

Therefore, even the simplest access to a variable is a lookup in three namespaces - local, global and built-in in that order.

To be continued...

June 30, 2008

Block-drawing characters in Firefox. WTF ?

Unicode defines a family of characters shaped like boxes of increasing height. Presumably useful for drawing diagrams in text. Something like this
x
xx
xxx
only fancier. The exact 8 characters in discussion have code points 0x2581-0x2588, and range from 1/8th to 8/8ths, i.e. full block. Here is a sample:
▁▂▃▄▅▆▇█
Now, correct me if I'm wrong, but those characters are only useful as soon as they are aligned with each other. You can't draw a diagram if one box is slightly offset - it turns out ugly. And so, can anyone tell why Firefox 2 renders the 4/8ths (half-block, code point 0x2584) and the 8/8ths (full block, code point 0x2588) shifted down a little ? Here, have a look:
This glitch makes it practically useless. WTF ?

June 22, 2008

Different ways to understand things in software engineering

The proficiency of a software developer is determined not only by which technologies he used or for how long, but more importantly by how exactly he understood and interpreted the principles behind them. Because the basic principles of software engineering are so numerous and often not specified formally, the view of the actual developer means everything.

In the course of work, a developer adapts his understanding to the problems he is working at, this is somewhat similar to how shapes of key and lock match. For this reason two people may be using the same technology for the same amount of years but be totally unable to understand each other to a point of engaging religious wars over the simplest points.

Now I understand why whenever I have a chance to interview a job applicant, I ask rather unspecific questions even of philosophical kind - to see not what he knows, but how he actually understands it and whether his understanding matches mine. Because if it doesn't we'd probably have hard times working together.

The difficult part here is trying to keep your knowledge deep and broad at the same time, because both the details and the perspective are required to understand.

June 17, 2008

The set of good programmers is still very small, a great joke by David Parnas

A reviewer explained his rejection of my best-known paper on the subject by writing, "Obviously Parnas does not know what he is talking about because nobody does it that way". Only a decade later, however, a textbook stated, "Parnas only wrote down what all good programmers did anyway". A logician would conclude that the set of good programmers was empty; that set is still very small.

-- David L. Parnas

This is Python, everything is executable

Dynamically typed language is by definition the one where variables don't have type, but the actual values do. This is by all means true in Python where the following code works fine
x = 10
x = "ten"
print(x) # prints ten
but limiting dynamism to untyped variables only would be missing the point.

Like with many "scripting" languages a Python program is started by passing the name of its main module to the "interpreter", such as
c:> python main.py
The transition of Python source code to an actually executed program begins with loading and parsing the module file. This step succeeds as soon as the module does not contain any syntax errors fatal for the parser. Successful parsing only guarantees that the module is not totally broken - a weak guarantee, only useful for checking for unbalanced parentheses and such.

What happens next is magic - the parsed module file is executed as though it was just a chunk of a source code. Wait a minute ! It is a chunk of a source code ! Anyway, execution of every module at its first import is the major part of Python program run. This process is identical no matter if the module being loaded is the program's main module or some other module explicitly imported by demand.

I have arrived to Python from C++, it took me a long time to change the perspective and the change is this - in Python you should look at everything as though it is an executable statement, because it really is. To illustrate this principle, consider definitions vs. declarations.

In statically typed languages, declarations exist for the sake of separate compilation - for the compiler to be able to tell whether one part of code is compatible with another without having to dig through the entire program. In Python, which is a dynamic language, there is no compilation stage, therefore declarations are useless, and what's left only looks like definitions.

For example, where in C++ you have two files (if you do it properly)
// foo.h                  // foo.cpp                         
class Foo int Foo::GetX(void) const
{ {
private: return x;
int x; }
public:
int GetX(void) const;
};
the .h file is a declaration - your promise to the compiler that you will provide the matching implementation and the .cpp file is that promise fulfilled. In Python there is no compiler so you don't have to feel obliged. Identical code in Python would be
class Foo:
def get_x(self):
return self.x
What you see in this Python code is neither a declaration nor a definition. It is a piece of executable code, which, when executed, introduces a new class to the containing module's namespace. Rewritten to its actual effect in pseudocode it would look like this:
class Foo:       temp1 = new class()

def get_x(self): temp2 = new method()
return self.x temp2.__code__ = return self.x
temp1["get_x"] = temp2

module["Foo"] = temp1
What you just saw was an illustration that a Python class definition is an executable statement, just like anything else and it executes once when the module is first imported. For example, it is possible to do something like C++'s conditional compilation:
class Foo:
if os.platform == "win32":
def do_it(self): # windows way
...
else:
def do_it(self): # unix way
...
The effect of the above code is that when the module is imported, the compiled version of class Foo will contain method do_it matching the current environment. It is not the same as the straightforward approach, where the check would have been performed upon each call to do_it:
class Foo:
def do_it(self):
if os.platform == "win32": # windows way
...
else: # unix way
...
In a similar vein, your class definition could fail to execute:
class Foo:
1 / 0 # this throws at import time
and the module will fail to import, throwing an exception to the caller.

Now it should not surprise you the least that when one module imports the other it is again not a declaration. When module foo does
import bar
the described process repeats for module bar, unless it has already been imported, in which case the import statement does nothing (from the discussed point of view). Similarly, you can import modules as you need them at runtime:
if need_time:
import time
print time.time()
Python therefore does not have any declarative semantics, only executional - ask yourself - what does it do when executed ?

To be continued...

June 16, 2008

This is Python, language installation and program structure

Installing Python is easy. If you use Windows, you have no choice at all - run setup.exe and you are done. Under Unix, Python can be preinstalled or you can install it manually. I always prefer to install from source on a clean machine, but if you have it preinstalled, you should be fine too.

Python installation is fully self-contained, and can be migrated to a different machine by copying all the files (or just the necessary ones) from c:\pythonXX or /usr/local/whatever/ to the destination. Multiple versions of Python coexist peacefully in different directories (although you should copy them around manually, because installation process registers stuff in the Windows registry and do other such things of global effect).

Python installation essentially contains the language parser+compiler and a huge and poorly structured standard library. The compiler itself along with a minimum set of libraries lives in pythonXX.dll or pythonXX.so.1, and the executable python.exe or bin/python is nothing but a simplest driver program of the (read line, execute, repeat) sort. The standard library lives in c:\pythonXX\lib + DLLs or /usr/local/lib/pythonXX/ and is just a heap of assorted utilities.

Python can be and is easily embedded into another application. It is a DLL, remember ? You take the DLL, zip the standard library and there you have it in two files - an embedded Python. In your application you create an instance of a compiler at runtime and start feeding it with stuff, that's all. Python can also be embedded into a diskless machine, it works just fine in a very restricted environment (such as the high security FreeBSD CD that I have here).

Python program consists of a set of separate modules, each module is a separate .py file containing some source code. The program is therefore available to the language in source, but Python nevertheless is not an interpreter. As each module is about to be used at runtime, it is loaded, parsed and compiled to an intermediate byte code for some virtual machine. The compiled byte code is saved alongside the original source file in an identically named .pyc file for future reuse. The outcome is the same as with Java or C# or any other language that translates source into byte code, and the difference is that in Python there is no separate compilation stage as such - the translation is performed at runtime and is in fact an important part of program execution.

To be continued...

June 11, 2008

This is Python, intro

Writing in Python for a few years now, I still get a kick of it. Wonderful and very powerful language, if used in the right way (aren't they all like that ?). No matter if code base is in megabytes, I still occasionally sit back in silence, admiring the beauty of a little code fragment or the way an idea is expressed in code.

If there is one single snippet of Python code to introduce its power of simplicity, it would be swapping of two variables. When I found it long ago in Python cookbook, it hit me like thunder. Never was my understanding of Python the same as before.

Here goes. To swap two variables in Python you need to

a, b = b, a

This utilizes Python feature called automatic tuple packing/unpacking. What's actually going on is more like

a, b <<< unpacked <<< (b, a) <<< packed <<< b, a

where (b, a) is a Python notion for an immutable sequence called tuple.

To be continued...

June 06, 2008

Slow pace of software development

If you need it fast, make sure it *looks* good, because it won't be. Pay more attention to high quality advertisement than to high quality development, but note that once you start selling promises, you will unlikely deliver a product.

May 22, 2008

Delphi is a very useful tool (with case study)

I've been working with Borland Delphi since 1996, when 2.0 was shiny new. Before that time I've been more or less formally taught Pascal and Delphi seemed a very welcome tool. And it still does, after 12 years. Strange thing is, whenever I mention using Delphi now, eyes are often rolled and "but it's not .NET" sounds. Know what ? I don't care about .NET. Pragmatically speaking, what matters is the product.

The very reason for this post is that the most recent GUI application that I've written was officially released a few days ago, and it is in Delphi. What's special about this particular application is that it comes really close to the ever sought perfect "one button" program. Ease of use was the major goal and what our group managed to produce you can see for yourself here:

Internetbank installer (in Russian!)

To give you the idea what it is for, it is a program to initiate access to remote banking web site for the bank of Severnaya kazna, which I happen to be working for right now. A client clicks the button, browser pops up and the client is brought to http://www.internetbank.ru/ from where he can manage his cards, accounts, transfers, payments etc.


Looks simple, except that before you can enter, you need to have started a cryptographically protected HTTP proxy server and provided with it a key, which is used for putting legally valid digital signatures under all your payments.

And that encryption key needs to be have been generated before. And the key expires and needs to be extended remotely. And CA certificates (the bank runs its own certification authority) also expire and need to be replaced remotely. And the user may have multiple keys to choose from.

And there might be problems with Internet access and the user needs a clear and concise yet technical enough diagnostics.

And it should run without installation from a removable drive with very restricted rights. And it should not require any runtime installation, such as .NET, but it must run on Windows 98.

And it must be remotely updatable.

And it must be simple to be usable by the most average Joe.

The result is the program works and really has just one button, which is also big and pretty. There is no confusion as to what to do next, because most of the time the user is simply not given any choice at all, clicking the only button starts the only process (of entering the protected web-site).

In a rare case when a user has to initiate some other process, such as generating a new key, the reasons for that are clearly explained right there on the program's face. Hints and careful label wording help at least those users who can and actually are reading.

Looking back I realize that the only reason why such a simple solution appeared is three years of refining the procedure. During last three years our customers had to use different software, clumsy and inconvenient (despite being also written by me). But in three years we learned all the moves and most of the problems. And then a strategic decision was made and we trashed the old software and rolled out the new one. And I like what we did.

Anyhow, returning to Delphi.

I'm using it for what it does best - pretty GUI with access to OS services or 3rd party libraries. Skin support (as well as good taste) makes programs really good looking. Being pretty is a major advantage for end-user software. Positive emotional reaction would actually help the user to deal with it.

Aside from that, Delphi programs don't require any *cough* runtime platform but run on Windows 98. And Delphi produces reasonably lightweight executables. And you don't need administrative rights to run them. And I like it.

Great stuff.

May 18, 2008

Just in: the benefits of compiling into platform-independent byte code


After the existence of Pascal became known (in 1974), several people asked us for assistance in implementing Pascal on various other machines [...] Thereupon we decided to provide a compiler version that would generate code for a machine of our own design. This code later became known as P-code. [...] Had we possessed the wisdom to foresee the dimension of this movement, we would have put more effort and care into designing and documenting P-code.

-- Niklaus Wirth, 1985

May 13, 2008

Just in: the only way to reliability is through simplicity and that can't be bought

Almost anything in software can be implemented, sold, and even used given enough determination. There is nothing a mere scientist can say that will stand against the flood of a hundred million dollars. But there is one quality that cannot be purchased in this way - and that is reliability. The price of reliability is the pursuit of the utmost simplicity. It is a price which the very rich find most hard to pay.

-- C.A.R. Hoare, 1981

О самописных программах

There is a Russian word "самописный" [səmopisni], literally meaning - "written by oneself". The word has a strong disparaging tone in it and is typically used to describe the low quality software written by one's colleagues (or even oneself), as opposed to the high quality truly industrial commercial software written by people somewhere else, even better if purchased for a huge chunk of money.

The mere existence of such word and the attitude of programmers who use it in derogative way is totally beyond me. The programmers who don't appreciate and value the work they do, how good a software they really write ? Moreover, how can they evaluate the quality of somebody else's work if they despise their own ? Besides, the absurdity of the situation is in that all the programs are really of this sort as someone has ultimately written them.

A concluding quote from a nobleman:

Clearly IBM and MIT must be possessed of some secret of successful software design and implementation whose nature I could not even begin to guess at. It was only later that they realized they could not either.

-- C.A.R. Hoare, 1981

May 11, 2008

Just in: choose your language carefully

I have regarded it as the highest goal of programming language design to enable good ideas to be elegantly expressed.

-- C.A.R. Hoare, 1981

Language designers also have an obligation to provide languages that encourage good style, since we all know that style is strongly influenced by the language in which it is expressed.

-- Donald Knuth, 1974

I have the feeling that one of the most important aspects of any computing tool is its influence on the thinking habits of those who try to use it.

-- E.W.Dijkstra, 1972

Just in: permanently low quality of software

I feel that all too often we have been satisfied with such a low level of quality that we have done ourselves harm in the process. We seem not to be able to use the machine, which we all believe is a very powerful tool for manipulating and transforming information, to do our own tasks in this very field.

-- R. W. Hamming, 1969

May 06, 2008

Just in: debugging is unnecessary

If you want more effective programmers, you will discover that they should not waste their time debugging - they should not introduce the bugs to start with.

-- E.W.Dijkstra, 1972

May 05, 2008

Just in: reusability as a central problem

Perhaps the central problem we face in all of computer science is how we are to get to the situation where we build on top of the work of others rather than redoing so much of it in a trivially different way.

-- R. W. Hamming, 1969

May 03, 2008

Just in: no adequate programming teaching

To parody our current methods of teaching programming, we give beginners a grammar and a dictionary and tell them that they are now great writers. [...] As a result, few programmers write in flowing poetry; most write in halting prose.

-- R. W. Hamming, 1969

May 02, 2008

Just in: domain-specific languages

We have, in fact, two languages, one inside the other; an outer language that is concerned with the flow of control, and an inner language which operates on data. There might be a case for having a standard outer language - or a small number to choose from - and a number of inner languages which could be, as it were, plugged in. If necessary, in order to meet special circumstances, a new inner language could be constructed; when plugged in, it would benefit from the power provided by the outer language in the matter of organizing the flow of control.

-- Maurice V. Wilkes, 1967

May 01, 2008

Just in: standardization

I am sorry when I hear well-meaning people suggest that the time has come to standardize on one or two languages. We need temporary standards, it is true, to guide us on our way, but we must not expect to reach stability for some time yet.

-- Maurice V. Wilkes, 1967

April 22, 2008

Method signature type checking decorator for Python 3000

I have just published a Python 3000 decorator for method signature type checking using function annotations. Here:

http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/572161

It is much cleaner that the similar decorator I have previously written for Python 2.x, the used Python 3000 function annotations make it better for the following reasons:

1. The signature-related piece of syntax is right there where it belongs - next to the parameter. Where I used to write
@takes(int, str)
@returns(bool)
def foo(i, s):
...
I now write
@typecheck
def foo(i: int, s: str) -> bool:
...
2. I don't have to add checking to all the parameters simply because there was no way to skip one. Where it was
class Foo(object):
@takes("Foo", str)
def foo(self, s):
...
it is now
class Foo:
@typecheck
def foo(self, s: str):
...
3. It plays nicely with the default values. This one has no equivalent in 2.x version, but it is nice to have:
@typecheck
def foo(x: int = 10): # 10 is also checked
...

@typecheck
def foo(*, k: optional(str) = None):
...
Other than that, it is just a nice usable piece of code, extensible too. Here is a few more examples:
@typecheck
def foo(x: with_attr("write", "flush")):
...

@typecheck
def foo(*, k: by_regex("^[0-9]+$")):
...

@typecheck
def swap_tuple(x: (int, float)) -> (float, int):
...

@typecheck
def swap_list(x: [int, float]) -> [float, int]:
...

@typecheck
def extract(k: list_of(by_regex("^[a-z]+$")),
d: dict_of(str, int)) -> list_of(int):
...

April 21, 2008

Approaching Python 3000: no more automatic tuple parameter unpacking

Just keep in mind that
lambda (x, y): x + y
is no longer possible in Python 3000. You are supposed to write
lambda x_y: x_y[0] + x_y[1]
instead. This also applies to functions, not just to lambdas and the idea belongs to PEP 3113 which forbids automatic tuple unpacking in function parameters. Ugly and inconvenient if you ask me, but there apparently was somebody who kept shooting himself in the leg.

I think that automatic unpacking was rather useful (if used sparingly), especially when you had to do something like
map(lambda (i, (a, b)): i * (a + b),
enumerate(zip(aa, bb))
which is now what ?
map(lambda i_a_b: i_a_b[0] * (i_a_b[1][0] + i_a_b[1][1]),
enumerate(zip(aa, bb))
Eeew...

March 30, 2008

Approaching Python 3000: function annotations

Python 3000 introduces function annotations, regulated by PEP-3107. This one is of particular interest to me as I've previously written one of the method signature checking decorators which this PEP is supposed to replace.

PEP-3107's has two major points:

1. Annotations can be anything. Any Python expression can be attached to a function argument or result value. For example, it is possible to write
def max(a: int, b: int) -> int:
def max(a: "first", b: "second") -> "maximum":
def foo(x: { ("no", "reason"): lambda x: x**2 }):
2. Annotations have no semantics and are not enforced, they are purely syntactical. For example, it's ok to write
def max(a: int, b: int) -> int:
...
max("foo", "bar") # nothing happens
The interpretation of annotations is left to 3rd party libraries. The language thus offers an unprecedented semantical freedom to the developers, but let's see what are the implications.

One problem is that you will have to choose your one true annotations. Specific interpretation of function annotations depends upon external module, library or application, and you acknowledge this dependence explicitly by either modifying your code or having it processed by external application.

For example, you could have chosen to use function annotations for method signature type checking, using typecheck decorator (resembling my type checking decorators)
@typecheck
def foo(a: int) -> int:
but once you have chosen the @typecheck implementation you have to stick with it and treat all your function annotations as type checks. Stacking multiple annotations is technically possible, but practically it is not, because the standard does not specify how multiple annotations should be multiplexed. Consider that you have
@typecheck
def foo(i: int):
in place and want to add a docstring kind of annotation to foo's first argument:
def foo(i: "comment"):
Should it be
def foo(i: (int, "comment")):
or
def foo(i: {"typecheck": int, "docstring": "comment"}):
?

No matter which way you choose, both typecheck and docstring must be prepared to extract their annotations from the actually encountered multiplexed construct. This means that two independent implementations must understand the same multiplexing format. Since such multiplexing is not standardized, it is impossible.

One seemingly reasonable way of such multiplexing could have been an iterable with instances of classes descended from some base class (Annotation?) for example
def foo(i: (typecheck(int), docstring("comment"))):
This example may be correct but it perfectly illustrates how having multiple semantically different annotations seriously hamper the visual quality and readability of the code.

Which opens a final question of whether the gains of the only particular annotations that you choose outweigh the loss of syntactical brevity.

March 10, 2008

Approaching Python 3000: string formatting

One of the changes in Python 3000 applies to string formatting. PEP-3101 is the regulating document for the change. It basically says that the regular % operator is too limited and what we need is a powerful domain-specific language for string formatting.

To be frank, I never felt limited with what % had to offer, but there is no point in criticizing what's about to become standard. Let's see what's new:

1. What used to be a binary operator is now a str's method:
"{0}, {1}".format("A", 10) == "A, 10"
"{n} = {v}".format(n = "N", v = "V") == "N = V"
2. Formatting and alignment is applied in the same manner as before:
"{0:03d}".format(10) == "010"
"<{S:>5s}>".format(S = "foo") == "< foo>"
"<{S:<5s}>".format(S = "foo") == "<foo >"
3. Format can insert not just parameter values, but also their items and/or attributes:
d = dict(foo = 1, bar = "a")
"{0[foo]}, {0[bar]}".format(d) == "1, a"
"{0.__class__.__name__}".format(d) == "dict"
4. Recursive substitution is allowed to a degree, for example this works:
d = dict(value = 10, format = "03d")
"{0[value]:{0[format]}}".format(d) == "010"
but this doesn't:
d = dict(data = {"a": "A", "b": "B"}, key = "a")
"{0[data][{0[key]}]}".format(d)
5. Classes can control their own formatting:
class Foo():
def __format__(self, format):
from re import match
assert match("[0-9]+s", format)
return "x" * int(format[:-1])
foo = Foo()

"{0:3s}".format(foo) == "xxx"
"{0:10s}".format(foo) == "xxxxxxxxxx"
I agree, the new way of string formatting (1 and 2) is cleaner and more straightforward. Substituting items and attributes (3) could be useful sometimes. Custom formatting (5) is Pythonic all right, but hardly practically useful, except when building a framework, a class framework perhaps.

What I don't buy is the attempt to make a statement from what is supposed to be an expression (4). If it is inconsistent, difficult to read and not apparently useful, it should not be there.

March 04, 2008

A programming language made of smileys

Just imagine the possibilities of a language whereby programs are constructed from smileys:


  • Easy to write, all language objects can be arranged on a toolbar. No more typos !
  • Easy to read. You get an immediate emotional response by simply looking at the code.
  • Appealing to a programmer of any nationality. No internationalization required !
  • Easy to extend by adding smileys with hammers, flowers or database connectors.
  • Fun to work with !

February 03, 2008

The great effects of little imperfections

Observing a chain of bubbles raising through a column of water I found it simply amazing how the same little fluctuations in water



that actually made the bubbles appear in the first place also make sure that they go straight up and don't deviate as they raise.

January 30, 2008

Re: Software Engineering Programs Are Not Computer Science Programs

Written by David Lorge Parnas the article under that title was published in "IEEE Software", Nov/Dec 1999, and essentially says that computer science and software engineering need to be separated in the same way as theoretical physics is separated from its related engineering fields. For the sake of both. As far as the education goes at least.

He also advocates the mandatory accreditation of software engineering programs and points out the problems to be encountered. Among the problems mentioned by the author are the lack of knowledge how to teach and experienced staff.

Frankly, the article was a little tedious to me, biased by the magazine specifics perhaps. But just like the other works of this great man that I had a chance to read, it is truthful and inspirational. Although in the case of this article, the inspiration has driven me in a slightly unexpected direction.

And so I would like to criticize the article on the grounds that the analogy between physics/regular engineering and computer science/software engineering does not hold.

First, there is a historical difference. Between physics and its engineering fields, it all began with practice and experiment. The extreme case would be construction - people have been prototyping since probably 50 thousand years ago. Two thousand years ago selected craftsmen have already mastered wood, stone and iron construction. There was neither science nor engineering at that moment, all they had was observation and experience.

I am not an expert in history of science, but it seems plausible that same pattern repeated most of the time - experiments came first and the theory followed. To be sure, physics as a science is far ahead now setting up experiments that only a few understand, but at least at early stages practical considerations have prevailed.

Exactly opposite is true for computer science. What began as pure mathematical theory in 1930s couldn't even be supported by experiment until a decade later when some sort of electrical apparatus has been constructed. In fact, being a branch of mathematics, computer science didn't have to be supported by experiments in the first place.

The "mathematical" engineering therefore was not something that anyone practically required. All they needed was to speed up the calculations, and I seriosly doubt that anyone could see the consequences. As the story has it, at one time IBM predicted the computer world market to be in tens of installations. If it wasn't for semiconductors, software engineering wouldn't even be here today, but computer science would.

Over time, software engineering became an awkward crossover between mathematics and psychology, where people try to project mathematical abstractions onto real world. Remember how Knuth said: "I have only proved it correct, not tried it." The matter dealt with in software engineering is thus something that should work because it is theoretically perfect but doesn't work because we are not practically perfect.

Second, there is an economical difference. Software is intangible, software production does not respect political borders, it can easily be and is routinely outsourced. What would you say if a team of construction workers could fly from India with its own tools and materials to raise a house overnight ? Plus they would charge less and still get the job done with satisfactory quality. And they would not need to be certified. Similarly, if a doctor or a lawyer could consult over the Internet from a different country, and his services were just as good, wouldn't that nullify certification efforts ?

Besides, you can't strictly control telecommutable industry, you try to lock it down by regulations and it goes underground. And the last thing we need is the black software market in addition to a pirated software market. Besides, such regulatory inhibition of software engineering would hamper the scientific progress and thus have exactly the opposite effect to the desired.

You may try to enforce mandatory certification of products instead, but this brings in a totally different perspective and requires a definitive procedure of software quality assessment - something at least improbable at this moment.

Third, there is a natural difference. There is no laws of nature in software engineering.

Try hard as you may, you cannot build a house which levitates above the ground. Because physics provides its engineers with absolute laws - such as energy conservation law, thermodynamics laws or Newton laws. They all may be a reflection of some deeper principles, but in practice it is sufficient for an engineer to know that you are limited in energy and can't fight gravity. And this is not because a scientist said so, but because you simply can't.

Not the case with software. The world in which software lives is only restricted with hardware architecture. Von Neuman is literally the god of software and computer scientists are his prophets. But what absolute laws do they give to their poor engineers ?

None.

The hardware has its restrictions, that's true, but it is all in capacity. It is physics that limits the hardware, the computer science does not impose any restrictions above that. It is as though it was possible to build a house with the only restriction in mind - that its size should not exceed that of a planet. You can even start building from the roof, and it doesn't have to touch the ground when it's done. It's all imaginary.

The lack of unbreakable laws leaves all the arguments about how software has to be build to a degree open-ended. But then, how could you certify an industry in which there is still no consensus on how to do the simplest thing ?

To conclude, I believe that computer science and software engineering are indeed different, so different in fact, they can be treated as totally unrelated. But the relationship between them is not the same as with physics and engineering, and it would be wrong to approach it with established (educational) practice.

January 27, 2008

There are unnecessary details

but there are no unimportant details.

January 19, 2008

All we have is less good programmers

The more I learn about programming, the more I want to ask: "why haven't I been taught this before ?". I mean - I graduated from a university, majored in "computer mathematics" or so it said, but I really got nothing useful from professional point of view. A few theoretical courses, such as graph theory is all. As the matter of fact, everything I know about programming I've learned from books and hard work.

The sad truth is that each generation takes a fresh start. I recall how confident I was in having known everything. May be it happens all the time, but programming is special because there is still no notion of software quality. It is surprisingly difficult to convince a beginning programmer that his program is bad. Because you simply have no reliable judgement basis except for your own expert opinion, but then what would you know ?

But then, there is no knowledge transfer and the entire industry is doomed to go around in circles, reinventing the same things every ten years or so, under different names.

I do realize that the software industry didn't get any better in the past decades, it even might have gotten worse by all accounts. The only reason why we could have possibly gotten more good programmers is because there simply appeared more just any programmers, because anyone could perform as one. Therefore it is statistically possible that the upper percentile also got more numerous.

It would also seem a valid guess that it becomes more and more difficult to find good programmers. Because the good ones tend to stick with a company, or a project, or a team, and bad ones may be changing places more often. This makes the problem of creating a strong team more difficult, and your typical team would in general be of lesser quality. As I believe that a strong team is the best thing that could happen to a programming project, this observation leaves even less hope in the future.


January 16, 2008

You write the program ...

... and the program writes you.