August 26, 2008

Google, DNS and finding stuff on the Internet

What if you've encountered Internet for the first time ? World-wide-web for that matter. Someone opens you a browser and says

- This is Internet, it has everything. Just type in an address of a site you want to visit.

Er, excuse me ? An address of a site I want to visit ??? WTF is that supposed to mean ? Anyone remember the address of the Pyramids ? I wouldn't mind visiting that particular site.

But really, what is a site address ? It is merely a reflection of a technical detail of the physical network organization. It just so happens that for the sake of unambiguous data delivery each computer on the Internet needs its own unique address. Now, the techies that invented it in 1970s just chose such address to be an integer number. If it was for them, or shouldn't the count of connected computers have exploded, numbers could have been used just as well:

- Connect me to server 12345 !
- You got it.

But people are notoriously bad in remembering numbers, and so there emerged a service similar to the yellow pages where each address could be given a name, and conveniently looked up later. Then it went like this:

- The new server is at great.new.site.com

and the user never bothered to translate "great.new.site.com" into 12345. The responsible domain name system (DNS), the ubiquitous service for looking up pieces of information by name is quite fascinating. It is perhaps the biggest distributed database in the world, and its capabilities have been largely underutilized over the years. May be this is why it is still up and running.

Presence of the DNS became as important as physical network connectivity. If there is no DNS, the Internet might as well be down. If you care to notice, it is exactly DNS where mainstream operating systems have their like only built-in redundancy. You are actually encouraged to configure multiple DNS servers at once, just in case one dies.

Well, DNS being a nice thing, it still got it own idiosyncrasies. There is really no reason for the site names to be organized in a dot-separated hierarchial fashion. In other words, in

www.yahoo.com

there is no need for neither "www" nor "com". Yahoo is the name, but the rest is irrelevant. The whole "dot separated" thing and "com" are just technical nuisances which made the development of DNS technically feasible, so that the database could be distributed more effectively. And "www" is nothing but a habit, a meme introduced to the culture. The sounds of "double u, double u, double u" and perhaps the visual rhythm of letters www immediately prepare anyone familiar with the Internet that a site address is being transmitted. Synchronization bits if you like.

So, what matters is the "yahoo" part, right ? The name. But the name of what and what's in a name ?

First, I'll go about the "name of what" part. World wide web is de-facto a hypertext, a billion of files intertwined with mutual links. Accordingly, what you type in is but an entry to the web. Once inside, you neither type nor care to remember any more names nor addresses, you just keep following the links. Have you ever stared at a blank browser page trying to invent another name which to type in just to see what comes up ? That's the idea. Any name could be tried as entry gateway, but picking them at random is extremely ineffective. Whenever one has multiple entry points to the web, he has to write them down, which is a starting point for a personal bookmark catalogue, doubtfully a popular sport any more. Instead it happens that everyone has like ten favorite entry points to the web, the ones that are fashionable, familiar, have catchy names or refer to the person's location or interests. Ok, so each user has his own favorite entry points to the web and they are the only ones that need names.

What's in a name then ? Oh, it is then totally irrelevant what exactly the name is. www.google.com, www.wikipedia.org, www.reddit.com, www.e1.ru, www.kazna.ru whatever is meaningless but catchy or meaningful but easy to remember in connection to some relevant topic.

Google is a catchy name and it presents the most rich and the most poor entry page at the same time. See, it might look like it helps, when you type www.google.com and the simplest possible page pops up and says: hi there, just type in what you need. But it is the same question we have started from - just type in what you need ! The only difference is that before we had to type the name of a single site, presumably known beforehand. Now we have to try keywords until we find something.

One point here is that the DNS names of the sites are largely irrelevant. A name of a site used to be the single keyword available for finding it, but no more. Now you are far more likely to find a site through a right query to google.

Another point is that is that google and the likes perform the same function DNS was supposed to - for relieving the user from remembering addresses and looking up relevant sites. Truly distributed DNS mapping site names to addresses became the part of the physical network (on the right ISO layer if you care), and got replaced by centralized mammoth server farms that map keywords to pages.

Finally, this switch gave enormous power to a proficient user, but for the average user it is still a blank stare at

- This is Internet, it has everything. Just type in what you want to find.

Er, excuse me ?

August 22, 2008

This is Python, calling a spade a spade

Python is a high level programming language, but what does this term mean ? What does it mean for a language to be high level or low level ? Can you compare height levels of different languages ?

The meaning for the term is nebulous and there is no single or final definition. Here is one approach - the more effectively the language allows you to handle things, the higher level it is. And by things I'm not meaning just objects as in classes instances. Things, you know, everything, even if I occasionally call them objects.

Enter the notion of first-class objects. Put simply, something is called first-class object in a programming language if it can be treated just like an instance of primitive type, such as int. For example, when you declare a variable (which is a valuable feature already, to be able to declare a variable of that kind)
int i;
you then can do all sorts of things with it, such as passing it as a parameter:
foo(i);
return it as a result of function:
return i;
and do other things, depending on the language. The point is that first-class objects can be handled more effectively and provide additional flexibility. Thus, the more objects in a language are first-class, the higher level that language is.

In Python pretty much everything is first-class. I won't be digging into language reference to find whether or not it is formally true, but in practice it is just like that. It is partly because Python is an dynamically typed language with referential variables semantics - as soon as something exists, you should be able to get a reference to it, and then, once you have a reference, you pass it around as a primitive, not caring about the nature of the object it points to. The language itself does not care what kind of an object is being referenced by the variable you pass. It is only when it comes to real work, such as access to the object's methods, it may turn out to be incompatible with the operation you throw at it. Such just-in-time type compatibility is a very old idea and is called "protocol compatibility" in Python.

Why is it good ? Because I can call a spade a spade. If I need to pass a class as a parameter, what a heck, I can do it:
def create(c, i):
return c(i)

create(int, 0)
See ? Generic programming right there.

Or, why wouldn't I be able to pass in a method ?
def apply(f, x):
return f(x)

def mul_by_2(x):
return x * 2

print(apply(mul_by_2, 1)) # prints 2
Uhm, was it functional programming ?

One other curious and extremely useful first-class thing, which you wouldn't find in many other languages is the call arguments. Remember, I have said that before, there is no declarations in Python. Compatibility of a called function with the actually supplied arguments is checked just-in-time, just as anything else:
def foo(a):
...
foo(1, 2) # this throws at runtime
But nothing stops you from writing a function which accepts any arguments:
def apply(f, *args):
return [f(arg) for arg in args]

apply(mul_by_2, 1)
apply(mul_by_2, 1, 2)
...
And the point is - inside the apply function args is a variable that references a tuple of the actually passed arguments:
def apply(*args):
print(args)

apply(1, 2, 3) # prints (1, 2, 3)
there may be just a little stretch about calling args a first-class object being "arguments to the call", but practically it is just that. Imagine the flexibility of things you can do with it.

Anyway, in conclusion I will demonstrate another situation where calling a spade a spade is good. A state machine. An object with a state, and a set of state transition rules. What would it typically be ?
class C:

def __init__(self):
self._state = "A"

def _switch(self, to):
self._state = to

def _state_A(self):
print("A->B")
self._switch("B")

def _state_B(self):
print("STOP")
self._switch(None)

def simulate(self):
while self._state is not None:
if self._state == "A":
self._state_A()
elif self._state == "B":
self._state_B()

C().simulate() # prints A->B STOP
This is a quickly drawn together sample, so please don't be too picky. The problem with it, which I will try to eliminate is this - you have two kinds of way to represent the same thing - the state. What is the reason for aliasing _state_A by "A" and _state_B by "B" ? Oh, the last letter matches, I see... And what's the point in having the state-by-state switch in simulate ? Why don't we just call a spade a spade ?
class C:

def __init__(self):
self._state = self._state_A

def _switch(self, to):
self._state = to

def _state_A(self):
print("A->B")
self._switch(self._state_B)

def _state_B(self):
print("STOP")
self._switch(None)

def simulate(self):
while self._state is not None:
self._state()

C().simulate()
In this second example, I don't have any arbitrary aliases for state, instead I use for a state its own handler. A method which handles a state is a state here. It simplifies things just a bit - the switch is gone, and it is overall more clean and consistent to my taste.

Well, that's about what I had to say.

Python being a high level language... Other factors, such as wide variety of built-in container types and huge standard library also help Python to be higher level than many other languages, but it's another story.

To be continued...

August 20, 2008

XML is like plankton in the information ocean

as huge amounts of it float around to be consumed by everyone.

August 17, 2008

Bosons, my ass

Higgs boson, they say, is the reason for wasting gazillions of euros on a high-tech circular tunnel.

So how come we still use portable energy sources that date back to 1800 and are only capable of only giving a 3000 mAh of power ? How come we can't purposely transfer a significant amount of energy wirelessly, through the air, without having to wear radiation-proof costume ? Speaking of which, why radiation protection is still 10m of lead ? Kind of limits space travel you know.

Higgs boson, when you discover it, you know what to do with it.

August 05, 2008

This is Python, dot operator and the magic "self"

Although syntactically similar to "regular" imperative programming languages which support OOP and everything, Python offers extra semantical freedom short of being magic.

Consider you have a reference to some object, in
x
some variable. As soon as it contains a reference to an object (and it always does), you can access that object through the variable, by applying all sorts of operators to it:
x += 1
x["foo"] = "bar"
x(1, 2)
x.foo("bar")
and so on. Whether or not each of those accesses will succeed depends on the target object, but the worst thing that could happen if you mistreat an object is a runtime exception, for example:
x = 1
x()
results in
TypeError: 'int' object is not callable
(Note on samples: they are in Python3k, with Python2x theory is the same, but some of the samples may need to be slightly modified.)

Let's keep on looking. As soon as Python is an OOP-capable language (whatever on Earth that means), it supports classes and methods:
class C:
def foo(self, x):
print(x)
and allows overriding reaction to some of the operators, for example the following pieces of code have similar meaning:
class C:                      class C
def __call__(self): {
pass -vs- public:
void operator()(void) {}
};
and it might seem that there is no difference except for Python way of having a fancy double underscore method for anything advanced, but in fact Python offers more.

Python allows overriding of "dot" operator. For example, the following class (despite being a little unclean) appears to support just any method you throw at it:
class C:
def __getattr__(self, name):
def any_method(*args, **kwargs):
print(name, args, kwargs)
return any_method
def i_exist(self):
print("i would not budge")
c = C()
c.ping()
c.add(1, 2)
c.lookup([1, 2], key = 1)
c.i_exist()
prints out
ping () {}
add (1, 2) {}
lookup ([1, 2],) {'key': 1}
i would not budge
The magic method is apparently __getattr__, it is invoked when you apply dot operator to a class instance and it does not have such named attribute by itself, note how the i_exist method stepped up despite of having __getattr__ overriden.
x.foo
^---- __getattr__ is invoked when the dot is crossed
So what does it mean ? It means that you can override anything, including the dot operator, something not possible in static-typed compiled languages, and this feature makes it really simple to hide all sorts of advanced behavior behind a simple method access. For example, consider XMLRPC client in Python:
from xmlrpc.client import ServerProxy
p = ServerProxy("http://1.2.3.4:5678")
p.AddNumbers(1, 2, 3)
and see how straightforward the access to a network service with procedural interface is. ServerProxy class simply intercepts the method access and turns it into a network call. This is done transparently at runtime with no need to recompile any stub or anything - you can access any target service method without any preparation. Compare this to an XMLRPC client library of your choice.

Now take a look at the following fictional line:
foo.bar["biz"]("baz").keep.on("going")
Can you see now that every delimiter (except for literal string quoute) can be intercepted and have its behavior modified ? Given this, I can (and almost universally do) apply aesthetic thinking - how would I like my code to look ? One of the Python principles is to have code (pleasantly) readable. In each case, for each relation between program modules (whatever that means) I can have it
like["this"] -OR-
like("this") -OR-
like_this -OR-
like + "this" -OR-
like.this
and so on. Depending on the situation I can pick up whatever option that makes the code more clear. And guess what ? Overriding the dot is sometimes useful.

Anyhow, this is only half of the story.

The other half is told from the other side of the dot. See, __getattr__ notifies an instance that one of its methods is about to be accessed and allows for it to override. But Python also allows for the accessed member to be notified whenever it is being accessed as a member of some other instance. Sounds weird ? Take a look at this:
class Member:
def __get__(self, instance, owner):
print("I'm a member of {0}".format(instance))
return self

class C:
x = Member()

c = C()
c.x
prints out
I'm a member of <__main__.C object at ...>
See ? The Member instance being a member of some other class is notified whenever it is accessed. Where can it be useful you may ask ? Oh, it is the key to the magic "self" in Python.

Consider the following most simple piece of code:
class C:
def foo(self):
print(self)
Have you ever thought what "self" is ? I mean - it obviously is an argument containing a reference to the instance being called, but where did it come from ? It doesn't even have to be called "self", it is just a convention, the following will work just as well:
class C:
def foo(magic):
print(magic)
And so it turns out that somehow at the moment of the invocation the first argument of every method points to the containing instance. How is it done ?

What happens when you do
c = C()
c.foo()
anyhow ? At first sight, access to c.foo should return a reference to a method - something related to C and irrelevant to c. But it appears that the following two accesses to foo
c1 = C()
c1.foo
c2 = C()
c2.foo
fetch different things - c1.foo returns a method with its first argument set to c1 and c2.foo - to c2. How could that happen ? The key here is that you access a method (which is a member of a class) through a class instance. The class itself contains its methods in a half-cooked "unbound" state, they don't have any "self":
class C:
def foo(self):
pass
print(C.foo)
print(C().foo)
prints out
<function foo at ...>
<bound method C.foo of <__main__.C object at ...>>
See ? When fetched directly from a class, a method is nothing but a regular function, it is not "bound" to anything. You can even call it, but you will have to provide its first argument "self" by yourself as you see fit:
class C:
def foo(self):
print(self)
C.foo("123")
prints out
123
But as soon as you instantiate and fetch the same method through an instance, the magic __get__ method comes into play and allows the returned reference to be "bound" to the actual instance. Something like this:
class Method:
def __init__(self, target):
self._target = target
def __get__(self, instance, owner):
self._self = instance # <<<< binding ahoy !
return self
def __call__(self, *args, **kwargs):
return self._target(self._self, *args, **kwargs)

class C:
foo = Method(lambda self, *args, **kwargs:
print(self, args, kwargs))
c = C()
print(c)
c.foo(1, 2, foo = "bar")
prints out
<__main__.C object at 0x00ADA0D0>
<__main__.C object at 0x00ADA0D0> (1, 2) {'foo': 'bar'}

And so I could demonstrate a reimplementation of a major language feature in a few lines. May be not apparently useful most of the time, such experience certainly makes you understand the language better.

One more thing, have I told you Python was cool ? :)

To be continued...