August 05, 2008

This is Python, dot operator and the magic "self"

Although syntactically similar to "regular" imperative programming languages which support OOP and everything, Python offers extra semantical freedom short of being magic.

Consider you have a reference to some object, in
x
some variable. As soon as it contains a reference to an object (and it always does), you can access that object through the variable, by applying all sorts of operators to it:
x += 1
x["foo"] = "bar"
x(1, 2)
x.foo("bar")
and so on. Whether or not each of those accesses will succeed depends on the target object, but the worst thing that could happen if you mistreat an object is a runtime exception, for example:
x = 1
x()
results in
TypeError: 'int' object is not callable
(Note on samples: they are in Python3k, with Python2x theory is the same, but some of the samples may need to be slightly modified.)

Let's keep on looking. As soon as Python is an OOP-capable language (whatever on Earth that means), it supports classes and methods:
class C:
def foo(self, x):
print(x)
and allows overriding reaction to some of the operators, for example the following pieces of code have similar meaning:
class C:                      class C
def __call__(self): {
pass -vs- public:
void operator()(void) {}
};
and it might seem that there is no difference except for Python way of having a fancy double underscore method for anything advanced, but in fact Python offers more.

Python allows overriding of "dot" operator. For example, the following class (despite being a little unclean) appears to support just any method you throw at it:
class C:
def __getattr__(self, name):
def any_method(*args, **kwargs):
print(name, args, kwargs)
return any_method
def i_exist(self):
print("i would not budge")
c = C()
c.ping()
c.add(1, 2)
c.lookup([1, 2], key = 1)
c.i_exist()
prints out
ping () {}
add (1, 2) {}
lookup ([1, 2],) {'key': 1}
i would not budge
The magic method is apparently __getattr__, it is invoked when you apply dot operator to a class instance and it does not have such named attribute by itself, note how the i_exist method stepped up despite of having __getattr__ overriden.
x.foo
^---- __getattr__ is invoked when the dot is crossed
So what does it mean ? It means that you can override anything, including the dot operator, something not possible in static-typed compiled languages, and this feature makes it really simple to hide all sorts of advanced behavior behind a simple method access. For example, consider XMLRPC client in Python:
from xmlrpc.client import ServerProxy
p = ServerProxy("http://1.2.3.4:5678")
p.AddNumbers(1, 2, 3)
and see how straightforward the access to a network service with procedural interface is. ServerProxy class simply intercepts the method access and turns it into a network call. This is done transparently at runtime with no need to recompile any stub or anything - you can access any target service method without any preparation. Compare this to an XMLRPC client library of your choice.

Now take a look at the following fictional line:
foo.bar["biz"]("baz").keep.on("going")
Can you see now that every delimiter (except for literal string quoute) can be intercepted and have its behavior modified ? Given this, I can (and almost universally do) apply aesthetic thinking - how would I like my code to look ? One of the Python principles is to have code (pleasantly) readable. In each case, for each relation between program modules (whatever that means) I can have it
like["this"] -OR-
like("this") -OR-
like_this -OR-
like + "this" -OR-
like.this
and so on. Depending on the situation I can pick up whatever option that makes the code more clear. And guess what ? Overriding the dot is sometimes useful.

Anyhow, this is only half of the story.

The other half is told from the other side of the dot. See, __getattr__ notifies an instance that one of its methods is about to be accessed and allows for it to override. But Python also allows for the accessed member to be notified whenever it is being accessed as a member of some other instance. Sounds weird ? Take a look at this:
class Member:
def __get__(self, instance, owner):
print("I'm a member of {0}".format(instance))
return self

class C:
x = Member()

c = C()
c.x
prints out
I'm a member of <__main__.C object at ...>
See ? The Member instance being a member of some other class is notified whenever it is accessed. Where can it be useful you may ask ? Oh, it is the key to the magic "self" in Python.

Consider the following most simple piece of code:
class C:
def foo(self):
print(self)
Have you ever thought what "self" is ? I mean - it obviously is an argument containing a reference to the instance being called, but where did it come from ? It doesn't even have to be called "self", it is just a convention, the following will work just as well:
class C:
def foo(magic):
print(magic)
And so it turns out that somehow at the moment of the invocation the first argument of every method points to the containing instance. How is it done ?

What happens when you do
c = C()
c.foo()
anyhow ? At first sight, access to c.foo should return a reference to a method - something related to C and irrelevant to c. But it appears that the following two accesses to foo
c1 = C()
c1.foo
c2 = C()
c2.foo
fetch different things - c1.foo returns a method with its first argument set to c1 and c2.foo - to c2. How could that happen ? The key here is that you access a method (which is a member of a class) through a class instance. The class itself contains its methods in a half-cooked "unbound" state, they don't have any "self":
class C:
def foo(self):
pass
print(C.foo)
print(C().foo)
prints out
<function foo at ...>
<bound method C.foo of <__main__.C object at ...>>
See ? When fetched directly from a class, a method is nothing but a regular function, it is not "bound" to anything. You can even call it, but you will have to provide its first argument "self" by yourself as you see fit:
class C:
def foo(self):
print(self)
C.foo("123")
prints out
123
But as soon as you instantiate and fetch the same method through an instance, the magic __get__ method comes into play and allows the returned reference to be "bound" to the actual instance. Something like this:
class Method:
def __init__(self, target):
self._target = target
def __get__(self, instance, owner):
self._self = instance # <<<< binding ahoy !
return self
def __call__(self, *args, **kwargs):
return self._target(self._self, *args, **kwargs)

class C:
foo = Method(lambda self, *args, **kwargs:
print(self, args, kwargs))
c = C()
print(c)
c.foo(1, 2, foo = "bar")
prints out
<__main__.C object at 0x00ADA0D0>
<__main__.C object at 0x00ADA0D0> (1, 2) {'foo': 'bar'}

And so I could demonstrate a reimplementation of a major language feature in a few lines. May be not apparently useful most of the time, such experience certainly makes you understand the language better.

One more thing, have I told you Python was cool ? :)

To be continued...

6 comments:

Alan Franzoni said...

the descriptor protocol example is not correct (unless you're using py3k which I haven't tried yet - then you should state this!). All classes should be new-style (otherwise getters will work, but setters won't), and there's no format() method on strings up to python2.5

Dmitry Dvoinikov said...

I do indeed use Python3k, which can be seen from

print(stuff)
"{0:s}".format(s)

which are a sure telltale.

You are right, with Python 2.x some of the samples may need to be slightly modified. The idea stays the same though. Will have the post modified.

Alan Franzoni said...

I suspected that, but I couldn't find any reference to it in the article :-) I was just fearing that you might have been mixing different versions.

zxq9 said...

This was insightful for me. I'm fairly new to Python and haven't yet found many explorations of the way __whatever__ type methods can be toyed with.

This was instructive and has given me some fun things to explore on my own since that have really expanded and clarified my thinking about how things happen in Python (both 2.7 and 3k, btw).

This was much better (and more fun!) than, say, reading a book about Python fundamentals that can't escape the Java-esque idioms of most (bad) programming book authors.

Anonymous said...

FYI, the following code from the above example did not work. Could you explain why?
class C:
def foo(self):
print(self)

if __name__=="__main__":
C.foo("123")

/System/Library/Frameworks/Python.framework/Versions/2.7/bin/python2.7 /Users/trambone/PycharmProjects/DataPatterns/Other/DotOperator/test.py
Traceback (most recent call last):
File "/Users/trambone/PycharmProjects/DataPatterns/Other/DotOperator/test.py", line 6, in
C.foo("123")
TypeError: unbound method foo() must be called with C instance as first argument (got str instance instead)

Process finished with exit code 1

Anonymous said...

hmm, the formating on the code post was lost. It should be indented as the code in the blog...