May 01, 2016

Python function guards

TL;DR: Rationale and thoughts behind the implementation of Python 3 function guards.

I really love Python, but unfortunately don't have to use it in my current daily job. So now I have to practice it in my spare time, making something generally useful and hopefully suitable for improving my Python application framework.

1. The idea

I already had a method signature checking decorator written years ago, and it turned out enormously useful, so in the same line I started thinking about whether it would be possible to implement declarative function guards that select one version out of many to be executed depending on the actual call arguments. In pseudo-Python, I would like to write something like this:

def foo(a, b) when a > b:
  ...

def foo(a, b) when a < b:
  ...

foo(2, 1) # executes the first foo

2. Proof of concept

At the first sight it looks impossible, because the second function kind of shadows the second one:

def foo(a, b):
  print "first"

def foo(a, b):
  print "second"

foo(2, 1) # second

but this is not exactly so. Technically, the above piece of code looks something like this:

new_function = def (a, b): print "first"
local_namespace['foo'] = new_function

new_function = def (a, b): print "second"
local_namespace['foo'] = new_function

and so the problem is not the function itself which is overwritten, but its identically named reference entry in current namespace. If you manage to save the reference in between, nothing stops you from calling it:

def foo(a, b):
  print("first")

old_foo = foo

def foo(a, b):
  if a > b:
    old_foo(a, b)
  elif a < b:
    print("second") 

foo(2, 1) # first
foo(1, 2) # second

so there you have it, what's left is to automate the process and it's done.

3. Syntax

There is no question as to how the guard should be attached to the guarded function - it would be done by terms of a decorator:

@guard
def foo(): # hey, I'm now being guarded !
  ...

@guard
def foo(): # and so am I
  ...

but the question remains where the guarding expression should appear. I see six ways of doing it:

A) as a parameter to the decorator itself:

@guard("a > b")
def foo(a, b):
  ...

B) as a default value for some predefined parameter:

@guard
def foo(a, b, _when = "a > b"):
  ...

C) as an annotation to some predefined parameter:

@guard
def foo(a, b, _when: "a > b"):
  ...

D) as an annotation to return value:

@guard
def foo(a, b) -> "a > b":
  ...

E) as a docstring

@guard
def foo(a, b):
  "a > b"
  ...

F) as a comment

@guard
def foo(a, b): # when a > b
  ...

Now I will dismiss them one by one until the winner is determined.

Method F (as a comment) is the first to go because implementing it would require serious parsing, access to source code and be semantically misleading as the comments are treated as something insignificant which can be omitted or ignored. The rest of the methods at least depend on the runtime information only and work on compiled modules.

Method A (as a parameter to the decorator) looks attractive, but is dismissed because it moves the decision from the function to the wrapper. So the function alone can't have guard expression and therefore it would not be possible to separate declaration from guarding:

def foo(a, b): # I want to be guarded
  ...
# but it is this guard here that knows how
foo = guard("a > b")(foo)

The rest of the methods are more or less equivalent and the choice is to personal taste. Nevertheless, I discard method E (docstring) because there is just one docstring per function and it has other uses. Besides, to me it looks like it describes the insides of the function, not the outsides.

So the final choice is between having the guarding expression as annotation and as default value. The real difference is this: a parameter with a default value can always be put last, but a parameter with annotation alone can not:

def foo(a, b = 0, _when: "a > b") # syntax error
  ...

This and the fact that aforementioned typecheck decorator already makes use of annotations tips the decision towards default value:

@guard
def foo(a, b, _when = "a > b"):
  ...

@guard
@typecheck 
def foo(a: int, b: int, _when = "a > b") -> int:
  ...
The choice of a name for the parameter containing the guard expression is arbitrary, but it has to be simple, clear and not conflicting at the same time. "_when" looks like a reasonable choice.

4. Semantics

With a few exceptions, the semantics of a guarded function is straightforward:

@guard 
def foo(a, b, _when = "a > b"):
  ...

@guard 
def foo(a, b, _when = "a < b"):
  ...

foo(2, 1) # executes the first version
foo(1, 2) # executes the second version
foo(1, 1) # throws

Except when there really is a question which version to invoke:

@guard 
def foo(a, b, _when = "a > 0"):
  ...

@guard 
def foo(a, b, _when = "b > 0"):
  ...

foo(2, 1) # now what ?

and if there is a default version, which is the one without the guarding expression:

@guard
def foo(a, b): # default
  ...

@guard
def foo(a, b, _when = "a > b"):
  ...

foo(2, 1) # uh ?

and the way it seems logical to me is this: the expressions are evaluated from top to bottom one by one until the match is found, except for the default version, which is always considered last.

Therefore here is how it should work:

@guard
def foo(a, b):
  print("default")

@guard 
def foo(a, b, _when = "a > 0"):
  print("a > 0") 

@guard 
def foo(a, b, _when = "a > 0 and b > 0"):
  print("never gets to execute")
 
@guard 
def foo(a, b, _when = "b > 0"):
  print("b > 0")

foo(1, 1)   # a > 0
foo(1, -1)  # a > 0
foo(-1, 1)  # b > 0
foo(-1, -1) # default

5. Function compatibility

So far we have only seen the case of identical function versions being guarded. But what about functions that have the same name but different signatures ?

@guard
def foo(a):
  ...

@guard
def foo(a, b):
  ...

Should we even consider to have these guarded as versions of one function ? In my opinion - no, because it creates an impression of a different concept - function overloading, which is not supported by Python in the first place. Besides, it would be impossible to map the arguments across the versions.

Another question is the behavior of default arguments:

@guard
def foo(a = 1, _when = "a > 0"):
  ...

@guard
def foo(a = -1, _when = "a < 0"):
  ...

Guarding these as one could work, but would be confusing as to which value the argument has upon which call. So this case I also reject.

What about a simplest case of different names for the same positional arguments ?

@guard
def foo(a, b):
  ...

@guard
def foo(b, a):
  ...

Technically, those have identical signatures, and can be guarded as one, but is likely to be another source of confusion, possibly from a mistake, typo or a bad copy/paste.

Therefore the way I implement it is this: all the guarded functions with the same name need to have identical signatures, down to parameter names, order and default values, except for the _when meta-parameter and annotations. The annotations are excused so that guard decorator could be compatible with typecheck decorator. So the following is about as far as two compatible versions can diverge:

@guard
@typecheck
def foo(a: int, _when = "isinstance(a, int)", *args, b, **kwargs):
  ...

@guard
@typecheck
def foo(a: str, *args, b, _when = "isinstance(a, str)", **kwargs):
  ...
Note how the _when parameter can be positional as well as keyword. This way it can be always put at the end of the parameter list in the declaration.

6. Function naming

Before we used simple functions, presumably declared at module level. But how about this:

@guard
def foo():
  ...

def bar():
  @guard 
  def foo():
    ...

class C:
  @guard 
  def foo(self):
    ...

those three are obviously not versions of the same function, but they are called foo() so how do we tell them apart ?

In Python 3.2 and later the answer is this: f.__qualname__ contains a fully qualified name of the function, kind of a "a path" to it:

foo
bar.<locals>.foo
C.foo

respectively. It doesn't matter much what exactly is in the __qualname__, but that they are different, just what we need. Prior to Python 3.3 there is no __qualname__ and we need to fallback to a hacky implementation of qualname.

7. Special cases

Lambdas are unnamed functions. Their __qualname__ has <lambda> in it but no own name. They would be impossible to guard:

foo = lambda: ...
foo = guard(foo)

bar = lambda: ...
bar = guard(bar)

because from the guard's point of view they are not "foo" and "bar", but the same "<lambda>".

An interesting glitch allows guarding classmethods and staticmethods. See, classmethod/staticmethod are not regular decorator functions but objects and therefore cannot be stacked with guard decorator

class C:
  @guard # this won't work
  @classmethod
  def foo(cls):
    ...

because classmethod can't be seen through to the original function foo. But it gets interesting when you swap the decorators around:

class C:
  @classmethod
  @guard
  def foo(cls, _when = "..."):
    ...
  @classmethod
  @guard
  def foo(cls, _when = "..."):
    ...

the way it works now is that guard decorator attaches to the original function foo, before it's wrapped with classmethod. Therefore the guarded chain of versions contains only the original functions, not classmethods. But when it comes to the actual call to it, it goes through a classmethod decorator before it gets to guard, the classmethod does it argument binding magic and whichever foo is matched by guard to be executed, gets its first argument bound to class as expected.

8. The register

Here is one final question: when a guarded function is encountered:

@guard
def foo(...):
  ...

where should the decorator look for previously declared versions of foo() ? There must exist some global state that maps function names to their previous implementations.

The most obvious solution is to attach a state dict to the guard decorator itself. The dict would then map (module_name, function_name) tuples to lists of previous functions versions. This approach certainly works but has a downside, especially considering I'm going to use it with Pythomnic3k framework. The reason is that in Pythomnic3k modules are reloaded automatically whenever source files containing them change. Having a separate global structure holding references to expired modules would be bad, but having a chain of function versions cross different identically named modules from the past would be a disaster.

There is a better solution of making the register slightly less global and attach the state dict to the module in which a function is encountered. This dict would map just function names to the lists of versions. Then all the information about the module's guarded functions disappear with the module with no additional effort.

9. Conclusion

The implementation works.

I'm integrating it with Pythomnic3k framework so that all public method functions are instrumented with it automatically, although it is tricky, because when you have a text of just a

def foo(...):
  ...
def foo(...):
  ...

and you need to turn it into

@guard
@typecheck
def foo(...):
  ...
@guard
@typecheck
def foo(...):
  ...

it requires modification of the parsed syntax tree. I will have to write a follow-up post on that.

That's all and thanks for reading.