Things That Require Further Thinking: March 2008

March 30, 2008

Approaching Python 3000: function annotations

Python 3000 introduces function annotations, regulated by PEP-3107. This one is of particular interest to me as I've previously written one of the method signature checking decorators which this PEP is supposed to replace.

PEP-3107's has two major points:

1. Annotations can be anything. Any Python expression can be attached to a function argument or result value. For example, it is possible to write

def max(a: int, b: int) -> int:
def max(a: "first", b: "second") -> "maximum":
def foo(x: { ("no", "reason"): lambda x: x**2 }):

2. Annotations have no semantics and are not enforced, they are purely syntactical. For example, it's ok to write

def max(a: int, b: int) -> int:
...
max("foo", "bar") # nothing happens

The interpretation of annotations is left to 3rd party libraries. The language thus offers an unprecedented semantical freedom to the developers, but let's see what are the implications.

One problem is that you will have to choose your one true annotations. Specific interpretation of function annotations depends upon external module, library or application, and you acknowledge this dependence explicitly by either modifying your code or having it processed by external application.

For example, you could have chosen to use function annotations for method signature type checking, using typecheck decorator (resembling my type checking decorators)

@typecheck
def foo(a: int) -> int:

but once you have chosen the @typecheck implementation you have to stick with it and treat all your function annotations as type checks. Stacking multiple annotations is technically possible, but practically it is not, because the standard does not specify how multiple annotations should be multiplexed. Consider that you have

@typecheck
def foo(i: int):

in place and want to add a docstring kind of annotation to foo's first argument:

def foo(i: "comment"):

Should it be

def foo(i: (int, "comment")):

def foo(i: {"typecheck": int, "docstring": "comment"}):

?

No matter which way you choose, both typecheck and docstring must be prepared to extract their annotations from the actually encountered multiplexed construct. This means that two independent implementations must understand the same multiplexing format. Since such multiplexing is not standardized, it is impossible.

One seemingly reasonable way of such multiplexing could have been an iterable with instances of classes descended from some base class (Annotation?) for example

def foo(i: (typecheck(int), docstring("comment"))):

This example may be correct but it perfectly illustrates how having multiple semantically different annotations seriously hamper the visual quality and readability of the code.

Which opens a final question of whether the gains of the only particular annotations that you choose outweigh the loss of syntactical brevity.

March 10, 2008

Approaching Python 3000: string formatting

One of the changes in Python 3000 applies to string formatting. PEP-3101 is the regulating document for the change. It basically says that the regular % operator is too limited and what we need is a powerful domain-specific language for string formatting.

To be frank, I never felt limited with what % had to offer, but there is no point in criticizing what's about to become standard. Let's see what's new:

1. What used to be a binary operator is now a str's method:

"{0}, {1}".format("A", 10) == "A, 10"
"{n} = {v}".format(n = "N", v = "V") == "N = V"

2. Formatting and alignment is applied in the same manner as before:

"{0:03d}".format(10) == "010"
"<{S:>5s}>".format(S = "foo") == "<  foo>"
"<{S:<5s}>".format(S = "foo") == "<foo  >"

3. Format can insert not just parameter values, but also their items and/or attributes:

d = dict(foo = 1, bar = "a")
"{0[foo]}, {0[bar]}".format(d) == "1, a"
"{0.__class__.__name__}".format(d) == "dict"

4. Recursive substitution is allowed to a degree, for example this works:

d = dict(value = 10, format = "03d")
"{0[value]:{0[format]}}".format(d) == "010"

but this doesn't:

d = dict(data = {"a": "A", "b": "B"}, key = "a")
"{0[data][{0[key]}]}".format(d)

5. Classes can control their own formatting:

class Foo():
def __format__(self, format):
    from re import match
    assert match("[0-9]+s", format)
    return "x" * int(format[:-1])
foo = Foo()

"{0:3s}".format(foo) == "xxx"
"{0:10s}".format(foo) == "xxxxxxxxxx"

I agree, the new way of string formatting (1 and 2) is cleaner and more straightforward. Substituting items and attributes (3) could be useful sometimes. Custom formatting (5) is Pythonic all right, but hardly practically useful, except when building a framework, a class framework perhaps.

What I don't buy is the attempt to make a statement from what is supposed to be an expression (4). If it is inconsistent, difficult to read and not apparently useful, it should not be there.

March 04, 2008

A programming language made of smileys

Just imagine the possibilities of a language whereby programs are constructed from smileys:

Easy to write, all language objects can be arranged on a toolbar. No more typos !
Easy to read. You get an immediate emotional response by simply looking at the code.
Appealing to a programmer of any nationality. No internationalization required !
Easy to extend by adding smileys with hammers, flowers or database connectors.
Fun to work with !

Things That Require Further Thinking