September 16, 2016
On death march
September 02, 2016
On improving movie-going experience
August 23, 2016
We have lost
In the programs that I wrote I used to know every line, every character even. Then they got bigger, but I still knew every file. Then they got bigger yet but at least I knew all the dependencies not only by name but by virtue.
Now a static single page web site in React comes with over 100000 files, some of which, as you know, are left pads and hot pockets and there are all new contests. Such complexity is beyond anyone's reach. Even the authors find it overwhelming.
We have lost the understanding of what's happening.
Today's software development is not engineering as such. Given the quality of the outcome it is not even a professional activity. Software development has become a recreational game for the young. The target audience for the programmer became not the users but other programmers. Stackoverflow and Github saw to that. This is now the most active social network in the world.
To impress one's peers it's no longer necessary to build some quality software. Bah ! You can't even tell what quality means. But to ride bicycle without hands - that's something ! And if you could do it blindfolded ! And backwards !
And so we see thousands of exercises in making things backwards. Without understanding the purpose or the reason, pick up a new tool, play for a month then move on to a new stimuli. Worse yet if it leaves behind another backwards-backwards-backwards-backwards tool. This adds another layer of useless and not understood complexity and provides positive feedback to the loop.
I remember well one day in 2005, when something out of the ordinary happened. At the time I was working under supervision of a great software engineer. He was always talking about "architecture", you know. Back then I didn't understand it at all, despite having already worked as a programmer for some 8-9 years. I thought it was all managerial talk. And then I was sitting in a conference room alone thinking how to organize a UI for some application and it dawned on me. I knew what architecture was, not burdened with details, my mind went to the next level, it was almost like I could fly. That I could not forget or unlearn, and I'm happy that this knowledge is with me, because it would not have happened today.
Today I would have just be dabbling in a Sea Of Complexity, pleasing my mind with details. May be I would have been happy about that, who knows.
July 15, 2016
On cookie consent
And I'm sick and tired of its effects.
It is an outstanding example of what happens when people that don't have the faintest come to control technology, in this case the Internet. For each web site to prompt the user about cookies is a terrible idea.
2. The users don't take it seriously. Even when the warning is straightforward (we need cookies to do something you may not like) it is a matter of a single click to close the annoying window.
3. It does not improve privacy. At all. From privacy standpoint, cookies are not the villains but the most innocent messengers that are being shot.
4. It makes the Internet more stressful. As if we had not enough banners, one-time offers, subscription popups, landing pages, paywalls and so on, now we have these noisy popups.
5. Technically, cookie consent is a catch-22 situation - to know whether to accept a cookie from a site you need to own a cookie from it. Therefore if you refuse, the site will ask again. Moreover, even if you accept, each browser on each device manages its own cookies, and only a limited number of them. So the questions will continue ad nauseam.
May 01, 2016
Python function guards
I really love Python, but unfortunately don't have to use it in my current daily job. So now I have to practice it in my spare time, making something generally useful and hopefully suitable for improving my Python application framework.
1. The idea
I already had a method signature checking decorator written years ago, and it turned out enormously useful, so in the same line I started thinking about whether it would be possible to implement declarative function guards that select one version out of many to be executed depending on the actual call arguments. In pseudo-Python, I would like to write something like this:
def foo(a, b) when a > b: ... def foo(a, b) when a < b: ... foo(2, 1) # executes the first foo
2. Proof of concept
At the first sight it looks impossible, because the second function kind of shadows the second one:
def foo(a, b): print "first" def foo(a, b): print "second" foo(2, 1) # second
but this is not exactly so. Technically, the above piece of code looks something like this:
new_function = def (a, b): print "first" local_namespace['foo'] = new_function new_function = def (a, b): print "second" local_namespace['foo'] = new_function
and so the problem is not the function itself which is overwritten, but its identically named reference entry in current namespace. If you manage to save the reference in between, nothing stops you from calling it:
def foo(a, b): print("first") old_foo = foo def foo(a, b): if a > b: old_foo(a, b) elif a < b: print("second") foo(2, 1) # first foo(1, 2) # second
so there you have it, what's left is to automate the process and it's done.
3. Syntax
There is no question as to how the guard should be attached to the guarded function - it would be done by terms of a decorator:
@guard def foo(): # hey, I'm now being guarded ! ... @guard def foo(): # and so am I ...
but the question remains where the guarding expression should appear. I see six ways of doing it:
A) as a parameter to the decorator itself:
@guard("a > b") def foo(a, b): ...
B) as a default value for some predefined parameter:
@guard def foo(a, b, _when = "a > b"): ...
C) as an annotation to some predefined parameter:
@guard def foo(a, b, _when: "a > b"): ...
D) as an annotation to return value:
@guard def foo(a, b) -> "a > b": ...
E) as a docstring
@guard def foo(a, b): "a > b" ...
F) as a comment
@guard def foo(a, b): # when a > b ...
Now I will dismiss them one by one until the winner is determined.
Method F (as a comment) is the first to go because implementing it would require serious parsing, access to source code and be semantically misleading as the comments are treated as something insignificant which can be omitted or ignored. The rest of the methods at least depend on the runtime information only and work on compiled modules.
Method A (as a parameter to the decorator) looks attractive, but is dismissed because it moves the decision from the function to the wrapper. So the function alone can't have guard expression and therefore it would not be possible to separate declaration from guarding:
def foo(a, b): # I want to be guarded ... # but it is this guard here that knows how foo = guard("a > b")(foo)
The rest of the methods are more or less equivalent and the choice is to personal taste. Nevertheless, I discard method E (docstring) because there is just one docstring per function and it has other uses. Besides, to me it looks like it describes the insides of the function, not the outsides.
So the final choice is between having the guarding expression as annotation and as default value. The real difference is this: a parameter with a default value can always be put last, but a parameter with annotation alone can not:
def foo(a, b = 0, _when: "a > b") # syntax error ...
This and the fact that aforementioned typecheck decorator already makes use of annotations tips the decision towards default value:
@guard def foo(a, b, _when = "a > b"): ... @guard @typecheck def foo(a: int, b: int, _when = "a > b") -> int: ...The choice of a name for the parameter containing the guard expression is arbitrary, but it has to be simple, clear and not conflicting at the same time. "_when" looks like a reasonable choice.
4. Semantics
With a few exceptions, the semantics of a guarded function is straightforward:
@guard def foo(a, b, _when = "a > b"): ... @guard def foo(a, b, _when = "a < b"): ... foo(2, 1) # executes the first version foo(1, 2) # executes the second version foo(1, 1) # throws
Except when there really is a question which version to invoke:
@guard def foo(a, b, _when = "a > 0"): ... @guard def foo(a, b, _when = "b > 0"): ... foo(2, 1) # now what ?
and if there is a default version, which is the one without the guarding expression:
@guard def foo(a, b): # default ... @guard def foo(a, b, _when = "a > b"): ... foo(2, 1) # uh ?
and the way it seems logical to me is this: the expressions are evaluated from top to bottom one by one until the match is found, except for the default version, which is always considered last.
Therefore here is how it should work:
@guard def foo(a, b): print("default") @guard def foo(a, b, _when = "a > 0"): print("a > 0") @guard def foo(a, b, _when = "a > 0 and b > 0"): print("never gets to execute") @guard def foo(a, b, _when = "b > 0"): print("b > 0") foo(1, 1) # a > 0 foo(1, -1) # a > 0 foo(-1, 1) # b > 0 foo(-1, -1) # default
5. Function compatibility
So far we have only seen the case of identical function versions being guarded. But what about functions that have the same name but different signatures ?
@guard def foo(a): ... @guard def foo(a, b): ...
Should we even consider to have these guarded as versions of one function ? In my opinion - no, because it creates an impression of a different concept - function overloading, which is not supported by Python in the first place. Besides, it would be impossible to map the arguments across the versions.
Another question is the behavior of default arguments:
@guard def foo(a = 1, _when = "a > 0"): ... @guard def foo(a = -1, _when = "a < 0"): ...
Guarding these as one could work, but would be confusing as to which value the argument has upon which call. So this case I also reject.
What about a simplest case of different names for the same positional arguments ?
@guard def foo(a, b): ... @guard def foo(b, a): ...
Technically, those have identical signatures, and can be guarded as one, but is likely to be another source of confusion, possibly from a mistake, typo or a bad copy/paste.
Therefore the way I implement it is this: all the guarded functions with the same name need to have identical signatures, down to parameter names, order and default values, except for the _when meta-parameter and annotations. The annotations are excused so that guard decorator could be compatible with typecheck decorator. So the following is about as far as two compatible versions can diverge:
@guard @typecheck def foo(a: int, _when = "isinstance(a, int)", *args, b, **kwargs): ... @guard @typecheck def foo(a: str, *args, b, _when = "isinstance(a, str)", **kwargs): ...Note how the _when parameter can be positional as well as keyword. This way it can be always put at the end of the parameter list in the declaration.
6. Function naming
Before we used simple functions, presumably declared at module level. But how about this:
@guard def foo(): ... def bar(): @guard def foo(): ... class C: @guard def foo(self): ...
those three are obviously not versions of the same function, but they are called foo() so how do we tell them apart ?
In Python 3.2 and later the answer is this: f.__qualname__ contains a fully qualified name of the function, kind of a "a path" to it:
foo bar.<locals>.foo C.foo
respectively. It doesn't matter much what exactly is in the __qualname__, but that they are different, just what we need. Prior to Python 3.3 there is no __qualname__ and we need to fallback to a hacky implementation of qualname.
7. Special cases
Lambdas are unnamed functions. Their __qualname__ has <lambda> in it but no own name. They would be impossible to guard:
foo = lambda: ... foo = guard(foo) bar = lambda: ... bar = guard(bar)
because from the guard's point of view they are not "foo" and "bar", but the same "<lambda>".
An interesting glitch allows guarding classmethods and staticmethods. See, classmethod/staticmethod are not regular decorator functions but objects and therefore cannot be stacked with guard decorator
class C: @guard # this won't work @classmethod def foo(cls): ...
because classmethod can't be seen through to the original function foo. But it gets interesting when you swap the decorators around:
class C: @classmethod @guard def foo(cls, _when = "..."): ... @classmethod @guard def foo(cls, _when = "..."): ...
the way it works now is that guard decorator attaches to the original function foo, before it's wrapped with classmethod. Therefore the guarded chain of versions contains only the original functions, not classmethods. But when it comes to the actual call to it, it goes through a classmethod decorator before it gets to guard, the classmethod does it argument binding magic and whichever foo is matched by guard to be executed, gets its first argument bound to class as expected.
8. The register
Here is one final question: when a guarded function is encountered:
@guard def foo(...): ...
where should the decorator look for previously declared versions of foo() ? There must exist some global state that maps function names to their previous implementations.
The most obvious solution is to attach a state dict to the guard decorator itself. The dict would then map (module_name, function_name) tuples to lists of previous functions versions. This approach certainly works but has a downside, especially considering I'm going to use it with Pythomnic3k framework. The reason is that in Pythomnic3k modules are reloaded automatically whenever source files containing them change. Having a separate global structure holding references to expired modules would be bad, but having a chain of function versions cross different identically named modules from the past would be a disaster.
There is a better solution of making the register slightly less global and attach the state dict to the module in which a function is encountered. This dict would map just function names to the lists of versions. Then all the information about the module's guarded functions disappear with the module with no additional effort.
9. Conclusion
The implementation works.
I'm integrating it with Pythomnic3k framework so that all public method functions are instrumented with it automatically, although it is tricky, because when you have a text of just a
def foo(...): ... def foo(...): ...
and you need to turn it into
@guard @typecheck def foo(...): ... @guard @typecheck def foo(...): ...
it requires modification of the parsed syntax tree. I will have to write a follow-up post on that.
That's all and thanks for reading.
April 18, 2016
Rent-a-battery ?
November 27, 2015
On Emoji
Speaking of smileys, I may have an emotional range of a teaspoon, but I can't tell what most of those emoji faces mean. Each time I pull up the emoji palette in an application, I'm always stuck at which to pick, despite of seemingly wide choice. They don't convey any emotion I can possibly want to express. Don't get me wrong, they may be perfectly suited to express a notion of a pile of shit but this is not what I need from an emotion. And even with faces, wtf ?
For the purpose of illustration I've picked a few, but you can imagine the rest. Here, see for yourself, and mind that it is an international standard, no less.
Please, PLEASE, use something like Kolobok, or hire a designer and at least make your smileys look like Skype's.
Монти Механик
July 07, 2015
Helium shoelaces
April 09, 2015
Rethinking the cache
In one of our products I had once written a specific application mini-ORM for caching objects persisted in a database. It was a Python 3 application in Pythomnic3k framework and a companion PostgreSQL database.
The database structure was really simple - one table per class named simply like "users" or "certificates" and a handful of stored procedures named like "users__load" or "certificates__save". The tables contained the actual data and a few special fields, like "checksum", which allowed to detect concurrent update conflicts optimistically.
So each time there was an ORM access in the code
user = pmnc.db_cache.User(user_id = 123) user.passport_number = "654321" user.save()what happened behind the scene was that ORM implementation in db_cache.py executed
SELECT *, checksum FROM users__load(123)then created an instance of class User based on returned data, one property per database column, cached the instance and returned it to the caller. After the passport_number property was modified the call to save executed
SELECT new_checksum FROM users__save(123, ..., '654321', ..., checksum)to flush changes to the database. Should another request for user 123 arrive in the meantime
user2 = pmnc.db_cache.User(user_id = 123)it would have returned the cached instance and not go the database.
As any ORM, this one did not answer all the questions. It did not allow ad-hoc queries which did not map directly to object paradigm and it was impractical to create a separate method for every request. Therefore, ad-hoc queries started finding their way directly to the application code like
pmnc.transaction.db.execute( " SELECT u1.last_name, u2.last_name FROM users u1 INNER JOIN users u2 ON u1.passport_number = u2.passport_number WHERE u1.suspended = {suspended} ", suspended = True)So now there were requests that bypassed the ORM cache and went straight to the database. This was still normal as soon as they were read-only. But later there were more of them and they were getting more analytical and heavy (think history of changes of all objects belonging to a particular user), therefore a question of caching appeared again.
It was then that I implemented and released a universal read caching mechanism for Pythomnic3k (version 1.4), so that it was possible to enable cache on any resource, database being the most probable candidate of course. I wanted to put it to production and make all the requests including ORM's go through the resource cache. The first thing that surfaced immediately was that read caching implementation was pretty much useless as it was, because while caching reads it did not take into account the existence of concurrent writes.
Actually, I knew this before the release, but simply had not enough understanding of how such concurrent cache activity should behave. But I left a couple of callable hooks that allowed to customize the cache behavior on application level. So I hooked into the cache and made reads and writes coexist. Because the cache couldn't tell whether such and such SQL request had side effects, the fact had to be declared by the application with each call. In simple terms a read like this
pmnc.transaction.cached_db.execute( " SELECT *, checksum FROM users__load({user_id}) ", user_id = 123, pool__cache_read_keys = { "users/123" })would later have its cached result invalidated by conflicting write
pmnc.transaction.cached_db.execute( " SELECT new_checksum FROM users__save({user_id}) ", user_id = 123, pool__cache_write_keys = { "users/123" })This way it went to production and worked fine for about half a year. Until one fine after-deployment morning it didn't.
There was another thing not taken into account, a race condition between reads and writes. For example, if these two conflicting request executed concurrently:
pmnc.transaction.cached_db.execute( " SELECT WHERE user_id = 123 ") pmnc.transaction.cached_db.execute( " UPDATE WHERE user_id = 123 ")there was a chance that the read would start before the write but end after the write. In this case write wouldn't invalidate the result of read simply because there was none at the moment, but the result would still arrive and be cached containing already invalid data. The problem was resolved by patching in an industry-standard sleep() in the right place, and it indeed remedied the situation. But now I started to rethink the entire thing. Clearly, caching semantics needed to be improved.
So I went and made a lot of changes to the cache code, using new experience, focusing on concurrent reads and writes behaviour. In particular, the above race condition was fixed by registering affected cache keys before any request, read or write, is sent to the database. This way if a write arrives when a conflicting read is in progress or the other way around, both are allowed to proceed but read result is not cached when read returns. The result is still returned to the caller and it may or may not be invalid but now it is the responsibility of the database, not the cache, so we did not break the semantics.
Now, as I was overhauling the cache anyway, I also wanted to examine the evidence.
First, I picked up what log files with enough debug information we had from production installations of our product and read through the registered SQL queries. Predictably enough, they fell into two categories:
- Ad-hoc queries. Fewer but slower.
- ORM queries. A lot more numerous but also much faster.
So I thought it would be nice to improve the cache by accounting not only weight of the cached result, but also "usefulness", which would be weight multiplied by hit count. And so I added such eviction policy, only it was called "useless", as in "evict useless entries first".
Some time later I thought that since ORM produces a lot of identical parametrized queries:
SELECT * FROM users__load(?)it would be reasonable to suggest that each entry cached from any such request has the same average usefulness. For example, if we have 1000 entries cache hits to which saved 1 second each, and there is a 10 new entries that have just been entered and did not have a chance yet, the newcomers should not be evicted right away. It would be better to let them stay hoping that each will come as helpful as the 1000 before them.
Therefore I added an optional "cache group" parameter for a query, the simplest kind of one is the literal SQL string itself. As entries produced by the same SQL string are entered to the cache, they are assigned to the same cache group and have their usefulness accounted combined. Even though the new entry may not have had any hits yet, it is under the umbrella of the high-ranked group with high average saving.
Eviction now had to work differently. At first I though that I would simply evict the low-ranked groups first, leaving the high-ranked ones intact if possible. But experiments indicated that one winning group simply took over the entire cache over time. So I had to implement eviction using weighted average amongst groups, where a high-ranked group has a better multiplier than a low-ranked one. This means that a value from a former group could still be evicted if it has low weight, and likewise a high weight value from a latter group could stay.
Fiction mode off. I have just committed all this code to Pythomnic3k SVN repository, and it will be in the next version, so anyone who is interested may check it out, the cache should now be usable. Although making it work right in an application may not be obvious, I will later include a sample specifically with cached access to a database.