May 20, 2011

Transactional memory - nothing but trouble

This post is a followup to a review that I have recently written for a book called "Transactional memory". Here I focus not on the book and its qualities, but on the matter it discusses - transactional memory. Still, when I say "they" and "their" I refer to the authors and indirectly - to the researchers they represent.

Why at all ?

To begin with, what are the problems they say transactional memory solves ?
  • Arrival of multi-core chips makes programs parallel.
  • Writing effective parallel programs is hard.
    My first point of criticism is right there - there is no evidence that parallel transactions make a good  generic programming model at all. Even if everything else was perfect, why TM programs would be efficient ?

    But there are transactions in databases !

    They quote databases as a proven example of successful transactional systems and kind of go from there. Ok, I buy that, but there are huge differences -
    • Databases operate on very high level objects - tables, records and alike.
    • Databases are about well defined objects with a rather limited behavior. You only can do a handful of operations to them. You cannot invent relational operators.
    • Because the operations are at high level, databases can to a degree "understand" what you are doing, and control the semantics.
    • With memory bytes you can do anything. There is no way to tell what you are doing in that transaction. It's very low level.
    Also, about their success, database transactions don't make the task of writing an efficient and/or correct program any simpler. The guarantee of an ACID system is not that it's effective or easy to use, but that it tolerates various screw ups and maintains data integrity. You normally say something like "it's in a transaction, so nothing bad can happen to the data". You don't say "it's in transaction, therefore it runs fast" or "it's in transaction therefore it is correct".

    Anything good at all ?

    The only thing that could possibly be good about transactional memory is that, just like any other enforcing formalism, it makes possible to reason about the correctness and behavior of a program. Proof of correctness, anyone ?

    That'd be great, but proof of what ? Again, working on a very low level, analysis of a TM program will reveal that there is no conflict between accesses to memory addresses such and such. Uh-huh, thanks.

    Transactional memory is shared

    For a technology whose goal is supporting multi-core computers (and we are talking about hundreds of cores, right ?), TM is surprisingly insistent on shared memory model.

    Although theoretically there is nothing wrong with 100 logical cores accessing the same memory, we live in a real world and in real world hardware rules. And hardware dictates that cores should be as independent as possible. Even in SMP architecture there are per-CPU caches, and you really should keep data accesses affine to a CPU, otherwise cache coherence would kill your performance. In NUMA architecture you don't have a privilege of a uniform view of a shared memory. And in clusters you don't have shared memory at all.

    Now, why sticking to an architecture that already scales poorly ? 

    Transactional memory doesn't play well with anything

    There hardly is anything in existing hardware or software which is not alienated by TM. Instead of having support from your commodity hardware and momentum from existing software development practices, you have to fight it.

    Hardware:
    • TM doesn't work well with caches. Shared memory doesn't in general.
    • TM doesn't work well  with CPU optimizations such as memory access reordering.
    • TM doesn't work well with non-shared bus architectures. Good bye NUMA, clusters, transputers.

    Software:
    • TM doesn't work with locks and waits. Most of current parallel programs use locks and waits.
    • TM doesn't like global heap and especially garbage collection.
    • TM doesn't like exceptions, because it is unclear what to do in case of exception.
    • TM doesn't like I/O because it is irrevocable. Because you cannot un-print something from your output, you have to invent workarounds like "transactions that are guaranteed to commit".
    Even worse, TM conflicts with itself. Consider having a transaction in a transaction. This can happen easily if you begin a transaction and then call a library which begins a transaction of its own. And it gets ugly fast. What if the inner transaction committed but the outer has to rollback ? What if there are multiple inner transactions ?

    Is there any surprise that existing databases don't support arbitrarily nested transactions ? Are you saying there should be a single memory transaction per execution context, as in database ?

    So what can you do in a transaction ?

    Because of the mentioned hardware implications, there is not much you can do in a memory transaction.

    1. The amount of memory that you can access in a single transaction is limited.

    For many TM systems it is limited with hardware cache size, and this is not much at all. Caches contains lines and by touching a line you effectively poison it, so if you access 2 bytes in separate lines you use up 2 cache lines. Also, concurrent transactions have to share it. What good is a 4M cache if it is only 128K lines and there are 128 parallel transactions ?

    2. The amount of time that you can spend in a transaction is limited.

    For many systems the length of the transaction must be less then OS scheduling quantum. Put another way, you cannot have a context switch in a transaction. And this is how much ? 1/1000th of a second ?

    There are such TM systems that allow dirty transaction data to spill from the cache to the main memory, and I presume there are such complex ones that allow context switching and arbitrary execution time, but their performance is to be seen.

    So, what can you do in a transaction with such draconian restrictions ?

    The answer is simple - memory transactions are only usable in low level libraries, implementing higher level objects such as data containers. For example you can have a multiple-producer multiple-consumer queue, a tree that allows access highly parallel, or some equally wonderful thing.

    But is it worth it ? I agree, it takes an expert to write a tree using locks or some magic non-blocking technique, but it's only done once. Are you saying that TM is good because now anyone could write a queue in an hour and it will be correct and efficient ?

    And you cannot use memory transactions in your applications. It is either impossible or grossly inefficient.

    And the answer is ?

    I do believe message passing is a better way to go. The technique goes under many names, but the idea is the same - there are separate processes with separate data and they interact only in a well-defined manner.

    It has no ACID properties, but heck, so does the real world. Like I always say - there are no transactions in real world.

    May 17, 2011

    Notes on implementing TLS. #2: Version differences.

    The first question when starting with TLS is this: which version to implement ?

    I started with version 1.2, the latest and presumably the best, initially using its specification as a reference. The other two versions I just skimmed over and since there were no apparent incompatibilities, proceeded with 1.2. Later on, when it was already working, I backported the differences.

    1. What are the differences ? Are they significant ?

    The real question is not about knowing what exactly those differences are, but about deciding whether it is feasible to support them all of the same codebase with minimal efforts. Luckily, it is.

    There are two major formal differences between the versions.

    1.0 vs. 1.1+

    IV handling. When a block cipher is used for traffic encryption, the key remains the same but IV varies with each subsequent packet. Version 1.0 uses tail of previous ciphertext as next IV, while versions 1.1+ transmit the next IV encrypted inside the previous ciphertext.

    1.1- vs. 1.2

    Ciphersuite support. Version 1.2 allows using any cryptographic algorithm of your choice anywhere in the protocol, restricting only the packet parsing rules. With versions 1.1- you could replace some of the cryptographic primitives but some others were hardcoded.

    There are minor differences in packet formats which are easily handled with a few if's here and there. On the other hand, these raise questions regarding interoperability between different versions client and servers.

    2. How to test different versions ?

    As far as testing goes, my main concern is correctness. In the initial phase, when my understanding of the protocol was still weak, it was invaluable to test against existing, presumably correct implementations. It allowed to quickly try the options which were unclear from the spec. Later on, as the product stabilized, I turned to loopback testing, leaving other implementations for regression testing.

    The second testing concern is compatibility. Although our product is proprietary and compatibility is not a main goal, it is a sign of a good implementation and I pursue it. There are corner cases and behaviour quirks that you find out only when examining existing implementations and the more of them you try out - the better. Every successful connection to a product of others makes you more confident in yours.

    Protocol versions pose a serious problem with testing. TLS 1.0 is the oldest and most widely supported, you can utilize pretty much any 3rd party TLS implementation to test against yours. On the other hand, not all of them support even 1.1 not to mention 1.2 and this hinders testing significantly. Most of the time I used OpenSSL which supports version 1.0 and GnuTLS which supports all versions.

    3. GOST support.

    Our top priority is using GOST family of cryptoalgorithms. Russian regulations mandate that GOST be used everywhere, and nothing else. It is unclear whether GOST is allowed to even co-exist with another cryptoalgorithm within the same system so to be on the safe side we have to replace all the TLS cryptographic primitives with their GOST versions. This is why TLS 1.2 is the only viable choice.

    There are existing products that officially declare TLS 1.0 (and even SSL) with GOST support, and I have no idea how's that possible. My guess is that they did break the protocol by illegally inserting GOST where it was not supposed to be and to hell with compatibility.

    May 05, 2011

    My other blog

    I also blog in Russian now, here.