Things That Require Further Thinking: 2012

December 12, 2012

How to make PF redirect traffic to localhost

Here in the company I work for, I also perform as a network administrator. Router, firewall, office VPNs, network segments, switches, cabling, that sort of thing.

The difficult part is that we have two separate internet connections through different providers, the perimeter router hence has two external IP addresses and three routing tables.

And we obviously want to

Spread outgoing traffic evenly through each provider.
Pass each provider's private networks through the owner.
Whenever either provider is down, pass the traffic through the other.
Redirect some incoming traffic to selected servers in the DMZ.
Redirect some other incoming traffic to the router itself.

I've had it all set up once four years ago using FreeBSD, ipfw and natd and it's been working ever since. The problem is maintenance. Having changed over time, ipfw ruleset got huge and unreadable, natd is troublesome to reconfigure and switches to userland consume CPU.

Now as the company is growing, we have new offices, new services, new hardware, and so I thought that rather than trying to shoehorn the changes, I'd do it over again using something else instead of ipfw/natd. For example, pf.

After reading man and a couple books it all seemed clear. Some things I indeed worked around easily. But not this one.

Say I want to redirect incoming internet traffic from either provider to a service running at the router itself listening at localhost. This should be something like

nat on ext -> (ext)
rdr on ext from any to (ext) tag FOO -> localhost
pass in on ext tagged FOO

Right ? Well, if ext is your only internet connection, yes. Otherwise you have ext1 and ext2, each having its own default gateway and it bites you. See, if ext1 is 1.1.1.1, ext2 is 2.2.2.2, a SYN packet arrives over ext1

ext1: 1.2.3.4 -> 1.1.1.1

it goes through the rdr rule first and becomes

ext1: 1.2.3.4 -> 127.0.0.1 tagged FOO

before it is seen by the pass rule. The routing information has been lost. Now even if the service responds with SYN/ACK, where should it be sent to ? Ideally we would need something like

rdr on ext1 from any to (ext) rtable 1 tag FOO -> localhost
rdr on ext2 from any to (ext) rtable 2 tag FOO -> localhost

to attach routing to a state as early as possible. Unfortunately, using rtable with rdr is not supported in FreeBSD 8. You can use rtable with pass, but it is too late a point.

My first reaction was to tag each provider's incoming traffic differently and play from there:

rdr on ext1 from any to (ext1) tag EXT1 -> localhost
rdr on ext2 from any to (ext2) tag EXT2 -> localhost
pass in on ext1 tagged EXT1
pass in on ext2 tagged EXT2
pass in on dmz route-to (ext1 gw1) tagged EXT1
pass in on dmz route-to (ext2 gw2) tagged EXT2

The trick here is that we intercept the packets from the service response at re-entry, they are automatically attached to an already tagged state and therefore we can determine where to route them. And it works, but not with loopback (note that I've used dmz for interface name). Specifically for the loopback interface the response packets avoid filtering, the

pass in on lo0 route-to (ext1 gw1) tagged EXT1

rule never fires and the matching packets go directly to routing. It doesn't work.

After some obligatory hair pulling, I've come up with the following working ruleset:

set skip on lo0
nat on ext1 -> (ext1)
nat on ext2 -> (ext2)
rdr on ext1 from any to (ext1) tag FOO -> localhost
rdr on ext2 from any to (ext2) tag FOO -> localhost
pass in on ext1 reply-to (ext1 gw1) tagged FOO
pass in on ext2 reply-to (ext2 gw2) tagged FOO

And an ever important addition to this:

route add default -iface lo0

Now, I have no idea why localhost needs to be specifically routed to lo0, but you can google up "route-to lo0" for a bunch of posts similar to mine. Without it the redirected packets somehow can't reach localhost. But as we already have reply-to clause on a pass rule, we cannot also include route-to (which is yet another obstacle). There appears no other way but to set a default route to lo0. It even makes sense, because on this machine I treat localhost as publicly accessible.

As a bonus this ruleset also works for redirected connections that reach out to DMZ, which is enough to keep me happy for the moment.

September 11, 2012

On interrupting application code

So, I've released a new version of Pythomnic3k framework recently. As usual, the changes that get incorporated are essentially answers to questions from real life usage. There is a few in current release too.

Now I'm thinking what to do next. Among the things that still make me uncomfortable is the real-time guarantee, or, specifically in Pythomnic3k, a guarantee that every request will return by deadline. Just anything, a failure perhaps, as soon as it won't hang.

The problem is, sometimes a developer writes something that may not look like an infinite loop, but behaves like one. A situation when execution hits a

while True:
    pass

is a disaster in any architecture. It never returns, it has to be executed, there goes a CPU. Even the perfect scheduler cannot decide that this code effectively does nothing and just prevent it from being scheduled.

In Python this is worse, because it is effectively single-threaded for CPU-bound code. And even if that would not have been so, the next request hits the same spot and before you know it your load average is over 26. Therefore a single mistake like this could bring entire service down.

There has to be a way of interrupting the application code.

And in Python there is. A thread can inject an exception to another thread's execution path and as soon as the victim gets scheduled next, that exception is thrown. This is very easy:

PyThreadState_SetAsyncExc(other_thread, Error)

As soon as the application has a watchdog (and in Pythomnic3k there is), it could interrupt worker threads using such injection. I tried it and it worked.

Now there appear other problems.

First is the problem I could easily turn blind eye to. A code which executes an OS call cannot be interrupted this way. Easy to see, because while it does it is outside the Python scheduler's reach. Therefore

time.sleep(86400)

will still return tomorrow.

Second is the bigger problem of unpredictability. You don't know when or where the deadline hence the exception hits you. This effectively means that no code can be considered exception-safe now.

As a framework author, I could protect sensitive fragments of code by essentially "disabling interrupts" for the moments when interruption would not be convenient. So I write something like this:

current_thread.can_interrupt = True
application_code()
current_thread.can_interrupt = False

That protects the framework, but not the application code. The same developer who wrote the infinite loop in the first place could very reasonably write something like this:

lock.acquire()
try:
    ...
finally:
    lock.release()

Now consider what happens if the exception is injected after acquire but before try. Although an opening try statement actually does something non-trivial, it is never considered to be a possible source of exceptions. Paranoid as I am, I read try as noop. If try started failing, all bets are off.

Similarly, it is common to put clean up code in finally and make that code exception-safe. For example:

d["foo"] = "I exist therefore I am"
try:
    print(d["foo"])
finally:
    del d["foo"] # this could not throw
    something very important

Now any code, however safe it looks could throw. One could write very defensive code wrapping every line in a try-finally block, but then again, it is still possible that in

finally:
    try:
        del d["foo"]
    finally:
        something very important

an exception is thrown just after the second finally statement and something very important is still not executed.

As it turns out to be, we have a "programming in presence of asynchronous signals" situation here. As soon as some external mechanism could interrupt execution, you have to always account for that. This was typical when programming interrupt handlers in assembler, where much of your code were cli and sti instructions. All hell rained down if you ever forgot one.

Granted, such programming is possible and could even be considered stylish and felt elitist. But it is entirely different style from application programming in high level dynamic language. Even if facilities to disable interrupts are provided by the framework, it would require much experience and care on developer's behalf, much more that could be expected. And it will be a lot of trouble to use correctly.

Therefore I don't think I will instrument Pythomnic3k with such deadline enforcing mechanism. A developer who wishes to make his code deadline-friendly could always do it in the same way it is done now, by explicitly checking for request expiration at well defined points. Something like

while not pmnc.request.expired:
    read_data(timeout = pmnc.request.remain)

And if someone makes a mistake and the service hangs... Well, you have to be ready for that too.

May 28, 2012

Oracle Application Integration Architecture (AIA) Foundation Pack 11gR1: Essentials

This is a review of a book Oracle Application Integration Architecture (AIA) Foundation Pack 11gR1: Essentials by Hariharan V. Ganesarethinam written by request of Packt publishing.

The book is not for me to begin with:

This book assumes that you have a fundamental knowledge of Oracle SOA suite and its components.

I don't. Integration is part of my job, that's true, but I've never used Oracle SOA suite. Therefore my only hope was that the book was "essential" enough to be worth reading.

Well, it was, but unfortunately the experience was not pleasant.

As far as published texts go, I'm a grammar nazi. If it is on paper, it should at the very least be syntactically correct. You do have editors and proofreaders in Packt, don't you ?

This book doesn't pass the grammar check, thereare wordsglued together, wrong prepositions used into it, and sequence of tenses has done wrong.

To me this greatly undermines the book's authority. If the authors didn't even bother to check the grammar, what are the chances that it contains correct information and in such form that I understand it right ?

And that would not be an easy challenge.

The author uses words such as "securities" (plural of security), "compatibilities", "upgradation", "inbuilt", "self-intelligent", "real-time" (supposedly instead of "real-life") and "product portfolio".

Writing style is less than perfect, but that would be nitpicking, because there are whole sentences that make no sense, for example:

review the test results, and correct the implementation if any.

we need to know the name of the operation where we are going to test and type.

And here is an example of an outright contradiction:

* EBF can only invoke or be invoked by another EBF or EBS. It never communicates with ABCS directly.
* EBF can be invoked by requester ABCS.

Even when the sentence structure is correct, one can encounter something like this:

AIA recommends extending these business processes at logical entry points. However, it does not recommend any four point extension locations.

WTF is "four point location" ?

The text is dense with acronyms. Safe would be to say that on average every sentence in this book contains about two. This makes it very hard to read. Besides, the purpose often seems to be not making it clear to the reader, but including every single one, as though omission would render the writing incorrect. Therefore trains of acronyms are repeated again and again.

Specifically in summary sections, same statements are repeated like 5 times in different variations.

Enough about the style, what about the content ?

The book essentially contains 3 types of information.

1. Architecture and components of Oracle AIA.

As we presumably have "a fundamental knowledge of Oracle SOA suite", there is no big picture, overview of purpose, problem, process, tools or methods. Instead, it jumps right to AIA components.

And it explains them well enough I guess. Upon reading this book I have an essential if vague understanding what AIA consists of.

There also are chapters about my favourite error handling, security and versioning, but they are shallow. What they say is that there is an XML in which you can configure everything. It is right there in a deep dark directory.

2. Screenshots.

Those are of the useless sort, where a shot of a maximized window takes half a page, the screen fonts are too small to be read, and you only need a single tiny button anyway, which is circled. And the text caption says something like "fill in the required information in all the fields and click save".

3. Guidelines.

Those are advices from the field experience, but given that the book does not talk about practical issues anyway, those advices come void. There is no structure, no reason why they are here. Out of the blue comes "and never do this !" What ? Why ?

A concluding quote from page 5:

Now that you are the proud owner of a Packt book, ...

Please, let me decide whether or not I'm proud, don't declare it before I've even read the book.

The book lives to its promise but just barely, therefore 3 out of 5.

March 03, 2012

A fantastic solution for falsifying elections in Russia

This has to do with presidential elections in Russia which are due tomorrow.

Just as with senate elections last December, Russians are getting ready for grossly falsified farce. Those expectations came out true last December and gave those who were watching tons of fun, such as total 146% of votes broadcast on TV, incriminating cellphone youtube flicks officially declared having been filmed by criminals in underground bunkers, and hundreds of what courts decided to be "technical counting mistakes" all in favour of the winning party.

Now the saga continues. This time the question is whether Putin becomes a dark lord for the next 12 years or something else happens.

Those of you who are interested in elections in a dictatorship state, can check out how army and police can be ordered to vote for such and such, how state employees (a synonym for "the poorest") can be threatened to do the same, how psychiatric clinics vote, how all the TV channels can ever display the same candidate, how protest rallies against unfair elections are mirrored the next day pro-Putin by having thousands of random people brought in by buses from all around, each paid $30 just to stand there, all the fun paid by budget money.

But the panacea emerged last month - web cameras !

A camera had to be installed at each voting site pointing at the slot box to which all the ballots are thrown. This is officially said to be the end of election corruption as "anyone can watch for himself that everything is in order with Russian elections".

The largest Russian communications company Rostelecom (of which Putin is presumably a stakeholder) has been assigned to install 300000 cameras, the total project cost being around half a billion dollars drained from the state budget.

I thought I would pass posting about how ineffective a solution this is, but as I'm on it, I'll say a few words.

Even in perfect conditions, the camera only shows the moment of throwing in the filled ballot. Even theoretically it thwarts two threats - one known as "throw in" when a single person throws in a stack of pre-filled ballots, and the other known as "carousel" when a group of person moves from one site to another voting at each using false id's (the faces can theoretically be cross-matched).

But really only the short-sighted can believe that cameras are an answer to anything.

First, it is proves nothing. Under Russian laws video footage is not an evidence. Furthermore, even if something is caught on tape and is passed down to a court, it is the same court as the last time, and it declares it to be a fake made by terrorists to discredit the state.

Second, noone is to ever watch it. It is 300000 days of hi-res video or about 5000 terabytes of data. Even such archive is made accessible to anyone over the Internet (and never has a wonder of a quality official web site appeared in Russian Internet before), it is 800 man/years to watch, therefore ten thousand people can watch in about a month, eating up 15 gigabit of traffic.

And if you were still unsure, the real stuff happens outside the camera views.

See, elections are nothing but their announced results. Therefore in the simplest case, in a few days after the elections the central election committee simply announces that Putin has won. How can anyone can disprove it ? For the results to be discarded and re-elections initiated, there must be sufficient evidence, and court must decide. And you know about Russian courts already.

The other thing about the elections is that the process of counting is an upstream aggregation of numbers. There are many places where those numbers can be manipulated.

Also, the paper trail is lost - the ballots are long since thrashed and the only remaining evidence is going to be the protocols signed by committees at each site after hand counting the ballots.

This is rather trivial excercise in applied security, so like I said I felt no obligation for this post. But then the Internet delivered something so striking that the great amazement forced me.

Practical security issues, I've seen a lot, but this takes the cake. It is simple, ingenious and extremely effective. There is a way to falsify all the counting protocols at once with noone ever noticing. And it has to do with web cameras. Interested ?

See, the web cameras are electronic devices and the members of the committees at each site all over Russia must be instructed how to use them. There is a printed manual, over which each member must be briefed and sign for it the last page. Everything fine so far ?

Now, behold !

The page at which they sign for the briefing is identical to the page at which they sign in the counting protocol. And accidentally, this page for signatures is actually a blank A3 sheet folded and clipped to a regular A4 instruction book. Therefore immediately after the briefing we have a blank A3 sheet with nothing on it except for the signatures of the counting committee at exact same place where they should be if that sheet was a counting protocol. You take the signed page, print over anything you want and there you have it - the official vote couting protocol signed by the election committee members in person.

This is fantastic.

The original report is here.

Things That Require Further Thinking