Monday, October 20, 2008

If its not asynchronous yet, make it

I am at war. It is a war against apis, query strikes, and unicode bombs. The enemy is Berkeley XMLDB, and I'm tired of losing. 

My battle at bunker hill has been using this embedded database in a server environment. XMLDB and its corresponding python bindings are not optimized OR designed for server environments. Because its embedded, it has some very negative server side effects:
  • if the db segfaults, the server segfaults
  • since the python bindings are a swig wrapper, the c libs segfault easily
  • if the db hangs, the server hangs
  • no persistent connections are a performance hit
  • multiple dbs are a management headache
  • multi-threaded processes are not well supported out of the box, and servers are multi-threaded
  • multi-processes are not supported at all, and can cause mega corruption of the database
  • if anything goes wrong with anything, recovery has to be single process single thread, meaning that all connections have to be terminated to bring one db back online
All these things add up to down time, down time, down time.  Down time means I'm losing. It stresses you out to the land of disappearing productivity and forces mistakes.  And then there is the ever present fear that one day a recovery will corrupt it all. XMLDB, I'm coming to fix you. You better be ready.

Now I need a game plan.  First things first, tackle swig and any nasty exceptions that come out of it.  Minimize encoding errors, separate interfaces, and wrap, wrap, wrap. Protection is the name of the game, and to ease the blow of attacks I'm gonna need some good armor. With every call to swig there will be a try/except/finally there on the defense.

Java weenies seem to have a web friendly XMLDB interface thats filled with interfaces of wrappers of SOAP and jsr*** that is a great idea but just won't work. Adding java to a python architecture is just going to make environment variables clash and maintenance will bring my men down.  To counter, I've started playing with twisted, as its guaranteed single threaded and can manage its threads to aid recovery. This should help separate the server from the db, a strategy that should have been painfully obvious from the start. It should also maintain an xmldb version of "persistent connections". Finally, a separate server means that if xmldb goes down, it doesn't drag plone into the mud with it. More men on the field means more threads could die, but it also means more threads to fight.

Living in a world of synchronous calls has been a dangerous game. Ideally I would like to update things on the spot, but anytime I depend on something being up, it is down. Storing data in a processing queue (such as a fancy xml feeds folder) is not premature optimization, just a smart way to never underestimate the power of the "retry" from transactions of web requests.  That's one thing java heads got right and I hope it will be the nuclear bomb in my currently weak arsenal.

Onwards and upwards I guess: XMLDB, prepare to behave.