Monday, October 20, 2008

If its not asynchronous yet, make it

I am at war. It is a war against apis, query strikes, and unicode bombs. The enemy is Berkeley XMLDB, and I'm tired of losing. 

My battle at bunker hill has been using this embedded database in a server environment. XMLDB and its corresponding python bindings are not optimized OR designed for server environments. Because its embedded, it has some very negative server side effects:
  • if the db segfaults, the server segfaults
  • since the python bindings are a swig wrapper, the c libs segfault easily
  • if the db hangs, the server hangs
  • no persistent connections are a performance hit
  • multiple dbs are a management headache
  • multi-threaded processes are not well supported out of the box, and servers are multi-threaded
  • multi-processes are not supported at all, and can cause mega corruption of the database
  • if anything goes wrong with anything, recovery has to be single process single thread, meaning that all connections have to be terminated to bring one db back online
All these things add up to down time, down time, down time.  Down time means I'm losing. It stresses you out to the land of disappearing productivity and forces mistakes.  And then there is the ever present fear that one day a recovery will corrupt it all. XMLDB, I'm coming to fix you. You better be ready.

Now I need a game plan.  First things first, tackle swig and any nasty exceptions that come out of it.  Minimize encoding errors, separate interfaces, and wrap, wrap, wrap. Protection is the name of the game, and to ease the blow of attacks I'm gonna need some good armor. With every call to swig there will be a try/except/finally there on the defense.

Java weenies seem to have a web friendly XMLDB interface thats filled with interfaces of wrappers of SOAP and jsr*** that is a great idea but just won't work. Adding java to a python architecture is just going to make environment variables clash and maintenance will bring my men down.  To counter, I've started playing with twisted, as its guaranteed single threaded and can manage its threads to aid recovery. This should help separate the server from the db, a strategy that should have been painfully obvious from the start. It should also maintain an xmldb version of "persistent connections". Finally, a separate server means that if xmldb goes down, it doesn't drag plone into the mud with it. More men on the field means more threads could die, but it also means more threads to fight.

Living in a world of synchronous calls has been a dangerous game. Ideally I would like to update things on the spot, but anytime I depend on something being up, it is down. Storing data in a processing queue (such as a fancy xml feeds folder) is not premature optimization, just a smart way to never underestimate the power of the "retry" from transactions of web requests.  That's one thing java heads got right and I hope it will be the nuclear bomb in my currently weak arsenal.

Onwards and upwards I guess: XMLDB, prepare to behave.

4 comments:

John Brennan said...

why XMLDB? are you using XML for transport? Why not JSON+CouchDB or some other doc store?

I'm interested to hear your thoughts on Twisted. From their doc it seems like they have a lot going on.. but they are also managing/abstracting a lot...

eleddy said...

@john_brennan I still have to write a blog post on why I hate json :) top three reasons, #1 being that thats what mcmorgan chose when he started things and often that determines a lot. Plus we are working with xml only so a native xmldb fits. #2 is that this is a huge app that has to be compatible through apis other than rest including soap and com. #3 is scalability.

I have never used couchdb but it looks interesting, however not appropriate for what we are working with. I am still working with twisted but your initial thoughts were mine too. I'm sure I'll write something when its all over :)

John Brennan said...

@eleddy:
hate json.. do tell why?
i get that XML makes more sense in certain cases like when having to talk in SOAP and to legacy systems, but because JSON is essentially serialized JS objects, it's dead simple (less work, more readable) for client side scripting. Now that more apps are being powered by javascript almost completely there is even a better push to KISS (keep it simple stupid).

But.. I'd love to hear why you hate json... ;)

Anonymous said...

@john_brennan I apologize, hate is a strong word. I dislike it for all the hype and the fact that its wrong for most applications but people use it anyways. json works for small things, small amounts of data. but when you are storing a mass amount of data that has to be shared among apps, some of which are NOT web based, the json is aweful. it also has terrible unicode support (something I have to deal with a lot) and is very finicky (my users like to use all kinds of apostrophes, not just ' and "). not that xml is always the right choice either of course :)