Monday, October 20, 2008

If its not asynchronous yet, make it

I am at war. It is a war against apis, query strikes, and unicode bombs. The enemy is Berkeley XMLDB, and I'm tired of losing. 

My battle at bunker hill has been using this embedded database in a server environment. XMLDB and its corresponding python bindings are not optimized OR designed for server environments. Because its embedded, it has some very negative server side effects:
  • if the db segfaults, the server segfaults
  • since the python bindings are a swig wrapper, the c libs segfault easily
  • if the db hangs, the server hangs
  • no persistent connections are a performance hit
  • multiple dbs are a management headache
  • multi-threaded processes are not well supported out of the box, and servers are multi-threaded
  • multi-processes are not supported at all, and can cause mega corruption of the database
  • if anything goes wrong with anything, recovery has to be single process single thread, meaning that all connections have to be terminated to bring one db back online
All these things add up to down time, down time, down time.  Down time means I'm losing. It stresses you out to the land of disappearing productivity and forces mistakes.  And then there is the ever present fear that one day a recovery will corrupt it all. XMLDB, I'm coming to fix you. You better be ready.

Now I need a game plan.  First things first, tackle swig and any nasty exceptions that come out of it.  Minimize encoding errors, separate interfaces, and wrap, wrap, wrap. Protection is the name of the game, and to ease the blow of attacks I'm gonna need some good armor. With every call to swig there will be a try/except/finally there on the defense.

Java weenies seem to have a web friendly XMLDB interface thats filled with interfaces of wrappers of SOAP and jsr*** that is a great idea but just won't work. Adding java to a python architecture is just going to make environment variables clash and maintenance will bring my men down.  To counter, I've started playing with twisted, as its guaranteed single threaded and can manage its threads to aid recovery. This should help separate the server from the db, a strategy that should have been painfully obvious from the start. It should also maintain an xmldb version of "persistent connections". Finally, a separate server means that if xmldb goes down, it doesn't drag plone into the mud with it. More men on the field means more threads could die, but it also means more threads to fight.

Living in a world of synchronous calls has been a dangerous game. Ideally I would like to update things on the spot, but anytime I depend on something being up, it is down. Storing data in a processing queue (such as a fancy xml feeds folder) is not premature optimization, just a smart way to never underestimate the power of the "retry" from transactions of web requests.  That's one thing java heads got right and I hope it will be the nuclear bomb in my currently weak arsenal.

Onwards and upwards I guess: XMLDB, prepare to behave.

Friday, June 13, 2008

My Abstraction Optimization

Here at Anus Health, you kinda get used to doing things multiple, unnecessary times: redoing the work of minions, retring bad data calls, or explaining complicated migration procedures to pointy haired bosses. The worst is writing code - write, rewrite with a better data model, reformat, add a tiny feature, clean - it never ends. With over 15,000 patient records filling over 13GB of data, today is the day where I need to start thinking about performance and rewriting better, more scalable code...

Abstract Out, Then Optimize In
I tend to build software according to the adage "if it doesn't make sense yet, just build another layer of abstraction". You know what? It works. Put time up front to make really nice abstractions and even the dumbest of colleagues are knockin' out working code like monkeys eat lice. But there is a price for abstraction in the end, and that tends to be performance. Every layer that you added is spinning the disk better than Punjabi MC and eating more RAM than your mom. The classic example of this is my good friend Plone. Amazing to quickly buildout an app, but I'll be damned if after 6 years of development experience, intense caching policies, and 16GB of RAM I still can't get it running much faster than a hobo on a train track.

In my last post, I talked a lot of shit about not worrying about scalability. If you're lucky, you get to the point where you actually have to start worrying because good caching and hardware is only gonna take you so far. The interesting thing about this is that after you spend all that time abstracting away from the nitty gritty, you know have optimize back down through the layers to find efficiencies: build things up and then tear them down. This is exactly what twitter is doing. Now that things are huge, start replacing those yummy developer abstractions with good old nasty C. Instead of using an ORM, write your reports direct from MySQL then take it a notch further and read directly from disk. Optimization high five - whipish!

The thing that uber sucks about optimizing an abstracted system is the fact that these optimizations are not easy to write. Current interfaces are entagled in kludges deeper than a whales vagina. The little pieces of backwords compatability from your 1034th version of the photo album picture are just barely holding up as it is. What is a developer to do? Have the Java weenies been right all along?!? Stay tuned...

Next time on Sincerely, Management:
  • "My Responsibility Response" - Things get whackey at Anus Health when Liz learns that Plone can't do everything, especially reports. Guest starring "IT guy".
  • "My Feed Lust" - After a losing battle with IP latency and unreliable 3rd party API responses, Liz shows instantaneous response the door. Not suitable for children under 25.
  • "My New XML Fetish" - Liz sends JSON & GET packing after they discover her steamy affair with XML, starting a dangerous journey away from friendly 3rd party developer API land.

Monday, May 19, 2008

Opportunity Knocking: SWF ISO QA Cowboy

Note: This post is only reflective of my desire to work with someone who doesn't suck, and not of my employer. I have to say, though, that I have considerable pull so get at me if you want to rock with yours truly every day to make some kick ass software that gets used by real people at a well funded start-up.

Here I am, lonely, and looking for a mate... my one and only work mate! I know that you are out there, and you can roll with the punches like the best of them. I know that you are un-phased by "emergencies" and have corrupted your own database a time or two only to breathe and recover gracefully (at least the second time). I know you aren't afraid of restarting a service live if timed right and that you truly understand that logs are there for more than just wasting disk cycles. You know when to yell at me for my coding sins and when to give me slack for a 95% pass day.

Save me from myself and my kludgey disgressions. Are you out there?

MISSION
:

Help us get our test on by setting up testing protocols, enforcing test suites, and anything else that makes a sturdier code base and happier customers. We have regression/unit testing in place but are looking for someone to enforce that they get run often as well as wrangle together the rest of the test picture pieces including pre and post deploy testing, bugs, new features, and performance.

A strong head and leadership qualities are a must: we are growing fast so don't be surprised if you are expected to voice opinions and yell about the right way to do things now and then. You will be given any tools you need to maintain system integrity and enforce test policy.This is the perfect position for someone who is ready to step it up a notch and help lead a small company to success-ville.

For those that are interested and able, this position can easily and quickly turn into a developer position.

SYSTEM:

We work primarily with Python and the Plone framework but dabble in many arenas such as COM, XMLRPC, XMLDB, Adobe Livecycle Workflow and Designer, etc... depending on what needs to be glued at the time. You will learn and understand how all these pieces fit together into one beautiful system and are expected to care for each piece like it was written by you.

REQUIREMENTS:
  • BS in Computer Science or must possess portfolio proving proficiency otherwise
  • 1-2 years experience testing or building web applications, preferably with dynamic languages and kudos if that language is Python
  • Ability and excitement to learn new languages/technologies in a web based environment
  • Recent grads will be considered if internship experience is close - we will gladly feed and grow you into the position
  • Must know when to ask questions and when to ask Google
  • Self-sufficient and flexible
  • Excellent troubleshooting, analytical, documentation and communication skills
  • Must be dependable, have a positive attitude, and be a team player
Knowledge of any of the following is a awesome:
  • Experience working with and/or developing in Plone
  • Familiarity with Test Driven Development and Agile Methodologies
  • History working with XML databases
  • Experience with Adobe PDF and/or Livecycle Designer
COMPENSATION:

Competitive salary based on experience with all the startup stock yummies.

Monday, May 12, 2008

I Poop on Designing for Scalability

Dear John -

Is it OK that I answer your question with a blog post? It seems very web 2.0, and my whole goal in life is to be more web 2.0 than any other web 2.0 weenie out there. Plus, maybe there is a reader out there, and maybe they have a better response.

Unfortunately, I've done a bunch of scaling in my short time on this planet (especially working with the notoriously performance agnst Plone) and I have to side with the camp that you should rarely spend time scaling architecture in the beginning. Here are some yummy reasons on why you should wait to pop your scaling cherry:
  • your web app may never take off and you have wasted time scaling when you could have spent time writing a killer feature
  • if it does take off, pay someone else to worry about it. If you read anything about scaling twitter, read about how they "fired" Blaine Cook and hired a bunch of scalability experts instead. Zing!
  • caching goes a loooong way. I like squid myself - it's a sexy beast if a little hard to configure - and I hear varnish is pretty top notch too. Hell, httpd has a nice accelerator built in too if you need a quick fix. Let's take a moment and remember WHY caching works so well: it serves up content that an app server like RoR or Django could give two craps about such as javascript, css and images. Have you ever looked at how much time your browser spends loading this stuff? It's a lot, and cache servers know when to expire, refresh, reload http headers, etc. Your cache server may be your users' browsers' best friend.
  • you can and should throw more hardware at the problem first. its cheap enough these days and it will buy you the time you need to develop a real scaling solution
  • Jared Spool gave a great talk at sxsw about actual performance and perceived performance. the performance of your pages (i.e. load time) is almost always next to perceived performance, the time it takes your user to complete a task. For example, amazon.com takes on average an unthinkable time to load pages but it is commonly perceived as the fastest site out there because of its 1 click functionality to get things done.
  • IP latency is a factor. If you are integrating with any external site (who isn't these days?), chances are that you will see more lag time from waiting for responses for that than any kludgey code you can write
  • it will rarely clog where you think it will clog so take that pipe dream to the dump and let the bums sleep in it
That being said, here are some things you can think about now if you are worried:
  • don't write stupid code. if you notice yourself writing 4 nested loops, think - hey, that could be nasty later on.
  • write good db queries and for the love of god don't use your code to filter out the results. in the same respect, don't access database variables more than needed - make a higher level variable and reference that forever. watch out for this in ORMs - they can be very inefficient for scaling even though they make code writing fast. but read bullet 1 above first.
  • know your language/tools. if you are using RoR or something that is known to be slow, anticipate it being a problem later and rewriting key parts in a language like C. oh, your language doesn't have C wrappers? that could be a problem ...
  • think about concurrency while you are writing for KNOWN expensive operations. Think: does this action rely on another action before going to the next step? For example, we have a process which needs to create an appountment in exchange and then attach 2 files to the resulting appointment. Instead of writing a functional piece of code like so: create appointment (2s) > attach file 1 (60s) > attach file 2(120 s) > report results(2s)) totaling 184sec, think of doing a threaded version: (create appt(2s) > report results(2s) > kickoff concurrent attach threads(120s) ) totaling 124 seconds. Web peeps don't think about concurrency enough. Be different.
  • put a round robin queue that diverts from dead parents so you can hot patch stuff live if your code requires restart for changes to take effect (i.e. compiled code). this has saved me a hundred times over and eliminates downtime while still allowing you to respond to heinous bugs.
Last but not least, don't forget Knuths famous words of wisdom: "Premature optimization is the root of all evil!!!"

This data streams crap seems like it was written by an architecture astronaut. Yeah right! Who really writes code like that? (hint: no one).

Hth,

Liz

>>>
...
So, although I truly want to know how you're doing, I also have a question for ya. It stems from thinking long and hard how to go about building the architecture for my latest startup. I've read interesting posts such as Two data streams for a happy website (http://gojko.net/2008/03/03/two-data-streams-for-a-happy-website/) and Scalability (http://romeda.org/blog/2008/05/scalability.html) from Twitter's Blaine Cook, but it's hard to figure out where to go from here.

Basically... how much time do I spend planning how the architecture can scale before I know the metrics. I've read that I shouldn't worry about it until I'm there, while others say to keep 2 separate data streams (one that requires users to be logged in and one that doesn't). I expect to have to tweak the solution if we do reach limits, but the more I know before I start writing code the better.

Have you (or friends you know) hit this problem? What advice could you provide?

Thanks in advance!

Best,
John
>>>

Thursday, April 24, 2008

Press or say 3 if you're being robbed by gunpoint

Dear 911 Emergency System Manager Dude,

There is a man on the intersection of the 163 North and the 8 East, lying on the ground, with blood coming out of his head. There are 5 cars involved, maybe 6. It's 9:30 am, and there is a lot of traffic headed directly at the accident. Even though I know its probably already been reported, I pick up my phone to report the crash and make sure bloody guy gets the ambulance he needs. This is the phone call I expected to make:

Me: 9-1-1
911: This is 911 please explain your emergency
Me: There was an accident on the 163 and the 8 and there is a man lying on the ground with blood everywhere that probably needs an ambulance
911: We'll make sure to send someone out there
Me: Thanks!

However, the emergency reporting system had "upgraded" since the last time I used it. Below is the phone call that actually happened. WARNING: extreme ellipses may follow.

Me: 9-1-1
busy signal
Me: ...huh...
Me: 9-1-1
busy signal
Me: 9-1-1
phone rings
Phone: Welcome to the California Highway Patrol
Me: Hi I'd like to report...
911: ... If you would like to report an emergency, please press or say 1 now
Me listens to the menu
911:
Me: ...
911: If you would like to report an emergency, please press or say 1 now
Me: ... huh?
911 hangs up

Terrible emergency system design just happened, and it may be killing people. From the moment I got the first busy signal (is it 1982 again?), this shiny new 911 reporting system/workflow/call-taker/whatever made it hard for me to complete a critical task and gave me motivation to write about the lessons learned from this short dialog.

A key concept in designing obviousness is that all choices should be as easy as possible. I had already decided that this was an emergency by dialing 911, and yet they forced me to rethink my choice verbally or by touch tone. You know 911, you're right, this isn't an emergency - I was robbed by gunpoint and killed by the time I figured out what the hell was going on with your system. Note to everyone who makes things: reducing redundant choices is the key to accessibility by end users and marks the efficiency of a system.

Deciding to use an automated system for a critical human-centric task such as 911 is a huge mistake. People (read:I) hate automated telephone systems and even if they have come a long way since first inception they are notorious for long waits, incorrect routing, and frustrated end users. Perhaps they actually want to discourage people from calling 911. In my case, congratulations 911, you have succeeded.

After deciding to go with the automated answering system, someone made a decision that talk to an operator was not an option in the main menu, let alone the default choice. Systems that ignore the usage patterns in which all of its users exist will suffer if not die before the second server is even plugged in. Even if the new system is better in every which way, a good system has no traction if people can't use it based on some piece of previous knowledge. Every automated system has a skip to operator option - where is yours 911? I can't find Ctl + Alt + Del on my phone...

So whomever made the decision at 911 headquarters to switch from an easy to use, simple emergency system to an automated, confusing pile of crap, congratulations on your un-success! You're fired.

Now can I have my old 911 back?

Sincerely,
Management

Friday, April 11, 2008

No Frustration Setup Policy

While many designers spend countless hours thinking about the design and usability of a products functionality, it seems that many neglect the experience of new user setup/assembly. Here is my tale...

I recently purchased and camped with the T3 Quarter Dome tent by REI and it was the best new setup experience I have had with a product in many, many years. So good, in fact, that I decided to dedicate this post to it. I'm sure anyone who has camped has felt the frustration of arriving late to a site and trying to assemble a tent in the dark for the first time, fumbling with the user manual to find out which posts attach to which side of the tent, why something seems to be unnecessarily tight or loose, etc... Arg! It seems that the designers of the T3 have also felt this pain (dare I say listened to their users?) and provided the following anti-frustration features that are simple and just knocked my socks off:
  • Connected support "rods": support pieces are tied together with bungee cords in the middle, meaning that it almost instantaneously assembles in the right way as soon as you pull it out of the bag, and also means that you never lose one
  • Color coded assembly: the top of the tent has lines in two colors, orange and silver, striping across the top that correspond to the orange and silver colored rods. Since the rods almost assemble themselves, you can see immediately how they lay across the top with respect to the tent. Color coded tags show that orange rod starts and ends in one stripe and the silver rod starts and ends in another. Dare I say idiot proof? I dare!
  • Two doors: the flyaway (as well as the tent) has a door on both sides, meaning that you don't have to decide and reducing the amount of times you say "oh wait, it goes the other way"
I could go on far too long about the little things that make this tent *fun* to assemble but I think I can go right to the point: the best user manual is the product itself. Smart use of color to lead the user and simple tricks to prevent "oh wait.." moments can take a user experience from "meh" to "wow" even if the final product is the same.

As a counterexample, let me describe the most frustrating setup I've had in many years, which just so happened to occur the week before the best one. I purchased some Speedplay clip in pedals for my road bike and attempted to mount them on my cleats. Here is a sampling of why it took two days to assemble correctly:
  • Bad print: the paper was neon orange and everything is written in 8pt font with bold and underlines exploding on the page to the point of complete ineffectiveness
  • Incomplete instructions: additional instructions and warnings were littered in the box like an afterthought, the most important of which accidentally drifted under the table only to be discovered after the problem was solved. I know its cheaper, but just reprint them!
  • Color: the only color coding turned out to be things that were irrelevant to assembly as well as screws that were only 1mm difference where almost unidentifiable and shims for different shoes were all the same gray with the label nicely etched in (you guessed it) gray were the most annoying
  • Language: assembling these shoes I said "what does that MEAN?!?" more times than I care to share. This instruction sheet definitely needed a definition list
After 2 frustrating days of wondering if it was my poor instruction reading, new pedal break in, or the pedals just not living up to the expectation set, I finally got things working right. With both products, after things were put together correctly, the end products were awesome and I am happy with my purchase decisions. However, if someone asks me about my pedals I'll recommend with a disclaimer: "...but it was a bear to assemble so watch out". You probably don't need to ask me what I think about the tent because if you know me, you already know.

So from here on out, a new No Frustration policy for getting started with a product made by LizCo is active. Any questions can be directed to your closest level of management and there is a reward for snitching on any product produced caught violating the new policy.

Now if I could just produce something...

Sincerely,
Management

Friday, April 4, 2008

Blog Policy

Dear Loyal Followers,

I know you have been dying to know - what the hell is up with Liz's blog? I can answer that in exactly 3 lines:

1. I had a Plone blog hosted here but now its not
2. I never blogged that much in the 1st place
3. Textdrive got bought by Joyent and moved from linux to unix servers (yay), asking me to migrate my blog database (boo) and for the love of money I am just too lazy to migrate the old blog so I'm using a hosted site (blogger) that publishes to my awesome URL (eleddy.com) to continue to provide you with quality not-blogging. Run on sentences are a new feature that you can come to expect from me this time around...

For your convenience, you will not have to update any urls or rss feeds that you never subscribed in the first place. However, to pay for this feature I had to cut some corners so humor is no longer free.

To those looking for the old Plone blog, since most of the content is on PloneWars.com I will not be republishing here unless someone asks politely. New content on Plone will continue to emerge but old content will remain hidden in the Google cache.

I hope you will find these changes amenable to your situation - please direct any requests and/or complaints to human resources aka me.

Sincerely,
Management

3 Click Bug Reporting Policy

Dear Companies that make web software,

As a developer, I know how important it is to get bugs from your users out in "the field", especially all you beta web n.0 site. So, when I come across bugs, I will do my best to let you know what happened in a clear and concise way. However, if I can't find your bug reporting link in 3 clicks, your loss.

For example, I was on YouTube today looking for a preview of the "Short Circuit Remake" and when I searched, the title bar says there are 4 results but only 2 were displayed. Hmmm.... that's odd... I should report that. 4 clicks, 2 Help Center searches, and no email address later I gave up. Sorry GooTube, you're outta here!

Why 3 clicks you say? In reality it should be 1: scroll to the bottom of the page and click "Report a Bug". But I'm feeling generous today on the day I make this new policy so congratulations, you have me for 3 clicks.

Sincerely,

Management


UPDATE: southwest.com admits they have no email to contact them with - you have to mail in your bug reports, along with any requests or talk to their service representative. Yeah right!