too much configuration, or: you shouldn’t have to tweak parameters

Whenever I encounter a paper like this – Towards automatic optimization of MapReduce programs – and there are a lot of them, I find myself sighing inwardly. (Heck, we even had a student of ours who ended up tweaking a bunch of these knobs:

This seems to be a common refrain in Java programs, but Hadoop especially – rather then either choosing a sensible constant, or adapting a value at runtime, let’s foist all of the work onto the user. But the way it’s phrased is clever – we’re not avoiding the decision, we’re just making it so the user can configure things however they want. I’ve done this a lot myself – it’s just so easy to add a flag to your command line or to your config file and pride yourself on a job well (not) done.

The key issue here is that, as a user, I don’t know what to put in for these values, I don’t know what’s important to change, and so I’m the absolute worst person to be responsible for these things.

Seriously, why are you giving the user these parameters to tweak?

  • io.sort.record.percent
  • io.sort.factor
  • mapreduce.job.split.metainfo.maxsize

What inevitably happens is we don’t know what any of these things actually mean when it comes to making things faster, so we end up searching the internet for the magic numbers to plug in, rerunning our jobs a whole bunch and wasting a crap-load of time.

This is not a desirable user experience. I mean, here’s the interface a car exposes to me:

There’s a “go faster” pedal and a “go slower” pedal. These correspond to all sorts of complicated, dynamic magic inside of the engine compartment, but I don’t need to know about them – the system handles it for me. Moreover, it can adjust parameters at runtime, in response to the behavior of the car – unlike most of our lazy computer programs!

If only our programs could be more like cars (though hopefully with better gas mileage).

Leave a Reply

Your email address will not be published. Required fields are marked *