Thursday, November 27, 2008

Improving MapReduce Performance in Heterogeneous Environments

This paper introduces a new scheduling algorithm to optimize MapReduce (specifically the Hadoop implementation) performance in heterogeneous conditions: Longest Approximate Time to End (LATE). The motivation here lies in the fact that the MapReduce task scheduler assumes linear progress and node homogeneity to make re-execution decisions in case of a straggler, a node that is performing poorly for various reasons (e.g., faulty hardware). Since this "speculative execution" is crucial to MapReduce's high performance, the authors set out to optimize the scheduler to finish the fastest tasks first without wasting unnecessary resources.

In short, task scheduling does not simply identify slow tasks but those that will hurt the average response time the most, as with any scheduling algorithm. Scheduling priority should be given to failed tasks, followed by non-running tasks and finally tasks for speculative execution. Moreover, speculative tasks should be scheduled on fast nodes. The ideas presented in this paper are all very intuitive and not too complicated - honestly, I'm surprised that MapReduce implementations didn't already use at least this level of sophistication in task scheduling. Overall, I definitely enjoyed this paper as I have plenty of Hadoop experience; this is certainly a keeper for the syllabus.

No comments: