Thursday, November 27, 2008

Improving MapReduce Performance in Heterogeneous Environments

This paper introduces a new scheduling algorithm to optimize MapReduce (specifically the Hadoop implementation) performance in heterogeneous conditions: Longest Approximate Time to End (LATE). The motivation here lies in the fact that the MapReduce task scheduler assumes linear progress and node homogeneity to make re-execution decisions in case of a straggler, a node that is performing poorly for various reasons (e.g., faulty hardware). Since this "speculative execution" is crucial to MapReduce's high performance, the authors set out to optimize the scheduler to finish the fastest tasks first without wasting unnecessary resources.

In short, task scheduling does not simply identify slow tasks but those that will hurt the average response time the most, as with any scheduling algorithm. Scheduling priority should be given to failed tasks, followed by non-running tasks and finally tasks for speculative execution. Moreover, speculative tasks should be scheduled on fast nodes. The ideas presented in this paper are all very intuitive and not too complicated - honestly, I'm surprised that MapReduce implementations didn't already use at least this level of sophistication in task scheduling. Overall, I definitely enjoyed this paper as I have plenty of Hadoop experience; this is certainly a keeper for the syllabus.

A Policy-aware Switching Layer for Data Centers

This paper addresses the need for more control over middlebox traversal in today's data centers. Data centers use a variety of middleboxes to improve the performance and security of services but existing networks provide suboptimal means to coerce traffic through a specific path of middleboxes. For this reason, the authors propose a policy-aware layer (PLayer) for data centers that is flexible, efficient and resilient to stern network conditions. Essentially, the solution is to wrap another layer of headers on each ethernet frame in order to track its progress through the policy (intended series of middleboxes).

In many ways, the ideas here are similar to the one that advocated a smoother integration of middleboxes into the Internet backbone by adding new identifiers. However, the solution proposed here appears to integrate rather seamlessly and with less overhead. The applications are also very obviously relevant (unlike the other middlebox paper, which didn't have a very convincing motivation in my opinion) and the concepts are generally explained clearly. I didn't completely understand the use of MAC addresses for pswitch-routing.

Scalable Application Layer Multicast

This paper presents an overlay-style protocol for scalable, application-layer multicast delivery. The primary focus here is on developing a hierarchical node organizational structure that minimizes control overhead. Even though the paper focuses on low-bandwidth applications, the authors claim that applications with different requirements can be handled by varying the overlay paths and metrics. The protocol employs a tree structure to avoid loops and data duplication, with a careful explanation of joining mechanisms for nodes. However, this tree model has several clusters at each layer and that may present problems - nodes at the top are overloaded compared to those at the bottom, which effectively lose bandwidth in one direction.

The cluster-based model may get complicated in a real network, especially merging and growing the tree. To an extent, this paper presents general approaches towards application-layer multicast rather than a specific protocol that can be readily implemented. It's more interesting than the previous one but once again, it may not really belong in the syllabus. The previous paper provides a better basic exposure to the requirements for multicast delivery.

A Reliable Multicast Framework for Light-weight Sessions and Application Level Framing

This paper describes SRM, an algorithm designed to support scalable, reliable multicast delivery. The authors importantly explain that several integral TCP parameters lose meaning in a multicast environment; for example, the round-trip time used for retransmission or the delay-bandwidth product is difficult to pinpoint when receivers have several orders of difference in magnitude. For this reason, receiver-based control is preferred to the traditional sender-based, fate-sharing approach. The SRM framework builds upon application-level framing (ALF) as well as the Light-Weight Session (LWS) protocol that was designed for large-scale conferencing applications. SRM includes some interesting features, such as localized retransmissions to avoid multicast spamming. However, I found some of the detailed discussion of simulation somewhat tedious. This was an interesting paper but not necessary essential to the syllabus.

Monday, November 17, 2008

A Delay-Tolerant Network Architecture for Challenged Internets

This paper presents the DTN architecture, designed for networks with exceptionally poor or unpredictable performance characteristics. The existing TCP/IP service model makes basic assumptions about the existence of end-to-end paths between any two nodes, as well as upper bounds on round-trip times and packet drop probabilities. The growing importance of networks that violate these assumptions - for example, sensor networks or military ad-hoc networks - is the motivation behind the DTN architecture. Rather than adopting band-aid solutions using link-repair approaches or proxies, DTN uses a messaging scheme and is based on a partially connected network graph model.

Unlike the majority of papers covered in this course (apart from the really early ones), this challenged some fundamental assumptions of the Internet. Given that deep-space and military communication, as a few examples, will remain essential for years to come, the motivation behind a DTN-like architecture is very convincing. It's compatible with the current TCP/IP stack, appears to be a practical solution at first-glance and definitely merits further study. This one definitely belongs in the syllabus as a bit of a wild-card paper.

An Architecture for Internet Data Transfer

This paper introduces an extensible data-oriented transfer (DOT) service that decouples application-specific content negotiation from generic data transfers. One motivation behind the DOT service is to optimize bandwidth usage by minimizing control overhead (e.g., from HTTP or SMTP messages). Moreover, this modular architecture not only eliminates the need for re-implementation for each new service but also enables more focused efforts to improve data transfer. The authors acknowledge that since the DOT service is essentially designed for bulk transfers, services that promise low round-trip latency (e.g., CDNs or Google web services) or those with small-sized transfers pose an inherent incompatibility issue. The solution offered here is to use traditional delivery mechanisms for such cases rather than compromise the advantages of DOT.

The concept proposed in this paper is already used in sectors of the Internet today; for example, real-time media can use a TCP-based control plane and RTP over UDP for bulk packet delivery. Even though several complications arise with the large-scale employment of a DOT-like service (many of them addressed in the paper), it seems worthy of further exploration nevertheless. The idea is not too far-fetched and somewhat feasible, even if not completely practical for the Internet.

Wednesday, November 12, 2008

X-Trace: A Pervasive Network Tracing Framework

This paper introduces X-Trace, a unified network tracing framework for that spans several protocol layers (as opposed to tcpdump, for example) and applications. The concept behind X-Trace debugging is essentially to carry X-trace meta-data in packets. Received meta-data at one node's layer will be forwarded to the next node at that layer and related packets (such as in a DNS response) will all carry meta-data so that they are effectively linked together in traces. Finally, a reporting infrastructure (with a website, etc.) is presented to ensure visibility of network trace data to all relevant parties.

This is a neat idea that could potentially be quite useful for today's large-scale distributed systems. That said, I'm not certain whether the packet overhead due to X-Trace is justified by its benefits. Some of the applications that are mentioned (tracing messages, ISP troubleshooting, etc.) make a lot of sense and this is definitely a concept worthy of additional investigation. The paper goes well with Paxson's study on packet dynamics and definitely belongs in the syllabus.