SIGCOMM 2008

AUGUST 17-22
SEATTLE, WA, USA

Routing
Floodless in SEATTLE: A Scalable Ethernet Architecture for Large Enterprises
Changhoon Kim (Princeton University); Matthew Caesar (Princeton University); Jennifer Rexford (Princeton University)

IP networks today require massive effort to configure and manage. Ethernet is vastly simpler to manage, but does not scale beyond small local area networks. This paper describes an alternative network architecture called SEATTLE that achieves the best of both worlds: The scalability of IP combined with the simplicity of Ethernet. SEATTLE provides plug-and-play functionality via flat addressing, while ensuring scalability and efficiency through shortest-path routing and hash-based resolution of host information. In contrast to previous work on identity-based routing, SEATTLE ensures path predictability and stability, and simplifies network management. We performed a simulation study driven by real-world traffic traces and network topologies, and used Emulab to evaluate a prototype of our design based on the Click and XORP open-source routing platforms. Our experiments show that SEATTLE efficiently handles network failures and host mobility, while reducing control overhead and state requirements by roughly two orders of magnitude compared with Ethernet bridging.

XL: An Efficient Network Routing Algorithm
Kirill Levchenko (UC San Diego); Geoffrey M. Voelker (UC San Diego); Ramamohan Paturi (UC San Diego); Stefan Savage (UC San Diego)

In this paper, we present a new link-state routing algorithm called Approximate Link state (XL) aimed at increasing routing efficiency by suppressing updates from parts of the network. We prove that three simple criteria for update propagation are sufficient to guarantee soundness, completeness and bounded optimality for any such algorithm. We show, via simulation, that XL significantly outperforms standard link-state and distance vector algorithms – in some cases reducing overhead by more than an order of magnitude – while having negligible impact on path length. Finally, we argue that existing link-state protocols, such as OSPF, can incorporate XL routing in a backwards compatible and incrementally deployable fashion.

Path Splicing
Murtaza Motiwala (Georgia Tech); Megan Elmore (Georgia Tech); Nick Feamster (Georgia Tech); Santosh Vempala (Georgia Tech)

We present path splicing, a new routing primitive that allows network paths to be constructed by combining multiple routing trees ("slices") to each destination over a single network topology. Path splicing allows traffic to switch trees at any hop en route to the destination. End systems can change the path on which traffic is forwarded by changing a small number of additional bits in the packet header. We evaluate path splicing for intradomain routing using slices generated from perturbed link weights and find that splicing achieves reliability that approaches the best possible using a small number of slices, for only a small increase in latency and no adverse effects on traffic in the network. In the case of interdomain routing, where splicing derives multiple trees from edges in alternate backup routes, path splicing achieves near-optimal reliability and can provide significant benefits even when only a fraction of ASes deploy it. We also describe several other applications of path splicing, as well as various possible deployment paths.

Shedding Light on the Glue Logic of Internet Routing Architecture
Franck Le (CMU); Geoffrey G. Xie (Naval Postgraduate School); Dan Pei (AT&T Labs - Research); Jia Wang (AT&T Labs - Research); Hui Zhang (CMU)

Recent studies reveal that the routing structures of operational networks are much more complex than a simple BGP/IGP hierarchy, highlighted by the presence of many distinct instances of routing protocols. However, the glue (how routing protocol instances interact and exchange routes among themselves) is still little understood or studied. For example, although Route Redistribution (RR), the implementation of the glue in router software, has been used in the Internet for more than a decade, it was only recently shown that RR is extremely vulnerable to anomalies similar to the permanent route oscillations in BGP. This paper takes an important step toward understanding how RR is used and how fundamental the role RR plays in practice. We developed a complete model and associated tools for characterizing interconnections between routing instances based on analysis of router configuration data. We analyzed and characterized the RR usage in more than 1600 operational networks. The findings are: (i) RR is indeed widely used; (ii) operators use RR to achieve important design objectives not realizable with existing routing protocols alone; (iii) RR configurations can be very diverse and complex. These empirical discoveries not only confirm that the RR glue constitutes a critical component of the current Internet routing architecture, but also emphasize the urgent need for more research to improve its safety and flexibility to support important design objectives.

Data Center Networking
A Policy-aware Switching Layer for Data Centers
Dilip A. Joseph (UC Berkeley); Arsalan Tavakoli (UC Berkeley); Ion Stoica (UC Berkeley)

Data centers deploy a variety of middleboxes (e.g., firewalls, load balancers and SSL offloaders) to protect, manage and improve the performance of applications and services they run. Since existing networks provide limited support for middleboxes, administrators typically overload path selection mechanisms to coerce traffic through the desired sequences of middleboxes placed on the network path. These ad-hoc practices result in a data center network that is hard to configure and maintain, wastes middlebox resources, and cannot guarantee middlebox traversal under network churn.

To address these issues, we propose the policy-aware switching layer or PLayer, a new layer-2 for data centers consisting of inter-connected policy-aware switches or pswitches. Unmodified middleboxes are placed off the network path by plugging them into pswitches. Based on policies specified by administrators, pswitches explicitly forward different types of traffic through different sequences of middleboxes. Experiments using our prototype software pswitches suggest that the PLayer is flexible, uses middleboxes efficiently, and guarantees correct middlebox traversal under churn.

A Scalable, Commodity Data Center Network Architecture
Mohammad Al-Fares (UC San Diego); Alexander Loukissas (UC San Diego); Amin Vahdat (UC San Diego)

Today's data centers may contain tens of thousands of computers with significant aggregate bandwidth requirements. The network architecture typically consists of a tree of routing and switching elements with progressively more specialized and expensive equipment moving up the network hierarchy. Unfortunately, even when deploying the highest-end IP switches/routers, resulting topologies may only support 50\% of the aggregate bandwidth available at the edge of the network, while still incurring tremendous cost. Non-uniform bandwidth among data center nodes complicates application design and limits overall system performance.

In this paper, we show how to leverage largely commodity Ethernet switches to support the full aggregate bandwidth of clusters consisting of tens of thousands of elements. Similar to how clusters of commodity computers have largely replaced more specialized SMPs and MPPs, we argue that appropriately architected and interconnected commodity switches may deliver more performance at less cost than available from today's higher-end solutions. Our approach requires no modifications to the end host network interface, operating system, or applications; critically, it is fully backward compatible with Ethernet, IP, and TCP.

DCell: A Scalable and Fault-Tolerant Network Structure for Data Centers
Chuanxiong Guo (Microsoft Research Asia); Haitao Wu (Microsoft Research Asia); Kun Tan (Microsoft Research Asia); Lei Shi (Tsinghua University); Yongguang Zhang (Microsoft Research Asia); Songwu Lu (UCLA)

A fundamental challenge in data center networking is how to efficiently interconnect an exponentially increasing number of servers. This paper presents DCell, a novel network structure that has many desirable features for data center networking. DCell is a recursively defined structure, in which a high-level DCell is constructed from many low-level DCells and DCells at the same level are fully connected with one another. DCell scales doubly exponentially as the node degree increases. DCell is fault tolerant since it does not have single point of failure and its distributed fault-tolerant routing protocol performs near shortest-path routing even in the presence of severe link or node failures. DCell also provides higher network capacity than the traditional tree-based structure for various types of services. Furthermore, DCell can be incrementally expanded and a partial DCell provides the same appealing features. Results from theoretical analysis, simulations, and experiments show that DCell is a viable interconnection structure for data centers.

Management
What's Going On? Learning Communication Rules In Edge Networks
Srikanth Kandula (MIT); Ranveer Chandra (Microsoft Research); Dina Katabi (MIT)

Existing traffic analysis tools focus on traffic volume. They identify the heavy-hitters---flows that exchange high volumes of data, yet fail to identify the structure implicit in network traffic — do certain flows happen before, after or along with each other repeatedly over time? Since most traffic is generated by applications (web browsing, email, p2p), network traffic tends to be governed by a set of underlying rules. Malicious traffic such as network-wide scans for vulnerable hosts (mySQLbot) also presents distinct patterns.

We present eXpose, a technique to learn the underlying rules that govern communication over a network. From packet timing information, eXpose learns rules for network communication that may be spread across multiple hosts, protocols or applications. Our key contribution is a novel statistical rule mining technique to extract significant communication patterns in a packet trace without explicitly being told what to look for. Going beyond rules involving flow pairs, eXpose introduces templates to systematically abstract away parts of flows thereby capturing rules that are otherwise unidentifiable. Deployments within our lab and within a large enterprise show that eXpose discovers rules that help with network monitoring, diagnosis, and intrusion detection with few false positives.

Answering What-If Deployment and Configuration Questions with WISE
Mukarram Tariq (Georgia Tech); Amgad Zeitoun (Google); Vytautas Valancius (Georgia Tech); Nick Feamster (Georgia Tech); Mostafa Ammar (Georgia Tech)

Designers of content distribution networks often need to determine how changes to infrastructure deployment and configuration affect service response times when they deploy a new data center, change ISP peering, or change the mapping of clients to servers. Today, the designers use coarse, back-of-the-envelope calculations, or costly field deployments; they need better ways to evaluate the effects of such hypothetical "what-if" questions before the actual deployments. This paper presents What-If Scenario Evaluator (WISE), a tool that predicts the effects of possible configuration and deployment changes in content distribution networks. WISE makes three contributions: (1) an algorithm that uses traces from existing deployments to learn causality among factors that affect service response-time distributions; (2) an algorithm that uses the learned causal structure to estimate a dataset that is representative of the hypothetical scenario that a designer may wish to evaluate, and uses these datasets to predict future response-time distributions; (3) a scenario specification language that allows a network designer to easily express hypothetical deployment scenarios without being cognizant of the dependencies between variables that affect service response times. Our evaluation, both in a controlled setting and in a real-world field deployment at a large, global CDN, shows that WISE can quickly and accurately predict service response-time distributions for many practical What-If scenarios.

Shadow Configuration as a Network Management Primitive
Richard Alimi (Yale University); Ye Wang (Yale University); Y. Richard Yang (Yale University)

Configurations for today's IP networks are becoming increasingly complex. As a result, configuration management is becoming a major cost factor for network providers and configuration errors are becoming a major cause of network disruptions. In this paper, we present and evaluate the novel idea of shadow configurations. Shadow configurations allow configuration evaluation before deployment and thus can reduce potential network disruptions. We demonstrate using real implementation that shadow configurations can be implemented with low overhead.

Network exception handlers: Host-network control in enterprise networks
Thomas Karagiannis (Microsoft Research); Richard Mortier (Vipadia Ltd); Antony Rowstron (Microsoft Research)

Enterprise network architecture and management have followed the Internets design principles despite different requirements and characteristics: enterprise hosts are administered by a single authority, which intrinsically assigns different values to traffic from different business applications.

We advocate a new approach where hosts are no longer relegated to the networks periphery, but actively participate in network-related decisions. To enable host participation, network information, such as dynamic network topology and per-link characteristics and costs, is exposed to the hosts, and network administrators specify conditions on the propagated network information that trigger actions to be performed while a condition holds. The combination of a condition and its actions embodies the concept of the network exception handler, defined analogous to a program exception handler. Conceptually, network exception handlers execute on hosts with actions parameterized by network and host state.

Network exception handlers allow hosts to participate in network management, traffic engineering and other operational decisions by explicitly controlling host traffic under predefined conditions. This flexibility improves overall performance by allowing efficient use of network resources. We outline several sample network exception handlers, present an architecture to support them, and evaluate them using data collected from our own enterprise network.

Wireless I
A Case for Adapting Channel Width in Wireless Networks
Ranveer Chandra (Microsoft Research); Ratul Mahajan (Microsoft Research); Thomas Moscibroda (Microsoft Research); Ramya Raghavendra (UC Santa Barbara); Victor Bahl (Microsoft Research)

We study a fundamental yet under-explored facet in wireless communication – the width of the spectrum over which transmitters spread their signals, or the channel width. Through detailed measurements in controlled and live environments, and using only commodity 802.11 hardware, we first quantify the impact of channel width on throughput, range, and power consumption. Taken together, our findings make a strong case for wireless systems that adapt channel width. Such adaptation brings unique benefits. For instance, when the throughput required is low, moving to a narrower channel increases range and reduces power consumption; in fixed-width systems, these two quantities are always in conflict.

We then present a channel width adaptation algorithm, called SampleWidth, for the base case of two communicating nodes. This algorithm is based on a simple search process that builds on top of existing techniques for adapting modulation. Per specified policy, it can maximize throughput or minimize power consumption. Evaluation using a prototype implementation shows that SampleWidth correctly identities the optimal width under a range of scenarios. In our experiments with mobility, it increases throughput by more than 60% compared to the best fixed-width configuration.

Learning to Share: Narrowband-Friendly Wideband Wireless Networks
Hariharan Rahul (MIT); Nate Kushman (MIT); Dina Katabi (MIT); Charles Sodini (MIT); Farinaz Edalat (MIT)

Wideband technologies in the unlicensed spectrum can satisfy the ever-increasing demands for wireless bandwidth created by emerging rich media applications. The key challenge for such systems, however, is to allow narrowband technologies that share these bands (say, 802.11 a/b/g/n, Zigbee) to achieve their normal performance, without compromising the throughput or range of the wideband network.

This paper presents SWIFT, the first system where high throughput wideband nodes are shown in a working deployment to coexist with unknown narrowband devices, while forming a network of their own. Prior work avoids narrowband devices by operating below the noise level and limiting itself to a single contiguous unused band. While this achieves coexistence, it sacrifices the throughput and operating distance of the wideband device. In contrast, SWIFT creates high-throughput wireless links by weaving together noncontiguous unused frequency bands that change as narrowband devices enter or leave the environment. This design allows SWIFT to achieve coexistence, while operating at normal power, and thereby obtaining higher throughput and greater operating range. We implement SWIFT on a wideband hardware platform, and evaluate it in the presence of 802.11 devices. In comparison to a baseline that coexists with narrowband devices by operating below their noise level, SWIFT is equally narrowband-friendly but achieves 2 9x higher throughput and 10x greater range.

ZigZag Decoding: Combating Hidden Terminals in Wireless Networks
Shyamnath Gollakota (MIT); Dina Katabi (MIT)

This paper presents ZigZag, an 802.11 receiver design that combats hidden terminals. ZigZag's core contribution is a new form of interference cancellation that exploits asynchrony across successive collisions. Specifically, 802.11 retransmissions, in the case of hidden terminals, cause successive collisions. These collisions have different interference-free stretches at their start, which ZigZag exploits to bootstrap its decoding.

ZigZag makes no changes to the 802.11 MAC and introduces no overhead when there are no collisions. But, when senders collide, ZigZag attains the same throughput as if the colliding packets were a priori scheduled in separate time slots. We build a prototype of ZigZag in GNU Radio. In a testbed of 14 USRP nodes, ZigZag reduces the average packet loss rate at hidden terminals from 72.6% to about 0.7%.

Security I
Spamming Botnets: Signatures and Characteristics
Yinglian Xie (Microsoft Research Silicon Valley); Fang Yu (Microsoft Research Silicon Valley); Kannan Achan (Microsoft Research Silicon Valley); Rina Panigrahy (Microsoft Research Silicon Valley); Geoff Hulten (Microsoft Corporation); Ivan Osipkov (Microsoft Corporation)

In this paper, we focus on characterizing spamming botnets by leveraging both spam payload and spam server traffic properties. Towards this goal, we developed a spam signature generation framework called AutoRE to detect botnet-based spam emails and botnet membership. AutoRE does not require pre-classified training data or white lists. Moreover, it outputs high quality regular expression signatures that can detect botnet spam with a low false positive rate. Using a three-month sample of emails from Hotmail, AutoRE successfully identified 7,721 botnet-based spam campaigns together with 340,050 unique botnet host IP addresses.

Our in-depth analysis of the identified botnets revealed several interesting findings regarding the degree of email obfuscation, properties of botnet IP addresses, sending patterns, and their correlation with network scanning traffic. We believe these observations are useful information in the design of botnet detection schemes.

Enriching Network Security Analysis with Time Travel
Gregor Maier (TU Berlin / Deutsche Telekom Laboratories); Robin Sommer (ICSI / LBNL); Holger Dreger (Siemens AG); Anja Feldmann (TU Berlin / Deutsche Telekom Laboratories); Vern Paxson (ICSI / UC Berkeley); Fabian Schneider (TU Berlin / Deutsche Telekom Laboratories)

In many situations it can be enormously helpful to archive the raw contents of a network traffic stream to disk, to enable later inspection of activity that becomes interesting only in retrospect. We present a Time Machine (TM) for network traffic that provides such a capability. The TM leverages the heavy-tailed nature of network flows to capture nearly all of the likely-interesting traffic while storing only a small fraction of the total volume. An initial proof-of-principle prototype established the forensic value of such an approach, contributing to the investigation of numerous attacks at a site with thousands of users. Based on these experiences, a rearchitected implementation of the system provides flexible, highperformance traffic stream capture, indexing and retrieval, including an interface between the TM and a real-time network intrusion detection system (NIDS). The NIDS controls the TM by dynamically adjusting recording parameters, instructing it to permanently store suspicious activity for offline forensics, and fetching traffic from the past for retrospective analysis. We present a detailed performance evaluation of both stand-alone and joint setups, and report on experiences with running the system live in high-volume environments.

To Filter or to Authorize: Network-Layer DoS Defense Against Multimillion-node Botnets
Xin Liu (UC Irvine); Xiaowei Yang (UC Irvine); Yanbin Lu (UC Irvine)

This paper presents the design and implementation of a filter-based DoS defense system (StopIt) and a comparison study on the effectiveness of filters and capabilities. Central to the StopIt design is a novel closed-control, open-service architecture: any receiver can use StopIt to block the undesired traffic it receives, yet the design is robust to various strategic attacks from millions of bots, including filter exhaustion attacks and bandwidth flooding attacks that aim to disrupt the timely installation of filters. Our evaluation shows that StopIt can block the attack traffic from a few millions of attackers within tens of minutes with bounded router memory. We compare StopIt with existing filter-based and capability-based DoS defense systems under simulated DoS attacks of various types and scales. Our results show that StopIt outperforms existing filter-based systems, and can prevent legitimate communications from being disrupted by various DoS flooding attacks. It also outperforms capability-based systems in most attack scenarios, but a capability-based system is more effective in a type of attack that the attack traffic does not reach a victim, but congests a link shared by the victim. These results suggest that both filters and capabilities are highly effective DoS defense mechanisms, but neither is more effective than the other in all types of DoS attacks.

Router Primitives
Deflating the Big Bang: Fast and Scalable Deep Packet Inspection with Extended Finite Automata
Randy Smith (UW Madison); Cristian Estan (UW Madison); Somesh Jha (UW Madison); Shijin Kong (UW Madison)

Deep packet inspection is playing an increasingly important role in the design of novel network services. Regular expressions are the language of choice for writing signatures, but standard DFA or NFA representations are unsuitable for high-speed environments, requiring too much memory, too much time, or too much per-flow state. DFAs are fast and can be readily combined, but doing so often leads to state-space explosion. NFAs, while small, require large per-flow state and are slow.

We propose a solution that simultaneously addresses all these problems. We start with a first-principles characterization of state-space explosion and give conditions that eliminate it when satisfied. We show how auxiliary variables can be used to transform automata so that they satisfy these conditions, which we codify in a formal model that augments DFAs with auxiliary variables and simple instructions for manipulating them. Building on this model, we present techniques, inspired by principles used in compiler optimization, that systematically reduce runtime and per-flow state. In our experiments, signature sets from Snort and Cisco Systems achieve state-space reductions of over four orders of magnitude, per-flow state reductions of up to a factor of eight, and runtimes that approach DFAs.

Packet Caches on Routers: The Implications of Universal Redundant Traffic Elimination
Ashok Anand (UW Madison); Archit Gupta (UW Madison); Aditya Akella (UW Madison); Srinivasan Seshan (CMU); Scott Shenker (UC Berkeley)

Many past systems have explored how to eliminate redundant transfers from network links and improve network efficiency. Several of these systems operate at the application layer, while the more recent systems operate on individual packets. A common aspect of these systems is that they apply to localized settings, e.g. at stub network access links. In this paper, we explore the benefits of deploying packet-level redundant content elimination as a universal primitive on all Internet routers. Such a universal deployment would immediately reduce link loads everywhere. However, we argue that far more significant network-wide benefits can be derived by redesigning network routing protocols to leverage the universal deployment. We develop ``redundancy-aware'' intra- and inter-domain routing algorithms and show that they enable better traffic engineering, reduce link usage costs, and enhance ISPs' responsiveness to traffic variations. In particular, employing redundancy elimination approaches across redundancy-aware routes can lower intra and inter-domain link loads by 10-50\%. We also address key challenges that may hinder implementation of redundancy elimination on fast routers. Our current software router implementation can run at OC48 speeds.

Virtual Routers on the Move: Live Router Migration as a Network-Management Primitive
Yi Wang (Princeton University); Eric Keller (Princeton University); Brian Biskeborn (Princeton University); Jacobus van der Merwe (AT&T Labs - Research); Jennifer Rexford (Princeton University)

The complexity of network management is widely recognized as one of the biggest challenges facing the Internet today. Point solutions for individual problems further increase system complexity while not addressing the underlying causes. In this paper, we argue that many network-management problems stem from the same root cause — the need to maintain consistency between the physical and logical configuration of the routers. Hence, we propose VROOM (Virtual ROuters On the Move), a new network-management primitive that avoids unnecessary changes to the logical topology by allowing (virtual) routers to freely move from one physical node to another. In addition to simplifying existing network-management tasks like planned maintenance and service deployment, VROOM can also help tackle emerging challenges such as reducing energy consumption. We present the design, implementation, and evaluation of novel migration techniques for virtual routers with either hardware or software data planes. Our evaluation shows that VROOM is transparent to routing protocols and results in no performance impact on the data traffic when a hardware-based data plane is used.

Incentives
BitTorrent is an Auction: Analyzing and Improving BitTorrent's Incentives
Dave Levin (University of Maryland); Katrina LaCurts (University of Maryland); Neil Spring (University of Maryland); Bobby Bhattacharjee (University of Maryland)

Incentives play a crucial role in BitTorrent, motivating users to upload to others to achieve fast download times for all peers. Though long believed to be robust to strategic manipulation, recent work has empirically shown that BitTorrent does not provide its users incentive to follow the protocol. We propose an auction-based model to study and improve upon BitTorrent's incentives. The insight behind our model is that BitTorrent uses, not tit-for-tat as widely believed, but an auction to decide which peers to serve. Our model not only captures known, performance-improving strategies, it shapes our thinking toward new, effective strategies. For example, our analysis demonstrates, counter-intuitively, that BitTorrent peers have incentive to intelligently under-report what pieces of the file they have to their neighbors. We implement and evaluate a modification to BitTorrent in which peers reward one another with proportional shares of bandwidth. Within our game-theoretic model, we prove that a proportional-share client is strategy-proof. With experiments on PlanetLab, a local cluster, and live downloads, we show that a proportional-share unchoker yields faster downloads against BitTorrent and BitTyrant clients, and that under-reporting pieces yields prolonged neighbor interest.

RD Network Services: Differentiation through Performance Incentives
Maxim Podlesny (Washington University in St. Louis); Sergey Gorinsky (Washington University in St. Louis)

With the Internet offering a single best-effort service, there have been numerous proposals of diversified network services that align better with the divergent needs of different distributed applications. The failure of these innovative architectures to gain wide deployment is primarily due to economic and legacy issues, rather than technical shortcomings. We propose a new paradigm for network service differentiation where design principles account explicitly for the multiplicity of Internet service providers and users as well as their economic interests in environments with partly deployed new services. Our key idea is to base the service differentiation on performance itself, rather than price. The proposed RD (Rate-Delay) services enable a user to choose between a higher transmission rate or low queuing delay at a congested network link. An RD router supports the two services by maintaining two queues per output link and achieves the intended rate-delay differentiation through simple link scheduling and dynamic buffer sizing. After analytically deriving specific rules for RD router operation, we conduct extensive simulations that confirm effectiveness of the RD services geared for incremental deployment in the Internet.

Rationality and Traffic Attraction: Incentives for Honest Path Announcements in BGP
Sharon Goldberg (Princeton University); Shai Halevi (IBM T. J. Watson Research); Aaron D. Jaggard (Rutgers University); Vijay Ramachandran (Colgate University); Rebecca N. Wright (Rutgers University)

We study situations in which autonomous systems (ASes) may have incentives to send BGP announcements differing from the AS-level paths that packets traverse in the data plane. Prior work on this issue assumed that ASes seek only to obtain the best possible outgoing path for their traffic. In reality, other factors can influence a rational AS's behavior. Here we consider a more natural model, in which an AS is also interested in attracting incoming traffic (e.g., because other ASes pay it to carry their traffic). We ask what combinations of BGP enhancements and restrictions on routing policies can ensure that ASes have no incentive to lie about their data-plane paths. We find that protocols like S-BGP alone are insufficient, but that S-BGP does suffice if coupled with additional (quite unrealistic) restrictions on routing policies. Our game-theoretic analysis illustrates the high cost of ensuring that the ASes honestly announce data-plane paths in their BGP path announcements.

Measurement
Unconstrained Endpoint Profiling (Googling the Internet)
Ionut Trestian (Northwestern University); Supranamaya Ranjan (Narus Inc.); Aleksandar Kuzmanovic (Northwestern University); Antonio Nucci (Narus Inc.)

Understanding Internet access trends at a global scale, i.e., what do people do on the Internet, is a challenging problem that is typically addressed by analyzing network traces. However, obtaining such traces presents its own set of challenges owing to either privacy concerns or to other operational difficulties. The key hypothesis of our work here is that most of the information needed to profile the Internet endpoints is already available around us on the web. In this paper, we introduce a novel approach for profiling and classifying endpoints. We implement and deploy a Google-based profiling tool, which accurately characterizes endpoint behavior by collecting and strategically combining information freely available on the web. Our unconstrained endpoint profiling approach shows remarkable advances in the following scenarios: (i) Even when no packet traces are available, it can accurately predict application and protocol usage trends at arbitrary networks; (ii) When network traces are available, it dramatically outperforms state-of-the-art classification tools; (iii) When sampled flow-level traces are available, it retains high classification capabilities when other schemes literally fall apart. Using this approach, we perform unconstrained endpoint profiling at a global scale: for clients in four different world regions (Asia, South and North America and Europe). We provide the first-of-its-kind endpoint analysis which reveals fascinating similarities and differences among these regions.

Network Discovery from Passive Measurements
Brian Eriksson (UW Madison); Paul Barford (UW Madison); Robert Nowak (UW Madison)

Understanding the Internet's structure through empirical measurements is important in the development of new topology generators, new protocols, traffic engineering, and troubleshooting, among other things. While prior studies of Internet topology have been based on active (traceroute-like) measurements, passive measurements of packet traffic offer the possibility of a greatly expanded perspective of Internet structure with much lower impact and management overhead. In this paper we describe a methodology for inferring network structure from passive measurements of IP packet traffic. We describe algorithms that enable 1) traffic sources that share network paths to be clustered accurately without relying on IP address or autonomous system information, 2) topological structure to be inferred accurately with only a small number of active measurements, 3) missing information to be recovered, which is a serious challenge in the use of passive packet measurements. We demonstrate our techniques using a series of simulated topologies and empirical data sets. Our experiments show that the clusters established by our method closely correspond to sources that actually share paths. We also show the trade-offs between selectively applied active probes and the accuracy of the inferred topology between sources. Finally, we characterize the degree to which missing information can be recovered from passive measurements, which further enhances the accuracy of the inferred topologies.

DisCarte: A Disjunctive Internet Cartographer
Rob Sherwood (University of Maryland); Adam Bender (University of Maryland); Neil Spring (University of Maryland)

Internet topology discovery consists of inferring the inter-router connectivity ("links") and the mapping from IP addresses to routers ("alias resolution"). Current topology discovery techniques use TTL-limited ``traceroute'' probes to discover links and use direct router probing to resolve aliases. The often-ignored record route (RR) IP option provides a source of disparate topology data that could augment existing techniques, but it is difficult to properly align with traceroute-based topologies because router RR implementations are under-standardized. Correctly aligned RR and traceroute topologies have fewer false links, include anonymous and hidden routers, and discover aliases for routers that do not respond to direct probing. More accurate and feature-rich topologies benefit overlay construction and network diagnostics, modeling, and measurement.

We present DisCarte, a system for aligning and cross-validating RR and traceroute topology data using observed engineering practices. DisCarte uses disjunctive logic programming (DLP), a logical inference and constraint solving technique, to intelligently merge RR and traceroute data. We demonstrate that the resultant topology is more accurate and complete than previous techniques by validating its internal consistency and by comparing to publicly-available topologies. We classify irregularities in router implementations and introduce a divide-and-conquer technique used to scale DLP to Internet-sized systems.

SatelliteLab: Adding Heterogeneity to Planetary-Scale Testbeds
Marcel Dischinger (MPI-SWS); Andreas Haeberlen (MPI-SWS); Ivan Beschastnikh (University of Washington); Krishna Gummadi (MPI-SWS); Stefan Saroiu (University of Toronto)

Planetary-scale network testbeds like PlanetLab and RON have become indispensable for evaluating prototypes of distributed systems under realistic Internet conditions. However, current testbeds lack the heterogeneity that characterizes the commercial Internet. For example, most testbed nodes are connected to well-provisioned research networks, whereas most Internet nodes are in edge networks.

In this paper, we present the design, implementation, and evaluation of SatelliteLab, a testbed that includes nodes from a diverse set of Internet edge networks. SatelliteLab has a two-tier architecture, in which well-provisioned nodes called planets form the core, and lightweight nodes called satellites connect to the planets from the periphery. The application code of an experiment runs on the planets, whereas the satellites only forward network traffic. Thus, the traffic is subjected to the network conditions of the satellites, which greatly improves the testbed's network heterogeneity. The separation of code execution and traffic forwarding enables satellites to remain lightweight, which lowers the barrier to entry for Internet edge nodes.

Our prototype of SatelliteLab uses PlanetLab nodes as planets and a set of 32 volunteered satellites with diverse network characteristics. These satellites consist of desktops, laptops, and handhelds connected to the Internet via cable, DSL, ISDN, Wi-Fi, Bluetooth, and cellular links. We evaluate SatelliteLab's design, and we demonstrate the benefits of evaluating applications on SatelliteLab.

Security II
iSPY: Detecting IP Prefix Hijacking on My Own
Zheng Zhang (Purdue University); Ying Zhang (University of Michigan); Y. Charlie Hu (Purdue University); Z. Morley Mao (University of Michigan); Randy Bush (IIJ)

IP prefix hijacking remains a major threat to the security of the Internet routing system due to a lack of authoritative prefix ownership information. Despite many efforts in designing IP prefix hijack detection schemes, no existing design can satisfy all the critical requirements of a truly effective system: real-time, accurate, light-weight, easily and incrementally deployable, as well as robust in victim notification. In this paper, we present a novel approach that fulfills all these goals by monitoring network reachability from key external transit networks to one's own network through lightweight prefix-owner-based active probing. Using the prefix-owner's view of reachability, our detection system, iSPY, can differentiate between IP prefix hijacking and network failures based on the observation that hijacking is likely to result in topologically more diverse polluted networks and unreachability. Through detailed simulations of Internet routing, 25-day deployment in 88 ASes (108 prefixes), and experiments with hijacking events of our own prefix from multiple locations, we demonstrate that iSPY is accurate with false negative ratio below 0.45% and false positive ratio below 0.17%. Furthermore, iSPY is truly real-time; it can detect hijacking events within a few minutes.

Accountable Internet Protocol (AIP)
David G. Andersen (CMU); Hari Balakrishnan (MIT); Nick Feamster (Georgia Tech); Teemu Koponen (ICSI / HIIT); Daekyeong Moon (UC Berkeley); Scott Shenker (UC Berkeley)

This paper presents AIP (Accountable Internet Protocol), a network architecture that provides accountability as a first-order property. AIP uses a hierarchy of self-certifying addresses, in which each component is derived from the public key of the corresponding entity. We discuss how AIP enables simple solutions to source spoofing, denial-of-service, route hijacking, and route forgery. We also discuss how AIP's design meets the challenges of scaling, key management, and traffic engineering.

P2P
P4P: Provider Portal for Applications
Haiyong Xie (Yale University); Y. Richard Yang (Yale University); Arvind Krishnamurthy (University of Washington); Yanbin Liu (IBM T. J. Watson Research); Abraham Silberschatz (Yale University)

As peer-to-peer (P2P) emerges as a major paradigm for scalable network application design, it also exposes significant new challenges in achieving efficient and fair utilization of Internet network resources. Being largely network-oblivious, many P2P applications may lead to inefficient network resource usage and/or low application performance. In this paper, we propose a simple architecture called P4P to allow for more effective cooperative traffic control between applications and network providers. We conducted extensive simulations and real-life experiments on the Internet to demonstrate the feasibility and effectiveness of P4P. Our experiments demonstrated that P4P either improves or maintains the same level of application performance of native P2P applications, while, at the same time, it substantially reduces network provider cost compared with either native or latency-based localized P2P applications.

Taming the Torrent: A Practical Approach to Reducing Cross-ISP Traffic in P2P Systems
David R. Choffnes (Northwestern University); Fabián E. Bustamante (Northwestern University)

Peer-to-peer (P2P) systems, which provide a variety of popular services, such as file sharing, video streaming and voice-over-IP, contribute a significant portion of today's Internet traffic. By building overlay networks that are oblivious to the underlying Internet topology and routing, these systems have become one of the greatest traffic-engineering challenges for Internet Service Providers (ISPs) and the source of costly data traffic flows. In an attempt to reduce these operational costs, ISPs have tried to shape, block or otherwise limit P2P traffic, much to the chagrin of their subscribers, who consistently finds ways to eschew these controls or simply switch providers.

In this paper, we present the design, deployment and evaluation of an approach to reducing this costly cross-ISP traffic without sacrificing system performance. Our approach recycles network views gathered at low cost from content distribution networks to drive biased neighbor selection without any path monitoring or probing. Using results collected from a deployment in BitTorrent with over 120,000 users in nearly 3,000 networks, we show that our lightweight approach significantly reduces cross-ISP traffic and, over 33% of the time, it selects peers along paths that are within a single autonomous system (AS). Further, we find that our system locates peers along paths that have two orders of magnitude lower latency and 30% lower loss rates than those picked at random, and that these high-quality paths can lead to significant improvements in transfer rates. In challenged settings where peers are overloaded in terms of available bandwidth, our approach provides 31% average download-rate improvement; in environments with large available bandwidth, it increases download rates by 207% on average (and improves median rates by 883%).

Challenges, Design and Analysis of a Large-scale P2P VoD System
Yan Huang (Shanghai Synacast Media Tech); Tom Z. J. Fu (Chinese University of Hong Kong); Dah Ming Chiu (Chinese University of Hong Kong); John C. S. Lui (Chinese University of Hong Kong); Cheng Huang (Shanghai Synacast Media Tech)

P2P file downloading and streaming have already become very popular Internet applications. These systems dramatically reduce the server loading, and provide a platform for scalable content distribution, as long as there is interest for the content. P2P-based video-on-demand (P2P-VoD) is a new challenge for the P2P technology. Unlike streaming live content, P2P-VoD has less synchrony in the users sharing video content, therefore it is much more difficult to alleviate the server loading and at the same time maintaining the streaming performance. To compensate, a small storage is contributed by every peer, and new mechanisms for coordinating content replication, content discovery, and peer scheduling are carefully designed. In this paper, we describe and discuss the challenges and the architectural design issues of a large-scale P2P-VoD system based on the experiences of a real system deployed by PPLive. The system is also designed and instrumented with monitoring capability to measure both system and component specific performance metrics (for design improvements) as well as user satisfaction. After analyzing a large amount of collected data, we present a number of results on user behavior, various system performance metrics, including user satisfaction, and discuss what we observe based on the system design. The study of a real life system provides valuable insights for the future development of P2P-VoD technology.

Donnybrook: Enabling Large-Scale, High-Speed, Peer-to-Peer Games
Ashwin Bharambe (CMU); John R. Douceur (Microsoft Research); Jacob R. Lorch (Microsoft Research); Thomas Moscibroda (Microsoft Research); Jeffrey Pang (CMU); Srinivasan Seshan (CMU); Xinyu Zhuang (CMU)

Without well-provisioned dedicated servers, modern fast-paced action games limit the number of players who can interact simultaneously to 16-32. This is because interacting players must frequently exchange state updates, and high player counts would exceed the bandwidth available to participating machines. In this paper, we describe Donnybrook, a system that enables epic-scale battles without dedicated server resources, even in a fast-paced game with tight latency bounds. It achieves this scalability through two novel components. First, it reduces bandwidth demand by estimating what players are paying attention to, thereby enabling it to reduce the frequency of sending less important state updates. Second, it overcomes resource and interest heterogeneity by disseminating updates via a multicast system designed for the special requirements of games: that they have multiple sources, are latency-sensitive, and have frequent group membership changes. We present user study results using a prototype implementation based on Quake III that show our approach provides a desirable user experience. We also present simulation results that demonstrate Donnybrook's efficacy in enabling battles of up to 900 players.

Wireless II
Symbol-Level Network Coding for Wireless Mesh Networks
Sachin Katti (MIT); Dina Katabi (MIT); Hari Balakrishnan (MIT); Muriel Medard (MIT)

This paper describes MIXIT, a system that improves the throughput of wireless mesh networks. MIXIT exploits a basic property of mesh networks: even when no node receives a packet correctly, any given bit is likely to be received by some node correctly. Instead of insisting on forwarding only correct packets, MIXIT routers use physical layer hints to make their best guess about which bits in a corrupted packet are likely to be correct and forward them to the destination. Even though this approach inevitably lets erroneous bits through, we find that it can achieve high throughput without compromising end-to-end reliability.

The core component of MIXIT is a novel network code that operates on small groups of bits, called symbols. It allows the nodes to opportunistically route groups of bits to their destination with low overhead. MIXIT's network code also incorporates an end-to-end error correction component that the destination uses to correct any errors that might seep through. We have implemented MIXIT on a software radio platform running the Zigbee radio protocol. Our experiments on a 25-node indoor testbed show that MIXIT has a throughput gain of 2.8x over MORE, a state-of-the-art opportunistic routing scheme, and about 3.9x over traditional routing using the ETX metric.

Predictable Performance Optimization for Wireless Networks
Yi Li (UT Austin); Lili Qiu (UT Austin); Yin Zhang (UT Austin); Ratul Mahajan (Microsoft Research); Eric Rozner (UT Austin)

We present a novel approach to optimize the performance of IEEE 802.11-based multi-hop wireless networks. A unique feature of our approach is that it enables an accurate prediction of the resulting throughput of individual flows. At its heart lies a simple yet realistic model of the network that captures interference, traffic, and MAC-induced dependencies. Unless properly accounted for, these dependencies lead to unpredictable behaviors. For instance, we show that even a simple network of two links with one flow is vulnerable to severe performance degradation. We design algorithms that build on this model to optimize the network for fairness and throughput. Given traffic demands as input, these algorithms compute rates at which individual flows must send to meet the objective. Evaluation using a multi-hop wireless testbed as well as simulations show that our approach is very effective. When optimizing for fairness, our methods result in close to perfect fairness. When optimizing for throughput, they lead to 100-200% improvement for UDP traffic and 10-50% for TCP traffic.

Interactive WiFi Connectivity for Moving Vehicles
Aruna Balasubramanian (University of Massachussetts); Ratul Mahajan (Microsoft Research); Arun Venkataramani (University of Massachussetts); Brian Neil Levine (University of Massachussetts); John Zahorjan (University Of Washington)

We ask if the ubiquity of WiFi can be leveraged to provide cheap connectivity from moving vehicles for common applications such as Web browsing and VoIP. Driven by this question, we conduct a study of connection quality available to vehicular WiFi clients based on measurements from testbeds in two different cities. We find that current WiFi handoff methods, in which clients communicate with one basestation at a time, lead to frequent disruptions in connectivity. We also find that clients can overcome many disruptions by communicating with multiple basestations simultaneously. These findings lead us to develop ViFi, a protocol that opportunistically exploits basestation diversity to minimize disruptions and support interactive applications for mobile clients. ViFi uses a decentralized and lightweight probabilistic algorithm for coordination between participating basestations. Our evaluation using a two-month long deployment and trace-driven simulations shows that its link-layer performance comes close to an ideal diversity-based protocol. Using two applications, VoIP and short TCP transfers, we show that the link layer performance improvement translates to better application performance. In our deployment, ViFi doubles the number of successful short TCP transfers and doubles the length of disruption-free VoIP sessions compared to an existing WiFi-style handoff protocol.