SIGCOMM 2004 Conference Papers

Papers Index

A First-Principles Approach to Understanding the Internet's Router-level Topology
Vivaldi: A Decentralized Network Coordinate System
Routing Design in Operational Networks: A Look from the Inside
Locating Internet Bottlenecks: Algorithms, Measurements, and Implications
An Algebraic Approach to Practical and Scalable Overlay Network Monitoring
CapProbe: A Simple and Accurate Capacity Estimation Technique
Optimizing Cost and Performance for Multihoming
A Comparison of Overlay Routing and Multihoming Route Control
The Feasibility of Supporting Large-Scale Live Streaming Applications with Dynamic Application End-Points
Link-level Measurements from an 802.11b Mesh Network
Comparison of Routing Metrics for Static Multi-Hop Wireless Networks
Routing in a Delay Tolerant Network
Turning the Postal System into a Generic Digital Communication Mechanism
A System for Authenticated Policy-Compliant Routing
SPV: Secure Path Vector Routing for Securing BGP
Shield: Vulnerability-Driven Network Filters for Preventing Known Vulnerability Exploits
Locating Internet Routing Instabilities
Diagnosing Network-Wide Traffic Anomalies
Network Sensitivity to Hot-Potato Disruptions
Building a better NetFlow
Work-Conserving Distributed Schedulers for Terabit Routers
Exact GPS Simulation with Logarithmic Complexity, and its Application to an Optimally Fair Scheduler
Sizing Router Buffers
A Wavelet-Based Approach to Detect Shared Congestion
Delayed Stability and Performance of Distributed Congestion Control
Impact of Configuration Errors on DNS Robustness
The Design and Implementation of a Next Generation Name Service for the Internet
A Layered Naming Architecture for the Internet
Mercury: Supporting Scalable Multi-Attribute Range Queries
Modeling and Performance Analysis of BitTorrent-Like Peer-to-Peer Networks
A Scalable Distributed Information Management System

A First-Principles Approach to Understanding the Internet's Router-level Topology

Lun Li (CalTech), David Alderson (CalTech), Walter Willinger (AT&T Labs--Research), John Doyle (CalTech)

Abstract:

A detailed understanding of the many facets of the Internet's topological structure is critical for evaluating the performance of networking protocols, for assessing the effectiveness of proposed techniques to protect the network from nefarious intrusions and attacks, or for developing improved designs for resource provisioning. Previous studies of topology have focused on interpreting measurements or on phenomenological descriptions and evaluation of graph-theoretic properties of topology generators. We propose a complementary approach of combining a more subtle use of statistics and graph theory with a first-principles theory of router-level topology that reflects practical constraints and tradeoffs. While there is an inevitable tradeoff between model complexity and fidelity, a challenge is to distill from the seemingly endless list of potentially relevant technological and economic issues the features that are most essential to a solid understanding of the intrinsic fundamentals of network topology. We claim that very simple models that incorporate hard technological constraints on router and link bandwidth and connectivity, together with abstract models of user demand and network performance, can successfully address this challenge and further resolve much of the confusion and controversy that has surrounded topology generation and evaluation.


This paper is available in Adobe PDF format.	TOP

Vivaldi: A Decentralized Network Coordinate System

Frank Dabek (MIT), Russ Cox (MIT), Frans Kaashoek (MIT), Robert Morris (MIT)

Abstract:

Large-scale Internet applications can benefit from an ability to predict round-trip times to other hosts without having to contact them first. Explicit measurements are often unattractive because the cost of measurement can outweigh the benefits of exploiting proximity information. Vivaldi is a simple, light-weight algorithm that assigns synthetic coordinates to hosts such that the distance between the coordinates of two hosts accurately predicts the communication latency
between the hosts.

Vivaldi is fully distributed, requiring no fixed network infrastructure and no distinguished hosts. It is also efficient: a new host can compute good coordinates for itself after collecting latency information from only a few other hosts. Because it requires little communication, Vivaldi can piggy-back on the communication patterns of the application using it and scale to a large number of hosts.

An evaluation of Vivaldi using a simulated network whose latencies are based on measurements among 1740 Internet hosts shows that a 2-dimensional Euclidean model with height vectors embeds these hosts with low error (the median relative error in round-trip time prediction is 11 percent).


This paper is available in Adobe PDF format.	TOP

Routing Design in Operational Networks: A Look from the Inside

David Maltz (CMU), Geoff Xie (CMU), Jibin Zhan (CMU), Hui Zhang (CMU), Gisli Hjalmtysson (AT&T Labs--Research), Albert Greenberg (AT&T Labs--Research)

Abstract:

In any IP network, routing protocols provide the intelligence that takes a collection of physical links and transforms them into a network that enables packets to travel from one host to another. Though routing design is arguably the single most important design task for large IP networks, there has been very little systematic investigation into how routing protocols are actually used in production networks to implement the goals of network architects. We have developed a methodology for reverse engineering a coherent global view of a network's routing design from the static analysis of dumps of the local configuration state of each router. Starting with a set of 8,035 configuration files, we have applied this method to 31 production networks. In this paper we
present a detailed examination of how routing protocols are used in operational networks. In particular, the results show the conventional model of interior and exterior gateway protocols is insufficient to describe the diverse set of mechanisms used by architects. We provide examples of the more unusual designs and examine their trade-offs. We discuss the strengths and weaknesses of our methodology, and argue that it opens paths towards new understandings of network behavior and design.


This paper is available in Adobe PDF format.	TOP

Locating Internet Bottlenecks: Algorithms, Measurements and Implications

Ningning Hu (CMU), Li Erran Li (Bell Laboratories), Zhuoqing Morley Mao (U. Michigan), Peter Steenkiste (CMU), Jia Wang (AT&T Labs--Research)

Abstract:

The ability to locate network bottlenecks along end-to-end paths on the Internet is of great interest to both network operators and researchers. For example, knowing where bottleneck links are, network operators can apply traffic engineering either at the interdomain or intradomain level to improve routing. Existing tools either fail to identify the location of bottlenecks, or generate a large amount of probing packets. In addition, they often require access to both end points. In this paper we present Pathneck, a tool that allows end users to efficiently and accurately locate the bottleneck link on an Internet path. Pathneck is based on a novel probing technique called Recursive Packet Train (RPT) and does not require access to the destination. We evaluate Pathneck using wide area Internet experiments and trace-driven emulation. In
addition, we present the results of an extensive study on bottlenecks in the Internet using carefully selected, geographically diverse probing sources and destinations. We found that Pathneck can successfully detect bottlenecks for almost 80% of the Internet paths we probed. We also report our success in using the bottleneck location and bandwidth bounds provided by Pathneck to infer bottlenecks and to avoid bottlenecks in multihoming and overlay routing.


This paper is available in Adobe PDF format.	TOP

An Algebraic Approach to Practical and Scalable Overlay Network Monitoring

Yan Chen (Northwestern University), David Bindel (UC Berkeley), Hanhee Song (UC Berkeley) , Randy H. Katz (UC Berkeley)

Abstract:

Overlay network monitoring enables distributed Internet applications to detect and recover from path outages and periods of degraded performance within seconds. For an overlay network
with n end hosts, existing systems either require O(n^2) measurements, and thus lack scalability, or can only estimate the latency but not congestion or failures. Our earlier extended
abstract briefly proposes an algebraic approach that selectively monitors k linearly independent paths that can fully describe all the O(n^2) paths. The loss rates and latency of these k paths can be used to estimate the loss rates and latency of all other paths. Our scheme only assumes knowledge of the underlying IP topology, with links dynamically varying between lossy and normal.

In this paper, we improve, implement and extensively evaluate such a monitoring system. We further make the following contributions: i) scalability analysis indicating that for reasonably large n (e.g., 100), the growth of k is bounded as O(n log n), ii) efficient adaptation algorithms for topology changes, such as the addition or removal of end hosts and routing changes, iii) measurement load balancing schemes, and iv) topology measurement error handling. Both simulation and Internet experiments demonstrate we obtain highly accurate path loss rate
estimation while adapting to topology changes within seconds and handling topology errors.


This paper is available in Adobe PDF format.	TOP

CapProbe: A Simple and Accurate Capacity Estimation Technique

Rohit Kapoor (Qualcomm), Ling-Jyh Chen (UCLA), Li Lao (UCLA), Mario Gerla (UCLA), M. Y. Sanadidi (UCLA)

Abstract:

We present a new capacity estimation technique, called CapProbe. CapProbe combines delay as well as dispersion measurements of packet pairs to filter out samples distorted by cross-traffic. CapProbe algorithms include convergence tests and convergence speed-up techniques by varying probing parameters. Our study of CapProbe includes a probability analysis to determine the time it takes CapProbe to converge on the average. Through simulations and
measurements, we found CapProbe to be quick and accurate across a wide range of traffic scenarios. We also compared CapProbe with two previous well-known techniques, pathchar and pathrate. We found CapProbe to be much more accurate than pathchar and similar in accuracy to pathrate, while providing faster estimation than both. Another advantage of CapProbe is its lower computation cost,since no statistical post processing of probing data is required.


This paper is available in Adobe PDF format.	TOP

Optimizing Cost and Performance for Multihoming

David K. Goldenberg (Yale), Lili Qiu (Microsoft Research), Haiyong Xie (Yale), Yang Richard Yang (Yale), Yin Zhang (AT&T Labs-Research)

Abstract:

Multihoming is often used by large enterprises and stub ISPs to connect to the Internet. In this paper, we design a series of novel smart routing algorithms to optimize cost and performance for multihomed users. We evaluate our algorithms through both analysis and extensive simulations based on realistic charging models, traffic demands, performance data, and network topologies. Our results suggest that these algorithms are very effective in minimizing cost and at the same time improving performance. We further examine the equilibrium performance of smart routing in a global setting and show that a smart routing user can improve its performance without adversely affecting other users.


This paper is available in Adobe PDF format.	TOP

A Comparison of Overlay Routing and Multihoming Route Control

Aditya Akella (CMU), Jeffrey Pang (CMU), Anees Shaikh (IBM Research), Bruce Maggs (CMU), Srinivasan Seshan (CMU)

Abstract:

The limitations of BGP routing in the Internet are often blamed for poor end-to-end performance and prolonged connectivity interruptions. Recent work advocates using overlays to effectively bypass BGP's path selection in order to improve performance and fault tolerance. In this paper, we explore the possibility that intelligent control of BGP routes, coupled with ISP multihoming, can provide competitive end-to-end performance and reliability. Using extensive measurements
of paths between nodes in a large content distribution network, we compare the relative benefits of overlay routing and multihoming route control in terms of round-trip latency, TCP connection throughput, and path availability. We observe that the performance achieved by route control together with multihoming to three ISPs (3-multihoming), is within 5-15% of overlay routing employed in conjunction 3-multihoming, in terms of both end-to-end RTT and throughput. We also show that while multihoming cannot offer the nearly perfect resilience of overlays, it can eliminate almost all failures experienced by a singly-homed end-network. Our results demonstrate that, by leveraging the capability of multihoming route control, it is not necessary to
circumvent BGP routing to extract good wide-area performance and availability from the existing routing system.
Click here for an errata for the paper (August 2004).


This paper is available in Adobe PDF format.	TOP

The Feasibility of Supporting Large-Scale Live Streaming Applications with Dynamic Application End-Points

Kunwadee Sripanidkulchai (CMU), Aditya Ganjam (CMU), Bruce Maggs (CMU), Hui Zhang (CMU)

Abstract:

While application end-point architectures have proven to be viable solutions for large-scale distributed applications such as distributed computing and file-sharing, there is little known about its feasibility for more bandwidth-demanding applications such as live streaming. Heterogeneity in bandwidth resources and dynamic group membership, inherent properties of application end-points, may adversely affect the construction of a usable and efficient overlay. At large scales, the problems become even more challenging. Even more challenging is attempting to do this at very large scales.

In this paper, we study one of the most prominent architectural issues in overlay multicast: the feasibility of supporting large-scale groups using an application end-point architecture. We look at three key requirements for feasibility: (i) are there enough resources to construct an overlay, (ii) can a stable and connected overlay be maintained in the presence of group dynamics, and (iii) can an efficient overlay be constructed? Using traces from a large content delivery network, we characterize the behavior of users watching live audio and video streams. We show that in many common real-world scenarios, all three requirements are satisfied. In addition, we evaluate the performance of several design alternatives and show that simple algorithms have the potential to meet these requirements in practice. Overall, our results argue for the feasibility of supporting large-scale live streaming using an application end-point architecture.


This paper is available in Adobe PDF format.	TOP

Link-level Measurements from an 802.11b Mesh Network

Daniel Aguayo (MIT), John Bicket (MIT), Sanjit Biswas (MIT), Glenn Judd (CMU), Robert Morris (MIT)

Abstract:

This paper analyzes the causes of packet loss in a 38-node urban multi-hop 802.11b network. The patterns and causes of loss are important in the design of routing and errorcorrection protocols, as well as in network planning. The paper makes the following observations. The distribution of inter-node loss rates is relatively uniform over the whole range of loss rates; there is no clear threshold separating “in range” and “out of range.” Most links have relatively stable loss rates from one second to the next, though a small minority have very bursty losses at that time scale. Signal-to-noise ratio and distance have little predictive value for loss rate. The large number of links with intermediate loss rates is probably due to multi-path fading rather than
attenuation or interference. The phenomena discussed here are all well-known. The contributions of this paper are an understanding of their relative importance, of how they interact, and of the implications for MAC and routing protocol design.


This paper is available in Adobe PDF format.	TOP

Comparison of Routing Metrics for Static Multi-Hop Wireless Networks

Richard Draves (Microsoft Research), Jitendra Padhye (Microsoft Research), Brian Zill (Microsoft Research)

Abstract:

Routing protocols for wireless ad hoc networks have traditionally focused on finding paths with minimum hop count. However, such paths can include slow or lossy links, leading to poor throughput. A routing algorithm can select better paths by explicitly taking the quality of the wireless links into account. In this paper, we conduct a detailed, empirical evaluation of the performance of three link-quality metrics---ETX, per-hop RTT, and per-hop packet pair---and compare them against minimum hop count. We study these metrics using a DSR-based routing protocol running over a wireless testbed. We find that the ETX metric has the best performance when all nodes are stationary. We also find that the per-hop RTT and per-hop packet-pair metrics perform poorly due to self-interference. Interestingly, the hop-count metric outperforms all
of the link-quality metrics in a scenario where the sender is mobile.


This paper is available in Adobe PDF format.	TOP

Routing in a Delay Tolerant Network

Sushant Jain (U. Washington), Kevin Fall (Intel Research), Rabin Patra (UC Berkeley)

Abstract:

We formulate the delay-tolerant networking routing problem, where messages are to be moved end-to-end across a connectivity graph that is time-varying but whose dynamics may be known in advance. The problem has the added constraints of finite buffers at each node and the general property that no contemporaneous end-to-end path may ever exist. This situation limits the applicability of traditional routing approaches that tend to treat outages as failures and seek to find an existing end-to-end path.

We propose a framework for evaluating routing algorithms in such environments. We then develop several algorithms and use simulations to compare their performance with respect to
the amount of knowledge they require about network topology. We find that, as expected, the algorithms using the least knowledge tend to perform poorly. We also find that with limited additional knowledge, far less than complete global knowledge, efficient algorithms can be constructed for routing in such environments. To the best of our knowledge this is the first such investigation of routing issues in DTNs.


This paper is available in Adobe PDF format.	TOP

Turning the Postal System into a Generic Digital Communication Mechanism

Randolph Y. Wang (Princeton), Sumeet Sobti (Princeton), Nitin Garg (Princeton), Elisha Ziskind (Princeton), Junwen Lai (Princeton), Arvind Krishnamurthy (Yale)

Abstract:

The phenomenon that rural residents and people with low incomes lag behind in Internet access is known as the ``digital divide.'' This problem is particularly acute in developing countries, where most of the world's population lives. Bridging this digital divide, especially by attempting to increase the accessibility of broadband connectivity,can be challenging. The improvement of wide-area connectivity is constrained by factors such as how quickly we can dig ditches to bury
fibers in the ground; and the cost of furnishing "last-mile" wiring can be prohibitively high.

In this paper, we explore the use of digital storage media transported by the postal system as a general digital communication mechanism. While some companies have used the postal system to deliver software and movies, none of them has turned the postal system into a truly generic digital communication medium supporting a wide variety of applications. We call such a generic
system a Postmanet. Compared to traditional wide-area connectivity options, the Postmanet has several important advantages, including wide global reach, great bandwidth potential and low cost.

Manually preparing mobile storage devices for shipment may appear deceptively simple, but with many applications, communicating parties and messages, manual management becomes infeasible, and systems support at several levels becomes necessary. We explore the simultaneous exploitation of the Internet and the Postmanet, so we can combine their latency and bandwidth advantages to enable sophisticated bandwidth-intensive applications.


This paper is available in Adobe PDF format.	TOP

A System for Authenticated Policy-Compliant Routing

Barath Raghavan (UCSD), Alex C. Snoeren (UCSD)

Abstract:

Internet end users and ISPs alike have little control over how packets are routed outside of their own AS, restricting their ability to achieve levels of performance, reliability, and utility that might
otherwise be attained. While researchers have proposed a number of source-routing techniques to combat this limitation, there has thus far been no way for independent ASes to ensure that such traffic does not circumvent local traffic policies, nor to accurately determine the
correct party to charge for forwarding the traffic.

We present Platypus, an authenticated source routing system built around the concept of network capabilities. Network capabilities allow for accountable, fine-grained path selection by
cryptographically attesting to policy compliance at each hop along a source route. Capabilities can be composed to construct routes through multiple ASes and can be delegated to third parties. Platypus caters to the needs of both end users and ISPs: users gain the ability to pool their resources and select routes other than the default, while ISPs maintain control over where, when, and whose packets traverse their networks. We describe how Platypus can be used to address several well-known issues in wide-area routing at both the edge and the core, and evaluate its performance, security, and interactions with existing protocols. Our results show that incremental deployment of Platypus can achieve immediate gains.


This paper is available in Adobe PDF format.	TOP

SPV: Secure Path Vector Routing for Securing BGP

Yih-Chun Hu (UC Berkeley), Adrian Perrig (CMU), Marvin Sirbu (CMU)

Abstract:

As our economy and critical infrastructure increasingly relies on the Internet, the insecurity of the underlying border gateway routing protocol (BGP) stands out as the Achilles heel. Recent misconfigurations and attacks have demonstrated the brittleness of BGP. Securing BGP has become a priority.

In this paper, we focus on a viable deployment path to secure BGP. We analyze security requirements, and consider tradeoffs of mechanisms that achieve the requirements. In particular, we study how to secure BGP update messages against attacks. We design an efficient cryptographic mechanism that relies only on symmetric cryptographic primitives to guard an ASPATH from alteration, and propose the Secure Path Vector (SPV) protocol. In contrast to the previously proposed S-BGP protocol, SPV is around 22 times faster. With the current effort to secure BGP, we anticipate that SPV will contribute several alternative mechanisms to secure BGP, especially for the case of incremental deployments.


This paper is available in Adobe PDF format.	TOP

Shield: Vulnerability-Driven Network Filters for Preventing Known Vulnerability Exploits

Helen J. Wang (Microsoft Research), Chuanxiong Guo (Microsoft Research), Daniel R. Simon (Microsoft Research), Alf Zugenmaier (Microsoft Research)

Abstract:

Software patching has not been effective as a first-line defense against large-scale worm attacks, even when patches have long been available for their corresponding vulnerabilities. Generally, people have been reluctant to patch their systems immediately, because patches are perceived to be unreliable and disruptive to apply. To address this problem, we propose a first-line worm defense in the network stack, using shields --- vulnerability-specific, exploit-generic network filters installed in end systems once a vulnerability is discovered, but before a patch is applied. These filters examine the incoming or outgoing traffic of vulnerable applications, and drop traffic that exploits vulnerabilities. Shields are less disruptive to install and uninstall, easier to test for bad side effects, and hence more reliable than traditional software patches. Further, shields are resilient to polymorphic or metamorphic variations of exploits.

In this paper, we show that this concept is feasible by describing a prototype Shield framework implementation that filters traffic above the transport layer. We have designed a safe and restrictive language to describe vulnerabilities as partial state machines of the vulnerable application. The expressiveness of the language has been verified by encoding the signatures of several known vulnerabilites.

In our Shield design, we model vulnerability signatures as a combination of partial protocol state machines and vulnerability-parsing instructions for specific payloads. For generality, we abstract out the generic elements of application-level protocols. For flexibility, we express vulnerability signatures and their countermeasures in a safe and restrictive policy language, interpreted by the Shield framework at runtime. We also minimize Shield's maintenance of protocol state (for scalability), and apply defensive design to ensure robustness. We have implemented a Shield prototype and experimented with 10 known vulnerabilities, including the ones behind the (in)famous MSBlast and Slammer worms. Our evaluation provides evidence of Shield's low false positive rate and small impact on application throughput. An examination of a sample set of known vulnerabilities suggests that Shield could be used to prevent exploitation of a substantial fraction of the most dangerous ones.


This paper is available in Adobe PDF format.	TOP

Locating Internet Routing Instabilities

Anja Feldmann (TU Muenchen), Olaf Maennel (TU Muenchen), Z. Morley Mao (U. Michigan), Arthur Berger (MIT/Akamai), Bruce Maggs (CMU/Akamai)

Abstract:

This paper presents a methodology for identifying the autonomous system (or systems) responsible when a routing change is observed and propagated by BGP. The origin of such a routing instability is deduced by examining and correlating BGP updates for many prefixes
gathered at many observation points. Although interpreting BGP updates can be perplexing, we find that we can pinpoint the origin to either a single AS or a session between two ASes in most cases. We verify our methodology in two phases. First, we perform simulations on an AS topology derived from actual BGP updates using routing policies that are compatible with inferred peering/customer/provider relationships. In these simulations, in which network and router behavior are ``ideal'', we inject inter-AS link failures and demonstrate that our methodology can effectively identify most origins of instability. We then develop several heuristics to cope with the
limitations of the actual BGP update propagation process and monitoring infrastructure, and apply our methodology and evaluation techniques to actual BGP updates gathered at hundreds of observation points. This approach of relying on data from BGP simulations as well as from measurements enables us to evaluate the inference quality achieved by our approach under ideal situations and how it is correlated with the actual quality and the number of observation points.


This paper is available in Adobe PDF format.	TOP

Diagnosing Network-Wide Traffic Anomalies

Anukool Lakhina (Boston University), Mark Crovella (Boston University), Christophe Diot (Intel Research)

Abstract:

Anomalies are unusual and significant changes in a network's traffic levels, which can often span multiple links. Diagnosing anomalies is critical for both network operators and end users. It is a difficult problem because one must extract and interpret anomalous patterns from large amounts of high-dimensional, noisy data.

In this paper we propose a general method to diagnose anomalies. This method is based on a separation of the high-dimensional space occupied by a set of network traffic measurements into disjoint subspaces corresponding to normal and anomalous network conditions. We show that this separation can be performed effectively using Principal Component Analysis.

Using only simple traffic measurements from links, we study volume anomalies and show that the method can: (1) accurately detect when a volume anomaly is occurring; (2) correctly identify the underlying origin-destination (OD) flow which is the source of the anomaly; and (3) accurately estimate the amount of traffic involved in the anomalous OD flow.

We evaluate the method's ability to diagnose (\emph{i.e.,} detect, identify, and quantify) both existing and synthetically injected volume anomalies in real traffic from two backbone networks. Our method consistently diagnoses the largest volume anomalies, and does so with a very low false alarm rate.


This paper is available in Adobe PDF format.	TOP

Network Sensitivity to Hot-Potato Disruptions

Renata Teixeira (UCSD), Aman Shaikh (AT&T Labs--Research), Tim Griffin (Intel Research), Geoffrey Voelker (UCSD)

Abstract:

Hot-potato routing is a mechanism employed when there are multiple (equally good) interdomain routes available for a given destination. In this scenario, the Border Gateway Protocol (BGP) selects the interdomain route associated with the closest egress point based upon intradomain path costs. Consequently, intradomain routing changes can impact interdomain routing and cause abrupt swings of external routes, which we call hot-potato disruptions. Recent work has shown that hot-potato disruptions can have a substantial impact on large ISP backbones and thereby jeopardize the network robustness. As a result, there is a need for guidelines and tools to assist in the design of networks that minimize hot-potato disruptions. However, developing these tools is challenging due to the complex and subtle nature of the interactions between exterior and interior routing. In this paper, we address these challenges using an analytic model of hot-potato routing that incorporates metrics to evaluate network sensitivity to hot-potato disruptions. We then present a methodology for computing these metrics using measurements of real ISP networks. We demonstrate the utility of our model by analyzing the sensitivity of AT&T's backbone network.


This paper is available in Adobe PDF format.	TOP

Building a Better NetFlow

Cristian Estan (UCSD), Ken Keys (CAIDA), David Moore (CAIDA/UCSD), George Varghese (UCSD)

Abstract:

Network operators need to determine the composition of the traffic mix on links when looking for dominant applications, users, or estimating traffic matrices. Cisco’s NetFlow has evolved into a solution that satisfies this need by reporting flow records that summarize a sample of the traffic traversing the link. But sampled NetFlow has shortcomings that hinder the collection and analysis of traffic data. First, during flooding attacks router memory and network bandwidth consumed by flow records can increase beyond what is available; second, selecting the right static sampling rate is difficult because no single rate gives the right tradeoff of memory use versus accuracy for all traffic mixes; third, the heuristics routers use to decide when a flow is reported are a poor match to most applications that work with time bins; finally, it is impossible to estimate without bias the number of active flows for aggregates with non-TCP traffic.

In this paper we propose Adaptive NetFlow, deployable through an update to router software, which addresses many shortcomings of NetFlow by dynamically adapting the sampling rate to achieve robustness without sacrificing accuracy. To enable counting of non-TCP flows, we propose an optional Flow Counting Extension that requires augmenting existing hardware at routers. Both our proposed solutions readily provide descriptions of the traffic of progressively
smaller sizes. Transmitting these at progressively higher levels of reliability allows graceful degradation of the accuracy of traffic reports in response to network congestion on the reporting path.


This paper is available in Adobe PDF format.	TOP

Work-Convserving Distributed Schedulers for Terabit Routers

Prashanth Pappu (Washington University), Jonathan Turner (Washington University), Ken Wong (Washington University)

Abstract:

Buffered multistage interconnection networks offer one of the most scalable and cost-effective approaches to building high capacity routers. Unfortunately, the performance of such systems has been difficult to predict in the presence of the extreme traffic conditions that can arise in the Internet. Recent work introduced distributed scheduling, to regulate the flow of traffic in such systems. This work demonstrated, using simulation and experimental measurements, that distributed scheduling can deliver robust performance for extreme traffic. Here, we show that distributed schedulers can be provably work-conserving for speedups of 2 or more. Two of the three schedulers we describe were inspired by previously published crossbar schedulers. The third has no direct counterpart in crossbar scheduling. In our analysis, we show that distributed schedulers based on blocking flows in small-depth acyclic flow graphs can be work-conserving, just as certain crossbar schedulers based on maximal bipartite matchings have been shown to be work-conserving. We also study the performance of practical variants of these schedulers when the speedup is less than 2, using simulation.


This paper is available in Adobe PDF format.	TOP

Exact GPS Simulation with Logarithmic Complexity, and its Applications to an Optimally Fair Scheduler

Paolo Valente (U. Pisa)

Abstract:

Generalized Processor Sharing (GPS) is a fluid scheduling policy providing perfect fairness. The minimum deviation (lead/lag) with respect to the GPS service achievable by a packet scheduler is one packet size. To the best of our knowledge, the only packet scheduler guaranteeing such minimum deviation is Worst-case Fair Weighted Fair Queueing (WF2Q), that requires on-line GPS simulation. Existing algorithms to perform GPS simulation have O(N) complexity per packet transmission (Nbeing the number of competing flows). Hence WF2Q has been charged for O(N) complexity too. Schedulers with lower complexity have been devised, but at the price of at least O(N) deviation from the GPS service, which has been shown to be detrimental for real-time adaptive applications and feedback based applications. Furthermore, it has been proven that the lower bound complexity to guarantee O(1) deviation is Omega(log N), yet a scheduler achieving such result has remained elusive so far.

In this paper we present an algorithm that performs exact GPS simulation with O(log N) worst-case complexity and small constants. As such it improves the complexity of all the packet schedulers based on GPS simulation. In particular, using our algorithm within WF2Q, we achieve the minimum deviation from the GPS service with O(log N) complexity, thus matching the aforementioned lower bound. Furthermore, we assess the effectiveness of the proposed solution by simulating real-world scenarios.


This paper is available in Adobe PDF format.	TOP

Sizing Router Buffers

Guido Appenzeller (Stanford), Isaac Keslassy (Stanford), Nick McKeown (Stanford)

Abstract:

All Internet routers contain buffers to hold packets during times of congestion. Today, the size of the buffers is determined by the dynamics of TCP's congestion control algorithm. In particular,
the goal is to make sure that when a link is congested, it is busy 100% of the time; which is equivalent to making sure its buffer never goes empty. A widely used rule-of-thumb states that each link needs a buffer of size B = RTT X C, where RTT is the average round-trip time of a flow passing across the link, and C is the data rate of the link. For example, a 10Gb/s router linecard needs approximately 250ms X 10Gb/s = 2.5Gbits of buffers; and the amount of buffering grows linearly with the line-rate. Such large buffers are challenging for router manufacturers, who must use large, slow, off-chip DRAMs. And queueing delays can be long, have high variance, and may destabilize the congestion control algorithms.

In this paper we argue that the rule-of-thumb B = RTT X C is now outdated and incorrect for backbone routers. This is because of the large number of flows (TCP connections) multiplexed together on a single backbone link. Using theory, simulation and experiments on a network of real routers, we show that a link with n flows requires no more than B =(RTT X C) / sqrt{n}, for long-lived or short-lived TCP flows. The consequences on router design are enormous: A 2.5Gb/s link carrying 10,000 flows could reduce its buffers by 99% with negligible difference in throughput; and a 10Gb/s link carrying 50,000 flows requires only 10Mbits of buffering, which can easily be implemented using fast, on-chip SRAM.


This paper is available in Adobe PDF format.	TOP

A Wavelet-Based Approach to Detect Shared Congestion

Min Sik Kim (UT Austin), Taekhyun Kim (UT Austin), YongJune Shin (UT Austin), Simon S. Lam (UT Austin), Edward J. Powers (UT Austin)

Abstract:

Per-flow congestion control helps endpoints fairly and efficiently share network resources. Better utilization of network resources can be achieved, however, if congestion management algorithms can determine when two different flows share a congested link. Such knowledge can be used to implement cooperative congestion control or improve the overlay topology of a P2P system. Previous techniques to detect shared congestion either assume a common source or destination node, drop-tail queueing, or a single point of congestion. We propose in this paper a novel technique, applicable to any pair of paths on the Internet, without such limitations. Our technique employs a signal processing method, wavelet denoising, to separate queueing delay caused by network congestion from various other delay variations. Our wavelet-based technique is evaluated through both simulations and Internet experiments. We show that, when detecting shared congestion of paths with a common endpoint, our technique provides faster convergence and higher accuracy while using fewer packets than previous techniques, and that it also accurately determines when there is no shared congestion. Furthermore, we show that our technique is robust and accurate for paths without a common endpoint or synchronized clocks; more specifically, it can tolerate a synchronization offset of up to one second between two packet flows.


This paper is available in Adobe PDF format.	TOP

Delayed Stability and Performance of Distributed Congestion Control

Yueping Zhang (Texas A&M), Seong-Ryong Kang (Texas A&M), Dmitri Loguinov (Texas A&M)

Abstract:

Recent research efforts to design better Internet transport protocols combined with scalable Active Queue Management (AQM) have led to significant advances in congestion control. One of the hottest topics in this area is the design of discrete congestion control algorithms that are asymptotically stable under heterogeneous feedback delay and whose control equations do not explicitly depend on the RTTs of end-flows. In this paper, we show that max-min fair congestion control methods with a stable symmetric Jacobian remain stable under arbitrary feedback delay (including heterogeneous directional delays) and that the stability condition of such methods does not involve any of the delays. To demonstrate the practicality of the obtained result, we
change the original controller in Kelly's work [14] to become robust under random feedback delay and fixed constants of the control equation. We call the resulting framework Max-min Kelly Control (MKC) and show that it offers smooth sending rate, exponential convergence to efficiency, and fast convergence to fairness, all of which make it appealing for future high-speed networks.


This paper is available in Adobe PDF format.	TOP

Impact of Configuration Errors on DNS Robustness

Vasileios Pappas (UCLA), Zhiguo Xu (UCLA), Songwu Lu (UCLA), Daniel Massey (Colorado State), Andreas Terzis (Johns Hopkins), Lixia Zhang (UCLA)

Abstract:

During the past twenty years the Domain Name System (DNS) has sustained phenomenal growth while maintaining satisfactory performance. However, the original design focused mainly on system robustness against physical failures, and neglected the impact of operational errors such as misconfigurations. Our recent measurement effort revealed three specific types of misconfigurations in DNS today: lame delegation, diminished server redundancy, and cyclic zone dependency. Zones with configuration errors suffer from reduced availability and increased query delays up to an order of magnitude. Furthermore, while the original DNS design assumed that
redundant DNS servers fail independently, our measurements show that operational choices made at individual zones can severely affect the availability of other zones. We found that, left unchecked, DNS configuration errors are widespread, with lame delegation affecting 15% of the DNS zones, diminished server redundancy being even more prevalent, andcyclic dependency appearing in 2% of the zones. We also noted that the degrees of misconfiguration vary from zone to zone, with most popular zones having the lowest percentage of errors. Our results indicate that DNS, as well as any other truly robust large-scale system, must include systematic checking mechanisms to cope with operational errors.


This paper is available in Adobe PDF format.	TOP

The Design and Implementation of a Next Generation Name Service for the Internet

Venugopalan Ramasubramanian (Cornell), Emin Gun Sirer (Cornell)

Abstract:

Name services are critical for mapping logical resource names to physical resources in large-scale distributed systems. The Domain Name System (DNS) used on the Internet, however, is slow, vulnerable to denial of service attacks, and does not support fast updates. These problems stem fundamentally from the structure of the legacy DNS.

This paper describes the design and implementation of the Cooperative Domain Name System (CoDoNS), a novel name service, which provides high lookup performance through proactive caching, resilience to denial of service attacks through automatic load-balancing, and fast propagation of updates. CoDoNS derives its scalability, decentralization, self-organization, and failure resilience from peer-to-peer overlays, while it achieves high performance using the Beehive replication framework. Cryptographic delegation, instead of host-based physical delegation, limits potential malfeasance by namespace operators and creates a competitive market for namespace management. Backwards compatibility with existing protocols and wire formats enables CoDoNS to serve as a backup for legacy DNS, as well as a complete replacement. Performance measurements from a real-life deployment of the system in PlanetLab shows that CoDoNS provides fast lookups, automatically reconfigures around faults without manual involvement and thwarts distributed denial of service attacks by promptly redistributing load across nodes.


This paper is available in Adobe PDF format.	TOP

A Layered Naming Architecture for the Internet

Hari Balakrishnan (MIT), Karthik Lakshminarayanan (UC Berkeley), Sylvia Ratnasamy (Intel Research), Scott Shenker (ICIR & UC Berkeley), Ion Stoica (UC Berkeley), Michael Walfish (MIT)

Abstract:

Currently the Internet has only one level of name resolution, DNS, which converts user-level domain names into IP addresses. In this paper we borrow liberally from the literature to argue that there should be three levels of name resolution: from user-level descriptors to service
identifiers; from service identifiers to endpoint identifiers; and from endpoint identifiers to IP addresses. These additional levels of naming and resolution (1) allow services and data to be first class Internet objects (in that they can be directly and persistently named), (2) seamlessly
accommodate mobility and multi-homing and (3) integrate middleboxes (such as NATs and firewalls) into the Internet architecture. We further argue that flat names are a natural choice for the service and endpoint identifiers. Hence, this architecture requires scalable resolution of flat names, a capability that distributed hash tables (DHTs) can provide.


This paper is available in Adobe PDF format.	TOP

Mercury: Supporting Scalable Multi-Attribute Range Queries

Ashwin R. Bharambe (CMU), Mukesh Agrawal (CMU), Srinivasan Seshan (CMU)

Abstract:

This paper presents the design of Mercury, a scalable protocol for supporting multi-attribute range-based searches. Mercury differs from previous range-based query systems in that it supports multiple attributes as well as performs explicit load balancing. To guarantee efficient routing and load balancing, Mercury uses novel light-weight sampling mechanisms for uniformly sampling random nodes in a highly dynamic overlay network. Our evaluation shows that Mercury is able to achieve its goals of logarithmic-hop routing and near-uniform load balancing.

We also show that Mercury can be used to solve a key problem for an important class of distributed applications: distributed state maintenance for distributed games. We show that the Mercury-based solution is easy to use, and that it reduces the game's messaging overheard significantly compared to a naive approach.


This paper is available in Adobe PDF format.	TOP

Modeling and Performance Analysis of Bit Torrent-Like Peer-to-Peer Networks

Dongyu Qiu (U. Illinois at Urbana-Champaign), R. Srikant (U. Illinois at Urbana-Champaign)

Abstract:

In this paper, we develop simple models to study the performance of BitTorrent, a second generation peer-to-peer (P2P) application. We first present a simple fluid model and study the scalability, performance and efficiency of such a file-sharing mechanism. We then consider the built-in incentive mechanism of BitTorrent and study its effect on network performance. We also provide numerical results based on both simulations and real traces obtained from the Internet.


This paper is available in Adobe PDF format.	TOP

A Scalable Distributed Information Management System

Praveen Yalagandula (UT Austin), Mike Dahlin (UT Austin)

Abstract:

We present a Scalable Distributed Information Management System (SDIMS) that aggregates information about large-scale networked systems and that can serve as a basic building block for a broad range of large-scale distributed applications by providing detailed views of nearby information and summary views of global information. To serve as a basic building block, a SDIMS should have four properties: scalability to many nodes and attributes, flexibility to accommodate a broad range of applications, administrative isolation for security and availability, and robustness to node and network failures. We design, implement and evaluate a SDIMS that (1) leverages Distributed Hash Tables (DHT) to create scalable aggregation trees, (2) provides
flexibility through a simple API that lets applications control propagation of reads and writes, (3) provides administrative isolation through simple extensions to current DHT algorithms, and (4) achieves robustness to node and network reconfigurations through lazy reaggregation, on-demand reaggregation, and tunable spatial replication. Through extensive simulations and micro-benchmark experiments, we observe that our system is an order of magnitude more scalable than existing approaches, achieves isolation properties at the cost of modestly increased read latency in comparison to flat DHTs, and gracefully handles failures.


This paper is available in Adobe PDF format.	TOP

Last Modified: February 11, 2004