ACM SIGCOMM 2019, Beijing, China

ACM SIGCOMM 2019 Thursday Program

  • Thursday, August 22, 2019

  • 8:30 am - 7:00 pm Registration Desk

    Location: Shangri-La Hotel Lobby

  • Drink and Snack throughout the entire day (outside of conference rooms)

  • 9:00 am - 10:15 am Technical Session 6: New Ways to Solve Old Problems

    Location: Valley Wing Ballroom, 2nd floor
    Session Chair: Junchen Jiang

  • Leveraging Quantum Annealing for Large MIMO Processing in Centralized Radio Access Networks

    Minsung Kim (Princeton University), Davide Venturelli (USRA Research Institute for Advanced Computer Science), Kyle Jamieson (Princeton Univeristy)

    • Abstract: User demand for increasing amounts of wireless capacity continues to outpace supply, and so to meet this demand, significant progress has been made in new MIMO wireless physical layer techniques. Higher-performance systems now remain impractical largely only because their algorithms are extremely computationally demanding. For optimal performance, an amount of computation that increases at an exponential rate both with the number of users and with the data rate of each user is often required. The base station’s computational capacity is thus becoming one of the key limiting factors on wireless capacity. QuAMax is the first large MIMO centralized radio access network design to address this issue by leveraging quantum annealing on the problem. We have implemented QuAMax on the 2,031 qubit D-Wave 2000Q quantum annealer, the state-of-the-art in the field. Our experimental results evaluate that implementation on real and synthetic MIMO channel traces, showing that 10 µs of compute time on the 2000Q can enable 48 user, 48 AP antenna BPSK communication at 20 dB SNR with a bit error rate of 10−6 and a 1,500 byte frame error rate of 10−4.


  • Neural Packet Classification

    Eric Liang (UC Berkeley), Hang Zhu, Xin Jin (Johns Hopkins University), Ion Stoica (UC Berkeley)

    • Abstract: Packet classification is a fundamental problem in computer networking. This problem exposes a hard tradeoff between the computation and state complexity, which makes it particularly challenging. To navigate this tradeoff, existing solutions rely on complex hand-tuned heuristics, which are brittle and hard to optimize. In this paper, we propose a deep reinforcement learning (RL) approach to solve the packet classification problem. There are several characteristics that make this problem a good fit for Deep RL. First, many of the existing solutions are iteratively building a decision tree by splitting nodes in the tree. Second, the effects of these actions (e.g., splitting nodes) can only be evaluated once we are done with building the tree. These two characteristics are naturally captured by the ability of RL to take actions that have sparse and delayed rewards. Third, it is computationally efficient to generate data traces and evaluate decision trees, which alleviate the notoriously high sample complexity problem of Deep RL algorithms. Our solution, NeuroCuts, uses succinct representations to encode state and action space, and efficiently explore candidate decision trees to optimize for a global objective. It produces compact decision trees optimized for a specific set of rules and a given performance metric, such as classification time, memory footprint, or a combination of the two. Evaluation on ClassBench shows that NeuroCuts outperforms existing hand-crafted algorithms in classification time by 18% at the median, and reduces both time and memory footprint by up to 3×.


  • Learning Scheduling Algorithms for Data Processing Clusters

    Hongzi Mao, Malte Schwarzkopf, Shaileshh Bojja Venkatakrishnan (MIT CSAIL), Zili Meng (Tsinghua University), Mohammad Alizadeh (MIT CSAIL)

    • Abstract: Efficiently scheduling data processing jobs on distributed compute clusters requires complex algorithms. Current systems use simple, generalized heuristics and ignore workload characteristics, since developing and tuning a scheduling policy for each workload is infeasible. In this paper, we show that modern machine learning techniques can generate highly-efficient policies automatically. Decima uses reinforcement learning (RL) and neural networks to learn workload-specific scheduling algorithms without any human instruction beyond a high-level objective, such as minimizing average job completion time. However, off-the-shelf RL techniques cannot handle the complexity and scale of the scheduling problem. To build Decima, we had to develop new representations for jobs' dependency graphs, design scalable RL models, and invent RL training methods for dealing with continuous stochastic job arrivals. Our prototype integration with Spark on a 25-node cluster shows that Decima improves average job completion time by at least 21% over hand-tuned scheduling heuristics, achieving up to 2x improvement during periods of high cluster load.


  • 10:15 am - 10:45 am Tea/Coffee Break

  • 10:45 am - 12:00 pm Technical Session 7: Applications

    Location: Valley Wing Ballroom, 2nd floor
    Session Chair: Bruce Maggs

  • E2E: Embracing User Heterogeneity to ImproveQuality of Experience on the Web

    Xu Zhang (The University of Chicago), Siddhartha Sen (Microsoft Research), Daniar Kurniawan, Haryadi Gunawi, Junchen Jiang (The University of Chicago)

    • Abstract: Conventional wisdom states that to improve quality of experience (QoE), web service providers should reduce the median or other percentiles of server-side delays. This work shows that doing so can be inefficient due to user heterogeneity in how the delays impact QoE. From the of QoE, the sensitivity of a request to delays can vary greatly even among identical requests arriving at the service, because they differ in the wide-area network latency experienced prior to arriving at the service. In other words, saving 50ms of server-side delay affects different users differently. This paper presents E2E, the first resource allocation system that embraces user heterogeneity to allocate server-side resources in a QoE-aware manner. Exploiting this heterogeneity faces a unique challenge: unlike other application-level properties of a web request (e.g., a user's subscription type), the QoE sensitivity of a request to server-side delays cannot be pre-determined, as it depends on the delays themselves, which are determined by the resource allocation decisions and the incoming requests. This circular dependence makes the problem computationally difficult. We make three contributions: (1) a case for exploiting user heterogeneity to improve QoE, based on end-to-end traces from Microsoft's cloud-scale production web framework, as well as a user study on Amazon MTurk; (2) a novel resource allocation policy that addresses the circular dependence mentioned above; and (3) an efficient system implementation with almost negligible overhead. We applied E2E to two open-source systems: replica selection in Cassandra and message scheduling in RabbitMQ. Using traces and our testbed deployments, we show that E2E can increase QoE (e.g., duration of user engagement) by 28%, or serve 40% more concurrent requests without any drop in QoE.


  • Graphene: Efficient Interactive Set Reconciliation Applied to Blockchain Propagation

    A. Pinar Ozisik, Brian N. Levine, George Bissias (University of Massachusetts Amherst), Gavin Andresen, Darren Tapp (, Sunny Katkuri (University of Massachusetts Amherst)

    • Abstract: We introduce Graphene, a method and protocol for interactive set reconciliation among peers in blockchains and related distributed systems. Through the novel combination of a Bloom filter and an Invertible Bloom Lookup Table (IBLT), Graphene uses a fraction of the network bandwidth used by deployed work for one- and two-way synchronization. We show that, for this specific problem, Graphene is more efficient at reconciling n items than using a Bloom filter at the information theoretic bound. We contribute a fast and implementation-independent algorithm for parameterizing an IBLT so that it is optimally small in size and meets a desired decode rate with arbitrarily high probability. We characterize our performance improvements through analysis, detailed simulation, and deployment results for Bitcoin Cash, a prominent cryptocurrency. Our implementations of Graphene, IBLTs, and our IBLT optimization algorithm are all open-source code.


  • Offloading Distributed Applications onto SmartNICs using iPipe

    Ming Liu, Tianyi Cui, Henry N. Schuh, Arvind Krishnamurthy (University of Washington), Simon Peter (The University of Texas at Austin), Karan Gupta (Nutanix)

    • Abstract: Emerging Multicore SoC SmartNICs, enclosing rich computing, resources (e.g., a multicore processor, onboard DRAM, accelerators, programmable DMA engines), hold the potential to offload generic datacenter server tasks. However, it is unclear how to use a SmartNIC efficiently and maximize the offloading benefits, especially for distributed applications. Towards this end, we characterize four commodity SmartNICs and summarize the offloading performance implications from four perspectives: traffic control, computing capability, onboard memory, and host communication. Based on our characterization, we build iPipe, an actor-based framework for offloading distributed applications onto SmartNICs. At the core of iPipe is a hybrid scheduler, combining FCFS and DRR-based processor sharing, which can tolerate tasks with variable execution costs and maximize NIC compute utilization. Using iPipe, we build a real-time data analytics engine, a distributed transaction system, and a replicated key-value store, and evaluate them on commodity SmartNICs. Our evaluations show that when processing 10/25Gbps of application bandwidth, NIC-side offloading can save up to 3.1/2.2 beefy Intel cores and lower application latencies by 23.0/28.0 μs.


  • 11:30 am - 1:30 pm Lunch

    Location: Garden Wing Ballroom, 1st floor and Valley Wing Jade Room, 3rd floor

  • 1:30 pm - 2:45 pm Technical Session 8: NICs and Switches

    Location: Valley Wing Ballroom, 2nd floor
    Session Chair: Minlan Yu

  • NitroSketch: Robust and General Sketch-based Monitoring in Software Switches

    Zaoxing Liu (Carnegie Mellon University), Ran Ben Basat (Harvard University), Gil Einziger (Ben-Gurion University of the Negev), Yaron Kassner (Technion), Vladimir Braverman (Johns Hopkins University), Roy Friedman (Technion), Vyas Sekar (Carnegie Mellon University)

    • Abstract: Software switches are emerging as a vital measurement vantage point in many networked systems. Sketching algorithms or sketches, provide high-fidelity approximate measurements, and appear as a promising alternative to traditional approaches such as packet sampling. However, sketches incur significant computation overhead in software switches. Existing efforts in implementing sketches in virtual switches make sacrifices on one or more of the following dimensions: performance (handling 40 Gbps line-rate packet throughput with low CPU footprint), robustness (accuracy guarantees across diverse workloads), and generality (supporting various measurement tasks). In this work, we present the design and implementation of NitroSketch, a sketching framework that systematically addresses the performance bottlenecks of sketches without sacrificing robustness and generality. Our key contribution is the careful synthesis of rigorous, yet practical solutions to reduce the number of per-packetCPU and memory operations. We implement NitroSketch on three popular software platforms (Open vSwitch-DPDK,, and BESS) and evaluate the performance. We show that accuracy is comparable to unmodified sketches while attaining up to two orders of magnitude speedup, and up to 45% reduction in CPU usage.


  • PicNIC: Predictable Virtualized NIC

    Praveen Kumar (Cornell University), Nandita Dukkipati, Nathan Lewis, Yi Cui, Yaogong Wang, Chonggang Li, Valas Valancius, Jake Adriaens, Steve Gribble (Google), Nate Foster (Cornell University), Amin Vahdat (Google)

    • Abstract: Network virtualization stacks are the linchpins of public clouds. A key goal is to provide performance isolation so that workloads on one Virtual Machine (VM) do not adversely impact the network experience of another VM. Using data from a major public cloud provider, we systematically characterize how performance isolation can break in current virtualization stacks and find a fundamental tradeoff between isolation and resource multiplexing for efficiency. In order to provide predictable performance, we propose a new system called PicNIC that shares resources efficiently in the common case while rapidly reacting to ensure isolation. PicNIC builds on three constructs to quickly detect isolation breakdown and to enforce it when necessary: CPU-fair weighted fair queues at receivers, receiver-driven congestion control for backpressure, and sender-side admission control with shaping. Based on an extensive evaluation, we show that this combination ensures isolation for VMs at sub-millisecond timescales with negligible overhead.


  • Fast, Scalable, and Programmable Packet Scheduler in Hardware

    Vishal Shrivastav (Cornell University)

    • Abstract: With increasing link speeds and slowdown in the scaling of CPU speeds, packet scheduling in software is resulting in lower precision and higher CPU utilization. By offloading packet scheduling to the hardware such as a NIC, one can potentially overcome these drawbacks. However, to retain the flexibility of software packet schedulers, packet scheduler in hardware must be programmable, while also being fast and scalable. State-of-the-art packet schedulers in hardware either compromise on scalability (Push-In-First-Out (PIFO)) or the ability to express a wide range of packet scheduling algorithms (First-In-First-Out (FIFO)). Further, even a general scheduling primitive like PIFO is not expressive enough to express certain key classes of packet scheduling algorithms. Hence in this paper, we propose a generalization of the PIFO primitive, called Push-In-Extract-Out (PIEO), which like PIFO, maintains an ordered list of elements, but unlike PIFO which only allows dequeue from the head of the list, PIEO allows dequeue from arbitrary positions in the list by supporting a programmable predicate-based filtering at dequeue. Next, we present a fast and scalable hardware design of PIEO scheduler and prototype it on a FPGA. Overall, PIEO scheduler is both more expressive and over 30x more scalable than PIFO.


  • 2:45 pm - 3:15 pm Tea/Coffee Break

  • 3:15 pm - 4:30 pm Technical Session 9: Video

    Location: Valley Wing Ballroom, 2nd floor
    Session Chair: Zafar Ayyub Qazi

  • Vantage: Optimizing video upload for time-shifted viewing of social live streams

    Devdeep Ray, Jack Kosaian, K. V. Rashmi, Srinivasan Seshan (Carnegie Mellon University)

    • Abstract: Social live video streaming (SLVS) applications are becoming increasingly popular with the rise of platforms such as Facebook-Live, YouTube-Live, Twitch and Periscope. A key characteristic that differentiates this new class of applications from traditional live streaming is that these live streams are watched by viewers at different delays; while some viewers watch a live stream in real-time, others view the content in a time-shifted manner at different delays. In the presence of variability in the upload bandwidth, which is typical in mobile environments, existing solutions silo viewers into either receiving low latency video at a lower quality or a higher quality video with a significant delay penalty, without accounting for the presence of diverse time-shifted viewers. In this paper, we present Vantage, a live-streaming upload solution that improves the overall quality of experience for diverse time-shifted viewers by using selective quality-enhancing retransmissions in addition to real-time frames, optimizing the encoding schedules to balance the allocation of the available bandwidth be-tween the two. Our evaluation using real-world mobile network traces shows that Vantage can provide high quality simultaneously for both low-latency and delayed viewing. For delayed viewing, Vantage achieves an average improvement of 19.9% over real-time optimized video streaming techniques across all the network traces and test videos, with observed gains of up to 42.9%. These benefits come at the cost of an average drop in real-time quality of 3.3%, with a maximum drop of 7.1%. This represents a significant performance improvement over current techniques used for SLVS applications, which primarily optimize the video upload for real-time viewing.


  • Pano: Optimizing 360 Video Streaming with a Better Understanding of Quality Perception

    Yu Guan, Chengyuan Zheng, Xinggong Zhang, Zongming Guo (Peking University), Junchen Jiang (The University of Chicago)

    • Abstract: 360° video streaming has gained tremendous popularity, but achieving high video quality requires much more bandwidth than non-360° videos. The reason is that current solutions assume users perceive the quality of 360° videos in the same way they perceive that of non-360° videos. However, our analysis reveals several quality determining factors unique to 360° videos, including the moving speed of user's viewpoint (center of user's field of view), the recent change of video luminance, and the difference in depth-of-fields of visual objects close to the viewpoint. This paper presents Pano, a 360° video streaming system that leverages these 360° video-specific factors. We make three technical contributions. (1) We build a 360° video quality perception model to systematically capture the impact of these 360° video-specific factors. (2) Pano replaces the grid-based spatial tiling by a variable sized tiling scheme, which strikes a better balance between the perceived quality and video encoding efficiency. (3) Pano uses a new quality-adaptation scheme that maximizes the user-perceived quality and is readily deployable in the current video delivery architecture. Our evaluation (based on user study and trace analysis) shows that compared with state-of-the-art techniques, Pano can save 41-46% bandwidth without any drop in the perceived quality, or it can raise the perceived quality (user rating) by 25%-142% without using more bandwidth.


  • End-to-End Transport for Video QOE Fairness

    Vikram Nathan, Vibhaalakshmi Sivaraman, Ravichandra Addanki, Mehrdad Khani, Prateesh Goyal, Mohammad Alizadeh (MIT CSAIL)

    • Abstract: The growth of video traffic makes it increasingly likely that multiple clients share a bottleneck link, giving video content providers an opportunity to optimize the experience of multiple users jointly. But today’s transport protocols are oblivious to video streaming applications and provide only connection-level fairness. We design and build Minerva, the first end-to-end transport protocol for multi-user video streaming. Minerva uses information about the player state and video characteristics to adjust its congestion control behavior to optimize for QoE fairness. Minerva clients receive no explicit information about other video clients, yet when multiple of them share a bottleneck link, their rates converge to a bandwidth allocation that maximizes QoE fairness. At the same time, Minerva videos occupy only their fair share of the bottleneck link bandwidth, competing fairly with existing TCP traffic. We implement Minerva on an industry standard video player and server and show that, compared to Cubic and BBR, 15-32% of the videos using Minerva experience an improvement in viewing experience equivalent to switching from 720p to 1080p.


  • 4:30 pm - 5:45 pm Technical Session 10: New Control Plane Operations

    Location: Valley Wing Ballroom, 2nd floor
    Session Chair: Dan Li

  • Towards Highly Available Clos-Based WAN Routers

    Sucha Supittayapornpong, Barath Raghavan, Ramesh Govindan (University of Southern California)

    • Abstract: The performance and availability of cloud and content providers often depends on the wide area networks (WANs) they use to interconnect their datacenters. WAN routers, which connect to each other using trunks (bundles of links), are sometimes built using an internal Clos topology connecting merchant-silicon switches. As such, these routers are susceptible to internal link and switch failures, resulting in reduced capacity and low availability. Based on the observation that today's WAN routers use relatively simple trunk wiring and routing techniques, we explore the design of novel wiring and more sophisticated routing techniques to increase failure resilience. Specifically, we describe techniques to 1) optimize trunk wiring to increase effective internal router capacity so as to be resilient to internal failures, 2) compute the effective capacity under different failure patterns, and 3) use these to compute compact routing tables under different failure patterns, since switches have limited routing table sizes. Our evaluations show that our approach can mask failures of up to 75% of switches in some cases without exceeding routing table limits, whereas competing techniques can sometimes lose half of a WAN router's capacity with a single failure.


  • On Optimal Neighbor Discovery

    Philipp H. Kindt, Samarjit Chakraborty (Technical University of Munich (TUM))

    • Abstract: Mobile devices apply neighbor discovery (ND) protocols to wirelessly initiate a first contact within the shortest possible amount of time and with minimal energy consumption. For this purpose, over the last decade, a vast number of ND protocols have been proposed, which have progressively reduced the relation between the time within which discovery is guaranteed and the energy consumption. In spite of the simplicity of the problem statement, even after more than 10 years of research on this specific topic, new solutions are still proposed even today. Despite the large number of known ND protocols, given an energy budget, what is the best achievable latency still remains unclear. This paper addresses this question and for the first time presents safe and tight, duty-cycle-dependent bounds on the worst-case discovery latency that no ND protocol can beat. Surprisingly, several existing protocols are indeed optimal, which has not been known until now. We conclude that there is no further potential to improve the relation between latency and duty-cycle, but future ND protocols can improve their robustness against beacon collisions.


  • Elmo: Source Routed Multicast for Public Clouds

    Muhammad Shahbaz (Stanford University), Lalith Suresh (VMware), Jennifer Rexford (Princeton University), Nick Feamster (Princeton University), Ori Rottenstreich (Technion), Mukesh Hira (VMware)

    • Abstract: We present Elmo, a system that addresses the multicast scalability problem in multi-tenant data centers. Modern cloud applications frequently exhibit one-to-many communication patterns and, at the same time, require sub-millisecond latencies and high throughput. IP multicast can achieve these requirements but has control- and data-plane scalability limitations that make it challenging to offer it as a service for hundreds of thousands of tenants, typical of cloud environments. Tenants, therefore, must rely on unicast-based approaches (e.g., application-layer or overlay-based) to support multicast in their applications, imposing bandwidth and end-host CPU overheads, with higher and unpredictable latencies. Elmo scales network multicast by taking advantage of emerging programmable switches and the unique characteristics of data-center networks; specifically, the hypervisor switches, symmetric topology, and short paths in a data center. Elmo encodes multicast group information inside packets themselves, reducing the need to store the same information in network switches. In a three-tier data-center topology with 27,000 hosts, Elmo supports a million multicast groups using an average packet-header size of 114 bytes, requiring as few as 1,100 multicast group-table entries on average in leaf switches, and having a traffic overhead as low as 5% over ideal multicast.


  • 5:45 pm - 6:05 pm Closing

    Yong Cui (Tsinghua University, China)

  • 6:30 pm - 9:00 pm N2Women Dinner

    Location: Garden Wing Ballroom, 1st floor

The final program may be adjusted.