ACM SIGCOMM 2017, Los Angeles, CA

ACM SIGCOMM 2017 Workshop on Kernel-Bypass Networks (KBNets’17)

Workshop Program

  • Monday, August 21, 2017, Laureate Room (Luskin Center)

  • 9:00am - 9:15am Opening Remarks

    Room: Laureate Room (Luskin Center)

  • 9:15am - 10:20am Keynote 1

    Session Chair: Arvind Krishnamurthy

    Room: Laureate Room (Luskin Center)

  • 10:20am - 10:50am Coffee Break (Foyer)

  • 10:50am - 12:30pm Session 1: High Performance Networks & Apps

    Session Chair: TBD

    Room: Laureate Room (Luskin Center)

  • LogMemcached - An RDMA based Continuous Cache Replication

    Samyon Ristov (The Hebrew University of Jerusalem, Israel), Yaron Weinsberg (Microsoft), Danny Dolev (The Hebrew University of Jerusalem, Israel), and Tal Anker (Mellanox Technologies)

    • Abstract:

      One of the advantages of cloud computing is its ability to quickly scale out services to meet demand. A common technique to mitigate the increasing load in these services is to deploy a cache.

      Although it seems natural that the caching layer would also deal with availability and fault tolerance, these issues are nevertheless often ignored, as the cache has only recently begun to be considered a critical system component. A cache may evict items at any moment, and so a failing cache node can simply be treated as if the set of items stored on that node have already been evicted. However, setting up a cache instance is a time-consuming operation that could inadvertently affect the service’s operation.

      This paper addresses this limitation by introducing cache replication at the server side by expanding Memcached (which currently provides availability via client side replication). This paper presents the design and implementation of LogMemcached, a modification of Memcached’s internal data structures to include state replication via RDMA to provide an increased system availability, improved failure resilience and enhanced load balancing capabilities without compromising performance and with introducing a very low CPU load, while keeping the main principles of Memcached’s Design Philosophy.


  • Accelerating Open vSwitch with Integrated GPU

    Janet Tseng, Ren Wang, James Tsai, Yipeng Wang, and Charlie Tai (Intel Labs)

  • VIRTIO-USER: A New Versatile Channel for Kernel-Bypass Networks

    Jianfeng Tan, Cunming Liang, Huawei Xie, Qian Xu, Jiayu Hu, Heqing Zhu, and Yuanhan Liu (Intel)

    • Abstract:

      Kernel-Bypass Networks still faces some challenging problems: (1) it’s hard for container to gain better performance by kernel-bypass virtual switch; (2) it lacks stable and efficient way to inject packets back to kernel stack for kernel-bypass network interface.

      To solve above problems, we propose VIRTIO-USER, as a versatile, performant, secure and standardized channel. Instead of using hypervisor to bridge frontend and backend driver, we implement an embedded vhost adapter in frontend driver to talk with vhost backend directly. It keeps other mechanisms like memory sharing model, ring layout and feature negotiation, in the same way with VIRTIO. We implement and upstream it into DPDK. In comparison with kernel-based container networking and existing exception path solution, our evaluation shows +3.5x performance boost in both scenarios.


  • Towards a Scalable Modular QUIC Server

    Yufeng Duan (Politecnico di Torino), Massimo Gallo (Nokia Bell Labs), Stefano Traverso (Politecnico di Torino), Rafael Laufer (Nokia Bell Labs), and Paolo Giaccone (Politecnico di Torino)

    • Abstract:

      QUIC has been recently proposed as an alternative transport protocol for web services requiring both low latency and end-to-end encryption. In a different direction, recent kernel-bypass techniques enabling high-speed packet I/O have fostered the development of scalable middleboxes and servers with the introduction of user-space network stacks. Attempting to join the best of both solutions, we introduce in this paper a modular L2–L7 network stack in user space based on QUIC. Our modular and scalable QUIC transport protocol called cQUIC is implemented in Click and uses Intel DPDK for high-speed packet I/O. We prototype cQUIC and show at least an order of magnitude improvement over the Google QUIC server. We also show that cQUIC scalability is CPU (and not I/O) bounded due to the high cost of cryptographic operations. From real-world traffic traces, we observe that up to 18% of QUIC connections are established using the expensive 2-RTT handshake, limiting scalability further.


  • 12:30pm - 1:45pm Lunch Break (Centennial Terrace)

  • 1:45pm - 2:50pm Keynote 2

    Session Chair: Daniel Firestone

    Room: Laureate Room (Luskin Center)

  • 2:50pm - 3:40pm Session 2: Congestion Control

    Session Chair: TBD

    Room: Laureate Room (Luskin Center)

  • RoCE Rocks without PFC: Detailed Evaluation

    Alexander Shpiner, Eitan Zahavi, Omar Dahley, Aviv Barnea, Rotem Damsker, Gennady Yekelis, Michael Zus, Eitan Kuta, and Dean Baram (Mellanox Technologies)

    • Abstract:

      In recent years, the usage of RDMA in data center networks has increased significantly, with RDMA over Converged Ethernet (RoCE) emerging as the canonical approach to deploying RDMA in Ethernet-based data centers. Initial implementations of RoCE required a lossless fabric for optimal performance. This is typically achieved by enabling Priority Flow Control (PFC) on Ethernet NICs and switches. The RoCEv2 specification introduced RoCE congestion control, which allows throttling the transmission rate in response to congestion. Consequently, packet loss is minimized and performance is maintained, even if the underlying Ethernet network is lossy.

      In this paper, we discuss the latest developments in RoCE congestion control. Hardware congestion control reduces the latency of the congestion control loop; it reacts promptly in the face of congestion by throttling the transmission rate quickly and accurately. The short control loop also prevents network buffers from overfilling under various congestion scenarios. In addition, fast hardware retransmission complements congestion control in severe congestion scenarios, by significantly reducing the performance penalty of packet drops. We survey architectural features that allow deployment of RoCE over lossy networks and present real lab test results.


  • Sharing CPUs via endpoint congestion control

    Laura Vasilescu, Vladimir Olteanu, and Costin Raiciu (University Politehnica of Bucharest)

    • Abstract:

      Software network processing relies on dedicated cores and hardware isolation to ensure appropriate throughput guarantees. Such isolation comes at the expense of low utilization in the average case, and severely restricts the number of network processing functions one can execute on a host.

      In this paper we propose that multiple processing functions should simply share a CPU core, turning the CPU into a special type of “link”. We use multiple NIC receive queues and the Fastclick suite to test the feasibility of this approach. We find that, as expected, per core throughput decreases when more processes are contending, however the decrease is not dramatic: around 10 Finally, we implement and test in simulation a solution that enables efficient CPU sharing by sending congestion signals proportional to per-packet cost for each flow. This enables endpoint congestion control (e.g. TCP) to react appropriately and share the CPU fairly.


  • 3:40pm - 4:10pm Coffee Break (Foyer)

  • 4:10pm - 5:25pm Session 3: Measurement & Performance Analysis

    Session Chair: TBD

    Room: Laureate Room (Luskin Center)

  • How to Measure the Killer Microsecond

    Mia Primorac, Edouard Bugnion, and Katerina Argyraki (EPFL)

    • Abstract:

      Datacenter-networking research requires tools to both generate traffic and accurately measure latency and throughput. While hardware-based tools have long existed commercially, they are primarily used to validate ASICs and lack flexibility, e.g. to study new protocols. They are also too expensive for academics. The recent development of kernel-bypass networking and advanced NIC features such as hardware timestamping have created new opportunities for accurate latency measurements. This paper compares these two approaches, and in particular whether commodity servers and NICs, when properly configured, can measure the latency distributions as precisely as specialized hardware.

      Our work shows that well-designed commodity solutions can capture subtle differences in the tail latency of stateless UDP traffic. We use hardware devices as the ground truth, both to measure latency and to forward traffic. We compare the ground truth with observations that combine five latency-measuring clients and five different port forwarding solutions and configurations. State-of-the-art software such as MoonGen that uses NIC hardware timestamping provides sufficient visibility into tail latencies to study the effect of subtle operating system configuration changes. We also observe that the kernel-bypass-based T-Rex software, that only relies on the CPU to timestamp traffic, can also provide solid results when NIC timestamps are not available for a particular protocol or device.


  • Performance Isolation Anomalies in RDMA

    Yiwen Zhang, Juncheng Gu, Youngmoon Lee, Mosharaf Chowdhury, and Kang G. Shin (University of Michigan)

    • Abstract:

      To meet the increasing throughput and latency demands of modern applications, many operators are rapidly deploying RDMA in their datacenters. At the same time, developers are re-designing their software to take advantage of RDMA’s benefits for individual applications. However, when it comes to RDMA’s performance, many simple questions remain open.

      In this paper, we consider the performance isolation characteristics of RDMA. Specifically, we conduct three sets of experiments – three combinations of one throughput-sensitive flow and one latency-sensitive flow – in a controlled environment, observe large discrepancies in RDMA performance with and without the presence of a competing flow, and describe our progress in identifying plausible root-causes.


  • Design Challenges for High Performance, Scalable NFV Interconnects

    Guyue Liu (The George Washington University), K.K. Ramakrishnan (University of California, Riverside), Mike Schlansker and Jean Tourrilhes (Hewlett Packard Labs), and Timothy Wood (The George Washington University)

    • Abstract:

      Software-based network functions (NFs) have seen growing interest. Increasingly complex functionality is achieved by having multiple functions chained together to support the required network-resident services. Network Function Virtualization (NFV) platforms need to scale and achieve high performance, potentially utilizing multiple hosts in a cluster. Efficient data movement is crucial, a cornerstone of kernel bypass. Moving packet data involves delivering the packet from the network interface to an NF, moving it across functions on the same host, and finally across yet another network to NFs running on other hosts in a cluster/data center. In this paper we measure the performance characteristics of different approaches for moving data at each of these levels. We also introduce a new high performance inter-host interconnect using InfiniBand. We evaluate the performance of Open vSwitch and the OpenNetVM NFV platform, considering a simple forwarding function and Snort, a popular intrusion detection system.


Call For Papers

Kernel-Bypass Networks (including, but not limited to RDMA and DPDK) have recently drawn much attention from the research community and the industry. Emerging applications such as AI training, distributed storage systems, and software middle-boxes/NFV have been shown to benefit significantly from technologies that bypass the conventional OS network stack. At the same time, recent switch and NIC developments (e.g., RoCE) have paved the way to the large-scale deployment of KBNets.

We believe that our community must expedite the research on kernel bypass networks. There are significant open questions, for example, regarding the merits of different kernel bypass architectures, how to design control plane and management systems for KBNets, and how to deal with inherent problems such as congestion spreading and deadlocks in such networks. As importantly, much more work is needed to rethink how we design distributed systems and applications to fully take advantage of KBNets.

The ACM SIGCOMM Workshop on Kernel-Bypass Networks (KBNets’17) is organized with the goal of bringing together researchers from the networking, operating systems, and distributed systems communities to promote the development and evolution of kernel-bypass networks. We welcome submissions related to all aspects of KBNets and KBNets-based systems, including network/system architecture, design, implementation, simulation, modeling, analysis, and measurement. We highly encourage novel and innovative early stage work that will encourage discussion and future research on KBNets.

Topics of Interest

Topics include but are not limited to:

  • Network transport for kernel-bypass networks
  • Control plane for kernel-bypass networks
  • Security issues regarding kernel-bypass networks
  • Distributed systems that are based on kernel-bypass networks, e.g., AI training, distributed storage, database and in-memory caches
  • Data center network architectures for kernel-bypass networks
  • Virtualization for kernel-bypass networks
  • NIC/switch hardware design for kernel-bypass networks
  • Middle-boxes/NFV optimization with kernel-bypass networks
  • Diagnosing and troubleshooting kernel-bypass networks
  • Experiences and best-practices in deploying kernel-bypass networks
  • Measurement and performance studies of kernel-bypass networks and applications
  • Deployment strategies and backward compatibility with traditional network stacks
  • Other approaches such as high performance OS data-plane architectures

Contact workshop co-chairs.

Submission Instructions

Submissions must describe original, previously unpublished research, not currently under review by another conference or journal. Papers must be submitted electronically via the submission site. The length of papers must be no more than 6 pages, including tables, figures and references, using the same template as SIGCOMM submission (SIGCOMM submission instructions). The cover page must contain the name and affiliation of author(s) for single-blind peer reviewing by the program committee. Each submission will receive at least three independent blind reviews from the TPC. At least one of the authors of every accepted paper must register and present their work at the workshop.

Please submit your paper via

Important Dates

  • March 24, 2017 March 31, 2017

    Submission deadline

  • April 30, 2017 May 3, 2017

    Acceptance notification

  • May 26, 2017

    Camera ready deadline

Authors Take Note

The official publication date is the date the proceedings are made available in the ACM Digital Library. This date may be up to TWO WEEKS prior to the first day of your conference. The official publication date affects the deadline for any patent filings related to published work.