APNet SIGCOMM/NSDI Talks
KyoungSoo Park
professor in the school of electrical engineering at KAIST
Paper Title:
Designing SmartNIC-accelerated TCP stacks
Abstract: Recent advancement of high-bandwidth I/O devices presents a great potential for scalable delivery of online content. Unfortunately, today's programming model implicitly assumes CPU in the busy loop, which often severely limits the true capacity of modern I/O devices. Not surprisingly, we observe that over 70\% of CPU cycles are spent on simple yet repetitive tasks such as disk and network I/O operations in online content delivery. In this talk, I'll introduce two SmartNIC-accelerated TCP stack designs that drastically reduce CPU burden on online content delivery. First, I present AccelTCP, which offloads connection management & connection splicing to SmartINIC while allowing CPU to focus on application-level processing. Second, with IO-TCP, I show that one can offload network and disk I/Os of a content server to SmartNIC. Both designs realize the separation of control and data planes of a TCP stack where the CPU side assumes the full control of the stack operation while mechanical data plane operations are offloaded to SmartNIC. Our evaluation shows that AccelTCP accelerates an in-memory key-value store and a popular load balancer by 2.3x and 11x while IO-TCP achieves over 70Gbps of encrypted video streaming with only two CPU cores.
Speaker Bio: KyoungSoo Park is a professor in the school of Electrical Engineering at KAIST. He received his B.S. degree from Seoul National University in 1997, and his M.A. and Ph.D. degrees from Princeton University in 2004 and 2007, respectively, all in computer science. His research interest is focused on high-performance packet processing, scalable network stack and server architectures, and systems support for distributed deep learning. He in the main architect of CoBlitz, a scalable large-file content distribution network (CDN), and has served as CTO of its startup that was eventually acquired by Akamai, Inc. His team won USENIX NSDI Community Award in 2014 (mTCP) and the USENIX NSDI Best Paper Award in 2017 (mOS).
Wenfei Wu
Assistant Professor from the School of Computer Science at Peking University
Paper Title:
Accelerating Distributed Systems with In-Network Computing
Abstract: With Moore's Law failing to promote the computation power at a single node, building distributed and heterogeneous systems becomes a new trend to support large-scale applications, such as large model training and big data analytics. In-Network Computing (INC) is an effective approach to building such distributed systems. INC leverages programmable network devices to process traversing data packets, and provides line-rate and low-latency data processing capabilities, which could compress traffic volume and accelerate the overall transmission and job efficiency. In this talk, we will share the progress and development of INC technologies, including INC protocol design for machine learning and data analytics, RDMA-compatible INC solutions, and runtime INC job management in clusters. These works are published in NSDI21, NSDI23, ASPLOS23, and INFOCOM23.
Speaker Bio: Wenfei Wu is an assistant professor from the School of Computer Science at Peking University. He obtained his Ph.D. degree from the University of Wisconsin-Madison in 2015. Dr. Wu researches into computer networks and distributed systems, and has published more than 50 papers in these areas. Dr. Wu's recent research focus is to build in-network computation (INC) methods for distributed systems; his work on INC-empowered distributed machine learning system ATP won the best paper award in NSDI 2021, and that on INC-empowered distributed data analytics system ASK won the distinguished paper award in ASPLOS 2023; Dr. Wu won other awards like IPCCC best paper runner-up in 2019, SoCC best student paper in 2013, etc.
Qiao Xiang
Professor of Computer Science at Xiamen University
Paper Title:
You Don’t Need a Centralized Verifier: Scaling Data Plane Checking via Distributed, On-Device Verification
Abstract: Centralized data plane verification (DPV) faces significant scalability issues in large networks (i.e., the verifier being a performance bottleneck and single point of failure and requiring a reliable management network). In this talk, we tackle this scalability challenge of DPV from an architectural perspective. In particular, we circumvent the scalability bottleneck of centralized design and advocate for a distributed, on-device DPV framework. Our key insight is that DPV can be transformed into a counting problem on a directed acyclic graph, which can be naturally decomposed into lightweight tasks executed at network devices, enabling fast data plane checking in networks of various scales and types. We build a small testbed of commodity switches equipped with low-end CPUs to demonstrate the feasibility and capability of distributed, on-device DPV. Additional experiments with real-world datasets (WAN/LAN/DC) show that our distributed DPV framework verifies a real, large DC in less than 41 seconds while other tools need several minutes or up to tens of hours, and shows an up to 2355x speed up on 80% quantile of incremental verification with small overheads.
Speaker Bio: Qiao Xiang is a Professor of Computer Science at Xiamen University, China. Before that, he worked at McGill University and Yale University as a postdoc and later a research assistant professor. He received his PhD from Wayne State University in 2014, and BE and BEcon from Nankai University in 2007. His research focuses on building reliable, efficient large-scale networks and systems.
Yang Xu
Yaoshihua Chair Professor in the School of Computer Science at Fudan University
Paper Title:
BMW Tree: Large-scale, High-throughput and Modular PIFO Implementation using Balanced Multi-Way Sorting Tree
Abstract: Push-In-First-Out (PIFO) queue has been extensively studied as a programmable scheduler. To achieve accurate, large-scale, and high-throughput PIFO implementation, we propose the Balanced Multi-way (BMW) Sorting Tree for real-time packet sorting. The tree is highly modularized, insertion-balanced and pipeline-friendly with autonomous nodes. Based on it, we design two simple and efficient hardware designs. The first one is a register-based (R-BMW) scheme. With a pipeline, it features an impressively high and stable throughput without any frequency reduction theoretically even under more levels. We then propose Ranking Processing Units to drive the BMW-Tree (RPU-BMW) to improve the scalability, where nodes are stored in SRAMs and dynamically loaded into/off from RPUs. As the capacity of BMW-Tree grows exponentially, only a few RPUs are needed for a large scale. The evaluation shows that when deployed on the Xilinx Alveo U200 card, R-BMW improves the throughput by 4.8x compared to the original PIFO implementation, while exhibiting a similar capacity. RPU-BMW is synthesized in GlobalFoundries 28nm process, costing a modest 0.522% (1.043mm^2) chip area and 0.57MB off-chip memory to support 87k flows at 200Mpps. To our best knowledge, RPU-BMW is the first accurate PIFO implementation supporting over 80k flows at as fast as 200Mpps.
Speaker Bio: Yang Xu is the Yaoshihua Chair Professor in the School of Computer Science at Fudan University. Prior to joining Fudan University, he was a faculty member in the Department of Electrical and Computer Engineering, New York University Tandon School of Engineering. He received his Ph.D. in Computer Science and Technology from Tsinghua University, China in 2007 and Bachelor of Engineering degree from Beijing University of Posts and Telecommunications in 2001. His research interests include software-defined networks, data center networks, distributed machine learning, edge computing, network function virtualization, and network security. He has published more than 120 journal and conference papers in top venues including SIGCOMM, NSDI, INFOCOM, JSAC, TON, ICNP, CoNEXT, ICDCS, MM and received the Best Paper Award at ACM CoNEXT 2022. He holds more than 10 U.S. and international granted patents on various aspects of networking and computing. He served as a TPC member for many international conferences, as an Editor for the Journal of Network and Computer Applications (Elsevier), and as a Guest Editor for the IEEE Journal on Selected Areas in Communications–Special Series on Network Softwarization & Enablers and Wiley Security and Communication Networks Journal–Special Issue on Network Security and Management in SDN.