ACM SIGCOMM 2022 TUTORIAL: In-Network Machine Learning using Taurus
Call For Participation
Call For Participation
This tutorial will expose attendees to the exciting new world of in-network machine learning (ML) that's enabled by Taurus. Through lectures and lab exercises, attendees will not only learn the internals of Taurus but also write per-packet ML applications (using KMeans, DNNs, and LSTMs) in Spatial (https://spatial-lang.org) and test them using Taurus’s behavioral model; hence, gaining hands-on experience with in-network machine learning.
We will present the design of the Taurus switch, emphasizing the role of its MapReduce block in enabling per-packet ML, by supporting new computational primitives inside the switch. We will also provide an overview of the Spatial language, and through a series of exercises show how to write ML applications using P4 + Spatial and compile them to the Taurus switch. By the end of the tutorial, attendees will be able to build and run novel per-packet ML models in Spatial and evaluate them using the Taurus behavioral model with Mininet, a virtual network environment.
Maintaining strict security and service-level objectives (SLOs) in next-generation hyperscale datacenter, enterprise, and edge networks demand that compute-intensive management and control decisions are made on the current state of the entire network (e.g., topology, queue sizes, and link and server loads), and applied per-packet at line-rate, in a fast-and-intelligent way . A delay of even a few microseconds in today’s (petabit-bisection-bandwidth) networks would result in (a) missing millions of anomalous packets , (b) saturating switch queues and causing congestion , (c) excessive retransmissions due to packet drops , (d) and imbalanced traffic and server loads , ultimately resulting in loss of revenue, higher operating costs, and unsatisfied end-users.
Unfortunately, the dominant solutions available today are either fast-yet-dumb or slow-but-intelligent. Network operators run services (like load balancing, anomaly detection, and congestion control) using switches and routers, which can react in nanoseconds to network conditions . However, these devices are designed for routing packets and have a constrained programming model (e.g., match-action tables or MATs ), which limits these services to simple heuristics. Conversely, control-plane servers (managing the network) can make complicated data-driven decisions . However, the round trip (10 µs or more) between the controller and switch fundamentally limits the control plane’s reaction speed, even with fast packet IO (e.g., Intel’s DPDK) and dedicated hardware (e.g., TPU or GPU).
We believe, “it is now time to bridge this gap between speed and intelligence.” And, to do so, we present and open-source, Taurus, a novel data-plane switch architecture for per-packet ML (appeared in ASPLOS ’22  and winner of IETF/IRTF ANRP Prize ’22). Taurus extends the Protocol-Independent Switch Architecture (PISA)  with a new Map-Reduce (MR) block, based on a spatial SIMD architecture that supports a variety of ML models . The block is accompanied by an open-source language, Spatial , that along with the P4 language  specifies the various components of the Taurus switch.
A rough outline follows:
Session I: An overview of the Taurus data-plane architecture and the Spatial language.
Session II: Hands-on exercises using the in-network ML development environment based on Taurus.
- ML processing pipeline
- Behavioral model
- Lab exercises
Session III: A mini workshop with invited talks focusing on the following categories.
- Emerging in-network ML use cases
- Current and future research directions
- New ML data-plane targets
- Network ML abstractions and software stacks
Session IV: A panel of luminaries from ML, Architecture, and Networking - bringing them together for the first time at SIGCOMM to discuss the role of ML in networking.
Audience Expectations and Prerequisites
Attendees are not expected to have any prior knowledge of P4 or Spatial languages; the necessary understanding to finish the lab exercises will be provided during the tutorial. However, we require that attendees meet the following expectations: (a) Attendees must bring their own laptops. (b) We will provide a VM image containing the required packages and tools, which they run on their machines. (c) We will provide detailed handouts to help follow the tutorial.
Tushar Swamy is a Ph.D. candidate in the Electrical Engineering Department at Stanford University, where he is advised by Kunle Olukotun. His research is at intersection of machine learning, networking, and architecture, where he develops hardware/software stack for dataplane-based machine learning infrastructure and services. Tushar has received the IETF/IRTF ANRP Prize '22 for his work on ML-capable switches and was named a Goldwater Scholar in 2014.
Annus Zulfiqar is a Ph.D. candidate and Ross Fellow in the Computer Science Department at Purdue University, where he is advised by Muhammad Shahbaz. His research focuses on designing the next-generation hardware/software abstractions and architectures for emerging workloads (e.g., in-network machine learning). Before joining Purdue, he worked as a Design Engineer at the Center for Advanced Research in Engineering (CARE), Pakistan, where he designed Wi-Fi/Ethernet/LTE-capable IoT Sensor Node Networks for Industrial Machine Telemetry. He received his undergraduate in Electrical Engineering from National University of Sciences and Technology (NUST), Pakistan.
Muhammad Shahbaz is a Kevin C. and Suzanne L. Kahn New Frontiers Assistant Professor in Computer Science at Purdue University. His research focuses on the design and development of domain-specific abstractions, compilers, and architectures for emerging workloads (including machine learning and self-driving networks). Shahbaz received his Ph.D. and M.A. in Computer Science from Princeton University and B.E. in Computer Engineering from the National University of Sciences and Technology (NUST). Before joining Purdue, Shahbaz worked as a postdoc at Stanford University and a Research Assistant at Georgia Tech and the University of Cambridge. Shahbaz has built open-source systems, including Pisces, SDX, and NetFPGA-10G, that are widely used in industry and academia. He received the Facebook, Google, and Intel Research Awards; IETF/IRTF ANRP Prize; ACM SOSR Systems Award; APNet Best Paper Award; Best of CAL Paper Award; Internet2 Innovation Award; and Outstanding Graduate Teaching Assistant Award.
Kunle Olukotun is the Cadence Design Professor of Electrical Engineering and Computer Science at Stanford University. Olukotun is well known as a pioneer in multicore processor design and the leader of the Stanford Hydra chip multiprocessor (CMP) research project. Olukotun founded SambaNova Systems (to build AI hardware and integrated systems to run AI applications from the data center to the cloud) and Afara Websystems (to develop high-throughput, low-power multicore processors for server systems). The Afara multicore processor, called Niagara, was acquired by Sun Microsystems. Niagara derived processors now power all Oracle SPARC-based servers. Olukotun currently directs the Stanford Pervasive Parallelism Lab (PPL), which seeks to proliferate the use of heterogeneous parallelism in all application areas using Domain Specific Languages (DSLs). Olukotun is a member of the Data Analytics for What’s Next (DAWN) Lab which is developing infrastructure for usable machine learning. Olukotun is an ACM Fellow and IEEE Fellow for contributions to multiprocessors on a chip and multi-threaded processor design and is the recipient of of the 2018 IEEE Harry H. Goode Memorial Award. Olukotun received his Ph.D. in Computer Engineering from The University of Michigan.
 Tushar Swamy, Alexander Rucker, Muhammad Shahbaz, Ishan Gaur, and Kunle Olukotun. Taurus: A Data Plane Architecture for Per-Packet ML. In ASPLOS, 2022.
 David Koeplinger, Matthew Feldman, Raghu Prabhakar, Yaqi Zhang, Stefan Hadjis, Ruben Fiszel, Tian Zhao, Luigi Nardi, Ardavan Pedram, Christos Kozyrakis, and Kunle Olukotun. Spatial: A Language and Compiler for Application Accelerators. In ACM/SIGPLAN PLDI ’18.
 Raghu Prabhakar, Yaqi Zhang, David Koeplinger, Matt Feldman, Tian Zhao, Stefan Hadjis, Ardavan Pedram, Christos Kozyrakis, and Kunle Olukotun. Plasticine: A Reconfigurable Architecture for Parallel Patterns. In ACM/IEEE ISCA ’17.
 Pat Bosshart, Dan Daly, Glen Gibb, Martin Izzard, Nick McKeown, Jennifer Rexford, Cole Schlesinger, Dan Talayco, Amin Vahdat, George Varghese, et al. P4: Programming Protocol-Independent Packet Processors. ACM SIGCOMM Computer Communication Review 44, 3 (2014), 87–95.
 Pat Bosshart, Glen Gibb, Hun-Seok Kim, George Varghese, Nick McKeown, Martin Izzard, Fernando Mujica, and Mark Horowitz. Forwarding Metamorphosis: Fast Programmable Match-Action Processing in Hardware for SDN. In ACM SIGCOMM ’13.
 Francis Y Yan, Jestin Ma, Greg D Hill, Deepti Raghavan, Riad S Wahby, Philip Levis, and Keith Winstein. Pantheon: The Training Ground for Internet Congestion-Control Research. In USENIX ATC ’18.
 Mo Dong, Qingxi Li, Doron Zarchy, P Brighten Godfrey, and Michael Schapira. PCC: Re-architecting Congestion Control for Consistent High Performance. In USENIX NSDI ’15.
 Mohammad Alizadeh, Tom Edsall, Sarang Dharmapurikar, Ramanan Vaidyanathan, Kevin Chu, Andy Fingerhut, Vinh The Lam, Francis Matus, Rong Pan, Navindra Yadav, and George Varghese. CONGA: Distributed Congestion-aware Load Balancing for Datacenters. In ACM SIGCOMM ’14.