Programming SmartNICs: From Packet Processing to Programmable Transport

A Hands-on Tutorial on SmartNIC Programming

Presenters

Presenter	Institution
Fernando Ramos	INESC-ID, IST, University of Lisbon
Muhammad Shahbaz	University of Michigan
Mina Tahmasbi Arashloo	University of Waterloo
Mario Baldi	NVIDIA

Tutorial Timetable

09:00 – 09:45	SmartNIC Architectures and Programming Models SmartNIC/DPU/IPU architectures; match-action pipelines vs. programmable cores vs. FPGA; Portable NIC Architecture (PNA); packet processing vs. transport programmability.
09:45 – 10:15	Packet Processing Programming on SmartNICs Table lookups; packet transformations; control-plane integration. Programming with P4, DOCA Pipeline Language (DPL) and DOCA Flow.
10:15 – 10:45	Break
10:45 – 11:30	Hands-On I: Programming the SmartNIC Packet Pipeline Hands-on development of packet-processing functionality, including forwarding, filtering, table lookups, and packet transformations, on BlueField-3 using DPL (DOCA Pipeline Language) and DOCA Flow to program the packet processing pipeline of the ConnectX-7 component.
11:30 – 12:00	Transport Programming on SmartNICs Programmable congestion control; transport-layer extensions; host-NIC coordination; open research challenges.
12:00 – 12:45	Hands-On II: Transport Programmability on SmartNICs Hands-on experimentation with programmable transport functionality using DOCA PCC (Programmable Congestion Control) and DPA (Data Path Accelerator) execution on BlueField-3; evaluating transport behavior and host-NIC interaction.

Summary

Programmable switches transformed networking research by making the data plane accessible and programmable. A similar shift is now happening at the network edge: SmartNICs, DPUs, and IPUs are evolving into programmable computing platforms capable not only of packet processing, but also of stateful services and transport-layer functionality. Rather than being fixed-function offload devices, they are becoming heterogeneous subsystems tightly integrated with host software stacks.

This tutorial provides a unified, systems-oriented introduction to SmartNIC programmability, spanning four tightly coupled dimensions: data-plane packet processing, stateful network function design, transport-layer programmability, and host-level integration. It combines conceptual foundations with guided hands-on exercises on NVIDIA BlueField platforms using NVIDIA Launchpad, allowing participants to gain both architectural understanding and practical experience with packet-processing and transport programmability.

By the end of the tutorial, attendees will understand the design space across SmartNIC, DPU, and IPU platforms; write and deploy packet-processing logic on NIC targets; design and evaluate stateful services; experiment with transport-layer customization and programmable congestion control; and integrate NIC-based functionality with host control planes and software stacks.

Motivation

Over the past decade, P4 and programmable switches opened the door to line-rate packet processing research. Today, SmartNICs extend that opportunity beyond switches and into end hosts, where networking, systems, and transport concerns intersect. Modern NIC platforms increasingly support programmable high-performance pipelines, embedded CPUs, accelerators, and tighter coordination with host software, enabling new designs for offload, isolation, efficiency, congestion control, and AI-aware networking.

Despite strong research and industrial momentum, the community still lacks a structured, hands-on tutorial that systematically teaches how to program SmartNIC packet processing, build stateful network functions, experiment with transport functionality on NICs, and integrate programmable NIC logic into end-host systems. This tutorial is designed to fill that gap.

Outline

The tutorial follows a progressive structure from foundations to advanced functionality.

Part I — Architectures and Packet Processing Foundations
We begin by introducing the architectural landscape of SmartNICs, DPUs, and IPUs, including their heterogeneous execution models and their relationship to host software stacks. We then discuss packet-processing programming models, including match-action pipelines, stateful logic, and control-plane coordination.

Part II — Hands-On Stateful Services
Participants then move to hands-on exercises on NVIDIA BlueField-3 platforms using NVIDIA Launchpad. They develop packet-processing functionality, including forwarding, filtering, table lookups, and packet transformations, in the ConnectX packet processing pipeline using DPL (DOCA Pipeline Language) and DOCA Flow, reinforcing the concepts introduced earlier.

Part III — Transport Programming on SmartNICs
The tutorial then expands from packet processing into transport-layer customization. This session covers programmable congestion control, custom transport logic, NIC-based transport abstractions, and the broader research challenges that arise when transport functionality moves onto programmable NICs.

Part IV — Hands-On Transport Experimentation and Host Integration
Finally, participants explore transport programmability experimentally. Through hands-on exercises using DOCA PCC (Programmable Congestion Control) and DPA (Data Path Accelerator) programming, they customize transport-layer functionality, evaluate performance trade-offs, and examine host-NIC coordination. The goal is to present SmartNICs not as isolated packet devices, but as heterogeneous, programmable systems spanning packet processing, transport functionality, slow-path execution, and host-level integration.

Expected Audience and Prerequisites

This tutorial is intended for:

networking researchers exploring programmable data planes,
systems researchers working on transport, offload, or AI networking,
educators introducing programmable networking platforms, and
industry practitioners deploying SmartNICs, DPUs, or related programmable NIC technologies.

Prerequisites

Participants should have:

basic networking knowledge,
familiarity with programming in C, P4, or similar languages, and
a laptop capable of running the provided VM/container environment.

Laptop Requirements

Participants should bring a laptop to access a remote lab environment used in the hands-on sessions. No specialized hardware is required. The exercises will run on NVIDIA BlueField-3 platforms and will include packet-processing and transport-programmability experiments using DOCA-based tooling. Tutorial materials will be made available before the session.

Biographies

Fernando M. V. Ramos (fvramos@tecnico.ulisboa.pt) is an Associate Professor at the Instituto Superior Técnico, University of Lisbon, and a Senior Researcher at INESC-ID. His research interests include programmable networking, networked systems, security, and the interplay of AI with systems and networking. He is currently co-chair of the P4 Education Working Group. Before joining IST, he held positions at the Faculty of Sciences of the University of Lisbon, the University of Cambridge, Telefónica Research, and Altice Labs. He holds a Ph.D. from the University of Cambridge.

Muhammad Shahbaz (msbaz@umich.edu) is an Assistant Professor of Computer Science at the University of Michigan. His research focuses on abstractions, compilers, and architectures for emerging workloads, including machine learning. He completed postdoctoral research at Stanford University and earned his Ph.D. and M.A. in Computer Science from Princeton University. His recognitions include the NSF CAREER Award and the ACM SOSR Systems Award.

Mina Tahmasbi Arashloo (mina.arashloo@uwaterloo.ca) is an Assistant Professor of Computer Science and Canada Research Chair at the University of Waterloo. Her research focuses on networked systems that are flexible, adaptable, formally analyzable, and still able to meet strict performance requirements. She was previously a presidential postdoctoral fellow at Cornell University and received her Ph.D. from Princeton University. Her recognitions include a Canada Research Chair, the N2Women Rising Star Award, the ACM SIGCOMM Doctoral Dissertation Award, and the Microsoft Research Dissertation Grant.

Mario Baldi (mbaldi@nvidia.com) is a Distinguished Architect at NVIDIA. He served for several years as co-chair of the P4 Architecture Workgroup and has held roles in both startups and established networking companies, as well as visiting professorships on four continents. He holds a Ph.D. in Computer and Systems Engineering from Politecnico di Torino.

Additional Information

All supporting materials will be made available at least one month before the tutorial.