Yusheng Zheng (UC Santa Cruz), Panayiotis Gavriil (The D. E. Shaw Group), Marios Kogias (Imperial College London)
Abstract: Modern network function (NF) deployments face a fundamental trade-off: kernel-based extended Berkeley Packet Filter (eBPF) NFs provide safety, portability, and an extensive tooling ecosystem, but are limited in performance, while kernel-bypass frameworks deliver high throughput but lack integrated verification and ease of deployment. We present uXDP, a new runtime that unifies these worlds by running unmodified, verified XDP programs in userspace. uXDP ensures compatibility and preserves the verification-driven safety, portability, and familiar workflows of eBPF while moving execution into the userspace, enabling more aggressive optimizations and flexibility. Without recompiling eBPF code, uXDP achieves throughput gains of up to 3.3× over in-kernel execution and improves Meta's Katran load balancer performance by 40%, all while retaining the trusted eBPF development model and deployment simplicity.
No Two Snowflakes Are Alike: Studying eBPF Libraries' Performance, Fidelity and Resource Usage
Carlos Machado, Bruno Gião (INESC TEC & U. Minho), Sebastião Amaro, Miguel Matos (IST Lisbon & INESC-ID), João Paulo, Tânia Esteves (INESC TEC & U. Minho)
Abstract: As different eBPF libraries keep emerging, developers are left with the hard task of choosing the right one. Until now, this choice has been based on functional requirements (e.g., programming language support, development workflow), while quantitative metrics have been left out of the equation. In this paper, we argue that efficiency metrics such as performance, resource usage, and data collection fidelity also need to be considered for making an informed decision. We show it through an experimental study comparing five popular libraries: bpftrace, BCC, libbpf, ebpf-go, and Aya. For each, we implement three representative eBPF-based tools and evaluate them under different storage I/O workloads. Our results show that each library has its own strengths and weaknesses, as their specific features lead to distinct trade-offs across the selected efficiency metrics. These results further motivate experimental studies to increase the community's understanding of the eBPF ecosystem.
Performance Implications at the Intersection of AF_XDP and Programmable NICs
Marco Molè, Farbod Shahinfar, Francesco Maria Tranquillo, Davide Zoni (Politecnico di Milano), Aurojit Panda (NYU), Gianni Antichi (Politecnico di Milano)
Abstract: AF_XDP is emerging as an easier way to implement zero-copy network bypass applications. This is because it allows mixed-mode deployments, where zero-copy and socket-based applications share the same NIC. However, AF_XDP relies on NIC hardware and driver features, but implementing these features on programmable NICs adds resource overheads and increases development complexity and thus might not be desirable. To address this, we examine the feasibility of using eBPF based kernel extensibility to implement the required features, and report on the tradeoff between an eBPF and a native NIC implementation. Our analysis involved updating the OpenNIC driver to support the loading of eBPF/XDP programs and zero-copy AF_XDP. Our implementation is of independent interest because it makes it easier to develop and evaluate alternate designs for mixed-mode zero-copy deployments, and new NIC accelerated applications. Our implementation is open-sourced.
Toward eBPF-Accelerated Pub-Sub Systems
Beihao Zhou, Samer Al-Kiswany, Mina Tahmasbi Arashloo (University of Waterloo)
Abstract: Publish-subscribe (pub-sub) systems are a fundamental building block for real-time distributed applications, where high throughput and low latency are critical. Existing brokers can suffer performance bottlenecks as they operate in user space and rely on the socket API and full kernel stack traversal for every message. We present BPF-Broker, a novel pub-sub broker that leverages eBPF to accelerate message dissemination by decoupling the broker's control and data paths. Subscriber management is handled in user space, while message forwarding is done early in the kernel using the TC ingress and XDP hooks. Our evaluation shows that BPF-Broker achieves up to 3× higher throughput compared to our Socket-based baseline broker under high subscriber counts, and up to 2-10× lower end-to-end latency. These results highlight the potential of eBPF in accelerating pub-sub systems.
A Memory Pool Allocator for eBPF Applications
Gyuyeong Kim (Sungshin Women's University), Dongsu Han (KAIST)
Abstract: eBPF enables high-performance kernel-level execution by eliminating networking stack traversal and context switching. Despite the advantages, eBPF applications face strict memory management constraints due to the eBPF verifier requirements that mandate static memory allocation. This limitation imposes a fundamental tradeoff between application performance and memory efficiency, ultimately restricting the potential of eBPF. We present Kerby, a dynamic memory pool allocator for eBPF that enables eBPF applications to dynamically manage pre-allocated memory by representing variable-length data as collections of fixed-size blocks. This allows applications to increase the amount of kernel-resident data while minimizing internal fragmentation. Our preliminary evaluation with key-value store implementations demonstrates that Kerby achieves significant improvements in both memory utilization and throughput.
SchedBPF - Scheduling BPF programs
Kavya Shekar, Dan Williams (Virginia Tech)
Abstract: The Linux BPF framework enables the execution of verified custom bytecode in the critical path of various Linux kernel routines, allowing for efficient in-kernel extensions. The safety properties and low execution overhead of BPF programs have led to advancements in kernel extension use-cases that can be broadly categorized into tracing, custom kernel policies, and application acceleration. However, BPF is fundamentally event-driven and lacks native support for periodic or continuous tasks such as background tracing, metric aggregation, or kernel housekeeping. Existing approaches such as kernel modules with kthreads, userspace daemons, or BPF timers fail to satisfy all the essential requirements for periodic kernel extensions such as fine-grained CPU control, kernel safety, and minimal overhead. To address this gap, we propose SchedBPF --- a conceptual framework that enables periodic execution of BPF programs on kernel threads. SchedBPF program executions are sandboxed and preemptible, as governed by the existing BPF verifier and JIT engine. They also adopt time-slice semantics, cgroup-style CPU quotas, and nice-level priority control, similar to kernel threads. SchedBPF aims to enable low-overhead, periodic execution of safe BPF code with fine-grained CPU resource management.
ChainIO: Bridging Disk and Network Domains with eBPF
Zheng Cao, He Xuhang (UC Merced), Yanpeng Hu (ShanghaiTech University), Yusheng Zheng, Yiwei Yang (UC Santa Cruz), Jianchang Su, Wei Zhang (University of Connecticut), Andi Quinn (UC Santa Cruz)
Abstract: Modern data-driven services from analytical databases and key-value stores to stream processors suffer high tail-latencies because each disk read and subsequent packet send/recv incurs a separate user-kernel crossing and redundant buffer copy. While Linux's io_uring now supports both block and socket I/O with asynchronous, batched submissions, it does not provide zero-copy transfers between storage and network domains; AF_XDP delivers high-performance packet I/O but is siloed to the network stack. No existing framework transparently unifies these mechanisms end-to-end. We present ChainIO, an eBPF-based system that intercepts and rewrites I/O syscalls, uses ring buffers to pass data descriptors directly between io_uring and AF_XDP, and orchestrates in-kernel execution to chain disk reads into network sends (and vice versa) with full POSIX semantics, fallback safety for unsupported cases, and zero application changes. Our prototype works with unmodified binaries and improves ClickHouse's TPC-H query throughput by up to 39%. ChainIO thus offers a general, safe, and high-performance path for cross-domain I/O optimization in diverse data-intensive workloads.
bpfCP: Efficient and Extensible Process Checkpointing via eBPF
Juntong Deng (King's College London), Stephen Kell (King's College London)
Abstract: Live migration, snapshotting, and accelerated startup of applications or containers have long been implemented using checkpoint and restore primitives. To save or 'checkpoint', it is necessary to dump not only its userspace state, but also a large amount of state in the kernel. The current widely used implementation on Linux relies heavily on the /proc file system and special system call interfaces, but these suffer from poor performance and lack extensibility. In this paper, we propose bpfCP, a process checkpointing scheme that dumps in-kernel state via eBPF programs, which improves performance and extensibility. Our preliminary evaluation shows that bpfCP can achieve significant performance improvements in dumping multiple types of in-kernel state of processes.
Automatic Synthesis of Abstract Operators for eBPF
Abstract: This paper proposes an approach to automatically synthesize sound and precise abstract operators for the static analyzer in the eBPF verifier. The eBPF verifier ensures that only safe user-defined programs are loaded into the kernel. An unsound operator can lead to unsafe programs being accepted, while an imprecise operator can cause safe programs to be rejected. Our approach starts by generating candidate operators using input-output examples tailored for the eBPF verifier's abstract operators and iteratively refines it for soundness and precision. Using this approach, we have generated more precise variants of existing operators. Our approach also generates numerous sound and unsound operators that can serve as test suites for existing eBPF verification and fuzzing frameworks.
Pairwise BPF Programs Should Be Optimized Together
Milo Craun, Dan Williams (Virginia Tech)
Abstract: BPF programs are extensively used for tracing and observability in production systems where performance overheads matter. Many individual BPF programs do not incur serious performance degrading overhead on their own, but increasingly more than a single BPF program is used to understand production system performance. BPF deployments have begun to look more like distributed applications; however, this is a mismatch with the underlying Linux kernel, potentially leading to high overhead cost. In particular, we identify that many BPF programs follow a pattern based on pairwise program deployment where entry and exit probes will be attached to measure a single quantity. We find that the pairwise BPF program pattern results in unnecessary overheads. We identify three optimizations---BPF program inlining, context aware optimization, and intermediate state internalization---that apply to pairwise BPF programs. We show that applying these optimizations to an example pairwise BPF program can reduce overhead on random read throughput from 28.13% to 8.98% and on random write throughput from 26.97% to 8.60%. We then examine some key design questions that arise when seeking to integrate optimizations with the existing BPF system.
Kernel Extension DSLs Should Be Verifier-Safe!
Franco Solleza, Justus Adam, Akshay Narayan, Malte Schwarzkopf (Brown University), Andrew Crotty (Northwestern University), Nesime Tatbul (Intel Labs and MIT)
Abstract: eBPF allows developers to write safe operating system extensions, but writing these extensions remains challenging because it requires detailed knowledge of both the extension's domain and eBPF's programming interface. Most importantly, the extension must pass the eBPF verifier. This paper argues that DSLs for extensions should guarantee verifier-safety: valid DSL programs should result in eBPF code that always passes the verifier. This avoids complex debugging and the need for extension developers to be eBPF experts. We show that three existing DSLs for different domains are compatible with verifier-safety. Beyond verifier-safety, practical extension DSLs must also achieve good performance. Inspired by database query optimization, we sketch an approach to creating DSL-specific optimizers capable of maintaining verifier-safety. A preliminary evaluation shows that optimizing verifier-safe extension performance is feasible.
Offloading the Tedious Task of Writing eBPF Programs
Xiangyu Gao, Xiangfeng Zhu (University of Washington), Bhavana Vannarth Shobhana (Rutgers University), Yiwei Yang (UC Santa Cruz), Arvind Krishnamurthy, Ratul Mahajan (University of Washington)
Abstract: eBPF offers a lightweight method to extend the Linux kernel without modifying the source code in existing modules. However, writing correct and efficient eBPF programs is hard due to its unique verifier constraints and cumbersome debugging processes specific to the kernel execution environment. To tackle such an obstacle, we present a system, SimpleBPF, aiming at offloading the tedious eBPF development task. Developers only need to express their intent in a high-level domain-specific language, while the underlying eBPF code generation is handled automatically. SimpleBPF integrates four key components: a concise DSL, an LLM-based generator, a semantic checker, and an LLM-based optimizer. We use few-shot prompting to build both the code generator and optimizer in SimpleBPF, and evaluate the system on programs written in a representative DSL. The preliminary evaluation result shows that SimpleBPF can generate valid eBPF programs that pass the kernel verifier and exhibit competitive runtime performance. We also outline future directions based on current findings.
Empowering machine-learning assisted kernel decisions with eBPF^ML
Abstract: Machine-learning (ML) techniques can optimize core operating system paths---scheduling, I/O, power, and memory---yet practical deployments remain rare. Existing prototypes either (i) bake simple heuristics directly into the kernel or (ii) off-load inference to user space to exploit discrete accelerators, both of which incur unacceptable engineering or latency cost. We argue that eBPF, the Linux kernel's safe, hot-swappable byte-code runtime, is the missing substrate for moderately complex in-kernel ML. We present eBPFML, a design that (1) extends the eBPF instruction set with matrix-multiply helpers, (2) leverages upcoming CPU matrix engines such as Intel Advanced Matrix Extensions (AMX) through the eBPF JIT, and (3) retains verifier guarantees and CO-RE portability.
eInfer: Unlocking Fine-Grained Tracing for Distributed LLM Inference with eBPF
Kexin Chu, Jianchang Su, Yifan Zhang (University of Connecticut), Chenxingyu Zhao (University of Washington), Yiwei Yang, Yusheng Zheng (UC Santa Cruz), Shengkai Lin, Shizhen Zhao (Shanghai Jiao Tong University), Wei Zhang (University of Connecticut)
Abstract: Modern large language model (LLM) inference workloads run on complex, heterogeneous distributed systems spanning CPUs, GPUs, multi-GPU setups, and network interconnects. Existing profiling tools either incur prohibitive overhead, provide limited visibility, or suffer from vendor lock-in, making real-time, fine-grained performance analysis impractical in production environments. We present eInfer, the first eBPF-based system enabling transparent, low-overhead end-to-end tracing of per-request performance across distributed LLM inference pipelines without requiring application modifications. eInfer uniquely correlates events across CPUs, accelerators, processes, and nodes, delivering unified, vendor-agnostic observability that approaches the accuracy of specialized GPU profiling tools. To address the challenges of scalability dynamic workloads, and instrumentation gaps on accelerators, we design a runtime-adaptive tracing mechanism that maintains comprehensive visibility in real time. Our initial evaluation demonstrates that eInfer delivers precise, low-overhead profiling, enabling critical insights to optimize LLM serving performance in production environments.
InXpect: Lightweight XDP Profiling
Vladimiro Paschali, Andrea Monterubbiano, Francesco Fazzari (University of Rome "La Sapienza"), Michael Swift (University of Wisconsin—Madison), Salvatore Pontarelli (University of Rome "La Sapienza")
Abstract: The eBPF eXpress Data Path (XDP) allows high-speed packet processing applications. Achieving high throughput requires careful design and profiling of XDP applications. However, existing profiling tools lack eBPF support. We introduce InXpect, a lightweight monitoring framework that profiles eBPF programs with fine granularity and minimal overhead, making it suitable for XDP-based in-production systems. We demonstrate how InXpect outperforms existing tools in profiling overhead and capabilities. InXpect is the first XDP/eBPF profiling system that provides real-time statistics streaming, enabling immediate detection of changes in program behavior.
BPFflow - Preventing information leaks from eBPF
Chinecherem Dimobi, Rahul Tiwari, Zhengjie Ji, Dan Williams (Virginia Tech)
Abstract: eBPF has seen major industry adoption by enterprises to enhance observability, tracing, and monitoring by hooking at different points in the kernel. However, since the kernel is a critical resource, eBPF can also pose as a threat if misused, potentially leading to privilege escalation, information leaks and more. While effective to some extent, existing mitigation strategies like interface filtering are coarse-grained and often over-restrictive. We propose BPFflow, a flexible framework for the system administrator to define policies that specify sensitive data sources, trusted sinks and permitted flows between them. These policies are enforced by an Information Flow Control (IFC) system within the eBPF verifier to track the propagation of sensitive data to prevent unauthorized leakage to userspace or any other untrusted sinks without any runtime overhead.
Gecko: High-Quality Video Streaming via Generative Prompt Chunks
Jiangkai Wu (Peking University), Liming Liu (Peking University), Yong Cui (Tsinghua University), Xinggong Zhang (Peking University)
Abstract: As internet video evolves towards higher quality, it poses challenges to the Quality of Experience (QoE) of adaptive streaming systems. To deliver high visual quality while avoiding rebuffering, we propose Gecko, an adaptive streaming system based on Prompt Inversion. At the media server, high-quality video is inverted to low-bitrate prompts. At the client, the received prompts are used to reconstruct high-fidelity video. To support videos with large-scale movements, a temporal-structural prompt is proposed to explicitly control temporal changes. To support high-resolution, an inverse upsampling algorithm is introduced, which integrates upsampling into the inversion. To further reduce the bandwidth usage, a chunk-wise inverse prompt is proposed. We implement Gecko on Puffer, with fine-grained integration of both browser client and media server. Evaluations under real-world network traces demonstrate Gecko can reduce bandwidth usage by 10x compared to H.264, avoid rebuffering by 91.2% compared to DVC. Moreover, Gecko can generate 4K videos at 69 FPS with a single RTX 4090D GPU.
TeleGS: End-to-End Monocular Gaussian Head for Immersive Telepresence
Zipeng Pan (Communication University of China), Yuan Zhang (Communication University of China), Tao Lin (State Key Laboratory of Media Convergence and Communication, Communication University of China)
Abstract: Current immersive telepresence systems face significant deployment barriers due to prohibitive hardware costs and stringent environmental requirements. To address these challenges, we propose TeleGS, a novel monocular 3D head reconstruction framework that synergizes 3D-GAN priors with 3D Gaussian Splatting (3DGS), offering the potential for immersive telepresence on accessible singlecamera consumer-grade hardware. Specifically, we design a fast initialization process that directly generates 3D Gaussians from 3D-GAN features and introduce a decoupled appearance model that combines view-independent features from 3D-GAN priors with view-dependent color prediction to achieve high-fidelity rendering. Furthermore, we implemented an end-to-end pipeline by integrating 3D-GAN inversion. Our system achieves 30× faster rendering and 1.22dB PSNR gain over the best-performing baseline, paving the way for practical, real-time 3D telepresence applications on consumer hardware.
Octavius: Towards Efficient Transmission of 3D Point Clouds via Adaptive Encoding and QUIC
Muhammad Haseeb (New York University), Eugene Chai (Nokia Bell Labs), Matteo Varvello (Nokia Bell Labs)
Abstract: The popularity of point cloud data is rising, driven by applications in virtual reality, urban planning, and volumetric videos. Real-time streaming of point clouds is challenging due to their large size. Current solutions focus on viewport adaptation, with less emphasis on transcoding, essential for streaming to diverse clients. Traditional transcoding involves creating multiple bitrate versions through subsampling and encoding, which is computationally intensive and storage-heavy. These methods also lack agility, limiting clients to predefined bitrates. Octavius advocates for an agile and scalable transcoding scheme for point clouds, requiring minimal computational resources and storage. By strategically withholding some data from the last layer of an octree -- a popular encoding data structure for point clouds -- clients can still decode the point cloud, with quality degradation based on the withheld data. This enables a continuous set of bitrates to be generated on-the-fly by simply withholding some amount of data during transmission. Octavius further uses a smart packing (for making packets) scheme to evenly distribute video degradation in event of consecutive packets withheld or missing and leverages a QUIC-based mixed-reliability protocol to reduce latency, avoiding packet retransmissions.
NETSPLAT: Data Plane Network Assistance for Streaming 3D Gaussian Splatting Scenes
Nehal Baganal Krishna (Leibniz University Hannover), Yuang Shi (National University of Singapore), Wei Tsang Ooi (National University of Singapore), Amr Rizk (Leibniz University Hannover)
Abstract: We present Netsplat, a system that leverages data plane programmability for providing network assistance to streaming dynamic 3D Gaussian Splatting scenes. 3DGS offers fast client-side rendering at the cost of huge frame sizes. Netsplat provides dynamic reconfiguration of multicast groups as well as alternative queueing and scheduling and in-band client-side reconfiguration capabilities to explicitly coordinate the 3DGS streaming clients with the network conditions.
miVirtualSeat: A Next Generation Hybrid Telepresence System
Klara Nahrstedt (University of Illinois at Urbana-Champaign), Ramesh Sitaraman (University of Massachusetts Amherst & Akamai Tech), Jacob Chakareski (New Jersey Institute of Technology), Michael Zink (University of Massachusetts Amherst), Mingyuan Wu (University of Illinois at Urbana-Champaign), Lingdong Wang (University of Massachusetts Amherst), Bo Chen (University of Illinois at Urbana-Champaign), Ruifan Ji (University of Illinois at Urbana-Champaign), Kuan-Ying Lee (University of Illinois at Urbana-Champaign), John Murray (University of Massachusetts Amherst), Simran Singh (New Jersey Institute of Technology)
Abstract: Post COVID, teleconferencing systems such as Zoom, Teams, and Webex have become a part of our daily life. As many employees have returned to work part-time in person, hybrid teleconferencing has become the norm. However, current solutions are not well equipped to make remote participants part of the holistic conversation that may exist in the physical meeting room space. In this lightning contribution, we present the design of miVirtualSeat, a telepresence system for small hybrid meetings with an architectural vision in which remote and physical participants feel co-presence. We will discuss the architectural design constructs of miVirtualSeat for remote and physical participants, and the services and protocols that provide underlying support.
FrameTrace: Frame-Level Telemetry for Media over QUIC
Birkan Denizer (Kiel University), Lasse Winkel (Technical University of Applied Sciences Lübeck), Olaf Landsiedel (Hamburg University of Technology & Kiel University)
Abstract: Live-streaming and remote control applications increasingly demand low end-to-end latency. Yet, evaluation tools of today give only connection-level or segment-level metrics. Coarse telemetry obscures the distinction between stalls experienced during encoding, propagation, and rendering, thereby concealing system bottlenecks. For example, under lossy conditions, long Group-of-Pictures (GoP) sizes increase stall duration as delayed I-frames block decoding of dependent frames. Prior studies of Media over QUIC (MoQ) use average segment times or logs from video players, leaving the frame-level latency and quality trade-off largely unexplored. In this paper, we introduce FrameTrace, a frame-level logger for analyzing Quality of Experience telemetry. Using FrameTrace, we demonstrate that MoQ itself cuts latency by 83% and 63% at 5% packet loss and 10 Mbps bandwidth limitation, respectively, compared to Low-Latency DASH (LL-DASH). As a case study, we investigate how dynamically adapting GoP size changes stall duration and perceived quality. A shorter GoP size improves the VMAF score by up to 30% compared to a longer one. Driven by FrameTrace, our lightweight real-time GoP adaptation controller reduces latency by an additional 10% while increasing VMAF by 3.65%.
PulseQUIC: Enhancing QUIC-Based Video Streaming through DRL-Guided Adaptive Pacing
Abstract: The adoption of QUIC for HTTP Adaptive Streaming (HAS), coupled with the rise of 5G, enables seamless 4K video delivery. However, it also calls for smarter congestion control and bitrate adaptation to maintain high Quality of Experience (QoE). While QUIC's user-space architecture supports easy integration of client-side adaptive bitrate (ABR) algorithms, server-side learning-based congestion control---originally designed for TCP---remains difficult to integrate due to its reliance on kernel-level changes. To address these challenges, we propose PulseQUIC- a cross-layer learning-based pacing mechanism that combines server-side transport metrics with real-time client feedback. PulseQUIC dynamically adjusts the pacing rate based on current network conditions and can be seamlessly integrated into any QUIC implementation or congestion control algorithm, offering significant performance gains. Experimental results demonstrate that PulseQUIC increases the client-side VMAF by ~33% while reducing server-side RTT by ~30% for low-latency live (LLL) streaming mode.
Towards Available Bandwidth Estimation for Low-Latency Up-Streaming Amidst Application-Limited Scenarios
Zhidong Jia (Peking University), Li Jiang (Peking University) Zhang Yihang (Peking University), Xinggong Zhang (Peking University), Wei Zhang (bytedance), Lan Xie (bytedance), Feng Qian (ByteDance), Leju Yan (ByteDance), Bing Yan (ByteDance), Qiang Ma (ByteDance), Zhou Sha (ByteDance), Wei Yang (ByteDance Inc.), Yixuan Ban (ByteDance)
Abstract: Low-latency live streaming (LLS) has seen significant growth in recent years. However, real-world data study shows that up-streaming from users to server often experiences stalling or poor video quality. One key factor is that the classical congestion control algorithms (CCAs) struggle to provide accurate bandwidth estimates in application-limited scenarios. In this paper, we introduce Camel, a novel bandwidth estimation system for LLS up-streaming. It estimates the available bandwidth by packets trains, decoupling the bandwidth estimation from application-layer throttling. The video bit-rate is determined by minimizing the potential congestion latency with fairness. To enable real-world deployment, Camel addresses three practical challenges: (1) Low-latency constrained bitrate adaption. (2) Prioritization of various packets. (3) Shallow-buffer-aware bursting. Camel was evaluated on one of the world's largest video streaming platform with A/B test by total 250M users and 2B sessions. The results demonstrate that Camel improves streaming bitrate by up to 14.4% and reduces stalling ratio by up to 14.1% compared to the existing online system.
MoQ Resilience: Implicit Fast Failover
Felicián Németh (Budapest University of Technology and Economics,HUN-REN-BME Cloud Applications Research Group), Zoltán Szatmáry (Budapest University of Technology and Economics), István Pelle (Budapest University of Technology and Economics,HUN-REN-BME Cloud Applications Research Group), Tamás Lévai (Budapest University of Technology and Economics,HUN-REN-BME Information Systems Research Group)
Abstract: Media over QUIC (MoQ) is an emerging IETF initiative that enables low-latency and large-scale media transmission for various applications. This paper presents an application-level approach for enabling fast and implicit failover mechanisms in MoQ networks, focusing on subscriber-driven strategies that require no protocol modifications and remain fully compatible with the current MoQ specification. We propose a configurable fast and implicit failover mechanism for MoQ, allowing subscribers to establish connections to multiple relays and switching relays in case of a transmission error. Our implementation demonstrates that the failover mechanisms can significantly improve high availability and resource utilization in MoQ deployments. Early evaluation results validate the effectiveness of this approach.
Real-Time AI-Driven Avatar Generation for Sign Language in HTTP Adaptive Streaming
Daniele Lorenzi (Alpen-Adria Universität Klagenfurt), Emanuele Artioli (Alpen-Adria-Universität Klagenfurt), Farzad Tashtarian (Alpen-Adria Universität Klagenfurt), Christian Timmerer (Alpen-Adria Universität Klagenfurt)
Abstract: As digital media consumption over the Internet surges globally, ensuring accessibility for all users becomes paramount. For people with hearing impairments, this means providing inclusion beyond classic captioning, which does not convey the full emotional and contextual depth of spoken content. This work addresses this accessibility gap by exploring the use of AI-generated avatars capable of translating speech into sign language in real-time. After defining the multifaceted challenges in this domain, we propose a novel AI-driven task partition to animate avatars for accurate and expressive sign language interpretations in live streaming.
Evaluation of Packet Wash for Low-Latency High-Bitrate Game Streaming
Abstract: The rapid advancement of immersive multimedia applications, such as cloud gaming, necessitates streaming technologies that deliver both low-latency and high-bitrate to ensure a seamless Quality of Experience (QoE). Conventional transport protocols like TCP and UDP often fail to simultaneously meet these demands, with challenges such as bufferbloat exacerbating latency or loss events that disrupt real-time experience in case of congestion. To address these limitations, this study evaluates the packet wash mechanism introduced in the Big Packet Protocol (BPP) and made possible by Scalable Video Coding (SVC), tailored for real-time applications like cloud gaming. The packet wash mechanism can discard on-the-fly higher-quality payload layers in network buffers during congestion events, preventing gameplay interruptions without requiring server-side negotiation or re-encoding. This network-based approach minimizes the effects of congestion compared to traditional bitrate adaptation methods. Experimental results for 2K game streaming demonstrate that the packet wash mechanism preserves visual quality with negligible degradation during sudden bandwidth drops.
CROSS: A Dual-Sided Scheduling Framework for Efficient Multipath Video Streaming
Bowen Hu (Renmin University of China, ByteDance), Tong Li (Renmin University of China), Chunyu Qiao (ByteDance), Jingkun Cao (Renmin University of China, ByteDance)
Abstract: Emerging video applications require more bandwidth than single-path transmission can deliver reliably. Multipath transmission aggregates bandwidth across multiple network paths, yet introduces packet reordering caused by data and ACK path heterogeneities. We propose CROSS, a dual-sided scheduling framework for efficient multipath video streaming. CROSS features (a) client-driven data scheduler that assigns parallel requests to dedicated paths to mitigate reordering and when bandwidth is insufficient, falls back to minRTT scheduling and (b) link-state-aware ACK scheduler that intelligently routes ACK traffic via two-stage scoring. Implemented atop XQUIC and evaluated with real-world network traces, CROSS increases transport throughput by up to 38%, boosts cached data by up to 24%, and reduces rebuffering time by as much as 52% across diverse network conditions.
QUIC Performance Anomalies Diagnosis with Qlens
Ziliang Zhang (Beijing University of Posts and Telecommunications), Bo Wang (Tsinghua University), Wufan Wang (Beijing University of Posts and Telecommunications)
Abstract: As live video streaming increasingly dominates network traffic, transport protocol optimization becomes crucial for achieving sub-second latency and stable throughput in edge computing environments. QUIC (Quick UDP Internet Connections) protocol [1] emerges as a transformative solution, combining zero-RTT handshakes, stream multiplexing, and user-space flexibility - characteristics particularly suited for edge-based video delivery systems. However, the user-space implementation may give rise to anomalies in QUIC, resulting in suboptimal transport performance. Consequently, efficient detection of performance anomalies during QUIC transmission and identifying whether they stem from network issues or QUIC's implementation is pivotal for maintaining service reliability in production environments.
Enhancing Immersive Telepresence through Super-Resolution for Adaptive Streaming
Peyman Mashhadi (Halmstad University), Eirini Liotou (Harokopio University of Athens)
Abstract: Immersive media such as telepresence and holographic communication demand high-quality 3D visuals under fluctuating network conditions. We propose integrating superresolution (SR) techniques into adaptive streaming to enhance point cloud quality without increasing bandwidth. By deploying deep SR models at 5G edge servers, we enable scalable, high-fidelity experiences even under limited connectivity. Our system bridges the gap between network-aware adaptation and perceptual enhancement, providing a robust solution for next-generation telepresence applications.
DNScope: Detecting DNS Misconfigurations through Graph Neural Static Reasoning
Kaiqiang Hu, Haizhou Du (Shanghai University of Electric Power); Ziyi Wang (Xiamen University)
Abstract: The Domain Name System (DNS) underpins Internet reliability but remains vulnerable to subtle, multi-zone misconfigurations that can trigger widespread outages. We present DNScope, a purely static analysis framework that encodes DNS zone files as a Trace Configuration Graph (TCG), capturing cross-nameserver, intra-zone, and inter-zone dependencies. A Gated Graph Neural Network with an attention mechanism learns node representations that integrate local record features and global structural patterns, enabling unsupervised anomaly detection of malformed subgraphs without any runtime data or manually defined rules. On the DNS datasets, DNScope pinpoints misconfiguration root causes, paving the way for automated DNS reliability and self-healing.
ConfSum: Towards Automatic Summarization of Network-scale Operational Intents from Device Configurations
Rundi Zhai (Beijing University of Posts and Telecommunications, Zhongguancun Laboratory); Jianmin Liu (Tsinghua University); Yukai Miao, Li Chen (Zhongguancun Laboratory); Dan Li (Tsinghua University); Baojiang Cui (Beijing University of Posts and Telecommunications); Peng Zhang (Xi'an Jiaotong University); Ennan Zhai (Alibaba); Zishuo Ding (The Hong Kong University of Science and Technology (Guangzhou))
Abstract: When network operators need to understand the high-level intent behind a network's existing device configurations, they must engage in a tedious and error-prone process of manually reverse-engineering the low-level commands. We propose Configuration Intent Summarization (CIS), a new task that aims to automate this process by generating human-readable summaries of the intents embedded across a network's configurations. CIS is challenging due to the diversity of intents, the semantic gap between device-specific configurations and network-wide intents, and the need to reason about interactions between multiple devices' configurations. We present ConfSum, a system that addresses these challenges by leveraging the unique ability of large language models (LLMs) to parse semi-structured configuration files and summarize them in natural language. However, the full CIS task requires reasoning about device interactions and other complexities that are beyond the capabilities of LLMs alone. To enhance the LLM's robustness to these challenges, ConfSum introduces novel techniques for retrieving relevant examples to augment LLM prompts, decomposing the generation process to handle multi-device intents, and integrating with formal validation tools. Our experiments demonstrate that Conf-Sum achieves high intent coverage while generating summaries that match the quality of human experts.
SafeMigration: Safe Large-scale Migration Planning via Symbolic Execution
Zibin Chen, Lixin Gao (Univ. of Massachusetts); Ying Zhang (Meta)
Abstract: Datacenter and Internet Service Provider networks undergo maintenance or upgrades frequently. To ensure a smooth migration without negative impact of live traffic, network operators must plan steps to take carefully. In this paper, we present SafeMigration, a system for safe migration planning. SafeMigration avoids exhaustive searches by using symbolic variables to represent network states, and encodes the planning problem into an SMT (Satisfiability Modulo Theories) problem. Evaluations using real-world migration scenarios demonstrate that SafeMigration generates migration plans 50 times faster and can be up to 2 orders of magnitude faster in the best case.
A Forwarding-Path-Aware Sampling Strategy for Config2Spec
Shangsen Li, Lailong Luo, Changhao Qiu, Bangbang Ren (National University of Defense Technology); Deke Guo (Sun Yat-sen University)
Abstract: Network specification plays an important role for network management tasks, such as configuration verification, synthesis, repair and network change. Although Config2Spec has proposed an efficient mining workflow by combining the power of data plane analysis and control plane verification, the sampling strategy for selecting the next failure environment proposed only considers the effect of link's crossing policies. To further improve the efficiency of mining process, in this paper, we propose a forwarding-path-aware sampling strategy for Config2Spec. By considering both the effect of forwarding paths and crossing policies, we assign an exponential decreasing weight to the links in a forwarding path. Thus the mining process can analyze more new forwarding path with fewer sampling steps. The comprehensive experiments on real topologies (Bics and Columbus) demonstrates the effectiveness of the forwarding-path-aware sampling strategy. Compared with the policy-based sampling strategy, the forwarding-path-aware sampling strategy improves the efficiency of mining process by 11.52% to 40.85%.
Incremental Network Configuration Verification via Localized Subspecification
Haoxian Chen (ShanghaiTech University)
Abstract: Verifying large network configurations can take hours or even days. Yet, network updates occur frequently as operational goals and conditions evolve. In this work, we study the problem of incremental network verification. Leveraging the observation that many updates are small, we propose a lightweight verification method based on localized subspecifications. Given an update location, we extract from the original verification conditions a subset of constraints that is both necessary and sufficient to preserve the global property. If the update satisfies the subspecification, correctness is guaranteed without re-running full verification. This approach offers a promising direction for reducing verification overhead and enabling timely feedback during network evolution.
Scalable BGP Simulation of Hyper-Scale Data Center Networks
Mengrui Zhang, Xiaoqiang Zheng, Lizhao You (Xiamen University); Ziyang Yao, Yang Wang, Rui Wen, Zhi Zhang, Ronghua Sun, Yuanhui Zhong, Haihua Li (Huawei Technologies Co. Ltd); Fei Yuan (Xiamen University); Yuanxun Kang (Yealink Technologies Co. Ltd); Qiao Xiang (Xiamen University)
Abstract: Modern cloud data center networks (DCNs) are hyper-scale, consisting of tens of thousands of switches, which poses significant challenges for simulation-based control-plane verification. Existing simulation tools face three key limitations when simulating BGP in such large-scale DCNs: poor scalability in large topologies, inaccuracy under non-monotonic configurations, and a lack of support for incremental simulation. To overcome these challenges, we present vBGPSim, a versatile BGP simulator designed for hyper-scale DCNs. vBGPSim enhances scalability through topology compression and a generalized Dijkstra's algorithm that replaces the iterative algorithm. It improves simulation accuracy under the non-monotonic configurations by integrating a withdrawal mechanism into Dijkstra's algorithm and supports incremental simulation using the same mechanism. Extensive evaluations on production and synthetic DCNs show that vBGPSim scales to networks with over 10,000 switches, while accurately handling non-monotonic updates and supporting efficient incremental simulation.
LTL-based Specifications for P4 Program Synthesis
Lorenzo Theunissen, Sebastijan Dumančić, Fernando Kuipers (Delft University of Technology)
Abstract: Programmable networks enable us to define the behaviour of a network through software. This added freedom comes with added complexity because multiple switches need to coordinate and be programmed correctly. To ease this task, we focus on intent-based networking via program synthesis. In this paper, we explain how to leverage linear temporal logic to describe the desired behaviour of a program, how to verify a P4 program against that description, and how to use the formula describing the program's behaviour to reduce the search space of the program synthesiser.
Protocol Vulnerability Detection Method Based on Fuzzing in Low Earth Orbit Satellite Network
Chuntao Lan, Lin Yao, Guowei Wu (Dalian University of Technology); Ziyuan Tian (Shanghai Dahua Surveying & Mapping Company)
Abstract: Low Earth Orbit (LEO) satellites play a vital role in satellite Internet systems due to their low latency and wide coverage. However, the dynamic topology and limited resources of LEO networks make them vulnerable to protocol-level security threats. Most existing studies focus on network performance or protocol design, with limited attention to protocol vulnerability detection. To address this gap, this paper introduces a protocol detection method for LEO networks based AFLnet (LNAFLnet). To cope with LEO-specific challenges, an adaptive sliding window mechanism is proposed to ensure connection stability while mitigating congestion and routing pollution. Experiments demonstrate that the method improves target selection accuracy, reduces link occupancy and drops average delay compared with existing fuzzing method. We apply the proposed method to a variant of the OSPF protocol designed for LEO networks, and further validate its effectiveness through testing.
Modeling and Evaluation of Elastic SFC Based on HCPN
Rulin Zhang, Hui Dong, Ju Zhang, Hua Li (College of Computer Science, Inner Mongolia University)
Abstract: Service Function Chaining (SFC) has enhanced the flexibility of Network Function Virtualization (NFV) through dynamic orchestration and reduced operational and maintenance costs, but challenges remain to achieve optimal elasticity. This study investigates the elasticity of SFC and addresses the current lack of an effective elasticity evaluation method. We propose a hierarchical Colored Petri Net (HCPN) model with temporal extensions to represent the ESFC behavior. It accurately simulates request processing, resource allocation, and elasticity policy execution. Then, we design a quantitative evaluation model based on multidimensional metrics and develop an evaluation algorithm to select the optimal ESFC from a candidate pool. The evaluations show that the proposed model and the elastic strategies are effective.
Refining specifications for configuration repair with side effect diagnosis
Ryusei Shiiba (Sokendai); Satoru Kobayashi (Okayama University); Osamu Akashi (National Institute of Informatics); Kensuke Fukuda (NII/Sokendai)
Abstract: Writing specifications in configuration repair has become hard for network operators due to the increasing complexity of routing policies. If operators fail to describe critical routing policies in the specifications, the repair tools could introduce unexpected changes in routing behaviors (i.e., side effects) that violate the policies. To ease the operational burden of specification writing, we explore a specification refinement approach that (1) identifies such side effects by repair and (2) refines the specifications to prevent the side effects, rather than requiring operators to write the complete specifications from scratch. To realize this approach with existing repair tools, we propose SEA, a system that can diagnose the side effects by repair. By designing the refinement process using an existing repair tool and SEA, we first demonstrate how SEA helps operators write the specifications to prevent side effects. Second, we show that SEA completes the side effect diagnosis within ten seconds for various numbers of synthetic configuration changes on real network topologies.
Formal Specifications for Data Plane Programs
Keyu Yuan, Baber Rehman (Huawei Technologies)
Abstract: Networks are complex in nature consisting of numerous configurations. Each network configuration requires careful consideration due to the high runtime impact. Software-defined Networking (SDN) opens control plane and data plane programmability for the end user. The network programmability, on one hand, allows flexibility of defining custom headers and protocols. On the other hand, it opens new horizons of network misconfigurations. Formal methods are being applied in various domains for building robust systems. Recent studies explored the applications of formal methods in network programming. This paper explores the possibility of using Backus-Naur Form (BNF) for writing formal specifications for P4 parsers and tables. The compiler checks at compile time that the specifications are consistent with the implementation and ensures correctness.
Poster: High-Performance Centralized Parallel Data-Plane Verification for Hyper-Scale DCNs
Abstract: Existing data-plane verifiers face severe performance challenges in verifying hyper-scale underlay data center networks (DCNs) - centralized verifiers often fail to fully exploit the parallelism offered by the modern multi-core CPUs, while distributed verifiers suffer from high overhead due to task distribution and inter-node communication. To overcome these limitations, this paper introduces HeTu, a high-performance centralized parallel data-plane verifier specifically for verifying hyper-scale underlay DCNs via three key designs: (1) a fully parallel verification framework with small graph construction overhead, (2) a new binary decision diagram management strategy that uses separated storage and selectively caching network-level predicates to reduce redundant operations, and (3) an optimized forwarding graph model that aggregates parallel tasks to eliminate redundant computation. Extensive evaluations on synthetic FatTree and large-scale production datasets show that HeTu outperforms state-of-the-art algorithms in runtime by at least 250× with the same-order memory usage, demonstrating its superior efficiency in data-plane verification of hyper-scale DCNs.
Optimising LEO Gateway Placement for the People
Prince Bhardwaj Pawankumar Sharma, Abdullahi K. Abubakar, Nishanth Sastry (University of Surrey)
Abstract: Low Earth Orbit (LEO) satellite networks are rapidly becoming the default broadband solution in developing countries, especially in areas lacking reliable terrestrial infrastructure. This paper investigates how gateway (GW) placement affects user experience in such networks, focusing on key performance metrics such as hop count and latency. Using real-world Starlink deployment data and global population distributions, we compare three gateway placement strategies: Starlink's current deployment (382 GWs), a country-centroid model (241 GWs), and Internet Exchange Point (IXP) co-location (382 and 787 locations). Our analysis shows that IXP co-location delivers superior performance, connecting (96.8%) of global users via direct bent-pipe links compared to country-centroid placement (94%) and Starlink's current layout (83.9%). While country-centroid placement appears optimal theoretically, it fails practically in landlocked regions lacking Point of Presence (POP) backhaul infrastructure. Rwanda exemplifies this challenge, forcing Starlink to deploy community gateways instead. Conversely, Starlink's strategy of leveraging existing IXP and POP locations proves effective, demonstrating that even revolutionary satellite technology must align with terrestrial infrastructure realities. Since bent-pipe connectivity offers significantly lower latency than multi-hop inter-satellite paths, these results demonstrate that optimal gateway placement requires alignment with both population density and existing internet infrastructure. This is especially crucial in the Global South, where LEO networks are increasingly relied upon for everyday internet access.
Simulation and Comparison of Vehicle Satellite Connectivity under a 3D Foliage Environment: Starlink and OneWeb
Kevin T. Li, Christian A. Hofmann, Andreas Knopp (University of the Bundeswehr Munich)
Abstract: Satellite communication for mobility use-cases is emerging as a key market for low-Earth orbit satellite constellations, offering broadband internet with sufficiently low latency in regions lacking terrestrial coverage. A commonly overlooked challenge in these scenarios is dynamic blockage caused by dense foliage or complex topography, where a line-of-sight satellite link can be difficult to establish. This work introduces a simulation framework to estimate key performance indicators for given constellation designs in a 3D geospatial environment comprising natural and build features. We demonstrate the simulator in a selected 3D scenario, showcase its capabilities, and compare the performance of two commercially active mega-constellations in delivering broadband connectivity under these conditions.
The Internet from Space, Reimagined: Leveraging Altitude for Efficient Global Coverage
Chris Misa, Ramakrishnan Durairajan (University of Oregon and Link Oregon)
Abstract: The Internet from Space has recently attracted renewed attention following technological developments that enable massive constellations of small satellites in low Earth orbit (LEO). While LEO satellite networks (LSNs) promise low-latency global connectivity, they face several fundamental challenges including the need for constant satellite replacement due to orbital decay, affecting environmental sustainability, and increasing congestion in orbital space from emerging players, heightening collision risks. In this work, we propose to expand the LSN constellation design space by including use of altitude as a flexible design parameter to help solve the aforementioned challenges---i.e., by constructing constellations with orbits throughout the range implied by the classic LEO, medium Earth orbit (MEO), and geosynchronous orbit (GEO) designators. Although altitudes above LEO induce higher propagation latency, they also increase how much of the Earth's surface is visible to each satellite, thereby significantly reducing the total number of satellites required for global coverage. Building on this intuition, we provide an initial theoretical analysis of the tradeoffs enabled by orbital altitudes from LEO all the way up to GEO and conduct packet-level simulations demonstrating that MEO constellations can achieve present-day Internet latencies while using ~19× fewer satellites and ~14× fewer handovers than LEO.
A Distributed Data Store in Orbit
Joerg Ott (Technische Universität München); Jussi Kangasharju (University of Helsinki); Nitinder Mohan (TU Delft)
Abstract: Emerging Low Earth Orbit (LEO) satellite constellations have been considered for uses beyond plain Internet access, including content caching and edge computing. Assuming satellites are equipped with inter-satellite links, we propose using these links and thus the space in-between satellites, paired with a dedicated satellite queuing system, to "store" data and provide access by keeping data in constant flux around the globe. We describe the properties and explore the capabilities of such a system and discuss some potential uses.
FjordLink: Comparison of Starlink and 5G Networks for Teleoperated Vessel Control
Birkan Denizer (Kiel University); Nils Dohse (ADDIX GmbH); Olaf Landsiedel (Hamburg University of Technology & Kiel University)
Abstract: The rapid growth of Low Earth Orbit satellite networks, such as Starlink, is increasing global connectivity by enabling low-latency broadband access in regions where wired and cellular networks fall short. Prior research focuses on the performance of Starlink in terrestrial settings. Yet, there is limited research on the performance of Starlink in coastal and maritime environments, raising the question of how Starlink performs in the presence of waves and tides. In this paper, we introduce the FjordLink, a combined Starlink and 5G dataset for coastal maritime connectivity. We collect over 500,000 measurements using a Flat High Performance dish and 5G modems on a research vessel for four months. Starlink and 5G networks achieve median RTTs of less than 50 ms and mean upload throughputs exceeding 35 Mbps. Our results show that Starlink operates similarly (e.g., with a 10 ms median latency difference) in both maritime and terrestrial environments, and improves the 99th percentile latency compared to 5G networks. As a case study, we utilize traces from FjordLink in emulation to evaluate BBR, CUBIC, and Reno congestion control algorithms, where BBR achieves 18% higher upload throughput than CUBIC and Reno.
Better Fill Up Your Pipe – Revisiting Starlink's Burst Characterization
Till Zimmermann, Eric Lanfer, Dominic Laniewski, Simon Brinkmann, Nils Aschenbruck (Osnabrück University)
Abstract: We present a comprehensive analysis of Starlink's physical layer performance based on a year-long UDP measurement campaign. Building upon a known methodology, we replicate and improve the approach to capture frame-level data rates with high granularity, identifying discrete modulation steps, and proposing an improved modulation model. Our approach reveals distinct symbol allocation patterns and validates modulation schemes ranging from robust low-order to higher-order Quadrature Amplitude Modulation (QAM). The findings provide a foundation for further research into the influence of environmental and network conditions on link performance. Additionally, we offer an open dataset to support ongoing studies of Low Earth Orbit (LEO) satellite networks.
A Detailed Characterization of Starlink One-way Delay
Johan Garcia, Simon Sundberg (Karlstad University); Anna Brunstrom (Karlstad University and University of Malaga)
Abstract: Low Earth Orbit (LEO) satellite networks, such as Starlink, are transforming global Internet access by delivering high-speed connectivity to underserved and remote regions. Despite extensive research into Starlink's performance, latency characteristics remain under-explored. This study presents a comprehensive analysis of one-way delay components in the Starlink network using high-frequency, high-precision measurement probes. Over a 10-day period, more than 500 million probe packets were collected and analyzed. The results reveal minor diurnal latency variation and provide means to separate out the delay components contributing to the observed one-way delay, and we sketch a delay model and provide empirical distributions. By measuring both uplink and downlink paths, the study uncovers significant differences in scheduling behavior, with uplink delays more affected by Starlink's periodic 15-second reconfiguration cycles. The results also highlight the limitations of using too coarse measurement intervals, which can introduce aliasing effects. Our OWD data set and traffic generation tool are made available to support further research in the area.
How LLM Saved Me from Struggling with Experiment Reproduction: LEO Networking as A Case Study
Yibo Wang, Yunan Hou, Zeqi Lai, Hewu Li, Qian Wu, Jun Liu, Yuanjie Li, Xin Xie (Tsinghua University); Zhifeng Han (Xidian University)
Abstract: Reproducing network experiments is critical to advancing the research in computer networks. However, in reality many researchers often struggle with experiment reproduction: not only because reading, understanding, and debugging prior work is time-consuming and labor-intensive, but also because not all papers publicly release their code, thereby forcing subsequent researchers to re-implement experiments from scratch. In this paper, we explore an intriguing question: can recent large language models (LLMs) assist in understanding research papers and generating code, thereby accelerating the reproduction of network experiments? Focusing on the rapidly evolving area of low-Earth-orbit (LEO) satellite networks (LSN), we present LASER1, a semi-automated, LLM-assisted tool designed to facilitate the reproduction of LSN experiments. LASER judiciously integrates the capabilities of LLM with LSN simulation to ease the burden of LSN experimentation. Our case studies provide preliminary evidence that LASER can efficiently reproduce experimental results consistent with those reported in the original papers, while substantially reducing the manual effort required by LSN researchers.
MOSAIC: Piecing Together 5G and LEOs for NTN Integration Experimentation
Revika Anand, Edward Austin, Charalampos Rotsos, Paul Smith, Nicholas Race (Lancaster University)
Abstract: Despite the rapid growth of 5G technologies, geographical network coverage remains a significant challenge. In certain areas - notably rural - it is anticipated that removing these technologies will result in a complete lack of service. To address this, standards bodies, such as 3GPP, have begun advancing toward 5G-and-beyond architectures incorporating Non-Terrestrial Networks (NTNs), most notably using Low-Earth Orbit (LEO) satellite constellations to expand coverage and improve resilience of 5G terrestrial networks (TN). However, the integration of 5G and NTN introduces new challenges due to the nature of mobility, network characteristics, and deployment costs. To support the development of new 5G-NTN integration architectures, we propose MOSAIC (MObile-SAtellite Integration Cradle), a realistic end-to-end 5G-NTN emulation platform that can recreate the unique features and software of emerging mobile infrastructures. MOSAIC offers a reproducible environment for recreating realistic 5G NTN experiments, utilizing unmodified, off-the-shelf software components. MOSAIC models NTN network characteristics using a Generalized Additive Model for Location, Scale, and Shape (GAMLSS), evaluating it against open-source satellite link measurement data from Starlink. Additionally, using our platform, we assess the performance of the Multipath TCP (MPTCP) protocol to support seamless handover scenarios between TN and NTN. We believe MOSAIC provides a holistic and open environment for experimentation with beyond 5G technologies.
Towards Global Outage Detection for LEO Networks
Manda Tran, Khiet Huynh, Dravya Jain, Dylan Truong, Sirapop Theeranantachai (University of California, Los Angeles); Beichuan Zhang (University of Arizona); Lixia Zhang, Liz Izhikevich (University of California, Los Angeles)
Abstract: Low Earth Orbit (LEO) satellite networks are increasingly deployed, yet users continue to experience frequent, short-lived outages. We present Roman-HitchHiking, a system for measuring LEO satellite outages globally and in near-real time. Roman-HitchHiking significantly reduces the measurement overhead by leveraging path redundancy to eliminate duplicate probes to shared pre-satellite routers, thereby reducing overall network traffic and increasing coverage. With Roman-HitchHiking, we identify large clusters of simultaneous outages across geographically diverse regions, pointing to potential centralized failures that traditional outage detection systems overlook. Roman-HitchHiking is open-sourced to enable reproducibility and foster further research on LEO outages.
An investigation of Starlink's performance during the May '24 solar superstorm
Suvam Basak (PhD Candidate, Indian Institute of Technology Kanpur); Amitangshu Pal (Indian Institute of Technology Kanpur); Debopam Bhattacherjee (Microsoft Research India)
Abstract: Low Earth Orbit (LEO) satellites have revolutionized the consumer-grade Internet market. The main giant of this landscape, Starlink, is already operating the world's largest LEO satellite fleet of 8,000 satellites made of non-radiation-hardened components. The recent May 2024 solar superstorm created an opportunity to evaluate the performance and reliability of such a network under intense solar events. In this paper, we conduct a statistical study on the packet loss, latency, and orbital drag experienced by satellites from a long-term perspective. The results indicate marginal inflation in loss and latency during and immediately after the superstorm. While increasing the observation window size dilutes the inflation under regular performance fluctuations. Additionally, we list out a few roadblocks that need to be addressed to pinpoint the impact on any specific satellite, along with the end-user's network connectivity experience caused by solar radiation.
A Deep Dive into the Impact of Solar Storms on LEO Satellite Networks
Abstract: Low Earth Orbit (LEO) satellite networks are an important part of the global communication infrastructure today. Despite ongoing efforts to improve their resilience, they remain vulnerable to component damage and deorbiting under harsh space weather conditions. Prior work identified a modest but noticeable impact on LEO satellite network performance during solar storms, typically manifesting as an immediate rise in packet loss and a sustained increase in round-trip time (RTT). However, these studies offer only coarse-grained insights and do not capture the nuanced spatial and temporal patterns of disruption across the LEO network. In this paper, we conduct a deep dive into the impact of solar storms on LEO satellite communications. By localizing the impact of increased atmospheric drag at the level of individual satellites and orbits, we reveal significant heterogeneity in how different parts of the network are affected. We find that the degree of performance degradation varies significantly across geographic regions, depending on satellite positioning during the storm. Specifically, we find that (i) not all satellite orbits are equally vulnerable, (ii) within a given orbit, certain satellites experience disproportionate impact depending on their position relative to geomagnetic conditions, and (iii) autonomous maneuvering of satellites might be a cause of the sustained increase in RTT. Our findings uncover previously overlooked patterns of vulnerability in LEO satellite constellations and highlight the need for more adaptive, region-aware mitigation strategies to address space weather-induced network disruptions.
Why choose when you can have both: Programmable data planes meet programmable optics
Chris Misa (University of Oregon), Matthew Nance-Hall (Scala Computing, Inc.), Reza Rejaie (University of Oregon), Walter Willinger (NIKSUN, Inc.), Ramakrishnan Durairajan (University of Oregon and Link Oregon)
Abstract: Recent advances in programmable optics have shown great promise in providing runtime control over a network's topological behavior to achieve spatial adaptability (e.g., dynamically provision new wavelengths). At the same time, the emergence of programmable data plane technologies has revolutionized how a network's forwarding behavior can be controlled at runtime to accomplish temporal flexibility (e.g., "on-the-fly" traffic aggregation). Unfortunately, a lingering chasm between optical systems and digital packet systems researchers prevents modern-day network applications to simultaneously benefit from both of these exciting developments. To overcome this divide, we propose in this paper ShapeShifter, a novel and principled approach towards integrating programmability in both packet and optical layers and jointly realizing spatial adaptability and temporal flexibility in practice. To provide the necessary technological foundation for this integration, ShapeShifter relies on recent progress in runtime programmability in both communities.
Sparse Collectives: Exploiting Data Sparsity to Improve Communication Efficiency
Dhananjaya Wijerathne, Haris Javaid, Guanwen Zhong, Dan Wu, Xing Yuan Kom, Mario Baldi (AMD)
Abstract: The scalability of distributed Machine Learning (ML) systems heavily relies on efficient data exchange between multiple devices, which is typically achieved through collective operations such as all-gather, reduce-scatter and allreduce. Many ML models, including transformer-based models, exhibit significant sparsity in activations and gradients that are exchanged through these collectives. In this work, we introduce lightweight sparse collectives designed to exploit this data sparsity, aiming to minimize the communication volume while keeping the overhead of sparsity-aware compression and decompression low. These sparse collectives deliver substantial improvements in collective performance, significantly reducing their completion time. Experiments on AMD Instinct™ MI210 and MI300X GPU nodes demonstrate up to 2.96× allreduce, 2.6× allgather, and 2.85× reduce-scatter speedup. These results highlight the potential of sparse collectives to accelerate large-scale distributed training and inference systems.
Codesign of Tensors Encoding And Transcoding: A Building Block For Decentralized AI
Revant Teotia, Muhammad Haseeb (New York University)
Abstract: Decentralized AI systems must transmit data/model updates to clients operating under diverse bandwidth, compute, and memory constraints. Traditional approaches either downcast tensors to a fixed low-precision format-sacrificing accuracy-or transmit full-precision values, overwhelming resource-constrained clients and increasing training time due to stragglers. Moreover, sending different precisions to different clients rule out the possibility of using (switch based or overlay) broadcast/multicast primitives as typically, two different precisions are essentially two different bytestreams. Inability to use broadcast primitives leads to high resource utilization in an resource constraint environment. We introduce xFP, a tensor encoding scheme that enables decoding multiple precisions from the same encoded data based on how much data is decoded by a client (Figure 1). So the same encoded tensors can be broadcasted to all clients where clients decode as much precision as they can support based on how big a prefix of packets they consume. Github repository: https://github.com/HaseebLUMS/adaptive-fp-encoding
NSX: Large-Scale Network Simulation on an AI Server
Sajy Khashab, Hariharan Sezhiyan, Rani Abboud, Alex Normatov, Stefan Kaestle, Eliav Bar-Ilan, Mohammad Nassar, Omer Shabtai, Wei Bai, Matty Kadosh, Jiarong Xing (Rice University), Mark Silberstein (NVIDIA and Technion), T. S. Eugene Ng (Rice University), Ang Chen (NVIDIA and University of Michigan)
Abstract: Network innovation is key to supporting AI workloads. Packet-level simulation is indispensable for testing new network features as it enables high-fidelity experimentation. However, today's simulators struggle to scale to large topologies that are typical to AI clusters. To scale the simulation, we have built NSX which is a new simulator that takes advantage of AI servers themselves (e.g., NVIDIA's DGX) to experiment with AI networks. Network simulation has unique workload characteristics that make AI servers an ideal fit: relatively simple, parallelizable compute, with high memory bandwidth pressure. Yet, in order to fully leverage this platform, we need new techniques to rearchitect network simulators for GPU execution. We describe the design decisions that have gone into NSX, and report evaluation results from our current prototype: NSX can scale simulation to networks of 524 k nodes, and it finishes 0.1 ms simulation in less than 2 seconds on a DGX-H100 box. NSX is being used by NVIDIA's networking team on a daily basis for AI cluster design, and new features are added to it on a regular basis.
MLSynth: Towards Synthetic ML Traces
Adel Sefiane (NVIDIA and Imperial College London), Alireza Farshin (NVIDIA), Marios Kogias (Imperial College London)
Abstract: AI infrastructure continues to grow rapidly to meet the escalating demand for compute power required to train and inference increasingly capable models. This growth brings significant challenges in both the design and operation of ML pipelines. Exploring these challenges and evaluating potential solutions can be prohibitively expensive and time-consuming without effective simulation tools. This paper introduces MLSynth, a framework for synthesising ML workloads, which is essential for meaningful benchmarking of AI infrastructure. More specifically, MLSynth allows researchers to: (i) define a wide range of ML models with different parallelisation strategies, (ii) explore various sources of performance variability, and (iii) generate synthetic Chakra execution traces that can be used with existing simulation frameworks (e.g., ASTRA-Sim) to comprehensively model ML workloads.
Simulating LLM training workloads for heterogeneous compute and network infrastructure
Sumit Kumar, Arjun Temura, Naman Sharma, Ramanjeet Singh (Indraprastha Institute of Information Technology Delhi), Meet Dadhania, Praveen Tammana (Indian Institute of Technology Hyderabad), Satananda Burla, Abed Mohammad Kamaluddin (Marvell Technology Inc.), Rinku Shah (Indraprastha Institute of Information Technology Delhi)
Abstract: The growing demand for large-scale GPU clusters to train large language models (LLMs) poses a significant challenge to innovation due to high costs and limited accessibility. While state-of-the-art simulators address this issue, they assume a uniform infrastructure. However, device heterogeneity is unavoidable in cloud environments due to resource sharing, frequent updates in device generations, and the inherent intra-chip interconnect heterogeneity. We propose a heterogeneity-aware simulator for distributed LLM training that takes into account the real-world compute and network heterogeneity. Our simulator allows for custom configurations and models the impact of hardware diversity on training time.
RTT- or Bandwidth-Bound? Demystifying the KV Cache Transfer in Large Language Model Serving
Shengnan Yue (China Mobile), Mowei Wang (Huawei Technologies), Yu Yan (China Mobile), Weiqiang Cheng (China Mobile), Zihan Jiang (Huawei Technologies), Zhenhui Zhang (China Mobile and Nanjing University)
Abstract: Modern large language model (LLM) serving systems increasingly adopt a prefill-decode disaggregation architecture to enhance inference efficiency. While this design improves resource utilization, it introduces latency due to the transfer of key-value (KV) cache. The community has generally assumed that this latency is bandwidth-bound and can be effectively mitigated by high-speed interconnects. In this paper, however, we reveal a contrasting observation: in geo-distributed deployments, KV cache transfer is predominantly round-trip time (RTT)-bound, resulting in significantly reduced effective throughput even when physical bandwidth is abundant. This performance bottleneck stems from the sequential transmission of non-contiguous memory blocks allocated by Paged Attention. Through analysis, we show how current block-wise transmission mechanisms collapse under high RTT conditions, mirroring the classic TCP small send window problem. We conclude by outlining promising directions for overcoming these limitations and enabling efficient LLM inference in geo-distributed environments.
Chronos: Prescheduled Circuit Switching for LLM Training
Sundararajan Renganathan, Nick McKeown (Stanford University)
Abstract: Hundreds of thousands of accelerators are used to train LLMs, with accelerators connected by packet-switched AI fabrics. In this paper, we ask if the fabric can be built entirely from time-synchronized circuit switches instead. The goal would be to reduce power, increase switching capacity, or reduce the number of network tiers. It appears to be possible, because traffic patterns are largely known a priori. We describe a tool that analyzes the training code and deduces a sequence of permutations that will correctly schedule a crossbar throughout the training run. Expert parallelism (used with mixture-of-experts models) is the only form of parallelism that cannot be pre-scheduled. For MoE traffic, we show how Birkhoffvon Neumann decomposition can be used to schedule the crossbar on demand.
Intent Fuel Station: A RAG-Enhanced Agent Hub for Realizing Networking Intents
Abstract: Intent-based networking configuration seeks to automatically translate high-level goals into low-level, executable network operations, enabling more intuitive and efficient network management. While existing research has demonstrated the ability to map high-level intents to configurations, approaches leveraging standardized documents remain scarce. To address these limitations, this paper proposes the Intent Fuel Station (IFS), a framework built upon multi-agent collaboration and Retrieval-Augmented Generation (RAG) techniques to support an end-to-end intent translation pipeline. IFS integrates hierarchical semantic injection with role-specialized agents, forming a structured system for intent realization. It introduces a tiered semantic fueling mechanism that adapts knowledge enhancement strategies to task complexity, and employs a threefold reliability scheme consisting of template constraints, parameter alignment, and error correction to ensure the structural integrity of generated configurations. Experimental results on IntentCraft and IntentGen datasets demonstrate that IFS achieves effective YAML validity accuracy, establishing a modular and verifiable paradigm for intent-based network automation.
LAPS: Joint Load Balancing and Congestion Control on Unequal-cost Multi-path Data Center Networks
Ying Wan (Southeast University), Haoyu Song (Futurewei Technologies), Yu Jia, Yunhui Yang (China Mobile), Tao Huang (Purple Mountain Laboratories), Zhikang Chen (Tsinghua University)
Abstract: The assumption of equal-cost paths no longer holds for newer data center network topologies catering for HPC/AI workloads, challenging both load balancing and congestion control. The existing load-balancing schemes, including random packet spraying, fail to adapt to such networks. In this paper, we propose LAPS, a simple latency-aware packet spraying scheme, to achieve joint load balancing and congestion control regardless of network topology and traffic pattern. As a coherent load-balancing and congestion-control solution, LAPS manages the packet sending rate and distribution simultaneously based on real-time path latency. It adapts to both TCP and RoCE-based transport protocols and can be deployed on Smart-NICs at a low implementation cost. Evaluations show that LAPS consistently outperforms the other load-balancing and congestion-control schemes in unequal-cost multi-path topologies for HPC/AI workloads.
Sparkle: Optimizing the Serverless AIGC Deployment over Crowdsourced Edge Environments
Kaile Zhu, Shihao Shen (Tianjin University), Tuo Zhang (University of Southern California), Xiaofei Wang (Tianjin University), Xiaoliang Wang (Nanjing University), Xin Jiang, Wenyu Wang (Paiou Cloud Computing), Hai Jin (Huazhong University of Science and Technology)
Abstract: The rise of Artificial Intelligence Generated Content (AIGC) applications demands scalable, cost-effective, and adaptable infrastructure. Crowdsourced edge networks, which utilize idle resources from third-party devices, offer low-cost environment essential for AIGC. This decentralized approach reduces capital expenditures and carbon footprints but faces challenges such as hardware volatility, where servers can unpredictably join or leave the network. The dynamic availability of servers leads to frequent service redeployments, necessitating efficient deployment procedure. In this paper, we propose a novel approach to deploying serverless AIGC applications, termed Sparkle. This method optimizes the deployment process by leveraging file-level granularity in image management and utilizing distributed image pulling and caching across edge networks. Experimental results from popular AIGC models demonstrate that Sparkle achieves up to 3.5x faster deployment times and a 28% reduction in registry storage space, outperforming current state-of-the-art technologies.
A Cloud-Edge Collaborative Inference System for Data-secure LLM Serving
Wenjie Chu, Chunhui Du, Yunfeng Shao (Huawei Technologies)
Abstract: The surge in private deployment of large language models (LLMs) driven by open-source advancements has intensified challenges in computational scalability, infrastructure costs, and data privacy. While cloud-edge collaborative inference frameworks alleviate local resource constraints through elastic cloud offloading, their efficacy in wide-area networks (WANs) is hindered by communication inefficiencies and privacy risks. This paper proposes CROSS-SEC, a novel cloud-edge collaborative inference framework integrating cross-WANs PD disaggregation with split learning (SL) for data security preservation. To mitigate transmission bottlenecks, CROSS-SEC introduces a layerwise KVCache computation-communication overlapping mechanism, coupled with asychromous concurrent transmission to eliminate ACK-induced latency. For congestion control, a dual-grained scheduling strategy is proposed: (1) KVCache-level priority scheduling across multi-user/multi-prefill requests ensures first-come-first-serve processing, and (2) latency-sensitive prioritization of latent variables over KVCache transfers guarantees TPOT compliance with SLAs. Experimental validation demonstrates that CROSS-SEC reduces TTFT by 19.83% and improves throughput by 3.63% compared to state-of-the-art frameworks, while maintaining data privacy through SL-based input/output confinement.
LLMs on Edge: Network Traffic Characteristics of Distributed Inference under the Loupe
Philippe Buschmann, Arne Bröring (Siemens AG), Georg Carle (Technical University of Munich), Andreas Blenk (Siemens AG)
Abstract: Large Language Models (LLMs) have revolutionized Natural Language Processing and now find their ways into various deployments such as end customer appliances or industrial settings. Their deployment at the edge, however, presents unique challenges, particularly regarding network infrastructure and resource constraints. While existing research has focused on LLM distribution in cloud environments, there is a lack of studies addressing the specific requirements and characteristics of edge computing scenarios. Accordingly, there is a rise of distributed LLM frameworks that aim to optimize the deployment of LLMs in edge environments. Due to their work in progress nature, these frameworks lack comprehensive measurements in a real testbed w.r.t. networking. This paper presents a comprehensive analysis of distributed LLM frameworks in edge computing environments, focusing on their networking behavior and deployment requirements. Our measurement results reveal non-obvious behaviors from performance degradation by adding compute nodes to significant traffic pattern complexities.
LIFT: Automating Symbolic Execution Optimization with Large Language Models for AI Networks
Ruoxi Wang (Northeastern University), Kun Li, Minghui Xu, Yue Zhang (Shandong University), Kaidi Xu (Drexel University), Chunchi Liu (Huawei Technologies), Xiuzhen Cheng (Shandong University), Yinhao Xiao (Guangdong University of Finance and Economics)
Abstract: Dynamic Symbolic Execution (DSE) is a key technique in program analysis, widely used in software testing, vulnerability discovery, and formal verification. In distributed AI systems, DSE plays a crucial role in identifying hard-to-detect bugs, especially those arising from complex network communication patterns. However, traditional approaches to symbolic execution are often hindered by scalability issues and inefficiencies, particularly in large-scale systems. This paper introduces LIFT (Large-language-model Integrated Functional-equivalent-IR Transformation), a novel framework that leverages Large Language Models (LLMs) to automate the optimization of Intermediate Representations (IRs) in symbolic execution. LIFT addresses the challenges of symbolic execution by providing a scalable, context-sensitive solution for IR transformation. The framework consists of two phases: IR Analysis and Optimization, where LLMs optimize time-intensive IR blocks, and Symbolic Execution and Validation, which includes benchmarking and semantic verification to ensure correctness and generalizability. Experiments on real-world binaries demonstrated significant performance improvements, including a 53.5% reduction in execution time for bigtest and a 10.24% reduction for random, along with reductions in IR statements, PUT instructions, and temporary variables. These results demonstrate that LLMs simplify IRs while maintaining functional correctness, enhancing symbolic execution in distributed AI systems.
Reconfigurability within Collective Communication Algorithms
Rukshani Athapathu, George Porter (University of California San Diego)
Abstract: With the explosive growth of AI applications, models and datasets continue to evolve rapidly. Today's Deep neural networks (DNN) consist of billions of parameters, requiring them to utilize many high-performance neural processing units (NPUs, such as GPUs or TPUs). Many large cloud providers use Torus topology as the interconnect for these large-scale training workloads (Google TPU, AWS Trainium, etc.). Resource allocations to multiple tenants in a Torus network can result in topology slices of different sizes, and model training on these torus slices will often result in 1D, 2D, or 3D logical communication. In this paper, we first analyze the performance of state-of-the-art collective communication algorithms on various sizes of static torus topology slices. Then, we quantify the benefits that dynamically reconfigurable torus slices can bring to the state-of-the-art collective communication algorithms.
T3P: Topology-Tailored Tensor Parallelism
Saar Ben-Yochana, Chen Avin, Gabriel Scalosub (Ben Gurion University of the Negev)
Abstract: As deep learning models continue to grow in scale and complexity, methods of distributed machine learning training, and particularly those used for large language models (LLMs), have become a critical ingredient in making such computations efficient and feasible. In such contexts, tensor parallelism (TP) is widely employed to distribute computations across multiple accelerators. However, since TP mandates frequent and high-volume communication between devices, the underlying network characteristics significantly influence performance. Previous work was mostly either model-agnostic or topology-agnostic and did not pick provably optimal configurations. This study presents Topology-Tailored Tensor Parallelism, T3P, an efficient algorithm that identifies the communication-optimal TP sharding configuration (within the considered search space) based on both the model architecture and the network topology. In particular, we show that T3P is optimal for any given resharding cost model.
Quantifying the Impact of Job Placement and Routing on Network Efficiency in AI Clusters
Dante Van Poucke, Didier Colle, Mario Pickavet, Wouter Tavernier (Ghent University - Imec)
Abstract: High-performance computing (HPC) clusters are essential for training large-scale AI models, yet they often suffer from severe underutilization due to network bottlenecks. This paper investigates the critical role of job placement in multi-tenant AI clusters and its impact on network performance. We propose a flow-level system model that jointly considers job placement, network topology, and routing strategy to evaluate link loads and congestion. By analyzing optimal, random and state-of-the-art placement strategies across modern network topologies, we demonstrate that placement decisions significantly influence network efficiency. Our results show that job placement cannot be ignored even under optimal routing, and that existing placement strategies are dependent on the routing strategy. This work underscores the importance of prioritizing job placement, as suboptimal placements can lead to significant performance degradation in AI workloads on HPC infrastructure.
Yuqi Li, Tianyu Chen (Jiangsu University), Vladimir Ciric (University of Nis), Changda Wang (Jiangsu University)
Abstract: In-band Network Telemetry (INT) provides fine-grained, real-time insights into network data flows. A critical component of INT is the packet marking strategy, which determines which packets carry telemetry data---directly influencing the balance between visibility, overhead, and responsiveness. Traditional marking methods typically rely on fixed thresholds or manually tuned heuristics, limiting their adaptability in dynamic environments. In this paper, we introduce AMSO-INT (Adaptive Marking Strategy Optimizer for INT)---an approach of reinforcement learning (RL) implemented directly on the programmable data plane to optimize telemetry packet marking in real time. AMSO-INT maps key network indicators---such as link utilization gradient, queue occupancy, packet loss history, and INT header saturation---into a discrete state-action space and applies a tabular Q-learning to autonomously adjust marking intervals. By embedding decision logic within the data plane, AMSO-INT enables low-latency, closed-loop adaptation without control-plane intervention, a fundamentally novel approach in INT design. AMSO-INT effectively balances telemetry coverage, INT header bandwidth overhead, and anomaly detection latency. Experiments on a P4-based FatTree topology demonstrate that AMSO-INT achieves over 99.5% telemetry coverage while reducing bandwidth consumption by 21.3% compared to classic periodic strategies---highlighting its potential for scalable, intelligent, and self-optimizing network telemetry.
Abstract: The Border Gateway Protocol (BGP) is the Internet's de facto routing protocol, responsible for exchanging reachability information between Autonomous Systems (ASes). As a policy-based protocol, BGP is implemented on the border routers of ASes. It is commonly assumed in research that routers withing ASes use the same BGP policies and AS paths for the same prefix. However, each border router seperately maintains multiple BGP sessions and selects the best route for a prefix by evaluating all learned routes. In this paper, we show that ASes do not always follow this assumption. Indeed, we find that almost 10% of observed ASes use different paths for the same prefix at the same time. We refer to such ASes as being heterogeneous. We analyze the timeframe from 1st January 2018 to 31st December 2023 and find that the most diverse region is RIPE, with 18% of observed ASes being heterogeneous, followed by ARIN with 10% and AFRINIC with 7%. Our findings suggest that neglecting AS heterogeneity in a study's methodology may result in skewed outcomes or misleading conclusions.
BGPFlow: Flow-based Feature Extraction for BGP Anomaly Detection
Yanxu Fu, Pei Zhang (Beijing University of Posts and Telecommunications); Han Zhang (Tsinghua University); Xiaohong Huang, Yan Ma, Kun Xie, Dandan Li (Beijing University of Posts and Telecommunications)
Abstract: The Border Gateway Protocol (BGP) plays a pivotal role in inter-domain routing, but its decentralized nature and lack of inherent security make it vulnerable to various anomalies. Efficient detection of these anomalies relies on appropriate aggregation methods that extract both quantitative and topological features from routing updates. However, due to the immense scale and complexity of the Internet, inappropriate feature aggregation methods may introduce data noise, thereby increasing detection latency and degrading the overall accuracy of the system. In this paper, we propose BGPFlow, a flexible and scalable feature extraction framework that supports multiple levels of aggregation and integrates both quantitative and graph-based features. We organize the BGP feature flows using the Original Autonomous System (Original AS) as the key and performs aggregation at the country level. To reduce the computational complexity of graph-based features, we apply a graph clustering method based on Graph Autoencoder (GAE) to compress country-level BGP topologies, preserving structural semantics while enhancing the detection of both large-scale and smaller anomalies. Through case studies involving routing anomalies of various scales, we demonstrate the efficiency of BGPFlow in detecting real-world network incidents.
TraffIX: Monitoring Global Internet Traffic Trends by Crawling IXP Statistics
Yasin Alhamwy, Oliver Hohlfeld (University of Kassel)
Abstract: This paper presents TraffIX, an automated system that collects, normalizes, and archives publicly available IXP traffic statistics at scale. TraffIX overcomes diverse format challenges---including structured data and image-based plots---and enables longitudinal analysis of Internet traffic trends.
Traffic Analysis and Recognition in Data Insufficient Scenarios
Sijiang Huang, Xiaohui Xie, Rui Xu (Tsinghua University); Cong Li, Yong Zhang (Industrial and Commercial Bank of China); Mowei Wang, Liang Zhang (Huawei Technologies); Yong Cui (Tsinghua University)
Abstract: Network traffic recognition is pivotal for modern network observability and has therefore garnered considerable attention from both academia and industry. However, existing solutions often rely on impractical assumptions regarding data abundance, which contradict our comprehensive analysis of traffic data from a live financial production environment. The practical landscape of traffic recognition is characterized by numerous challenges, including stringent privacy constraints, device limitations, and the scarcity of training samples in dynamic network conditions, resulting in severe data insufficiency. To address these challenges, we propose TARDIS, a network traffic recognition framework designed specifically for data-insufficient scenarios. TARDIS introduces two novel designs: Payload-agnostic Hierarchical Feature Extraction, which exploits domain knowledge to extract discriminative features from limited information, and Semi-supervised Pre-training based on Sequence Prediction, which effectively utilizes hybrid historical data to enhance model adaptation. Our comprehensive evaluation across multiple datasets, including a self-collected real-world financial dataset (FinApps) and several publicly available ones, demonstrates the effectiveness of TARDIS. Notably, in challenging few-shot scenarios with only 50 training samples per class, TARDIS outperforms the best-performing baseline by 7.13% in accuracy.
Lost in Encryption: Monitoring Audio and Video Flows without Payload in Video-Conferencing Applications
Abstract: With the increasing popularity of remote work, ensuring a sufficient level of Quality of Experience (QoE) in video conferencing applications (VCA) has become critical to ensure that employees can work reliably from anywhere. As such, monitoring of VCA has received much attention and requires solving several problems. First, because these applications typically generate a variety of network flows, those used for transporting critical media must be isolated from the rest of the traffic. Second, this identification must be performed at run-time because the VCA often selects the server IP addresses dynamically. Third, standards and apps are moving towards more encryption, making it harder to identify media flows and extract app-layer metrics. We present a method for efficient and near real-time identification and classification of media flows from VCA, for both native and WebRTC-based versions. Our method relies on insights drawn from traffic patterns to detect media flows accurately in seconds, without prior knowledge of the app's internals, relying only on IP/UDP layer metadata, without depending upon payload or even RTP headers. Then, we extract application-layer metrics used for QoE estimation of media flows by using only IP/UDP packet metadata, and demonstrate that our heuristic-based estimators perform well under network degradation for Microsoft Teams.
E2E energy monitoring for AI inference in mobile networks
Abhishek Dandekar, Ashrafur Rahman (TU Berlin); Julius Schulz-Zander (Fraunhofer HHI)
Abstract: The increasing adoption of Artificial Intelligence (AI), particularly large language (LLMs) and vision-language models (VLMs) has led to a sharp rise in energy demand. While most of the studies predominantly assess energy consumption during training and inference, they often neglect the energy required to transport contextual data---such as text, images, or video---from far-edge devices to AI models, especially over mobile networks. We measure and analyze energy consumption for AI inference both on model-level and network-level. Our approach leverages a combined cross-layer and in-band network telemetry approach to estimate application-level energy usage. Our experiments show that the energy used by the network can be on par with that used by energy efficient AI models for certain tasks. Furthermore, we also estimate the total CO2 emissions of these inference workflows. These results highlight the critical need to incorporate network consumed energy into sustainable AI system design.
Simurgh: Multi-Agent Adversarial Benchmarking for Proactive Microservice Observability
Navidreza Asadi, Răzvan-Mihai Ursu (Technical University of Munich); Leon Wong (Rakuten Mobile, Inc.); Wolfgang Kellerer (Technical University of Munich)
Abstract: Microservices autoscaling is essential for dynamically adjusting resources to meet fluctuating workload demands and maintain service-level objectives (SLOs), such as latency, while minimizing resource usage. However, the control logic of modern autoscalers is susceptible to exploitation. Assessing its performance requires more than passive monitoring; the rapid evolution of application development has outpaced the availability of observability tools to benchmark and identify corner cases in autoscaling configurations relative to an application's behavior. In this work, we aim to address a critical yet underexplored question: Can we systematically identify adversarial inputs, i.e., traffic anti-patterns that disproportionately increase SLO violations or operational costs---or both? We propose Simurgh, an adversarial benchmarking framework designed to generate traffic patterns tailored for finding autoscaling anti-patterns. It evolves strategies based on real-time observability signals from both the application and infrastructure layers. This problem is inherently complex due to its large solution space. To address this, we introduce heuristics that relax the problem while leveraging multiple parallel systems, each paired with a local controller and optimizer. These controllers act as individual agents being managed by a global controller, asynchronously generating diverse traffic patterns while collectively optimizing toward a shared adversarial objective. We evaluate our framework on two applications and three optimization methods, including Bayesian optimization, chaos engineering, and a distributed reinforcement learning approach. Our preliminary empirical results illustrate Simurgh's effectiveness in identifying anti-patterns with respect to different objectives, such as SLO violations and operational costs, and demonstrate generalizability across different applications and cluster sizes of 10× larger.
Portest: Port Scan Detection on Non-Programmable Switches using TCAM and Randomized Algorithm
Timon Krack, Martina Zitterbart (Karlsruhe Institute of Technology)
Abstract: Monitoring network traffic for detecting security events is crucial for the effective operation of intrusion detection systems (IDS). While programmable switches offer the flexibility to execute monitoring algorithms directly in the data plane, non-programmable switches lack such capabilities and traffic needs to be mirrored and processed externally, leading to scalability and performance challenges. In this paper, we present Portest, a novel algorithm that enables the detection of port scans on non-programmable switches without mirroring traffic. Portest installs a constant number of flow rules with specific stochastic properties in the Ternary Content Addressable Memory (TCAM) of the switch and uses the match counter values for detection. Our results demonstrate that Portest can efficiently detect real-world port scans on non-programmable hardware.
Towards a Playground to Democratize Experimentation and Benchmarking of AI Agents for Network Troubleshooting
Zhihao Wang (University of Electronic Science and Technology of China); Alessandro Cornacchia(KAUST); Franco Galante (Politecnico di Torino); Carlo Centofanti (University of L’Aquila); Alessio Sacco (Politecnico Di Torino); Dingde Jiang (University of Electronic Science and Technology of China)
Abstract: Artificial Intelligence (AI) and Large Language Models (LLMs), are increasingly finding application in network-related tasks, such as network configuration synthesis [22] and dialogue-based interfaces to network measurements [23], among others. In this preliminary work, we restrict our focus to the application of AI agents to network troubleshooting and elaborate on the need for a standardized, reproducible, and open benchmarking platform, where to build and evaluate AI agents with low operational effort. This platform primarily aims at standardize and democratize the experimentation with AI agents, by enabling researchers and practitioners --- including non-domain experts such as ML/AI engineers--- to evaluate AI agents on curated problem sets, without concerns for underlying operational complexities. We present a modular and extensible benchmarking framework that supports widely adopted network emulators [3, 18, 20, 21]. It targets an extensible set of network issues in diverse real-world scenarios - e.g., data centers, access, WAN, etc. - and orchestrates the end-to-end evaluation workflows, including failure injection, telemetry instrumentation and collection, and agent performance evaluation. Agents can be easily connected through a single Application Programming Interface (API) to an emulation platform and rapidly evaluated. The code is publicly available at https://github.com/zhihao1998/LLM4NetLab.
Proof-of-Concept Implementation of Mission Critical Push-to-Talk Services in Private 5G Networks
Erdem Kara (Fraunhofer FOKUS Institute, Germany), Elena-Ramona Modroiu (Technische Universität Berlin, Germany), Marius Corici (Fraunhofer FOKUS Institute, Germany), Thomas Magedanz (Fraunhofer FOKUS Institute, Germany)
Abstract: First responders require ultra-reliable, low-latency communication and push-to-talk (PTT) features for effective crisis response. Current mission critical PTT (MCPTT) implementations typically rely on proprietary solutions, limiting deployment flexibility in private 5G networks. This paper presents a novel proof-of-concept implementation of an open-source MCPTT system by integrating various open-source projects to meet 3rd Generation Partnership Project (3GPP) requirements and enabling broader accessibility. The system architecture is outlined, highlighting the functionalities of its components. We detail the configuration of the testbed and the selected open-source projects used to integrate a complex MCPTT service. We also provide the key MCPTT functionalities implemented in the developed system within a private 5G network. Our evaluation methodology is based on several key performance indicators (KPIs), including MCPTT access time, end-to-end MCPTT access time, and mouth-to-ear latency. Although the initial performance measurements do not fully satisfy the defined limits, they achieve promising results, revealing areas for further improvement. This work advances the field by demonstrating the feasibility of an open-source MCPTT solution and provides valuable insights for future developments in mission-critical communications.
EcoRAN: A Novel Programmable Framework for Dynamic and Energy-Efficient Resource Optimization in Multi-Tenant, Neutral Host O-RAN Systems
Mattia Bevilacqua (Politecnico di Milano, SIAE Microelettronica, Italy), Marco Mezzavilla (Politecnico di Milano, Italy), Eugenio Moro (Politecnico di Milano, Italy), Christian Mazzucco (SIAE Microelettronica, Italy), Maurizio Magarini (Politecnico di Milano, Italy)
Abstract: The disaggregation enabled by Open Radio Access Network (O-RAN) technology offers unprecedented flexibility for multi-tenant deployments on shared infrastructure, making it a promising solution for neutral hosts managing RAN resources across multiple operators. However, this flexibility introduces new challenges in resource allocation and cost optimization. This paper presents a programmable platform designed to help neutral hosts dynamically allocate resources to minimize energy and infrastructure costs while meeting tenant performance needs. We propose a lightweight heuristic to validate the platform's ability to adapt CPU core allocation and DU-level power consumption in real time, using Intel SST-CP and CPU power limit adjustments. The architecture is implemented on a bare-metal OKD cluster, offering a cost-effective and reconfigurable foundation for practical O-RAN experimentation.
VOTA: Parallelizing 6G-RAN Experimentation with Virtualized Over-The-Air Workloads
Chang Liu (Eindhoven University of Technology, Netherlands), Ta Dang Khoa Le (Eindhoven University of Technology, Netherlands), Rahul Saini (Eindhoven University of Technology, Netherlands), Kishor Joshi (Eindhoven University of Technology, Netherlands), Georgios Exarchakos (Eindhoven University of Technology, Netherlands)
Abstract: Testbed sharing, a practice in which different researchers concurrently develop independent use cases on top of the same testbed, is ubiquitous in wireless experimental research. Its key drawback is experimental inconvenience: one must delay experiments or tolerate compute and RF interference that harms experimental fidelity. In this paper, we propose VOTA, an open-source, software-only testbed scaling method that leverages real-time virtualization and frequency tuning to maximize parallel experiments while controlling interference. In a demonstration of two interference-sensitive 6G use cases - MIMO iDFT/DFT Offloading and O-RAN DoS Attack - running side-by-side on a 32-core host, we showcase VOTA capabilities: dedicated-like results while allowing 2.67× more sharing opportunities.
Towards URLLC with Open-Source 5G Software
Aoyu Gong (EPFL, Switzerland), Arman Maghsoudnia (EPFL, Switzerland), Raphael Cannatà (EPFL, Switzerland), Eduard Vlad (RWTH Aachen, Germany), Néstor Lomba Lomba (EPFL, Switzerland), Dan Mihai Dumitriu (Pavonis LLC, USA), Haitham Hassanieh (EPFL, Switzerland)
Abstract: Ultra-Reliable Low-Latency Communication is a key feature of 5G, yet achieving its strict one-way latency target remains challenging in real-world deployments. While previous work proposes latency-reduction techniques, most are theoretical or simulation-based and overlook practical bottlenecks in actual systems. In this paper, we analyze and optimize latency with open-source 5G RAN software. We characterize latency sources arising from 5G specifications and implementation-level factors, along with their complex interplays. Guided by this analysis, we introduce improvements reducing oneway latency by 39.28 % in the downlink and 55.38 % in the uplink. Our results show the importance of system-level experimentation and provide a blueprint for advancing toward URLLC targets in both 5G and future cellular networks.
In-Band Context Signaling for Cross-Layer QoS in Software-Defined Local 5G
Akihiro Nakao (The University of Tokyo, Japan)
Abstract: In this paper, we propose a flexible cross-layer QoS control scheme for software-defined Local 5G networks. Our approach embeds application and wireless context into the 5G bearer using in-band PDCP-layer signaling. By inserting custom PDCP trailers, our method enables the base station to apply fine-grained QoS policies tailored to real-time application requirements and radio conditions. We prototype this approach using open-source 5G software (OAI/Free5GC) and demonstrate negligible latency overhead (~0.1 ms) on an SDR-based testbed. We further introduce a novel "spatiotemporal slicing" concept, dynamically controlling bandwidth based on UWB-derived location data. Our cross-layer design offers a practical, reproducible foundation for context-aware 6G research.
Modeling and Simulation of Trapped-ion Quantum Repeaters and Networks
Abstract: This paper explores the design and implementation of trapped-ion quantum repeaters and networks using modeling and simulation. We aim to quantitatively understand the practical architecture design and resource requirements of trapped-ion entanglement-based quantum repeater paradigms. Our simulation results explore entanglement rate and fidelity as key performance metrics, and we discuss the major challenges for practical deployment of quantum networks and future directions for research and development in order to meet these challenges.
Towards Blind Quantum Machine Learning in Entanglement Networks
Diego Medeiros de Abreu, Antônio Abelém
Abstract: Blind Quantum Computation (BQC) enables clients to delegate quantum computations to a quantum server while maintaining the privacy of their data and algorithms, even when the server is untrusted. In this work, we extend BQC frameworks to Quantum Machine Learning (QML) by implementing a network of entangled clients and a quantum server. Specifically, we explore the integration of Variational Quantum Classifiers (VQC) and Quantum Convolutional Neural Networks (QCNNs) within this paradigm. Our proposed model allows clients to perform classical preprocessing and optimization locally while leveraging the quantum server for computationally expensive quantum tasks. The entanglement-based network is managed by a controller that dynamically allocates resources according to the BQC protocol, allowing secure and efficient execution. We present simulation results indicating the feasibility of this approach, including an analysis of network efficiency and resource consumption, alongside the F1 score on QML benchmark datasets.
Entanglement improves coordination in distributed systems
Francisco Ferreira da Silva, Stephanie Wehner
Abstract: Coordination in distributed systems is often hampered by communication latency, which degrades performance. Quantum entanglement enables correlations stronger than classically possible without communication. Such correlations manifest instantaneously upon measurement, irrespective of the physical distance separating the systems. We investigate the application of shared entanglement to a dual-objective optimization problem in a distributed system comprising two servers. The servers process both a continuously available, preemptible baseline task and incoming paired customer requests, to maximize the baseline task throughput subject to a Quality of Service (QoS) constraint on average customer waiting time. We present a rigorous analytical model demonstrating that an entanglement-assisted routing strategy allows the system to achieve higher baseline throughput compared to communication-free classical strategies, provided the baseline task's output exhibits sufficiently increasing returns with processing time. This advantage stems from entanglement enabling better coordination, which allows the system to satisfy the customer QoS constraint with a lower overall probability of splitting customer requests, leading to more favorable conditions for baseline task processing and thus higher throughput. We further show that the magnitude of this throughput gain is particularly pronounced for tasks exhibiting increasing returns, where output grows super-linearly with processing time. Our results identify optimization of scheduling in distributed systems as a novel application domain for near-term quantum networks.
Linear Programming Approach for Demonstrating Network Nonlocality for Arbitrary Networks
Salome Hayes-Shuptar, Daniel Bhatti, David Elkouss
Abstract: Device-independent protocols will play a crucial role in the future of quantum networks. The properties of network settings lend themselves to the novel concept of network non-locality. Detecting it is a challenging task due to the independence of multiple, causally separated sources, which renders the set of network local behaviors non-convex and forces the use of nonlinear inequalities or semi-definite programs. We propose a linear programming framework for demonstrating network nonlocality using linear optics and single-photon sources for an arbitrary network. Furthermore, we investigate natural network topologies where the number of sources is smaller than the number of parties, allowing a more efficient use of resources. The resulting linear program depends only on observed probabilities and a tunable experimental parameter, and can be solved more efficiently than current semi-definite program approaches. Our method provides a systematic, device-independent witness for network nonlocality, allowing us to explore the range of networks which demonstrate network nonlocality and work towards scalable certification in complex quantum networks.
Modeling and Simulation of All-photonic Quantum Repeaters and Networks
Chuen Hei Chan, Charu Jain, Ezra Kissel, Wenji Wu, Edwin Barnes, Sophia E. Economou, Inder Monga
Abstract: This paper explores the design and implementation of all-photonic quantum repeaters and networks using modeling and simulation. We aim to quantitatively understand the practical architecture design and resource requirements of all-photonic entanglement-based quantum repeater paradigms.
An extensible control plane software architecture for quantum networking research
Abstract: As quantum networking experiments move from laboratory experiments to larger scale deployments, integrated control software becomes essential for managing complex interactions between the numerous distributed resources involved. While a number of laboratory-scale control systems have been developed for specific quantum platform demonstrations, an openly available and general solution for operating quantum networks has not emerged. With the QUANT-NET Control Plane (QNCP), we introduce a model-based, extensible control plane implementation that offers a framework for enabling network-wide orchestration in quantum information network environments. QCNP provides a quantum network data model, resource management, communication primitives, and a plugin interface for defining orchestration and protocol interactions across distributed quantum network devices and services. This paper describes the design and architecture of QNCP, its implementation, and opportunities for extensibility and deployment.
Quantum-Enabled Secure Computation Medical Service
Diogo Filipe Matos, Juan José Romero, Laura Ortiz, Vicente Martin, Armando Nolasco Pinto
Abstract: This work presents the implementation of a quantum-enabled Secure Multi-Party Computation (SMC) service applied to two medical use cases and demonstrated on a Quantum Key Distribution (QKD) network deployed in Aveiro, utilizing both research and commercial QKD devices. We show the integration of quantum resources into the execution of SMC protocols by using quantum-generated oblivious keys, and also an overview of the generic framework for deploying SMC services. At the end, the results demonstrate the feasibility of the proposed use cases, highlighting the need for continued research into the integration of quantum Oblivious Transfer (OT) and SMC services within QKD networks.
Arqon Suite of Quantum Network Control Applications
Scarlett Gauthier, Thomas R. Beauchamp, Stephanie Wehner
Abstract: The aim of a quantum network is to enable users to successfully execute applications on their quantum end nodes. Users of mature networks, such as the internet, the postal network, or the telephone network expect their demands for service to be satisfied reliably. Here, we present an extended abstract introducing Arqon, a suite of control applications capable of delivering reliable service to end nodes. We define a full set of reliability requirements and demonstrate through a numeric evaluation that Arqon is capable of simultaneously satisfying all requirements.
Simulation-based Analysis of Distributed Quantum Computing with Remote Gates
Benedikt Baier, Wolfgang Kellerer
Abstract: Quantum computers offer computational advantages for specific problems over classical systems [8], but today's devices are limited by qubit counts and susceptibility to noise. Distributed quantum computing (DQC) addresses these constraints by interconnecting quantum processors via quantum networks to enable cooperative execution of quantum algorithms [4]. DQC can be realized either through quantum teleportation [3], which transfers qubit states across nodes, or through remote gates [7], which enact quantum operations between spatially separated qubits while preserving their physical allocation. Both approaches depend on pre-shared entanglement, which must be distributed across potentially multi-hop quantum networks via entanglement swapping [11]. In realistic settings, DQC performance is influenced by factors such as entanglement fidelity, quantum memory coherence time, and routing path length, yet the impact of these parameters on computational fidelity remains insufficiently understood. This work investigates these effects through simulation of distributed Grover circuits executed over quantum networks with 1-4 routers, incorporating entanglement swapping, network delays, and varying physical constraints to assess how network-level properties affect circuit-level performance.
Towards a common framework for quantum information networking
Thomas R. Beauchamp, Alberto Sebastián-Lombraña, Scarlett Gauthier, Juan Jose Romero, Vicente Martin, Stephanie Wehner, Laura Ortiz
Abstract: Designing and developing future quantum information networks is currently a cutting-edge topic. However, advances made in quantum cryptographic networks and independently in entanglement-based networks create the need for a unified framework for quantum information networking. This work attempts to set a path to a common paradigm for future quantum information networking.
Quantum Oblivious Transfer through Coherent-One-Way Quantum-Key-Distribution
Javier Faba, Juan José Romero, Laura Ortiz, Vicente Martín Ayuso
Abstract: In this work, we investigate the feasibility of realizing Quantum Oblivious Transfer (QOT) using the experimental implementation of the Coherent-One-Way (COW) quantum key distribution protocol, requiring only minimal modifications.
A Vision for Integrated Quantum-Classical Network Operations
Kevin Bohan, Bella Bose, Steven Corbato, Thinh Nguyen, Inder Monga, Brian Smith, Wenji Wu, Ramakrishnan Durairajan
Abstract: We propose qcNOC, a next-generation hybrid quantum-classical Network Operations Center. qcNOC introduces a unified framework that encompasses hybrid performance metrics, real-time observability, advanced fault detection and localization, and operator training tailored to the hybrid network environment. This work outlines the architectural vision and strategic roadmap for qcNOC, with the goal of democratizing access to quantum networking capabilities and ensuring that research and education networks (RENs) and campus networks can actively participate in and benefit from the quantum revolution.
Reinforcement Learning for Entanglement Distribution in Quantum Networks
Andrés Agustí Casado, Álvaro Troyano Olivas, Javier Faba, Luis Miguel Robledo, Vicente Martin, Laura Ortiz
Abstract: Distributing entanglement efficiently across quantum networks remains a major challenge, primarily due to the intricate timing and coordination required for entanglement generation and swapping. Orchestrating these operations with precision is key to achieving efficient and reliable entanglement distribution while hardware performance is improved. In our approach, we model the quantum network using two graphs: one for the physical fiber layer handling entanglement generation, and another for the mapping of stored Bell pairs to quantum memories. Entanglement generation and swapping become edge transformations in this setting. We use reinforcement learning to discover optimized distribution strategies across complex, heterogeneous networks that extend beyond repeater chains.
Towards a wider optimisation of quantum communication infrastructure deployments
Alberto Sebastián-Lombraña, Qiaolun Zhang, Laura Ortiz, Vicente Martín
Abstract: The deployment of quantum communication infrastructures is costly, necessitating optimisation strategies to reduce deployment cost while ensuring the desired performance. Most current studies primarily focus on resource allocation in quantum communication networks after deployment. While some initial studies have begun to investigate deployment strategies, they remain limited to specific QKD types, such as point-to-point and measure-device-independent QKD. However, a broader approach is needed to optimise network deployment in a broader manner, to incorporate insights relevant to the optical networking and quantum systems industries, as well as other external constraints beyond cost and performance. This paper introduces a vision for optimisation in quantum networks focused on planning their deployment in advance to determine which types of systems should be installed and where. The approach includes considering a broader set of requirements, such as trust and governance. Grounded in the experience gained from planning the deployment of the Madrid quantum network, MadQCI, this vision aims to support more realistic deployment strategies.