ACM SIGCOMM 2023 Tutorial: Closed-Loop “ML for Networks” Pipelines
Draft Program
- Session I
-
9:00 am - 10:00 am The Standard ML Pipeline: Problems and Challenges
-
10:00 am - 10:30 am Break
- Break
- Session II
-
10:30 am - 12:00 am Beyond the Standard ML Pipeline
- Trustee: An augmented ML pipeline for developing explainable ML models
- netUnicorn: A closed-loop ML pipeline for developing generalizable ML models -
12:00 pm - 01:00 pm Lunch
- Break
- Session 3
-
01:00 pm - 02:00 pm “Is my ML model wrong?” - Using Trustee in Practice
-
02:00 pm - 03:00 pm “Does my ML model work?” - Using netUnicorn in Practice
-
03:00 pm - 03:30 pm Break
- Break
- Session 4
-
3:30 pm - 5:00 pm The Future of ML for Networks
- Open-sourcing tool and model development to support reproducibility at scale
- Assembling a “living” set of challenging (“high-stakes”) use cases
- Democratizing “ML for networks” research: It takes a village! -
5:00 pm - 5:30 pm Discussion and Wrap-Up
Call For Participation
This tutorial will expose attendees to different existing learning problems for networks. For a subset of these problems, we will demonstrate how conventional ML pipelines used today leverage datasets to develop ML models that perform (or “work”) well in theory but poorly or not at all in practice. We use these examples to motivate the development of new ML pipelines that produce ML models that are explainable and generalizable; that is, not only do they perform well in practice, but we also know why and how well.
We will show how the participants can use Trustee [1] to understand a given model’s decision-making and identify different underspecification issues [2]. Using Trustee, participants will be able to identify (and rectify) different inductive biases that a learning model may have encoded and that prevent the model from being generalizable (e.g., shortcut learning, vulnerability to out-of-distribution samples).
We will also demonstrate how the participants can use netUnicorn [3] with its built-in Trustee-based explainability module to iteratively update the data-collection intents to generate new training datasets that result in newly trained models that are less prone to encoding inductive biases and are therefore more likely to generalize and perform well in practice.
We will end the tutorial with a mini research workshop where researchers who have already used Trustee and/or netUnicorn as part of their ongoing/past research projects will share their experience and discuss ways to improve the existing closed-loop ML pipeline currently employed by netUnicorn. Further advances in this area will be imperative for researchers to develop ML-based solutions for challenging network performance- or security-related problems that network operators find trustworthy and safe and are therefore willing to deploy in their production networks.
Lastly, we also plan to synthesize recommendations for SIGCOMM reviewers for assessing and reviewing ML-related research papers and artifacts that get submitted to SIGCOMM and other related venues (e.g., NSDI, CoNEXT, etc.).
Background
Learning is intrinsic to networking. Different networking protocols (e.g., TCP, ABR, etc.) and systems (e.g., IDS, load balancers, etc.) entail making decisions using partial state information (e.g., bottleneck bandwidth, queue size, etc., in the case of TCP) extracted from active and passive network measurements. Most networking solutions rely on domain-specific heuristics for decision-making. For more than two decades, networking researchers have been exploring how to employ machine learning to improve decision-making for networks. Recently, we have witnessed multiple solutions demonstrating how ML’s data-driven decision-making can outperform state-of-the-art heuristics. However, despite all the promises, “ML for networks” has failed to gain traction among network operators.
The main reason why ML-based approaches have not lived up to their promise of facilitating or even resolving real-world network security and performance issues is the black-box nature of the trained models. Specifically, it is this black-box aspect that has dominated the development of ML-based approaches to date that prevents network operators from deploying them confidently in their production environments. To address this issue and encourage deployment, it is vital to prove that the trained models perform well in real-world deployment environments, indicating that their exceptional performance is not a result of encoded inductive biases that prevent them from generalizing beyond the confines of the training environment.
To facilitate the study of a trained model’s generalizability, we have developed (and open-sourced) two frameworks: Trustee and netUnicorn. Used in combination, these two frameworks enable a closed-loop ML pipeline that iteratively collects the “right” training data from one or more network environments for disparate learning problems.
Trustee, a model explainability framework, cracks the trained black-box models open to explain their decision-making. The paper titled “AI/ML and Network Security: The Emperor has no Clothes,” describing this work received a “Best Paper Honorable Mention” from ACM CCS’22 and garnered a 2023 “Applied Networking Research Prize (ANRP)” from IRTF. Specifically, Trustee generates high-fidelity, low-complexity, and stable decision trees to explain how a model makes its decision for the majority of data points. These decision trees simplify the detection of the model’s vulnerability to different underspecification issues, such as shortcut learning, out-of-distribution (ood) issues, spurious correlations, etc. Domain experts can then use these insights to iteratively calibrate their data-collection intents/policies such that the ML pipeline uses the “right” data to develop learning models that generalize as expected.
Collecting the "right" data for a given learning problem from any network environment is challenging. Consequently, most researchers and practitioners end up using publicly available datasets, which are often unrelated to the target learning problem and/or network environment. We developed and open-sourced netUnicorn, a new modular data-collection framework to address this problem. It simplifies collecting network data for different learning tasks, applies to diverse deployment settings or network environments, and supports a flexible, iterative data-collection approach. In particular, by appealing to a programming abstraction that disaggregates data collection mechanisms from data collection intents (i.e., experiments) and also disaggregates intents (expressed as experiments) into flexible pipelines (i.e., consisting of reusable modules) and mechanisms into independent microservices, netUnicorn simplifies both the collection of data for different learning tasks and the collection of data from different environments.
Outline
- Session 1&2: An overview of ML for networking and the use of Trustee and netUnicorn for Closed-Loop ML Pipelines
-
Session 3: Hands-on exercises using Trustee (“Prove my ML model
wrong!”) and netUnicorn (“Prove my ML model works!”).
- Develop a black-box ML model using state-of-the-art ML tools
- Use the Trustee to crack open its decision-making and identify underspecification problems
- Use netUnicorn to fix the dataset for the given problem and recollect data, using real-world or virtual network architecture.
- Use Trustee’s results to demonstrate iterative data collection benefits.
- Extend for different learning problems and different network environments.
- Session 4: Mini research workshop, with topics such as practical experiences with Trustee and/or netUnicorn, use cases, existing roadblocks, community contributions, and recommendations for SIGCOMM and other related venues reviewers.
Audience Expectations and Prerequisites
Most of the Hands-On labs would be implemented in Python, so basic knowledge of the language would be useful. We do not require any prior knowledge of machine learning pipelines, tools, and datasets. All the necessary information required to finish the lab exercises will be provided during the tutorial. We will provide detailed handouts to help attendees easily follow the tutorial. However, we require attendees to meet the following expectations: 1. For the on-site tutorial: (1) Attendees must bring their laptops. (2) Attendees must have a Google account (3) We will provide a VM image containing all the required packages and tools, which they can run on their laptops. We will also provide access to the configured VM image over the cloud to accommodate participants with outdated OS or hardware.
Organizers
-
Roman Beltiukov
UCSB
-
Bio:
Roman Beltiukov is a Ph.D. student in the computer science department at UCSB, working under the supervision of Arpit Gupta. His research focuses mainly on developing ML solutions for the networking area, ensuring that the resulting artifacts are generalizable and credible. He is leading the netUnicorn project, a tool and framework for enabling more credible data collection efforts, and is working on democratizing network research by creating real-world networking test-beds (PINOT project).
-
-
Arthur Jacobs
UFRGS
-
Bio:
Arthur Jacobs is a Postdoctoral Research Scientist at the Federal University of Rio Grande do Sul (UFRGS). He received his Ph.D. in Computer Science in 2022, also from the Federal University of Rio Grande do Sul, advised by Prof. Dr. Lisandro Granville and co-advised by Prof. Dr. Ronaldo Ferreira. He also worked as a visiting scholar at Princeton University in 2019 and 2020, under the supervision of Prof. Dr. Jennifer Rexford and worked closely with Dr. Walter Willinger from NIKSUN, Inc. Arthur was the recipient of the 2020 IBM PhD Fellowship and the 2023 IRTF Applied Networking Research Prize (ANRP). His research interests include network management, Intent-based Networking, Natural Language Processing for network management, self-driving networks, programmable networks and Artificial Intelligence, and its application to network and security.
-
-
Arpit Gupta
UCSB
-
Bio:
Arpit Gupta is an assistant professor in the computer science department at UCSB. His research focuses on building flexible, scalable, and trustworthy systems that solve real-world problems at the intersection of networking, security, and machine learning. He also develops systems that aid in characterizing and addressing digital inequity issues. He made BQT, a tool to extract broadband plans offered by ISPs in the US; Trustee, a tool to explain decision-making of ML artifacts for networking; Sonata, a streaming network telemetry system; and SDX, an Internet routing control system. His work on augmenting crowdsourced Internet measurement data using BQT received the Distinguished paper award; Trustee received Applied Networking Research Award and Best Paper Award (honorable mention); SDX received the Internet2 Innovation Award, the USENIX NSDI Community Contribution Award, and the ACM SOSR Best Paper Award. Arpit received his Ph.D. from Princeton University. He completed his master's degree at NC State University and a bachelor's degree at the Indian Institute of Technology, Roorkee, India.
-
-
Walter Willinger
NIKSUN Inc.
-
Bio:
Walter Willinger is Chief Scientist at NIKSUN, Inc., a Princeton-based company developing industry-leading real-time and forensics-based cybersecurity and network performance solutions. Before joining NIKSUN, he worked at AT&T Labs-Research in Florham Park, NJ from 1996 to 2013 and at Bellcore Applied Research from 1986 to 1996. Dr. Willinger received his Dipl. Math. degree from the ETH Zurich and his M.S. and Ph.D. degrees in Operations Research and Industrial Engineering from Cornell University. He is a Fellow of ACM (2005), IEEE (2005), AT&T (2007), SIAM (2009), and AAIA (2023); co-recipient of the 1995 IEEE Communications Society W.R. Bennett Prize Paper Award and the 1996 IEEE W.R.G. Baker Prize Award; co-recipient of the 2005 and 2016 ACM/SIGCOMM Test-of-Time Paper Awards; co-recipient of the IRTF Applied Networking Research Prize (2023); and recipient of the 2024 IEEE Internet Award. His paper “On the self-similar nature of Ethernet traffic” was featured in the 2007 IEEE ComSoc publication “The Best of the Best — Fifty Years of Communications and Networking Research”, as one of the most influential papers in communications and networking in the last half century.
-
-
Ronaldo A. Ferreira
UFMS
-
Bio:
Ronaldo A. Ferreira is a Full Professor of Computer Science in the College of Computing at UFMS. He received his B.Sc. from UFMS in 1992, his M.S. from the University of Campinas in 1998, and his Ph.D. from Purdue University in 2006, all in Computer Science. Ronaldo was a Visiting Research Scholar and a Visiting Associate Professor at Princeton University from 2014 to 2016. He was chair of the Special Interest Group on Computer Networks and Distributed Systems of the Brazilian Computing Society (SBC) from 2011 to 2013. Ronaldo was a member of the board of administration of the Brazilian National Research and Educational Network (RNP) from 2011 to 2013. He was also a member of the Brazilian National Laboratory of Computer Networks (LARC) board of directors from 2014 to 2023. His research interests are in computer networks and distributed systems.
-
-
Wenbo Guo
UC Berkeley
-
Bio:
Wenbo Guo is a Postdoc. associate at UC Berkeley. His research interests are machine learning and cybersecurity. His work includes strengthening the fundamental properties of machine learning models and designing customized machine learning models to handle security-unique challenges. He is a recipient of multiple prestigious awards, including the IBM Ph.D. Fellowship (2020-2022) and ACM CCS Outstanding Paper Award (2018).
-
-
Lisandro Granville
UFRGS
-
Bio:
Lisandro Granville is a Full Professor of Computer Science at the Institute of Informatics of the Federal University of Rio Grande do Sul (UFRGS), Brazil. He holds a Ph.D. (2001) and M.Sc. (1998) degrees in Computer Science, both received from UFRGS. From September 2007 to August 2008 he was a visiting researcher at the University of Twente, The Netherlands, with the Design and Analysis of Communication Systems group. He is a member of the Computer Networks Group, where he develops research projects on network and service management. As a Full Professor, he is also involved with supervision and education activities on undergraduate and graduate courses in both Computer Science and Computer Engineering.
-
-
Hooman Mohajeri
UCSB
-
Bio:
Doctor Hooman Mohajeri Moghaddam is a postdoctoral fellow at the University of California Santa Barbara and a recent Ph.D. graduate from Princeton University where he was advised by Nick Feamster and Prateek Mittal. As part of his Ph.D., Hooman studied the privacy properties of Internet-connected devices, including smart TVs and streaming devices, through automated large-scale measurements. Hooman is broadly interested in privacy-enhancing technologies and network security and his research has been covered by the Wired magazine, The New York Times, and The Wall Street Journal, and his work was selected as the runner-up for the Caspar Bowden Award for Outstanding Research in Privacy Enhancing Technologies in 2021. Before joining Princeton, Hooman worked as a software engineer and architect at Cisco Systems. Hooman received his Master's degree from the University of Waterloo, as part of the Cryptography, Security, and Privacy (CrySP) research group, and his B.Sc. from the Sharif University of Technology.
-
References
[1] Arthur S. Jacobs, Roman Beltiukov, Walter Willinger, Ronaldo A. Ferreira, Arpit Gupta, and Lisandro Z. Granville. 2022. AI/ML for Network Security: The Emperor has no Clothes. In Proceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security (CCS '22). Association for Computing Machinery, New York, NY, USA, 1537–1551. https://doi.org/10.1145/3548606.3560609
[2] Alexander D'Amour et al. Underspecification Presents Challenges for Credibility in Modern Machine Learning. arXiv preprint arXiv: 2011.03395, 2011
[3] Roman Beltiukov, Wenbo Guo, Arpit Gupta, Walter Willinger. In Search of netUnicorn: A Data-Collection Platform to Develop Generalizable ML Models for Network Security Problems. arXiv preprint arXiv: 2306.08853, 2023
[4] D. Arp, E. Quiring, F. Pendlebury, A. Warnecke, F. Pierazzi, C. Wressnegger, L. Cavallaro, and K. Rieck. Dos and Don'ts of Machine Learning in Computer Security. In 31st USENIX Security Symposium (USENIX Security 22), Boston, MA, August 2022. USENIX Association.
[5] R. Geirhos, J. Jacobsen, C. Michaelis, R. Zemel, W. Brendel, M. Bethge, and F. A. Wichmann. Shortcut learning in deep neural networks. Nature Machine Intelligence, 2(11):665–673, Nov 2020.
[6] Laurens D’hooge, Tim Wauters, Bruno Volckaert, Filip De Turck. Inter-dataset generalization strength of supervised machine learning methods for intrusion detection. Journal of Information Security and Applications, Volume 54, 2020, 102564, ISSN 2214-2126, https://doi.org/10.1016/j.jisa.2020.102564.
[7] H. Nori, S. Jenkins, P. Koch, and R. Caruana. Interpretml: A unified framework for machine learning interpretability. arXiv preprint arXiv:1909.09223, 2019.