ACM SIGCOMM 2017, Los Angeles, CA
MENU

ACM SIGCOMM 2017 Workshop on Big Data Analytics and Machine Learning for Data Communication Networks (Big-DAMA 2017)

Workshop Program

  • Monday, August 21, 2017, Optimist Room (Luskin Center)

  • 8:30am - 8:45am Opening

    Room: Optimist Room (Luskin Center)

  • 8:45am - 10:30am Session 1: Traffic Monitoring and Analysis with Big Data Analytics

    Session Chair: TBD

    Room: Optimist Room (Luskin Center)

  • Keynote: What I Learned about Big Data Working with Biologists, Neuroscientists and Climate Scientists

    Constantine Dovrolis (Georgia Institute of Technology)


    Abstract:

    "Big data" means different things to different people. My interests in this subject focus on the scientific discoveries that big data enable.
    Complex systems, such as the Internet, produce emergent phenomena and unexpected behaviors that cannot be studied through the classical, reductionist and hypothesis-driven research methodology. Instead, the researcher faces the challenge of first discovering those unknown and interesting patterns by mining huge volumes of multidimensional data, before trying to explain, model or predict them.
    During the last five years or so I have worked with scientists from other data-heavy disciplines (biology, neuroscience, climate science), applying network analysis methods in various data mining problems. In this talk, I will attempt to summarize my experiences and "lessons learned" from these collaborations, and contrast data mining research in computer networking with similar research in other disciplines. I will close with what I view as interesting opportunities for big data research in the field of computer networking.

     

    Bio: Dr. Constantine Dovrolis is a Professor at the School of Computer Science of the Georgia Institute of Technology. He received the Computer Engineering degree from the Technical University of Crete in 1995, the M.S. degree from the University of Rochester in 1996, and the Ph.D. degree from the University of Wisconsin-Madison in 2001. His current research focuses on cross-disciplinary applications of network analysis and data mining in neuroscience and biology. He has also worked on the evolution of the Internet, Internet economics, and on applications of network measurement.

     

  • Ensemble-learning Approaches for Network Security and Anomaly Detection

    Juan Vanerio (Universidad de la República/AIT Austrian Institute of Technology) and Pedro Casas (AIT Austrian Institute of Technology)

    • Abstract:

      The application of machine learning models to network security and anomaly detection problems has largely increased in the last decade; however, there is still no clear best-practice or silver bullet approach to address these problems in a general context. While deep-learning is today a major breakthrough in other domains, it is difficult to say which is the best model or category of models to address the detection of anomalous events in operational networks. We present a potential solution to fill this gap, exploring the application of ensemble learning models to network security and anomaly detection. We investigate different ensemble-learning approaches to enhance the detection of attacks and anomalies in network measurements, following a particularly promising model known as the Super Learner. The Super Learner performs asymptotically as well as the best possible weighted combination of the base learners, providing a very powerful approach to tackle multiple problems with the same technique. We test the proposed solution for two different problems, using the well-known MAWILab dataset for detection of network attacks, and a semi-synthetic dataset for detection of traffic anomalies in operational cellular networks. Results confirm that the Super Learner provides better results than any of the single models, opening the door for a generalization of a best-practice technique for these specific domains.

       

  • Cluster-Based Load Balancing for Better Network Security

    Gal Frishman (IDC) and Yaniv Ben-Itzhak and Oded Margalit (IBM Research)

    • Abstract:

      In the big-data era, the amount of traffic is rapidly increasing. Therefore, scaling methods are commonly used. For instance, an appliance composed of several instances (scaled-out method), and a load-balancer that distributes incoming traffic among them. While the most common way of load balancing is based on round robin, some approaches optimize the load across instances according to the appliance-specific functionality. For instance, load-balancing for scaled-out proxy-server that increases the cache hit ratio.

      In this paper, we present a novel load-balancing approach for machine-learning based security appliances. Our proposed load-balancer uses clustering method while keeping balanced load across all of the network security appliance’s instances. We demonstrate that our approach is scalable and improves the machine-learning performance of the instances, as compared to traditional load-balancers.

       

  • Hierarchical IP flow clustering

    Kamal Shadi (Georgia Institute of Technology), Preethi Natarajan (Cisco), and Constantine Dovrolis (Georgia Institute of Technology)

    • Abstract:

      The analysis of flow traces can help to understand a network’s usage patterns. We present a hierarchical clustering algorithm for network flow data that can summarize terabytes of IP traffic into a parsimonious tree model. The method automatically finds an appropriate scale of aggregation so that each cluster represents a local maximum of the traffic density from a block of source addresses to a block of destination addresses. We apply this clustering method on NetFlow data from an enterprise network, find the largest traffic clusters, and analyze their stationarity across time. The existence of heavy-volume clusters that persist over long time scales can help network operators to perform usage-based accounting, capacity provisioning and traffic engineering. Also, changes in the layout of hierarchical clusters can facilitate the detection of anomalies and significant changes in the network workload.

       

  • 10:30am - 11:00am Coffee Break (Foyer)

  • 11:00am - 12:30pm Session 2: Embedding Machine Learning in Network Measurement

    Session Chair: TBD

    Room: Optimist Room (Luskin Center)

  • Keynote: Big Data Begets Big Data: Understanding Modern Datacenter Networks

    Alex C. Snoeren (UC San Diego)


    Abstract:

    While modern datacenters are typically viewed as the engine that powers big-data computation, they have reached such a scale that their management and operation can itself be considered a big data problem. In this talk, I will discuss our experiences studying the network traffic in one of Facebook’s datacenters, and show how we have been able to leverage properties of the traffic to design improved network management systems. In particular, we find that in many cases traffic is sufficiently well load balanced across multiple paths in the datacenter that statistical methods can be used to identify and localize performance-impacting faults within the network fabric. In contrast to most existing network fault detection systems, which deliberately inject probe traffic, we find the scale of modern datacenters allows signals to be extracted from the existing service traffic. Best of all, these signals can be monitored entirely at the end hosts and allow faster detection than existing approaches.

     

    Bio: Alex C. Snoeren is a Professor in the Computer Science and Engineering Department at the University of California, San Diego, where he is a member of the Systems and Networking Research Group. His research interests include operating systems, distributed computing, and mobile and wide-area networking. Professor Snoeren received a Ph.D. in Computer Science from the Massachusetts Institute of Technology (2003) and an M.S. in Computer Science (1997) and Bachelors of Science in Computer Science (1996) and Applied Mathematics (1997) from the Georgia Institute of Technology. He is a recipient of the Alfred P. Sloan Fellowship (2009), a National Science Foundation CAREER Award (2004), the MIT EECS George M. Sprowls Doctoral Dissertation Award (Honorable Mention, 2003), and best-paper awards at the ACM SIGCOMM (2001, 2007) and USENIX OSDI (2008) conferences.

     

  • o'zapft is: Tap Your Network Algorithm's Big Data!

    Andreas Blenk and Patrick Kalmbach (Technical University of Munich), Stefan Schmid (Aalborg University), and Wolfgang Kellerer (Technical University of Munich)

    • Abstract:

      At the heart of many computer network planning, deployment, and operational tasks lie hard algorithmic problems. Accordingly, over the last decades, we have witnessed a continuous pursuit for ever more accurate and faster algorithms. We propose an approach to design network algorithms which is radically different from most existing algorithms. Our approach is motivated by the observation that most existing algorithms to solve a given hard computer networking problem overlook a simple yet very powerful optimization opportunity in practice: many network algorithms are executed repeatedly (e.g., for each virtual network request or in reaction to user mobility), and hence with each execution, generate interesting data: (problem,solution)-pairs. We make the case for leveraging the potentially big data of an algorithm’s past executions to improve and speedup future, similar solutions, by reducing the algorithm’s search space. We study the applicability of machine learning to network algorithm design, identify challenges and discuss limitations. We empirically demonstrate the potential of machine learning network algorithms in two case studies, namely the embedding of virtual networks (a packing optimization problem) and k-center facility location (a covering optimization problem), using a prototype implementation.

       

  • Net2Vec: Deep Learning for the Network

    Roberto Gonzalez, Filipe Manco, Alberto Garcia-Duran, Jose Mendes, Felipe Huici, Saverio Niccolini, and Mathias Niepert (NEC Labs. Europe)

    • Abstract:

      We present Net2Vec, a flexible high-performance platform that allows the execution of deep learning algorithms in the communication network. Net2Vec is able to capture data from the network at more than 60Gbps, transform it into meaningful tuples and apply predictions over the tuples in real time. This platform can be used for different purposes ranging from traffic classification to network performance analysis.

      Finally, we showcase the use of Net2Vec by implementing and testing a solution able to profile network users at line rate using traces coming from a real network. We show that the use of deep learning for this case outperforms the baseline method both in terms of accuracy and performance.

       

  • 12:30pm - 2:00pm Lunch Break (Centennial Terrace)

  • 2:00pm - 3:30pm Session 3: Machine Learning for Network Performance Prediction

    Session Chair: TBD

    Room: Optimist Room (Luskin Center)

  • Keynote: Scaling BGP Big Data for Network Operations, SDN and Research

    Tim Evens (Cisco)


    Abstract:

    BGP is commonly analyzed with IP flow data (e.g. IPFIX, NetFlow). Leveraging big data methods used to analyze IP flows also to analyze BGP frequently results in a misunderstanding that BGP data can be stored and scaled in a similar fashion as IP flow data. BGP big data is both immutable and mutable whereas IP flow data is primarily immutable. Many of the current data stores and platforms efficiently scale immutable data objects covering only partially what is needed to analyze BGP data. In this presentation, I will explain in detail how BGP big data requires more than a single method of data storage and handling. I will provide examples using RouteViews data showing how SNAS effectively scales to monitor current states and historical changes for all RouteView peers.

     

    Bio: Tim Evens has over 20 years of experience as a network engineer and software developer in a wide range of industries including carrier and internet service providers, financial trading, healthcare, retail and technology. Tim has been active as a double CCIE for over 16 years. In his current position, Tim is a Principle Engineer in the Chief Technology and Architecture Office at Cisco. His current projects include a variety of network data analytics, network automation and programming. He is the primary developer and architect of SNAS.

     

  • NETPerfTrace–Predicting Internet Path Dynamics and Performance with Machine Learning

    Sarah Wassermann (Université de Liège), Pedro Casas (AIT Austrian Institute of Technology), and Thibaut Cuvelier and Benoit Donnet (Université de Liège)

    • Abstract:

      We study the problem of predicting Internet path changes and path performance using traceroute measurements and machine learning models. Path changes are frequently linked to path inflation and performance degradation, therefore the relevance of the problem. We introduce NETPerfTrace, an Internet Path Tracking system to forecast path changes and path latency variations. By relying on decision trees and using empirical distribution-based input features, we show that NETPerfTrace can predict (i) the remaining life time of a path before it actually changes and (ii) the number of path changes in a certain time period with relatively high accuracy. Through extensive evaluation, we demonstrate that NETPerfTrace highly outperforms DTRACK, a previous system with the same prediction targets. NETPerfTrace also offers path performance forecasting capabilities. In particular, our tool can predict path latency metrics, providing a system which can not only predict path changes, but also forecast their impact in terms of performance variations. We release NETPerfTrace as open software to the networking community, as well as all evaluation datasets.

       

  • Neural Network Based Wavelength Assignment in Optical Switching

    Craig Gutterman (Columbia University), Weiyang Mo, Shengxiang Zhu, Yao Li, and Daniel C. Kilper (University of Arizona), and Gil Zussman (Columbia University)

    • Abstract:

      Greater network flexibility through software defined networking and the growth of high bandwidth services are motivating faster service provisioning and capacity management in the optical layer. These functionalities require increased capacity along with rapid reconfiguration of network resources. Recent advances in optical hardware can enable a dramatic reduction in wavelength provisioning times in optical circuit switched networks. To support such operations, it is imperative to reconfigure the network without causing a drop in service quality to existing users. Therefore, we present a system that uses neural networks to predict the dynamic response of an optically circuit-switched 90-channel multi-hop Reconfigurable Optical Add-Drop Multiplexer (ROADM) network. The neural network is able to recommend wavelength assignments that contain the power excursion to less than 0.5 dB with a precision of over 99

       

  • 3:30pm - 4:00pm Coffee Break (Foyer)

  • 4:00pm - 4:45pm Session 4: End User Tracking with Big Data Analytics

    Session Chair: TBD

    Room: Optimist Room (Luskin Center)

  • Call Detail Records for Human Mobility Studies: Taking Stock of the Situation in the "Always Connected Era"

    Pierdomenico Fiadino and Víctor Ponce-López (EURECAT), Juan Antonio Torrero-Gonzalez (ORANGE SPAIN), Marc Torrent-Moreno (EURECAT), and Alessandro D'Alconzo (AIT)

    • Abstract:

      The exploitation of cellular network data for studying human mobility has been a popular research topic in the last decade. Indeed, mobile terminals could be considered ubiquitous sensors that allow the observation of human movements on large scale without the need of relying on non-scalable techniques, such as surveys, or dedicated and expensive monitoring infrastructures. In particular, Call Detail Records (CDRs), collected by operators for billing purposes, have been extensively employed due to their rather large availability, compared to other types of cellular data (e.g., signaling). Despite the interest aroused around this topic, the research community has generally agreed about the scarcity of information provided by CDRs: the position of mobile terminals is logged when some kind of activity (calls, SMS, data connections) occurs, which translates in a picture of mobility somehow biased by the activity degree of users. By studying two datasets collected by a Nation-wide operator in 2014 and 2016, we show that the situation has drastically changed in terms of data volume and quality. The increase of flat data plans and the higher penetration of “always connected” terminals have driven up the number of recorded CDRs, providing higher temporal accuracy for users’ locations.

       

  • Users' Fingerprinting Techniques from TCP Traffic

    Luca Vassio, Danilo Giordano, Martino Trevisan, and Marco Mellia (Politecnico di Torino) and Ana Paula Couto da Silva (Universidade Federal de Minas Gerais)

    • Abstract:

      Encryption at the application layer is often promoted to protect privacy, i.e., to prevent someone in the network from observing users’ communications. In this work we explore how to build a profile for a target user by observing only the names of the services contacted during browsing, names that are still not encrypted and easily accessible from passive probes. Would it be possible to uniquely identify a target user from a large population that accesses the same network? Aiming at verifying if and how this is possible, we propose and compare three methodologies to compute similarities between users’ profiles. We use real data collected in networks, evaluate and discuss performance and the impact of quality of data being used. To this end, we propose a machine learning methodology to extract the services intentionally requested by users, which turn out to be important for the profiling purpose. Results show that the classification problem can be solved with good accuracy (up to 94%), provided some ingenuity is used to build the model.

       

  • 4:45pm - 5:30pm Panel

    Room: Optimist Room (Luskin Center)

  • Challenges and Opportunities for Big-DAMA

    Alberto Dainotti (CAIDA UC San Diego), Constantine Dovrolis (Georgia Institute of Technology), Alex C. Snoeren (UC San Diego), Tim Evens (Cisco)

Call For Papers

Big-DAMA 2017

Big data is transforming the world, and the data communication networks domain is not an exception. While the success of big data and machine learning in data communication networks has been so far mastered by the big players of the Internet - such as Google, network operators, practitioners and researchers have at their reach a matchless opportunity to also ride on the success of the big data wave. The complexity of today networks has dramatically increased in the last few years, making it more important and challenging to design scalable network measurement and analysis techniques and tools. Critical applications such as network monitoring, network security, or dynamic network management require fast mechanisms for online analysis of thousands of events per second, as well as efficient techniques for offline analysis of massive historical data. Besides characterization, making operational sense out of the ever-growing amount of network measurements is becoming a major challenge.

Despite recent major advances of big data analysis frameworks, their application to the network measurements analysis domain remains poorly understood and investigated, and most of the proposed solutions are in-house and difficult to benchmark. Furthermore, machine learning and big data analytic techniques able to characterize, detect, locate and understand complex behaviors and complex systems promise to shed light on this enormous amount of data, but smart and scalable approaches must be conceived to make them applicable to the networking practice. Last but not least, the explosion in volume and heterogeneity of data measurements generated across the entire network stack is opening the door to innovative solutions and out-of-the-box ideas to improve current networks, and many other networking applications besides monitoring and analysis are becoming more data and measurements driven than ever.

The Big-DAMA workshop seeks for novel contributions in the field of machine learning and big data analytics applied to data communication network analysis, including scalable analytic techniques and frameworks capable of collecting and analyzing both online streams and offline massive datasets, network traffic traces, topological data, and performance measurements. In addition, Big-DAMA looks for novel and out-of-the-box approaches and use cases related to the application of machine learning and big data in Networking. The workshop will allow researchers and practitioners to share their experiences on designing and developing big data applications for networking, to discuss the open issues related to the application of machine learning into networking problems and to share new ideas and techniques for big data analysis in data communication networks.

Topics of Interest

We encourage both mature and positioning submissions describing systems, platforms, algorithms and applications addressing all facets of the application of machine learning and big data to the analysis of data communication networks. We are particularly interested in disruptive and novel ideas that permit to unleash the power of machine learning and big data in the networking domain. The following is a non-exhaustive list of topics:

  • Big networking data analysis
  • Machine learning, data mining and big data analytics in networking
  • Data analytics for network measurements mining
  • Stream-based machine learning for networking
  • Big data analysis frameworks for network monitoring data
  • Distributed monitoring architectures for big networking data
  • Networking-based benchmarks for big data analysis solutions
  • Learning algorithms and tools for network anomaly detection and security
  • Network anomaly diagnosis through big networking data
  • Machine learning and big data analytics for network management
  • Big networking data integrity and privacy
  • Big data analytics and visualization for traffic analysis
  • Research challenges on machine learning and big data analytics for networking
  • Collection and processing systems for large-scale topology and performance measurements

Contact workshop co-chairs.

Submission Instructions

Submissions must be original, unpublished work, and not under consideration at another conference or journal. Submitted papers must be at most six (6) pages long, including all figures, tables, references, and appendices in two-column 10pt ACM format. Papers must include authors names and affiliations for single-blind peer reviewing by the PC. Authors of accepted papers are expected to present their papers at the workshop.

Please submit your paper via https://sigcomm17big-dama.hotcrp.com/.

Important Dates

  • March 17, 2017 March 31, 2017

    Paper registration deadline

  • March 24, 2017 March 31, 2017

    Paper submission deadline

  • April 28, 2017

    Paper acceptance notification

  • May 19, 2017 May 31, 2017

    Camera ready deadline

Authors Take Note

The official publication date is the date the proceedings are made available in the ACM Digital Library. This date may be up to TWO WEEKS prior to the first day of your conference. The official publication date affects the deadline for any patent filings related to published work.

Committees