Data-mining the Internet: What we know, what we don't, and how we can learn more?
Michalis Faloutsos, U.C. Riverside and Christos Faloutsos, CMU
(Full Day - Day 2)

Content:

What do we know about the Internet? How can we learn more about it? Despite significant research efforts, we actually know very little about the Internet. Furthermore, commonly used data analysis techniques based on averages, standard deviation and Poisson processes have exhausted their capabilities. Unfortunately, most network researchers are unaware of the wealth of data-mining and statistical analysis tools available. The goal of this tutorial is to bridge the gap between networking and data-mining research.

First, we present the state of the art of WHAT we know about modeling and simulating the Internet. Second, we present cutting edge techniques on HOW to further our understanding of the network. The following two scenarios describe the type of questions this tutorial will answer:

· Scenario 1 (WHAT): You want to simulate your new protocol. What topology should you use? What is the distribution of sources and destinations? What is the traffic intensity of each connection? What kind of background traffic should you use?

· Scenario 2 (HOW): You just obtained large amount of measured data regarding round trip delays among several node pairs over a few hours. How can you characterize it? How do you compare the delays between different end-points? How do you cluster "similar" round-trip behavior? How can you identify abnormal behavior such as a Distributed Denial of Service Attack (DDoS)?

Intended Audience:

This tutorial is intended for network researchers who want to a) conduct realistic simulations, b) analyze real data by identifying patterns and abnormal behavior, and c) get quickly up-to-date with the latest data-mining tools. The tutorial is self-contained so that it can be accessible to students, while at the same time, it will contain useful material for seasoned networking researchers from academia and industry.

Speaker's Biographies:

The instructors have collaborated for the past 4 years, resulting in multiple joint papers. This joint work has been a fusion of the two research focuses of the collaborators: networking and data-mining. This work has focused on Internet modeling using the advanced data-mining techniques and has lead to discoveries that would not have been feasible otherwise.

MICHALIS FALOUTSOS received a B.Sc. degree in Electrical Engineering (1993) from the National Technical University of Athens, Greece and a M.Sc. and Ph.D. degrees in Computer Science from the University of Toronto, Canada (1999). He is currently an assistant professor at the University of California Riverside. He received a NSF CAREER award (2000) and two major DARPA grants. He has co-authored with Christos and Petros Faloutsos, the highly cited paper "On Powerlaws of the Internet Topology" (SIGCOMM'99), which renewed the interest of the community in modeling the Internet topology. His interests include Internet measurements, multicast protocols, real-time communications, and wireless networks.

CHRISTOS FALOUTSOS received a B.Sc. degree in Electrical Engineering (1981) from the National Technical University of Athens, Greece and a M.Sc. and Ph.D. degrees in Computer Science from the University of Toronto, Canada. He is currently a professor at Carnegie Mellon University. He has received an NSF Presidential Young Investigator Award (1989), three "best paper" awards (SIGMOD 94, VLDB 97, KDD01 (runner-up)), and four teaching awards. He has published over 100 refereed articles, one monograph, and holds four patents. His research interests include data-mining, network analysis, indexing in relational and multimedia databases.