# Program

• 09:00 - 09:30 - Opening session
• Inferring BGP Blackholing Activity in the Internet  long
Vasileios Giotsas (CAIDA/TU Berlin), Georgios Smaragdakis (MIT / TU Berlin), Christoph Dietzel (TU Berlin / DE-CIX), Philipp Richter and Anja Feldmann (TU Berlin), Arthur Berger (MIT / Akamai)
Abstract: The Border Gateway Protocol (BGP) has been used for decades as the de facto protocol to exchange reachability information among networks in the Internet. However, little is known about how this protocol is used to restrict reachability to selected destinations, e.g., that are under attack. While such a feature, BGP blackholing, has been available for some time, we lack a systematic study of its Internet-wide adoption, practices, and network efficacy, as well as the profile of blackholed destinations. In this paper, we develop and evaluate a methodology to automatically detect BGP blackholing activity in the wild. We apply our method to both public and private BGP datasets. We find that hundreds of networks, including large transit providers, as well as about 50 Internet eXchange Points (IXPs) offer blackholing service to their customers, peers, and members. Between 2014--2017, the number of blackholed prefixes increased by a factor of 6, peaking at 5K concurrently blackholed prefixes by up to 400 Autonomous Systems. We assess the effect of blackholing on the data plane using both targeted active measurements as well as passive datasets, finding that blackholing is indeed highly effective in dropping traffic before it reaches its destination, though it also discards legitimate traffic. We augment our findings with an analysis of the target IP addresses of blackholing. Our tools and insights are relevant for operators considering offering or using BGP blackholing services as well as for researchers studying DDoS mitigation in the Internet.
• Pinpointing Delay and Forwarding Anomalies Using Large-Scale Traceroute Measurements  long
Romain Fontugne (IIJ Research Lab), Emile Aben (RIPE NCC), Cristel Pelsser (University of Strasbourg / CNRS), Randy Bush (IIJ Research Lab)
Abstract: Understanding data plane health is essential to improving Internet reliability and usability. For instance, detecting disruptions in distant networks can identify repairable connectivity problems. Currently this task is difficult and time consuming as operators have poor visibility beyond their network's border. In this paper we leverage the diversity of RIPE Atlas traceroute measurements to solve the classic problem of monitoring in-network delays and get credible delay change estimations to monitor network conditions in the wild. We demonstrate a set of complementary methods to detect network disruptions and report them in near real time. The first method detects delay changes for intermediate links in traceroutes. Second, a packet forwarding model predicts traffic paths and identifies faulty routers and links in cases of packet loss. In addition, we define an alarm score that aggregates changes into a single value per AS in order to easily monitor its sanity, reducing the effect of uninteresting alarms. Using only existing public data we monitor hundreds of thousands of link delays while adding no burden to the network. We present three cases demonstrating that the proposed methods detect real disruptions and provide valuable insights, as well as surprising findings, on the location and impact of the identified events.
• Through the Wormhole: Tracking Invisible MPLS Tunnels  long
Yves Vanaubel (Université de Liège), Pascal Mérindol and Jean-Jacques Pansiot (Université de Strasbourg), Benoit Donnet (Université de Liège)
Abstract: For years now, researches on Internet topology are mainly conducted through active measurements. For instance, CAIDA builds router level topologies on top of IP level traces obtained with traceroute. The resulting graphs contain a significant amount of nodes with a very large degree, often exceeding the actual number of interfaces of a router. Although this property may result from inaccurate alias resolution, we believe that opaque MPLS clouds made of invisible tunnels are the main cause. Using Layer-2 technologies such as MPLS, routers can be configured to hide internal IP hops to traceroute. Consequently, an entry point of an MPLS network appears as the neighbor of all exit points and the whole Layer-3 network turns into a dense mesh of high degree nodes. This paper tackles three problems: the MPLS deployment underestimation, the revelation of IP hops hidden by MPLS tunnels, and the overestimation of high degree nodes. We develop new measurement techniques able to reveal the presence and content of invisible MPLS tunnels. We validate them through emulation and perform a large-scale measurement campaign targeting suspicious networks on which we apply statistical analysis. Finally, based on our dataset, we look at basic graph properties impacted by invisible tunnels.
• 10:45 - 11:15 - Break
• Challenges in Inferring Internet Congestion using Throughput Measurements  long
Srikanth Sundaresan (Princeton University), Danny Lee (Georgia Tech), Xiaohong Deng and Yun Feng (University of New South Wales), Amogh Dhamdhere (CAIDA/UC San Diego)
Abstract: We revisit the use of crowdsourced throughput measurements to infer and localize congestion on end-to-end paths, with particular focus on points of interconnections between ISPs. We analyze three challenges with this approach. First, accurately identifying which link on the path is congested requires fine-grained network tomography techniques not supported by existing throughput measurement platforms. Coarse-grained network tomography can perform this link identification under certain topological conditions, but we show that these conditions do not always hold on the global Internet. Second, existing measurement platforms provide limited visibility of paths to popular web content sources, and only capture a small fraction of interconnections between ISPs. Third, crowdsourcing measurements inherently risks sample bias: using measurements from volunteers across the Internet leads to uneven distribution of samples across time of day, access link speeds, and home network conditions. Finally, it is not clear how large a drop in throughput to interpret as evidence of congestion. We investigate these challenges in detail, and offer guidelines for deployment of measurement infrastructure, strategies, and technologies that can address empirical gaps in our understanding of congestion on the Internet.
• Investigating the Causes of Congestion on the African IXP Substrate  short
Rodérick Fanou (IMDEA Networks Institute and Universidad Carlos III de Madrid), Francisco Valera (Universidad Carlos III de Madrid), Amogh Dhamdhere (CAIDA/UC San Diego)
Abstract: The goal of this work is to investigate the prevalence, causes, and impact of congestion on the African IXP substrate. Towards this end, we deployed Ark probes (within networks peering) at six African IXPs and ran the time-sequence latency probes (TSLP) algorithm, thereby collecting latency measurements to both ends of each mapped AS link for a whole year. We were able to detect congestion events and quantify their periods and magnitudes at four IXPs. We then verified the events and investigated the causes by interviewing the IXP operators. Our results show that only 2.2% of the discovered IP links experienced (sustained or transient) congestion during our measurement period. Our findings suggest the need for ISPs to carefully monitor the provision of their peering links, so as to avoid or quickly mitigate the occurrence of congestion. Regulators may also define the maximum level of packet loss in those links to provide some protection to communications routed through local IXPs.
• TCP Congestion Signatures  long
Srikanth Sundaresan (Princeton University), Amogh Dhamdhere (CAIDA/UCSD), Mark Allman (ICSI), k claffy (CAIDA/UCSD)
Abstract: We develop and validate Internet path measurement techniques to distinguish congestion experienced when a flow self-induces congestion in the path from when a flow is affected by an already congested path. One application of this technique is for speed tests, when the user is affected by congestion either in the last mile or in an interconnect link. This difference is important because in the latter case, the user is constrained by their service plan (i.e., what they are paying for), and in the former case, they are constrained by forces outside of their control. We exploit TCP congestion control dynamics to distinguish these cases for Internet paths that are predominantly TCP traffic. In TCP terms, we re-articulate the question: was a TCP flow bottlenecked by an already congested (possibly interconnect) link, or did it induce congestion in an otherwise idle (possibly a last-mile) link? TCP congestion control affects the round-trip time (RTT) of packets within the flow (i.e., the flow RTT): an endpoint sends packets at higher throughput, increasing the occupancy of the bottleneck buffer, thereby increasing the RTT of packets in the flow. We show that two simple, statistical metrics derived from the flow RTT during the slow start period — its coefficient of variation, and the normalized difference between the maximum and minimum RTT — can robustly identify which type of congestion the flow encounters. We use extensive controlled experiments to demonstrate that our technique works with up to 90% accuracy. We also evaluate our techniques using two unique real-world datasets of TCP throughput measurements using Measurement Lab data and the Ark platform. We find up to 99% accuracy in detecting self-induced congestion, and up to 85% accuracy in detecting external congestion. Our results can benefit regulators of interconnection markets, content providers trying to improve customer service, and users trying to understand whether poor performance is something they can fix by upgrading their service tier.
• High-Resolution Measurement of Data Center Microbursts  short
Qiao Zhang (University of Washington), Vincent Liu (University of Pennsylvannia), Hongyi Zeng (Facebook), Arvind Krishnamurthy (University of Washington)
Abstract: Data centers house some of the largest, fastest networks in the world. In contrast to and as a result of their speed, these networks operate on very small timescales—a 100 Gbps port processes a single packet in at most 500 ns with end-to-end network latencies of under a millisecond. In this study, we explore the fine-grained behaviors of a large production datacenter using extremely high-resolution measurements (10s to100s of microsecond) of rack-level traffic. Our results show that characterizing network events like congestion and synchronized behavior in data centers does indeed require the use of such measurements. In fact, we observe that more than 70% of bursts on the racks we measured are sustained for at most tens of microseconds: a range that is orders of magnitude higher-resolution than most deployed measurement frameworks. Congestion events observed by less granular measurements are likely collections of smaller microbursts. Thus, we find that traffic at the edge is significantly less balanced than other metrics might suggest. Beyond the implications for measurement granularity, we hope these results will inform future datacenter load balancing and congestion control protocols.
• 12:35 - 14:00 - Lunch
• Detection, Classification, and Analysis of Inter-Domain Traffic with Spoofed Source IP Addresses  long
Franziska Lichtblau, Florian Streibelt, Thorben Krüger, Philipp Richter, and Anja Feldmann (TU Berlin)
Abstract: IP traffic with forged source addresses - spoofed traffic - enables a series of threats ranging from the impersonation of remote hosts to massive Denial of Service attacks. Consequently, IP address spoofing received considerable attention with efforts to either suppress spoofing, to mitigate its consequences, or to actively measure the ability to spoof in individual networks. However, as of today, we still lack a comprehensive understanding both of the prevalence and the characteristics of spoofed traffic "in the wild" as well as of the networks that inject spoofed traffic into the Internet. In this paper, we propose and evaluate a method to passively detect spoofed packets in traffic exchanged between networks in the inter-domain Internet. Our detection mechanism identifies both source IP addresses that should never be visible in the inter-domain Internet (i.e., unrouted and bogon sources), as well as source addresses that should not be sourced by individual networks, as inferred from BGP routing information. We apply our method to classify the traffic exchanged between more than 700 networks at a large European IXP. We find that the majority of connected networks do not, or not consistently, filter their outgoing traffic. Filtering strategies and contributions of spoofed traffic vary heavily across networks of different types and sizes. Finally, we study qualitative characteristics of spoofed traffic, both regarding application popularity as well as structural properties of addresses. Combining our observations, we identify and study dominant attack patterns.
• Millions of Targets Under Attack: a Macroscopic Characterization of the DoS Ecosystem  long
Mattijs Jonker (University of Twente), Alistair King (CAIDA, UC San Diego), Johannes Krupp and Christian Rossow (CISPA, Saarland University), Anna Sperotto (University of Twente), Alberto Dainotti (CAIDA, UC San Diego)
Abstract: Denial-of-Service attacks have rapidly increased in terms of frequency and intensity, steadily becoming one of the biggest threats to Internet stability and reliability. However, a rigorous comprehensive characterization of this phenomenon, and of countermeasures to mitigate the associated risks, faces many infrastructure and analytic challenges. We make progress toward this goal, by introducing and applying a new framework to enable a macroscopic characterization of attacks, attack targets, and DDoS Protection Services (DPSs). Our analysis leverages data from four independent global Internet measurement infrastructures over the last two years: backscatter traffic to a large network telescope; logs from amplification honeypots; a DNS measurement platform covering 60% of the current namespace; and a DNS-based data set focusing on DPS adoption. Our results reveal the massive scale of the DoS problem, including an eye-opening statistic that one-third of all /24 networks recently estimated to be active on the Internet have suffered at least one DoS attack over the last two years. We also discovered that often targets are simultaneously hit by different types of attacks. In our data, Web servers were the most prominent attack target; an average of 3% of the Web sites in .com, .net, and .org were involved with attacks, daily. Finally, we shed light on factors influencing migration to a DPS.
• Your State is Not Mine: A Closer Look at Evading Stateful Internet Censorship  long
Zhongjie Wang, Yue Cao, Zhiyun Qian, Chengyu Song, and Srikanth V. Krishnamurthy (University of California, Riverside)
Abstract: Understanding the behaviors of, and evading state-level Internet-scale censorship systems such as the Great Firewall (GFW) of China, has emerged as a research problem of great interest. One line of evasion is the development of techniques that leverage the possibility that the TCP state maintained on the GFW may not represent the state at end-hosts. In this paper we undertake, arguably, the most extensive measurement study on TCP-level GFW evasion techniques, with several vantage points within and outside China, and with clients subscribed to multiple ISPs. We find that the state-of-the-art evasion techniques are no longer very effective on the GFW. Our study further reveals that the primary reason that causes these failures is the evolution of GFW over time. In addition, other factors such as the presence of middleboxes on the route from the client to the server also contribute to previously unexpected behaviors. Our measurement study leads us to new understandings of the GFW and new evasion techniques. Evaluations of our new evasion strategies show that our new techniques provide much higher success rates of (compared to prior schemes) ≈ 90 % or higher. Our results further validate our new understandings of the GFW’s evolved behaviors. We also develop a measurement-driven tool INTANG, that systematically looks for and finds the best strategy that works with a server and network path. Our measurements show that INTANG can yield near perfect evasion rates and is extremely effective in aiding various protocols such as HTTP, DNS over TCP, and Tor in evading the GFW.
• lib·erate, (n): A library for exposing (traffic-classification) rules and avoiding them efficiently  long
Fangfan Li (Northeastern University), Abbas Razaghpanah (Stony Brook University), Arash Molavi Kakhki (Northeastern University), Arian Akhavan Niaki (Stony Brook University), David Choffnes (Northeastern University), Phillipa Gill (University of Massachusetts Amherst), Alan Mislove (Northeastern University)
Abstract: ISPs leverage middleboxes to implement a variety of network management policies (e.g., prioritizing or blocking traffic) in their networks. While such policies can be beneficial (e.g., blocking malware) they also raise issues such as network neutrality and freedom of speech when used for application-specific differentiation and censorship. There is in general a poor understanding of how such policies are implemented in practice, and how they can be evaded efficiently. As a result, most circumvention solutions are brittle, point solutions based on manual analysis. This paper presents the design and implementation of lib·erate, a general-purpose tool for automatically identifying middlebox policies, reverse-engineering their implementations, and adaptively deploying custom circumvention techniques. Our key insight is that differentiation is necessarily implemented by middleboxes using incomplete models of end-to-end communication protocols at the network and transport layers. lib·erate conducts targeted network measurements to identify the corresponding inconsistencies and leverages this information to transform arbitrary network traffic such that it is purposefully misclassified (e.g., to avoid shaping or censorship). Unlike previous work, our approach is application-agnostic, can be deployed unilaterally (i.e., only at one endpoint) on unmodified applications via a linked library or transparent proxy, and can adapt to changes to classifiers at runtime. We evaluate lib·erate both in a testbed environment and in operational networks that throttle or block traffic based on DPI-based classifier rules, and show that our approach is effective across a wide range of middlebox deployments.
• 15:40 - 16:10 - Break
• 16:10 - 17:45 - Posters
• 18:00 - 20:00 - Reception
• If you are not paying for it, you are the product: How much do advertisers pay to reach you?  long
Panagiotis Papadopoulos (FORTH-ICS, Greece), Nicolas Kourtellis (Telefonica Research, Spain), Pablo Rodriguez Rodriguez (Telefonica Alpha, Spain), Nikolaos Laoutaris (Data Transparency Lab)
Abstract: Online advertising is progressively moving towards a programmatic model in which ads are matched to actual interests of individuals collected as they browse the web. Letting the huge debate around privacy aside, a very important question in this area, for which little is known, is: How much do advertisers pay to reach an individual? In this study, we develop a first of its kind methodology for computing exactly that -- the price paid for a web user by the ad ecosystem -- and we do that in real time. Our approach is based on tapping on the Real Time Bidding (RTB) protocol to collect cleartext and encrypted prices for winning bids paid by advertisers in order to place targeted ads. Our main technical contribution is a method for tallying winning bids even when they are encrypted. We achieve this by training a model using as ground truth prices obtained by running our own probe'' ad-campaigns. We design our methodology through a browser extension and a back-end server that provides it with fresh models for encrypted bids. We validate our methodology using a one year long trace of 1600 mobile users and demonstrate that it can estimate a user's advertising worth with more than 82% accuracy.
• Exploring the Dynamics of Search Advertiser Fraud  long
Joe DeBlasio (UC San Diego), Saikat Guha (Microsoft Research India), Geoffrey M. Voelker and Alex C. Snoeren (UC San Diego)
• The Ad Wars: Retrospective Measurement and Analysis of Anti-Adblock Filter Lists  long
Umar Iqbal and Zubair Shafiq (The University of Iowa), Zhiyun Qian (University of California-Riverside)
• On the Structure and Characteristics of User Agent Strings  short
Jeff Kline and Aaron Cahn (comScore), Paul Barford (comScore, University of Wisconsin - Madison), Joel Sommers (Colgate University)
Abstract: User agent (UA) strings transmitted during HTTP transactions convey client system configuration details to ensure that content returned by a server is appropriate for the requesting host. As such, analysis of UA strings and their structure offers a unique perspective on active client systems in the Internet and when tracked longitudinally, offers a perspective on the nature of system and configuration dynamics. In this paper, we describe our study of UA string characteristics. Our work is based on analyzing a unique corpus of over 1B UA strings collected over a period of 2 years by comScore. We begin by analyzing the general characteristics of UA strings, focusing on the most prevalent strings and dynamic behaviors. We identify the top 10 most popular User Agents, which account for 26\% of total daily volume. These strings describe the expected instances of popular platforms such as Microsoft, Apple and Google. We then report on the characteristics of low-volume UA strings, which has important implications for unique device identification. We show that this class of user agent generates the overwhelming majority of traffic, with between 2M and 10M instances observed each day. We show that the distribution of UA strings has temporal dependence and we show the distribution measured depends on the type of content served. Finally, we report on two large-volume UA anomalies characterized by web browsers sending false and misleading UAs in their web requests.
• 10:30 - 11:00 - Break
• Cell Spotting: Studying the role of cellular networks in the Internet  long
John P. Rula (Northwestern University/Akamai), Fabian E. Bustamante (Northwestern University), Moritz Steiner (Akamai)
Abstract: The impressive growth of the mobile Internet has motivated several industry reports retelling the story in terms of number of devices or subscriptions sold per regions, or the increase in mobile tra c, both WiFi and cellular. Yet, despite the abundance of such reports, we still lack an understanding of the impact of cellular networks around the world. We present the first comprehensive analysis of global cellular networks. We describe an approach to accurately identify cellular network IP addresses using the Network Information API, a non-standard Javascript API in several mobile browsers, and show its effectiveness in a range cellular network configurations. We combine this approach with the vantage point of one of the world’s largest CDNs, with servers located in 1,450 networks and clients distributed across across 245 countries, to characterize cellular access around the globe. We find that the majority of cellular networks exist as mixed networks (i.e., networks that share both fixed-line and cellular devices), requiring prefix – not ASN – level identification. We discover over 350 thousand /24 and 23 thousand /48 cellular IPv4 and IPv6 prefixes respectively. By utilizing addresses level traffic from the same CDN, we calculate the fraction of traffic coming from cellular addresses. Overall we find that cellular traffic comprises 16.2% of the CDN’s global traffic, and that cellular traffic ranges widely in importance between countries, from capturing nearly 96% of all traffic in Ghana to just 12.1% in France.
• Measurement-based, Practical Techniques to Improve 802.11ac Performance  long
Apurv Bhartia, Bo Chen, Feng Wang, Derrick Pallas, Raluca Musaloiu-E., Ted Tsung-Te Lai, and Hao Ma (Cisco Meraki)
Abstract: Devices implementing newer wireless standards continue to displace older wireless technology. As 802.11ac access points (APs) are rapidly adopted in enterprise environments, new challenges arise.This paper first presents an overview of trends in enterprise wireless networks based on a large-scale measurement study, in which we collect data from an anonymous subset of millions of radio access points in hundreds of thousands of real-world deployments. Based on the observed data and our experience deploying wireless networks at scale, we then propose two techniques that we have implemented in Meraki APs to improve both overall network capacity and performance perceived by end users: (i) a dynamic channel assignment algorithm, TurboCA, that adjusts to frequent RF condition changes,and (ii) a novel approach, FastACK, that improves the end-to-end performance of TCP traversing high-throughput wireless links. Finally,we evaluate TurboCA with metrics taken from a variety of real-world networks and evaluate TCP performance of FastACK with extensive testbed experiments.
• Dissecting VOD Services for Cellular: Performance, Root Causes and Best Practices  long
Shichang Xu and Z. Morley Mao (University of Michigan), Subhabrata Sen (AT&T Labs Research), Yunhan Jia (University of Michigan)
Abstract: HTTP Adaptive Streaming (HAS) has emerged as the predominant technique for transmitting video over cellular for most content providers today. While mobile video streaming is extremely popular, delivering good streaming experience over cellular networks is technically very challenging, and involves complex interacting factors. We conduct a detailed measurement study of a wide cross-section of popular streaming video-on-demand (VOD) services to develop a holistic understanding of these services' design and performance. We identify performance issues and develop effective practical best practice solutions to mitigate these challenges. By extending the understanding of how different, potentially interacting components of service design impact performance, our findings can help developers build streaming services with better performance.
• Connected cars in a cellular network: A measurement study  short
Carlos E. Andrade, Simon D. Byers, Vijay Gopalakrishnan, Emir Halepovic, David J. Poole, Lien K. Tran, and Christopher T. Volinsky (AT&T Labs - Research)
Abstract: Connected cars are a rapidly growing segment of Internet-of- Things (IoT). While they already use cellular networks to support emergency response, in-car WiFi hotspots and infotainment, there is also a push towards updating their firmware Over-The-Air (FOTA). With millions of connected cars expected to be deployed over the next several years, and more importantly persist in the network for a long time, it is important to understand their behavior, usage patterns, and impact — both in terms of their experience, as well as other users. Using one million connected cars on a production cellular network, we conduct network-scale measurements of over one billion radio connections to understand various aspects including their spatial and temporal connectivity patterns, the network conditions they face, use and handovers across various radio frequencies and mobility patterns. Our measurement study reveals that connected cars have distinct sets of characteristics, including those similar to regular smartphones (e.g. overall diurnal pattern), those similar to IoT devices (e.g. mostly short network sessions), but also some that belong to neither type (e.g. high mobility). These insights are invaluable in understanding and modeling connected cars in a cellular network and in designing strategies to manage their data demand.
• 12:30 - 13:55 - Lunch
• Target Generation for Internet-wide IPv6 Scanning  long
Austin Murdock, Frank Li, and Paul Bramsen (University of California, Berkeley), Zakir Durumeric (International Computer Science Institute), Vern Paxson (University of California, Berkeley)
Abstract: Fast IPv4 scanning has enabled researchers to answer a wealth of new security and measurement questions. However, while increased network speeds and computational power have enabled comprehensive scans of the IPv4 address space, a brute-force approach does not scale to IPv6. Systems are limited to scanning a small fraction of the IPv6 address space and require an algorithmic approach to determine a small set of candidate addresses to probe. In this paper, we first explore the considerations that guide designing such algorithms. We introduce a new approach that identifies dense address space regions from a set of known "seed" addresses and generates a set of candidates to scan. We compare our algorithm 6Gen against Entropy/IP, the current state of the art, finding that we can recover between 1-8 times as many addresses for the five candidate datasets considered in the prior work. However, during our analysis, we uncover widespread IP aliasing in IPv6 networks. We discuss its effect on target generation and explore preliminary approaches for detecting aliased regions.
• PacketLab: A Universal Measurement Endpoint Interface  short
Kirill Levchenko (UC San Diego), Amogh Dhamdhere, Bradley Huffaker, and kc claffy (CAIDA), Mark Allman (ICSI), Vern Paxson (UC Berkeley/ICSI)
Abstract: The right vantage point is critical to the success of any active measurement. However, most research groups cannot afford to design, deploy, and maintain their own network of measurement endpoints, and thus rely measurement infrastructure shared by others. Unfortunately, the mechanism by which we share access to measurement endpoints today is not frictionless; indeed, issues of compatibility, trust, and a lack of incentives get in the way of efficiently sharing measurement infrastructure. We propose PacketLab, a universal measurement endpoint interface that lowers the barriers faced by experimenters and measurement endpoint operators. PacketLab is built on two key ideas: It moves the measurement logic out of the endpoint to a separate experiment control server, making each endpoint a lightweight packet source/sink. At the same time, it provides a way to delegate access to measurement endpoints while retaining fine-grained control over how one's endpoints are used by others, allowing research groups to share measurement infrastructure with each other with little overhead. By making the endpoint interface simple, we also make it easier to deploy measurement endpoints on any device anywhere, for any period of time the owner chooses. We offer PacketLab as a candidate measurement interface that can accommodate the research community's demand for future global-scale Internet measurement.
• Automatic Metadata Generation for Active Measurement  short
Joel Sommers (Colgate University), Ramakrishnan Durairajan (University of Oregon), Paul Barford (University of Wisconsin - Madison and comScore, Inc.)
Abstract: Empirical research in the Internet is fraught with challenges. Among these is the possibility that local environmental conditions (e.g., CPU load or network load) introduce unexpected bias or artifacts in measurements that lead to erroneous conclusions. In this paper, we describe a framework for local environment monitoring that is designed to be used during Internet measurement experiments. The goals of our work are to provide a critical, expanded perspective on measurement results and to improve the opportunity for reproducibility of results. We instantiate our framework in a tool we call SoMeta, which monitors the local environment during active probe-based measurement experiments. We evaluate the runtime costs of SoMeta and conduct a series of experiments in which we intentionally perturb different aspects of the local environment during active probe-based measurements. Our experiments show how simple local monitoring can readily expose conditions that bias active probe-based measurement results. We conclude with a discussion of how our framework can be expanded to provide metadata for a broad range of Internet measurement experiments.
• A High-Performance Algorithm for Identifying Frequent Items in Data Streams  long
Daniel Anderson and Pryce Bevan (Georgetown University), Kevin Lang (Oath Research), Edo Liberty (Amazon), Lee Rhodes (Oath), Justin Thaler (Georgetown University)
Abstract: Estimating frequencies of items over data streams is a common building block in streaming data measurement and analysis. Misra and Gries introduced their seminal algorithm for the problem in 1982, and the problem has since been revisited many times due its practicality and applicability. We describe a highly optimized version of Misra and Gries' algorithm that is suitable for deployment in industrial settings. Our code is made public via an open source library called DataSketches that is already used by several companies and production systems. Our algorithm improves on two theoretical and practical aspects of prior work. First, it handles weighted updates in amortized constant time, a common requirement in practice. Second, it uses a simple and fast method for merging summaries that asymptotically improves on prior work even for unweighted streams. We describe experiments confirming that our algorithms are more efficient than prior proposals.
• Recursive Lattice Search: Hierarchical Heavy Hitters Revisited  short
Kenjiro Cho (IIJ)
Abstract: The multidimensional Hierarchical Heavy Hitter (HHH) problem identifies significant clusters in traffic across multiple planes such as source and destination addresses, and has been widely studied in the literature. A compact summary of HHHs provides an overview on complex traffic behavior and is a powerful means for traffic monitoring and anomaly detection. In this paper, we present a new efficient HHH algorithm which fits operational needs. Our key insight is to revisit the commonly accepted definition of HHH, and apply the Z-ordering to make use of a recursive partitioning algorithm. The proposed algorithm produces summary outputs comparable to or even better in practice than the existing algorithms, and runs orders of magnitude faster for bitwise aggregation. We have implemented the algorithm into our open-source tool and have made longitudinal datasets of backbone traffic openly available.
• 15:30 - 16:00 - Break
• Taking a Long Look at QUIC: An Approach for Rigorous Evaluation of Rapidly Evolving Transport Protocols  long
Arash Molavi Kakhki (Northeastern University), Samuel Jero (Purdue University), David Choffnes, Alan Mislove, and Cristina Nita-Rotaru (Northeastern University)
Abstract: Google’s QUIC protocol, which implements TCP-like properties at the application layer atop a UDP transport, is now used by the vast majority of Chrome clients accessing Google properties but has no formal state machine specification, limited analysis, and ad-hoc evaluations based on snapshots of the protocol implementation in a small number of environments. Further frustrating attempts to evaluate QUIC is the fact that the protocol is under rapid development, with extensive rewriting of the protocol occurring over the scale of months, making individual studies of the protocol obsolete before publication. Given this unique scenario, there is a need for alternative techniques for understanding and evaluating QUIC when compared with previous transport-layer protocols. First, we develop an approach that allows us to conduct analysis across multiple versions of QUIC to understand how code changes impact protocol effectiveness. Next, we instrument the source code to infer QUIC’s state machine from execution traces. With this model, we run QUIC in a large number of environments that include desktop and mobile, wired and wireless environments and use the state machine to understand differences in transport- and application-layer performance across multiple versions of QUIC and in different environments. QUIC generally outperforms TCP, but we also identified performance bugs related to window sizes, re-ordered packets, and multiplexing large number of small objects; further, we identify that QUIC’s performance diminishes on mobile devices and over cellular networks.
• Large-Scale Scanning of TCP's Initial Window  short
Jan Rüth, Christian Bormann, and Oliver Hohlfeld (RWTH Aachen University)
Abstract: Improving web performance is fueling the debate of sizing TCP’s initial congestion window (IW), which is a critical performance parameter especially for short-lived flows. This debate yielded several RFC updates to recommended IW sizes, e.g., an increase to IW10 in 2010. The current adoption of IW recommendations is, however, unknown. In this paper, we therefore conduct large-scale measurements covering the entire IPv4 space inferring the IW distribution size by probing HTTP and HTTPS servers. We present an HTTP and TLS scanning method implemented in ZMap, enabling quick estimations of IW sizes at Internet scale. For the first time since the standardization and implementation of IW 10, we shed light on the rugged landscape of IW configurations on the Internet.
• The Record Route Option is an Option!  short
Brian Goodchild (Rutgers Camden, USC, Columbia), Yi-Ching Chiu and Haonan Lu (USC), Rob Hansen (Northeastern), Matt Calder (USC, Microsoft), Dave Choffnes (Northeastern), Wyatt Lloyd (USC), Matthew Luckie (Waikato), Ethan Katz-Bassett (USC, Columbia)
Abstract: The IPv4 Record Route (RR) Option instructs routers to record their IP addresses in a packet. RR is subject to a nine hop limit and, traditionally, inconsistent support from routers. Recent changes in interdomain connectivity—the so-called “flattening Internet”—and new best practices for how routers should handle RR packets suggest that now is a good time to reassess the potential of the RR Option. We quantify the current utility of RR by issuing RR measurements from PlanetLab and M-Lab to every advertised BGP prefix. We find that 75% of addresses that respond to ping without RR also respond to ping with RR, and 66% of these RR-responsive addresses are within the nine hop limit of at least one vantage point. These numbers suggest the RR Option is a useful measurement primitive on today’s Internet.
• Initial Measurements of the Cuban Street Network  short
Eduardo Pujol (Universidad de las Ciencias Informáticas, Havana), Will Scott (University of Michigan), Eric Wustrow (University of Colorado), J. Alex Halderman (University of Michigan)
Abstract: Internet access in Cuba is severely constrained, due to limited availability, slow speeds, and high cost. Within this isolated environment, technology enthusiasts have constructed a disconnected but vibrant IP network that has grown organically to reach tens of thousands of households across Havana. We present the first detailed characterization of this deployment, which is known as the SNET, or Street Network. Working in collaboration with SNET operators, we describe the network's infrastructure and map its topology, and we measure bandwidth, available services, usage patterns, and user demographics. Qualitatively, we attempt to answer \emph{why} the SNET exists and what benefits it has afforded its users. We go on to discuss technical challenges the network faces, including scalability, security, and organizational issues. To our knowledge, the SNET is the largest isolated community-driven network in existence, and its structure, successes, and obstacles show fascinating contrasts and similarities to those of the Internet at large.
• 19:00 - 22:00 - Banquet
• Mission Accomplished? HTTPS Security after DigiNotar  long
Johanna Amann (International Computer Science Institute / Corelight / LBNL), Oliver Gasser and Quirin Scheitle (Technical University of Munich), Lexi Brent (The University of Sydney), Georg Carle (Technical University of Munich), Ralph Holz (The University of Sydney)
Abstract: Driven by recent CA compromises and the risk of man-in-the-middle attacks, new security features were added to TLS, HTTPS, and the Web PKI over the past five years. These include Certificate Transparency, for making the CA system auditable; HSTS and HPKP headers, to harden the HTTPS posture of a domain; the DNS-based extensions CAA and TLSA, for control over certificate issuance and pinning; and SCSV, for protocol downgrade protection. This paper presents the first large scale investigation of these improvements to the HTTPS ecosystem, explicitly accounting for their combinations. In addition to collecting passive measurements at the Internet uplinks of large University networks on three continents; we perform the largest domain-based active Internet scan to date, covering nearly 200M domains. Furthermore, we track the long-term deployment history of new TLS security features by leveraging passive observations dating back to 2012. We find that while deployment of new security features has picked up in general, only SCSV (49M domains) and CT (7M domains) have gained enough momentum to improve the overall security of HTTPS. Features with higher complexity, such as HPKP, are deployed scarcely and often incorrectly. Our empirical findings are placed in the context of risk, deployment effort, and benefit of these new technologies, and actionable steps for improvement are proposed.
• Tripwire: Inferring Internet Site Compromise  long
Joe DeBlasio, Stefan Savage, Geoffrey M. Voelker, and Alex C. Snoeren (UC San Diego)
Abstract: Password reuse has been long understood as a problem: credentials stolen from one site may be leveraged to gain access to another site for which they share a password. Indeed, it is broadly understood that attackers exploit this fact and routinely leverage credentials extracted from a site they have breached to access high-value accounts at other sites (e.g., email accounts). However, as a consequence of such acts, this same phenomena of password reuse attacks can be harnessed to indirectly infer site compromises---even those that would otherwise be unknown. In this paper we describe such a measurement technique, in which unique honey accounts are registered with individual third-party websites, and thus access to an email account provides indirect evidence of credentials theft at the corresponding website. We describe a prototype system, called Tripwire, that implements this technique using an automated Web account registration system combined with email account access data from a major email provider. In a pilot study monitoring more than 2,300 sites over a year, we have detected 19 site compromises, including what appears to be a plaintext password compromise at an Alexa top-500 site with more than 45 million active users.
• Measuring and Mitigating OAuth Access Token Abuse by Collusion Networks  long
Shehroze Farooqi (The University of Iowa), Fareed Zaffar (Lahore University of Management and Sciences), Nektarios Leontiadis (Facebook), Zubair Shafiq (The University of Iowa)
• Understanding the Role of Registrars in DNSSEC Deployment  long
Taejoong Chung (Northeastern University), Roland Rijswijk-Deij (University of Twente and SURFnet), David Choffnes, Alan Mislove, and Christo Wilson (Northeastern University), Dave Levin (University of Maryland), Bruce M. Maggs (Duke University)
Abstract: The Domain Name System (DNS) provides a scalable, flexible name resolution service. Unfortunately, its unauthenticated architecture has become the basis for many security attacks. To address this, DNS Security Extensions (DNSSEC) were introduced in 1997. DNSSEC's deployment requires support from the top-level domain (TLD) registries and registrars, as well as participation by the organization that serves as the DNS operator. Unfortunately, DNSSEC has seen poor deployment thus far: despite being proposed nearly two decades ago, only 1\% of .com, .net, and .org domains are properly signed. In this paper, we investigate the underlying reasons {\em why} DNSSEC adoption has been remarkably slow. We focus on registrars, as most TLD registries already support DNSSEC and registrars often serve as DNS operators for their customers. Our study uses large-scale, longitudinal DNS measurements to study DNSSEC adoption, coupled with experiences collected by trying to deploy DNSSEC on domains we purchased from leading domain name registrars and resellers. Overall, we find that a select few registrars are responsible for the (small) DNSSEC deployment today, and that many leading registrars do not support DNSSEC at all, or require customers to take cumbersome steps to deploy DNSSEC. Further frustrating deployment, many of the mechanisms for conveying DNSSEC information to registrars are error-prone or present security vulnerabilities. Finally, we find that using DNSSEC with third-party DNS operators such as Cloudflare requires the domain owner to take a number of steps that 40\% of domain owners do not complete. Having identified several operational challenges for full DNSSEC deployment, we make recommendations to improve adoption.
• 10:40 - 11:10 - Break
• Complexity vs. Performance: Empirical Analysis of Machine Learning as a Service  long
Yuanshun Yao and Zhujun Xiao (University of Chicago), Bolun Wang (UCSB/University of Chicago), Bimal Viswanath, Haitao Zheng, and Ben Y. Zhao (University of Chicago)
Abstract: Machine learning classifiers are basic research tools used in numerous types of network analysis and modeling. To reduce the need for domain expertise and costs of running local ML classifiers, network researchers can instead rely on centralized Machine Learning as a Service (MLaaS) platforms. In this paper, we evaluate the effectiveness of MLaaS systems ranging from fully-automated, turnkey systems to fully-customizable systems, and find that with more user control comes greater risk. Good decisions produce even higher performance, and poor decisions result in harsher performance penalties. We also find that server side optimizations help fully-automated systems outperform default settings on competitors, but still lag far behind well-tuned MLaaS systems which compare favorably to standalone ML libraries. Finally, we find classifier choice is the dominating factor in determining model performance, and that users can approximate the performance of an optimal classifier choice by experimenting with a small subset of random classifiers. While network researchers should approach MLaaS systems with caution, they can achieve results comparable to standalone classifiers if they have sufficient insight into key decisions like classifiers and feature selection.
• An Empirical Characterization of IFTTT: Ecosystem, Usage, and Performance  short
Xianghang Mi and Feng Qian (Indiana University), Ying Zhang (Facebook Inc.), Xiaofeng Wang (Indiana University)
Abstract: IFTTT is a popular trigger-action programming platform whose applets can automate more than 400 services of IoT devices and web applications. We conduct an empirical study of IFTTT using a combined approach of analyzing data collected for 6 months and performing controlled experiments using a custom testbed. We profile the interactions among different entities, measure how applets are used by end users, and test the performance of applet execution. Overall we observe the fast growth of the IFTTT ecosystem and its increasing usage for automating IoT-related tasks, which correspond to 52% of all services and 16% of the applet usage. We also observe several performance inefficiencies and identify their causes.
• The Web Centipede: Understanding How Web Communities Influence Each Other Through the Lens of Mainstream and Alternative News Sources  long
Savvas Zannettou (Cyprus University of Technology), Tristan Caulfield and Emiliano De Cristofaro (University College London), Nicolas Kourtellis and Ilias Leontiadis (Telefonica Research), Michael Sirivianos (Cyprus University of Technology), Gianluca Stringhini (University College London), Jeremy Blackburn (University of Alabama at Birmingham)
Abstract: As the number and the diversity of news outlets on the Web grows, so does the opportunity for alternative'' sources of information to emerge. Using large social networks like Twitter and Facebook, misleading, false, or agenda-driven information can quickly and seamlessly spread online, deceiving people or influencing their opinions. Also, the increased engagement of tightly knit communities, such as Reddit and 4chan, further compounds the problem, as their users initiate and propagate alternative information, not only within their own communities, but also to different ones as well as various social media. In fact, these platforms have become an important piece of the modern information ecosystem, which, thus far, has not been studied as a whole. In this paper, we begin to fill this gap by studying mainstream and alternative news shared on Twitter, Reddit, and 4chan. By analyzing millions of posts around several axes, we measure how mainstream and alternative news flows between these platforms. Our results indicate that alt-right communities within 4chan and Reddit can have a surprising level of influence on Twitter, providing evidence that fringe'' communities often succeed in spreading alternative news to mainstream social networks and the greater Web.
• 12:15 - 13:45 - Lunch
• Email Typosquatting  long
Janos Szurdi and Nicolas Christin (Carnegie Mellon University)
Abstract: While website domain typosquatting is highly annoying for legitimate domain operators, research has found that it relatively rarely presents a great risk to individual users. However, any application (e.g., email, ftp,...) relying on the domain name system for name resolution is equally vulnerable to domain typosquatting, and consequences may be more dire than with website typosquatting. This paper presents the first in-depth measurement study of email typosquatting. Working in concert with our IRB, we registered 76 typosquatting domain names to study a wide variety of user mistakes, while minimizing the amount of personal information exposed to us. In the span of over seven months, we received millions of emails at our registered domains. While most of these emails are spam, we infer, from our measurements, that every year, three of our domains should receive approximately 3,585 legitimate'' emails meant for somebody else. Worse, we find, by examining a small sample of all emails, that these emails may contain sensitive information (e.g., visa documents or medical records). We then project from our measurements that 1,211 typosquatting domains registered by unknown entities receive in the vicinity of 800,000 emails a year. Furthermore, we find that millions of registered typosquatting domains have MX records pointing to only a handful of mail servers. However, a second experiment in which we send honey emails'' to typosquatting domains only shows very limited evidence of attempts at credential theft (despite some emails being read), meaning that the threat, for now, appears to remain theoretical.
• Fifteen Minutes of Unwanted Fame: Detecting and Characterizing Doxing  long
Peter Snyder (University of Illinois at Chicago), Periwinkle Doerfler (New York University), Chris Kanich (University of Illinois at Chicago), Damon McCoy (New York University)
Abstract: Doxing is online abuse where a malicious party attempts to harm another by releasing identifying or sensitive information. Motivations for doxing include personal, competitive, and political reasons, and web users of all ages, genders and internet experience have been targeted. Existing research on doxing is primarily qualitative. This work improves our understanding of doxing by being the first to take a quantitative approach. We do so by designing and deploying a tool which can detect dox files and measure the frequency, content, targets, and effects of doxing occurring on popular dox-posting sites. This work analyzes over 1.7 million text files posted to pastebin.com, 4chan.org and 8ch.net, sites frequently used to share doxes online, over a combined period of approximately thirteen weeks. Notable findings in this work include that approximately 0.3% of shared files are doxes, that online social networking accounts mentioned in these dox files are more likely to close than typical accounts, that justice and revenge are the most often cited motivations for doxing, and that dox files target males more frequently than females. We also find that recent anti-abuse efforts by social networks have reduced how frequently these doxing victims close or restrict their accounts in response to doxing attacks. We also propose mitigation steps, such a service that can inform people when their accounts have been shared in a dox file, or law enforcement notification tools to inform authorities when individuals are at heightened risk of abuse.
• Ethical issues of research using datasets of illicit origin  long
Daniel R. Thomas, Sergio Pastrana, Alice Hutchings, Richard Clayton, and Alastair R. Beresford (University of Cambridge)
Abstract: We evaluate the use of data obtained by illicit means against a broad set of ethical and legal issues. Our analysis covers both the direct collection, and secondary uses of, data obtained via illicit means such as exploiting a vulnerability, or unauthorized disclosure. We extract ethical principles from existing advice and guidance and analyse how they have been applied within more than 20 recent peer reviewed papers that deal with illicitly obtained datasets. We find that existing advice and guidance does not address all of the problems that researchers have faced and explain how the papers tackle ethical issues inconsistently, and sometimes not at all. Our analysis reveals not only a lack of application of safeguards but also that legitimate ethical justifications for research are being overlooked. In many cases positive benefits, as well as potential harms, remain entirely unidentified. Few papers record explicit Research Ethics Board (REB) approval for the activity that is described and the justifications given for exemption suggest deficiencies in the REB process.
• 15:00 - 15:30 - Break
• A Look at Router Geolocation in Public and Commercial Databases  short
Manaf Gharaibeh and Anant Shah (Colorado State University), Bradley Huffaker (CAIDA / UC San Diego), Han Zhang (Colorado State University), Roya Ensafi (University of Michigan / Princeton University), Christos Papadopoulos (Colorado State University)
Abstract: Internet measurement research frequently needs to map infrastructure components, such as routers, to their physical locations. Although public and commercial geolocation services are often used for this purpose, their accuracy when applied to network infrastructure has not been previously assessed. Prior work focused on evaluating the overall accuracy of geolocation databases, which is dominated by their performance on end-user IP addresses. In this work, we evaluate the reliability of router geolocation in databases. We use a dataset of about 1.64M router interface IP addresses extracted from the CAIDA Ark dataset to examine the country- and city-level coverage and consistency of popular public and commercial geolocation databases. We also create and provide a ground-truth dataset of 16,586 router interface addresses and locations, with city-level accuracy, and we use it to evaluate the databases’ accuracy with a regional breakdown analysis. Our results show that the databases are not reliable for geolocating routers and that there is room to improve their country- and city-level accuracy. Based on our results, we present a set of recommendations to researchers concerning using the geolocation databases to geolocate routers and understanding their results.
• Shortcuts through Colocation Facilities  short
Vasileios Kotronis, George Nomikos, Lefteris Manassakis, and Dimitris Mavrommatis (FORTH, Greece), Xenofontas Dimitropoulos (FORTH, Greece & University of Crete, Greece)
Abstract: Network overlays, running on top of the existing Internet substrate, are of perennial value to Internet end-users in the context of, e.g., real-time applications. Such overlays can employ traffic relays to yield path latencies lower than the direct paths, a phenomenon known as Triangle Inequality Violation (TIV). Past studies identify the opportunities of reducing latency using TIVs. However, they do not investigate the gains of strategically selecting relays in Colocation Facilities (Colos). In this work, we answer the following questions: (i) how Colo-hosted relays compare with other relays as well as with the direct Internet, in terms of latency (RTT) reductions; (ii) what are the best locations for placing the relays to yield these reductions. To this end, we conduct a large-scale one-month measurement of inter-domain paths between RIPE Atlas (RA) nodes as endpoints, located at eyeball networks. We employ as relays Planetlab nodes, other RA nodes, and machines in Colos. We examine the RTTs of the overlay paths obtained via the selected relays, as well as the direct paths. We find that Colo-based relays perform the best and can achieve latency reductions against direct paths, ranging from a few to 100s of milliseconds, in 76% of the total cases; ~75% (58% of total cases) of these reductions require only 10 relays in 6 large Colos.