This is the conference schedule for IMC 2019.

  • 08:00 - 09:00 - Coffee, cookies & sweets
  • 09:00 - 09:30 - Opening Remarks   video
    • Moritz Müller (SIDN and University of Twente), Matthew Thomas and Duane Wessels (Verisign), Wes Hardaker (USC/ISI), Taejoong Chung (Rochester Institute of Technology), Willem Toorop (NLnet Labs), Roland van Rijswijk-Deij (University of Twente and NLnet Labs)
      Abstract: The DNS Security Extensions (DNSSEC) adds authenticity and integrity to the naming system of the Internet. Resolvers that validate information in the DNS need to know the cryptographic public key used to sign the root zone of the DNS. Eight years after the introduction of this key and one year after it was scheduled originally, this key was replaced by ICANN for the first time in October 2018. ICANN considered this event, called a rollover, "an overwhelming success" and during the rollover they detected "no significant outages". In this paper, we independently follow the process of the rollover starting from the events that lead to its postponement in 2017 until the removal of the old key in 2019. We collected data from multiple vantage points in the DNS ecosystem for the entire duration of the rollover process. Using this data, we study key events of the rollover. These include telemetry signals that led to the rollover being postponed, a near real-time view of the actual rollover in resolvers and a significant increase in queries to the root of the DNS once the old key was revoked. Our analysis contributes significantly to identifying the causes of challenges observed during the rollover. We show that while from an end-user perspective, the roll indeed passed without major problems, there are important lessons to be learned from events that occurred over the entire duration of the rollover. Based on these lessons, we propose improvements to the process for future rollovers.
    • Timm Böttger, Felix Cuadrado, and Gianni Antichi (Queen Mary University of London), Eder Leão Fernandes (Queen Mary, University of London), Gareth Tyson (Queen Mary University of London), Ignacio Castro (Queen Mary, University of London), Steve Uhlig (Queen Mary University of London)
      Abstract: DNS is a vital component for almost every networked application. Originally however it was designed as an unencrypted protocol yielding concerns about user security. DNS-over-HTTPs (DoH) is the latest proposal to make the DNS system more secure. In this paper we study the current DNS-over-HTTPs ecosystem, especially the cost of the additional security. We start by surveying the current DoH landscape, by assessing standard compliance and supported features of public DoH servers. We then compare different transports for secure DNS, to highlight the improvements DoH makes over its predecessor DNS-over-TLS (DoT). These improvements explain in part the significantly bigger take-up of DoH in comparison to DoT. Finally, we quantify the overhead incurred by the additional layers of the DoH transport and their impact on web page load times. We find that these overheads only have limited impact on page load times, suggesting that it is possible to obtain the improved security of DoH with only marginal performance impact.
    • Chaoyi Lu and Baojun Liu (Tsinghua University), Zhou Li (UC Irvine), Shuang Hao (University of Texas at Dallas), Haixin Duan (Tsinghua University; Qi An Xin Security Research Institute), Mingming Zhang, Chunying Leng, and Ying Liu (Tsinghua University), Zaifeng Zhang (Qihoo 360), Jianping Wu (Tsinghua University)
      Abstract: DNS packets are designed to travel in unencrypted form through the Internet based on its initial standard. Recent discoveries show that real-world adversaries are actively exploiting this design vulnerability to compromise Internet users' security and privacy. To mitigate such threats, several protocols have been proposed to encrypt DNS queries between DNS clients and servers, which we jointly term as DNS-over-Encryption. While some proposals have been standardized and are gaining strong support from industry, little has been done to understand their status from the view of global users. This paper performs by far the first end-to-end and large-scale analysis on DNS-over-Encryption. By collecting data from Internet scanning, user-end measurement and passive monitoring logs, we have gained several unique insights. In general, the service quality of DNS-over-Encryption is satisfying, in terms of accessibility and latency. For DNS clients, DNS-over-Encryption queries are less likely to be disrupted by in-path interception compared to traditional DNS, and the extra overhead is minor. However, we also discover several issues regarding how the services are operated. As an example, we find 25% DNS-over-TLS service providers use invalid SSL certificates. Compared to traditional DNS, DNS-over-Encryption is used by far fewer users but we have witnessed a growing trend. As such, we believe the community should push broader adoption of DNS-over-Encryption and we also suggest the service providers carefully review their implementations.
  • 10:35 - 11:00 - Break
    • Louis F. DeKoven, Audrey Randall, Ariana Mirian, Gautam Akiwate, Ansel Blume, Lawrence K. Saul, Aaron Schulman, Geoffrey M. Voelker, and Stefan Savage (University of California, San Diego)
      Abstract: Security is a discipline that places significant expectations on lay users. Thus, there are a wide array of technologies and behaviors that we exhort end users to adopt and thereby reduce their security risk. However, the adoption of these "best practices" — ranging from the use of antivirus products to actively keeping software updated — is not well understood, nor is their practical impact on security risk well-established. This paper explores both of these issues via a large-scale empirical measurement study covering approximately 15,000 computers over six months. We use passive monitoring to infer and characterize the prevalence of various security practices in situ as well as a range of other potentially security-relevant behaviors. We then explore the extent to which differences in key security behaviors impacts real-world outcomes (i.e., that a machine shows clear evidence of having been compromised).
    • Ben Collier, Daniel R. Thomas, Richard Clayton, and Alice Hutchings (University of Cambridge)
      Abstract: Illegal booter services offer denial of service (DoS) attacks for a fee of a few tens of dollars a month. Internationally, police have implemented a range of different types of intervention aimed at those using and offering booter services, including arrests and website takedown. In order to measure the impact of these interventions we look at the usage reports that booters themselves provide and at measurements of reflected UDP DoS attacks, leveraging a five year measurement dataset that has been statistically demonstrated to have very high coverage. We analysed time series data (using a negative binomial regression model) to show that several interventions have had a statistically significant impact on the number of attacks. We show that, while there is no consistent effect of highly-publicised court cases, takedowns of individual booters precede significant, but short-lived, reductions in recorded attack numbers. However, more wide-ranging disruptions have much longer effects. The closure of HackForums' booter market reduced attacks for 13 weeks globally (and for longer in particular countries) and the FBI's coordinated operation in December 2018, which involved both takedowns and arrests, reduced attacks by a third for at least 10 weeks and resulted in lasting change to the structure of the booter market.
    • Daniel Kopp and Matthias Wichtlhuber (DE-CIX), Ingmar Poese (BENOCS), José Jair Cardoso de Santanna (University of Twente), Oliver Hohlfeld (Brandenburg University of Technology), Christoph Dietzel (DE-CIX / MPI for Informatics)
      Abstract: Booter services continue to provide popular DDoS-as-a-service platforms and enable anyone irrespective of their technical ability, to execute DDoS attacks with devastating impact. Since booters are a serious threat to Internet operations and can cause significant financial and reputational damage, they also draw the attention of law enforcement agencies and related counter activities. In this paper, we investigate booter-based DDoS attacks in the wild and the impact of an FBI takedown targeting 15 booter websites in December 2018 from the perspective of a major IXP and two ISPs. We study and compare attack properties of multiple booter services by launching Gbps-level attacks against our own infrastructure. To understand spatial and temporal trends of the DDoS traffic originating from booters we scrutinize 5 months, worth of inter-domain traffic. We observe that the takedown only leads to a temporary reduction in attack traffic. Additionally, one booter was found to quickly continue operation by using a new domain for its website.
    • Sergio Pastrana (Universidad Carlos III de Madrid), Guillermo Suarez-Tangil (King's College London)
      Abstract: Illicit crypto-mining leverages resources stolen from victims to mine cryptocurrencies on behalf of criminals. While recent works have analyzed one side of this threat, i.e.: web-browser cryptojacking, only white papers and commercial reports have partially covered binary-based crypto-mining malware. In this paper, we conduct the largest measurement of crypto-mining malware to date, analyzing approximately 4.4 million malware samples (1 million malicious miners), over a period of twelve years from 2007 to 2018. Our analysis pipeline applies both static and dynamic analysis to extract information from the samples, such as wallet identifiers and mining pools. Together with OSINT data, this information is used to group samples into campaigns. We then analyze publicly-available payments sent to the wallets from mining-pools as a reward for mining, and estimate profits for the different campaigns. All this together is is done in a fully automated fashion. Our profit analysis reveals campaigns with multi-million earnings, associating over 4.3% of Monero with illicit mining. We analyze the infrastructure related with the different campaigns, showing that a high proportion of this ecosystem is supported by underground economies such as Pay-Per-Install services. We also uncover novel techniques that allow criminals to run successful campaigns.
  • 12:30 - 14:00 - Lunch and Poster Presentation
    • Pawel Foremski (Farsight Security, Inc. / IITiS PAN), Oliver Gasser (Technical University of Munich), Giovane Moura (SIDN Labs / TU Delft)
      Abstract: The Domain Name System (DNS) is thought of as having the simple-sounding task of resolving domains into IP addresses. With its stub resolvers, different layers of recursive resolvers, authoritative nameservers, a multitude of query types, and DNSSEC, the DNS ecosystem is actually quite complex. DNS Observatory provides a bird's-eye view and allows to analyze the big picture of DNS. As data source DNS Observatory leverages globally distributed DNS probes acquiring a peak of 200K DNS queries per second between recursive resolvers and authoritative nameservers. For each observed query we extract traffic features, aggregate them, and track the top k DNS objects. This allows us to characterize DNS deployments and evaluate the median response delays of DNS queries, where we find that the top 10% nameservers (which handle about half the traffic) have indeed a shorter response time than less popular nameservers. We also leverage DNS Observatory to show correlations between decreasing TTLs and increasing DNS traffic. Furthermore, the TTL data allows us to anticipate upcoming changes in the DNS infrastructure. Another aspect that we analyze in depth is the effect of the Happy Eyeballs algorithm in combination with low negative caching TTLs, which results in a share of up to 90% empty responses for some domains. Finally, we propose actionable measures to improve uncovered DNS issues and shortcomings and we offer interested researchers access to DNS Observatory.
    • Giovane C. M. Moura (SIDN Labs/TU Delft), John Heidemann (University of Southern California / Information Sciences Institute), Ricardo Schmidt (U. Passo Fundo), Wes Hardaker (USC/ISI)
      Abstract: DNS depends on extensive caching for good performance, and every DNS zone owner must set _Time-to-Live_ (TTL) values to control their DNS caching. Today there is relatively little guidance backed by research about how to set TTLs, and operators must balance conflicting demands of caching against agility of configuration. Exactly how TTL value choices affect operational networks is quite challenging to understand for several reasons: DNS is a distributed service, DNS resolution is security-sensitive, and resolvers require multiple types of information as they traverse the DNS hierarchy. These complications mean there are multiple frequently interacting, places TTLs can be specified. This paper provides the first careful evaluation of how these factors affect the effective cache lifetimes of DNS records, and provides recommendations for how to configure DNS TTLs based on our findings. We provide recommendations in TTL choice for different situations, and for where they must be configured. We show that longer TTLs have significant promise, reducing median latency from 183ms to 28.7ms for one country-code TLD.
    • Rami Al-Dalky and Michael Rabinovich (Case Western Reserve University), Kyle Schomp (Akamai Technologies)
      Abstract: Content delivery networks (CDNs) commonly use DNS to map end-users to the best edge servers. A recently proposed EDNS0-Client-Subnet (ECS) extension allows recursive resolvers to include end-user subnet information in DNS queries, so that authoritative nameservers, especially those belonging to CDNs, could use this information to improve user mapping. In this paper, we study the ECS behavior of ECS-enabled recursive resolvers from the perspectives of the opposite sides of a DNS interaction, the authoritative nameservers of a major CDN and a busy DNS resolution service. We find a range of erroneous (i.e., deviating from the protocol specification) and detrimental (even if compliant) behaviors that may unnecessarily erode client privacy, reduce the effectiveness of DNS caching, diminish ECS benefits, and in some cases turn ECS from facilitator into an obstacle to authoritative nameservers' ability to optimize user-to-edge-server mappings.
  • 15:15 - 15:45 - Break
    • Yi Cao, Arpit Jain, Kriti Sharma, Aruna Balasubramanian, and Anshul Gandhi (Stony Brook University)
      Abstract: This short paper presents a detailed empirical study of BBR’s performance under different real-world and emulated testbeds across a range of network operating conditions. Our empirical results help to identify network conditions under which BBR outperforms, in terms of goodput, contemporary TCP congestion control algorithms. We find that BBR is well suited for networks with shallow buffers, despite its high retransmissions, whereas existing loss-based algorithms are better suited for deep buffers. To identify the root causes of BBR’s limitations, we carefully analyze our empirical results. Our analysis reveals that, contrary to BBR’s design goal, BBR often exhibits large queue sizes. Further, the regimes where BBR performs well are often the same regimes where BBR is unfair to competing flows. Finally, we demonstrate the existence of a loss rate “cliff point” beyond which BBR’s goodput drops abruptly. Our empirical investigation identifies the likely culprits in each of these cases as specific design options in BBR’s source code.
    • Ranysha Ware (Carnegie Mellon University), Matt Mukerjee (Carnegie Mellon University / Nefeli Networks), Justine Sherry and Srinivasan Seshan (Carnegie Mellon University)
      Abstract: BBR is a new congestion control algorithm (CCA) released for the Linux kernel and widely deployed by Google. As the default CCA for YouTube (which commands 11+% of Internet traffic) BBR has rapidly become a major player in Internet congestion control. BBR’s fairness or friendliness to other connections has recently come under scrutiny as measurements from multiple research groups have shown undesirable outcomes when BBR competes with traditional CCAs. One such outcome is a fixed, 40% fixed proportion of link capacity consumed by BBR when competing with loss-based algorithms like Cubic or Reno. In this short paper, we provide the first model capturing BBR’s behavior in competition with loss-based CCAs. Our model is coupled with practical experiments to validate its implications. The key lesson is this: a single variable – the ‘inflight cap’ – determines BBR’s bandwidth consumption in these scenarios; this cap can be configured to sustain an arbitrary fixed proportion of network capacity.
    • Philipp Richter (MIT / Akamai), Arthur Berger (Akamai / MIT)
      Abstract: Scanning of hosts on the Internet to identify vulnerable devices and services is a key component in many of today's cyberattacks. Tracking this scanning activity, in turn, provides us with an excellent signal to assess the current state-of-affairs for many vulnerabilities and their exploitation. So far, studies tracking scanning activity have relied on unsolicited traffic captured in darknets, focusing on random scans of the address space. In this work, we track scanning activity through the lens of unsolicited traffic captured at the firewalls of some 89,000 hosts of a major CDN. Our vantage point has two distinguishing features compared to darknets: (i) it is distributed across some 1,300 networks, and (ii) its servers are live, offering services and thus emitting traffic. While all servers receive a baseline level of probing caused by random and full scans of the IPv4 space, we show that some 30\% of all logged scan traffic is the result of non-random scanning activity. We find that non-random scanning campaigns often target localized regions in the address space, and that their characteristics in terms of target selection strategy and scanned services differ vastly from the more widely known random scans. Our observations imply that conventional darknets can only partially illuminate scanning activity, and may severely underestimate widespread attempts to scan and exploit individual services in specific prefixes or networks. Our methods can be adapted for individual network operators to assess if they are subjected to targeted scanning activity.
    • Vivek Adarsh, Michael Nekrasov (University of California, Santa Barbara), Ellen Zegura (Georgia Institute of Technology), Elizabeth Belding (University of California, Santa Barbara)
      Abstract: Over 87% of US mobile wireless subscriptions are currently held by LTE-capable devices. However, prior work has demonstrated that connectivity may not equate to usable service. Even in well-provisioned urban networks, unusually high usage (such as during a public event or after a natural disaster) can lead to overload that makes the LTE service difficult, if not impossible to use, even if the user is solidly within the coverage area. A typical approach to detect and quantify overload on LTE networks is to secure the cooperation of the network provider for access to internal metrics. An alternative approach is to deploy multiple mobile devices with active subscriptions to each mobile network operator(MNO). Both approaches are resource and time intensive. In this work, we propose a novel method to estimate overload in LTE networks using only passive measurements, and without requiring provider cooperation. We use this method to analyze packet-level traces for three commercial LTE service providers, T-Mobile, Verizon and AT&T, from several locations during both typical levels of usage and during public events that yield large, dense crowds. This study presents the first look at overload estimation through the analysis of unencrypted broadcast messages. We show that an upsurge in broadcast reject and cell barring messages can accurately detect an increase in network overload.
    • Stephen McQuistin (University of Glasgow), Sree Priyanka Uppu and Marcel Flores (Verizon Digital Media Services)
      Abstract: Anycast is a popular tool for deploying global, widely available systems, including DNS infrastructure and Content Delivery Networks (CDNs). The optimization of these networks often focuses on the deployment and management of anycast \emph{sites}. However, such approaches fail to consider one of the primary configurations of a large anycast network: the set of networks that receive anycast announcements at each site (\emph{i.e.}, an announcement configuration). Altering these configurations, even without the deployment of additional sites, can have profound impacts on both anycast site selection and round-trip times. In this study, we explore the operation and optimization of anycast networks through the lens of deployments that have a large number of upstream service providers. We demonstrate that these many-provider anycast networks exhibit fundamentally different properties than few-provider networks when interacting with the Internet, having a greater number of single AS-hop paths, and reduced dependency on each provider. We further examine the impact of announcement configuration changes, demonstrating that in nearly $30\%$ of vantage point groups, round-trip time performance can be improved by more than $25$\%, solely by manipulating which providers receive anycast announcements. Finally, we propose DailyCatch, an empirical measurement system for testing and validating announcement configuration changes, and demonstrate its ability to influence user-experienced performance on a global, anycast CDN.
  • 17:20 - 17:30 - Break
  • 17:30 - 19:30 - Opening Reception
  • 08:00 - 09:00 - Coffee, cookies & sweets
    • Brandon Schlinker (Facebook / USC), Italo Cunha (Universidade Federal de Minas Gerais / Columbia University), Yi-Ching Chiu (USC), Srikanth Sundaresan (Facebook), Ethan Katz-Bassett (Columbia University)
      Abstract: We examine the current state of user network performance and opportunities to improve it from the vantage point of Facebook, a global content provider. Facebook serves over 2 billion users distributed around the world using a network of PoPs and interconnections spread across 6 continents. In this paper, we execute a large-scale, 10-day measurement study of metrics at the TCP and HTTP layers for production user traffic at all of Facebook’s PoPs worldwide, collecting performance measurements for hundreds of trillions of sampled HTTP sessions. We discuss our approach to collecting and analyzing measurements, including a novel approach to characterizing user achievable goodput from the server side. We find that most user sessions have MinRTT less than 39ms and can support HD video. We investigate if it is possible to improve performance by incorporating performance information into Facebook’s routing decisions; we find that default routing by Facebook is largely optimal. To our knowledge, our measurement study is the first characterization of user performance on today’s Internet from the vantage point of a global content provider.
    • Santiago Vargas and Aruna Balasubramanian (Stony Brook University), Moritz Steiner and Utkarsh Goel (Akamai)
      Abstract: Content delivery networks serve a major fraction of the Internet traffic, and their geographically deployed infrastructure makes them a good vantage point to observe traffic access patterns. We perform a large-scale investigation to characterize Web traffic patterns observed from a major CDN infrastructure. Specifically, we discover that responses with 'application/json' content-type form a growing majority of all HTTP requests. As a result, we seek to understand what types of devices and applications are requesting JSON objects and explore opportunities to optimize CDN delivery of JSON traffic. Our study shows that mobile applications account for at least 52% of JSON traffic on the CDN and embedded devices account for another 12% of all JSON traffic. We also find that more than 55% of JSON traffic on the CDN is uncacheable, showing that a large portion of JSON traffic on the CDN is dynamic. By further looking at patterns of periodicity in requests, we find that 6.3% of JSON traffic is periodically requested and reflects the use of (partially) autonomous software systems, IoT devices, and other kinds of machine-to-machine communication. Finally, we explore dependencies in JSON traffic through the lens of ngram models and find that these models can capture patterns between subsequent requests. We can potentially leverage this to prefetch requests, improving the cache hit ratio.
    • Bahador Yeganeh, Ramakrishnan Durairajan, and Reza Rejaie (University of Oregon), Walter Willinger (NIKSUN, Inc.)
      Abstract: The growing demand for an ever-increasing number of cloud services is transforming the Internet’s interconnection or peering ecosystem in profound ways such as the emergence of "virtual private interconnections (VPIs)". However, due to the underlying technologies, these VPIs are not publicly visible and traffic traversing them remains largely hidden as it bypasses the public Internet. In particular, existing techniques for inferring Internet interconnections are unable to detect these VPIs and are also incapable of mapping them to the physical facility or geographic region where they are established. In this paper, we present a third-party measurement study aimed at revealing all the peerings between Amazon and the rest of the Internet. We describe our technique for inferring these peering links and pay special attention to inferring VPIs associated with this largest cloud provider. We group Amazon's peerings based on their key features and illustrate that each group plays a specific role in Amazon's peering ecosystem. We also present and evaluate a new method for pinning (i.e. geo-locating) each end of an inferred interconnection associated with each peering. Our study provides a first look at Amazon's peering fabric.
    • Aravindh Raman and Sagar Joglekar (King's College London), Emiliano De Cristofaro (University College London), Nishanth Sastry (King's College London), Gareth Tyson (Queen Mary University of London)
      Abstract: The Decentralised Web (DW) has recently seen a renewed momentum, with a number of DW platforms like Mastodon, PeerTube, and Hubzilla gaining increasing traction. These offer alternatives to traditional social networks like Twitter, YouTube, and Facebook, by enabling the operation of web infrastructure and services without centralised ownership or control. Although their services differ greatly, modern DW platforms mostly rely on two key innovations: first, their open source software allows anybody to setup independent servers ("instances") that people can sign-up to and use within a local community; and second, they build on top of federation protocols so that instances can mesh together, in a peer-to-peer fashion, to offer a globally integrated platform. In this paper, we present a measurement-driven exploration of these two innovations, using a popular DW microblogging platform (Mastodon) as a case study. We focus on identifying key challenges that might disrupt continuing efforts to decentralise the web, and empirically highlight a number of properties that are creating natural pressures towards re-centralisation. Finally, our measurements shed light on the behaviour of both administrators (i.e., people setting up instances) and regular users who sign-up to the platforms, also discussing a few techniques that may address some of the issues observed.
  • 10:30 - 11:00 - Break
    • Mshabab Alrizah, Sencun Zhu, and Xinyu Xing (The Pennsylvania State University), Gang Wang (University of Illinois at Urbana-Champaign)
      Abstract: Ad-blocking systems such as Adblock Plus rely on crowdsourcing to build and maintain filter lists, which are the basis for determining which ads to block on web pages. In this work, we seek to advance our understanding of the ad-blocking community as well as the errors and pitfalls of the crowdsourcing process. To do so, we collected and analyzed a longitudinal dataset that covered the dynamic changes of popular filter-list EasyList for nine years and the error reports submitted by the crowd in the same period. Our study yielded a number of significant findings regarding the characteristics of FP and FN errors and their causes. For instances, we found that false positive errors (i.e., incorrectly blocking legitimate content) still took a long time before they could be discovered (50% of them took more than a month) despite the community effort. Both EasyList editors and website owners were to blame for the false positives. In addition, we found that a great number of false negative errors (i.e., failing to block real advertisements) were either incorrectly reported or simply ignored by the editors. Furthermore, we analyzed evasion attacks from ad publishers against ad-blockers. In total, our analysis covers 15 types of attack methods including 8 methods that have not been studied by the research community. We show how ad publishers have utilized them to circumvent ad-blockers and empirically measure the reactions of ad blockers. Through in-depth analysis, our findings are expected to help shed light on any future work to evolve ad blocking and optimize crowdsourcing mechanisms.
    • Pelayo Vallina, Álvaro Feal, and Julien Gamba (IMDEA Networks Institute/Universidad Carlos III de Madrid), Narseo Vallina-Rodriguez (IMDEA Networks Institute/ICSI), Antonio Fernandez-Anta (IMDEA Networks Institute)
      Abstract: Modern privacy regulations, including the General Data Protection Regulation (GDPR) in the European Union, aim to control user tracking activities in websites and mobile applications. These privacy rules typically contain specific provisions and strict requirements for websites that provide sensitive material to end users such as sexual, religious, and health services. However, little is known about the privacy risks that users face when visiting such websites, and about their regulatory compliance. In this paper, we present the first comprehensive and large-scale analysis of 6,843 pornographic websites. We provide an exhaustive behavioral analysis of the use of tracking methods by these websites, and their lack of regulatory compliance, including the absence of age-verification mechanisms and methods to obtain informed user consent. The results indicate that, as in the regular web, tracking is prevalent across pornographic sites: 72% of the websites use third-party cookies and 5% leverage advanced user fingerprinting technologies. Yet, our analysis reveals a third-party tracking ecosystem semi-decoupled from the regular web in which various analytics and advertising services track users across, and outside, pornographic websites. We complete the paper with a regulatory compliance analysis in the context of the EU GDPR, and newer legal requirements to implement verifiable access control mechanisms (e.g., UK's Digital Economy Act). We find that only 16% of the analyzed websites have an accessible privacy policy and only 4% provide a cookie consent banner. The use of verifiable access control mechanisms is limited to prominent pornographic websites.
    • Sai Teja Peddinti, Igor Bilogrevic, Nina Taft, Martin Pelikan, Úlfar Erlingsson, Pauline Anthonysamy, and Giles Hogben (Google)
      Abstract: Users of mobile apps sometimes express discomfort or concerns with what they see as unnecessary or intrusive permission requests by certain apps. However encouraging mobile app developers to request fewer permissions is challenging because there are many reasons why permissions are requested; furthermore, prior work has shown it is hard to disambiguate the purpose of a particular permission with high certainty. In this work we describe a novel, algorithmic mechanism intended to discourage mobile-app developers from asking for unnecessary permissions. Developers are incentivized by an automated alert, or "nudge", shown in the Google Play Console when their apps ask for permissions that are requested by very few functionally-similar apps---in other words, by their competition. Empirically, this incentive is effective, with significant developer response since its deployment. Permissions have been redacted by 59% of apps that were warned, and this attenuation has occurred broadly across both app categories and app popularity levels. Importantly, billions of users' app installs from the Google Play have benefited from these redactions.
    • Jingjing Ren, Daniel J. Dubois, and David Choffnes (Northeastern University), Anna Maria Mandalari, Roman Kolcun, and Hamed Haddadi (Imperial College London)
      Abstract: Internet of Things (IoT) devices are increasingly found in everyday homes, providing useful functionality for devices such as TVs, smart speakers, and video doorbells. Along with their benefits come potential privacy risks, since these devices can communicate information about their users to other parties over the Internet. However, understanding these privacy risks in depth and at scale is a difficult challenge, resulting in prior work that only scratches the surface. In this work, we conduct a multidimensional analysis of privacy exposure from 81 devices located in labs in the US and UK. Through a total of 34,586 rigorous automated and manual controlled experiments, we characterize privacy exposure in terms of destinations of Internet traffic, whether the contents of communication are protected by encryption, what are the IoT-device interactions that each destination learns about, and whether there are unexpected exposures of sensitive information (eg video surreptitiously transmitted by a recording device). Further, we determine whether there are regional differences between these properties, as the privacy regulations in the US (enforced by the FTC) and UK (GPDR) can have substantial impact on data collection. Last, we compare our controlled experiments with data gathered from an in situ user study comprising 36 participants.
  • 12:30 - 14:00 - Lunch, Poster Concourse, and N2Women meeting
    • Michalis Pachilakis (FORTH), Panagiotis Papadopoulos (Brave Software), Evangelos P. Markatos (FORTH), Nicolas Kourtellis (Telefonica Research)
      Abstract: In recent years, Header Bidding (HB) has gained popularity among web publishers, challenging the status quo in the ad ecosystem. Contrary to the traditional waterfall standard, HB aims to give back to publishers control of their ad inventory, increase transparency, fairness and competition among advertisers, resulting in higher ad-slot prices. Although promising, little is known about how this ad protocol works: What are HB’s possible implementations, who are the major players, and what is its network and UX overhead? To address these questions, we design and implement HBDetector: a novel methodology to detect HB auctions on a website at real-time. By crawling 35,000 top Alexa websites, we collect and analyze a dataset of 800k auctions. We find that: (i) 14.28% of top websites utilize HB. (ii) Publishers prefer to collaborate with a few Demand Partners who also dominate the waterfall market. (iii) HB latency can be significantly higher (up to 3× in median case) than waterfall.
    • Muhammad Ahmad Bashir, Sajjad Arshad, Engin Kirda, William Robertson, and Christo Wilson (Northeastern University)
      Abstract: Programmatic advertising provides digital ad buyers with the convenience of purchasing ad impressions through Real Time Bidding (RTB) auctions. However, programmatic advertising has also given rise to a novel form of ad fraud known as domain spoofing, in which attackers sell counterfeit impressions that claim to be from high-value publishers. To mitigate domain spoofing, the Interactive Advertising Bureau (IAB) Tech Lab introduced the ads.txt standard in May 2017 to help ad buyers verify authorized digital ad sellers, as well as to promote overall transparency in programmatic advertising. In this work, we present a 15-month longitudinal, observational study of the ads.txt standard. We do this to understand (1) if it is helping ad buyers to combat domain spoofing and (2) whether the transparency offered by the standard can provide useful data to researchers and privacy advocates. With respect to halting domain spoofing, we observe that over 60% of Alexa Top-100K publishers that run RTB ads have adopted ads.txt, and that ad exchanges and advertisers appear to be honoring the standard. With respect to transparency, the widespread adoption of ads.txt allows us to explicitly identify over 1,000 domains belonging to ad exchanges, without having to rely on crowdsourcing or heuristic methods. However, we also find that ads.txt is still a long way from reaching its full potential. Many publishers have yet to adopt the standard, and we observe major ad exchanges purchasing unauthorized impressions that violate the standard. This opens the door to domain spoofing attacks. Further, ads.txt data often include errors that must be cleaned and mitigated before the data is practically useful.
    • Phani Vadrevu (University of New Orleans), Roberto Perdisci (University of Georgia)
      Abstract: Malicious ads often use social engineering (SE) tactics to coax users into downloading unwanted software, purchasing fake products or services, or giving up valuable personal information. These ads are often served by low-tier ad networks that may not have the technical means (or simply the will) to patrol the ad content they serve to curtail abuse. In this paper, we propose a system for large-scale automatic discovery and tracking of SE Attack Campaigns delivered via Malicious Advertisements (SEACMA). Our system aims to be generic, allowing us to study the SEACMA ad distribution problem without being biased towards specific categories of ad-publishing websites or SE attacks. Starting with a seed of low-tier ad networks, we measure which of these networks are the most likely to distribute malicious ads and propose a mechanism to discover new ad networks that are also leveraged to support the distribution of SEACMA campaigns. The results of our study aim to be useful in a number of ways. For instance, we show that SEACMA ads use a number of tactics to successfully evade URL blacklists and ad blockers. By tracking SEACMA campaigns, our system provides a mechanism to more proactively detect and block such evasive ads. Therefore, our results provide valuable information that could be used to improve defense systems against social engineering attacks and malicious ads in general.
  • 15:15 - 15:45 - Break
    • Ran Ben Basat (Harvard University), Gil Einziger (Ben Gurion University), Junzhi Gong (Harvard University), Jalil Moraney and Danny Raz (Technion)
      Abstract: Network measurement is an essential building block for a variety of network applications such as traffic engineering, quality of service, load-balancing and intrusion detection. Maintaining a per-flow state is often impractical due to the large number of flows, and thus modern systems use complex data structures that are updated with each incoming packet. Thus, designing measurement applications that operate at line speed is a major challenge in this domain. In this work, we address this challenge by providing a unified mechanism that improves the update time of a variety of network algorithms. We do so by identifying, studying and optimizing a common algorithmic pattern that we call q-MAX. The goal is to maintain the largest q values in a stream of packets. We formally analyze the problem and introduce interval and sliding window algorithms that have a worst-case constant update time. We show that our algorithms perform up to \times20×20 faster than library algorithms, and using these new algorithms for some relevant measurement applications yields a throughput improvement of up to x12 on real network traces. Finally, we implemented the scheme within Open vSwitch, a state of the art virtual switch, and show that q-MAX based monitoring can be done in line-speed while current monitoring techniques are significantly slower.
    • Matthew Luckie (University of Waikato), Bradley Huffaker and kc claffy (UC San Diego)
      Abstract: We present the design, implementation, evaluation, and validation of a system that automatically learns to extract router names (router identifiers) from hostnames stored by network operators in different DNS zones, which we represent by regular expressions (regexes). Our supervised-learning approach evaluates automatically generated candidate regexes against sets of hostnames for IP addresses that other alias resolution techniques previously inferred to identify interfaces on the same router. Conceptually, if three conditions hold: (1) a regex extracts the same value from a set of hostnames associated with IP addresses on the same router; (2) the value is unique to that router; and (3) the regex extracts names for multiple routers in the suffix, then we conclude the regex accurately represents the naming convention for the suffix. We train our system using existing router aliases inferred from active probing to learn regexes for 2552 different suffixes. We then demonstrate the utility of this system by using the resulting regexes to find 104% additional aliases for these suffixes. Regexes inferred in IPv4 perfectly predict aliases for 86% of suffixes with IPv6 aliases, i.e., IPv4 and IPv6 addresses representing the same underlying router, and find 8.6 times more routers in IPv6 than found by prior techniques.
    • Johannes Naab, Patrick Sattler, Jonas Jelten, Oliver Gasser, and Georg Carle (Technical University of Munich (TUM))
      Abstract: Domain-based top lists such as the Alexa Top 1M portray the popularity of web domains. Even though their shortcomings (e.g., instability, no aggregation, lack of weights) have been pointed out, domain-based top lists still are an important element of Internet measurement studies. In this paper we present the concept of prefix top lists, which provide insights from the importance of addresses of domain-based top lists, while ameliorating certain of their shortcomings. With prefix top lists we aggregate domain-based top lists into network prefixes and apply a Zipf distribution to provide weights to each prefix. We find that different domain-based top lists provide differentiated views on Internet prefixes. In addition, we observe very small weight changes over time. We leverage prefix top lists to conduct an evaluation of the DNS to classify the deployment quality of domains. We show that popular domains with name server recommendations for IPv4, but IPv6 compliance is still lacking. The Zipf weight aggregation allows us to create a single ranking for the providers of highly popular domains and providers used by many low ranked domains. Finally, we provide these enhanced and more stable prefix top lists to fellow researchers which can use them to obtain more representative measurement results.
    • Mah-Rukh Fida, Evrim Acar Ataman, and Ahmed Elmukashfi (Simula Metropolitan CDE)
      Abstract: Understanding and characterizing the reliability of a mobile broadband network is a challenging task due to the presence of a multitude of root causes that operate at different temporal and spatial scales. This, in turn, limits the use of classical statistical methods for characterizing the mobile network's reliability. We propose leveraging tensor factorization, a well-established data mining method, to address this challenge.We represent an year-long time series of outages, from two mobile operators as multi-way arrays, and demonstrate how tensor factorization helps extracting the outage patterns at various time-scales, making it easy to locate possible root causes. Unlike traditional methods of time series analysis, tensor factorization provides a compact and interpretable picture of outages.
  • 17:05 - 17:30 - Transfer to social event
  • 17:30 - 19:00 - Social event
  • 19:00 - 19:30 - Transfer to dinner location
  • 19:30 - 22:00 - Conference dinner
  • 08:00 - 09:30 - Coffee, cookies & sweets (Cloakroom open 08:30)
    • Yi Cao, Javad Nejati, Aruna Balasubramanian, and Anshul Gandhi (Stony Brook University)
      Abstract: Given the growing significance of network performance, it is crucial to examine how to make the most of available network options and protocols. We propose ECON, a model that predicts performance of applications under different protocols and network conditions to scalably make better network choices. ECON is built on an analytical framework to predict TCP performance, and uses the TCP model as a building block for predicting application performance. ECON infers a relationship between loss and congestion using empirical data that drives an online model to predict TCP performance. ECON then builds on the TCP model to predict latency and HTTP performance. Across four wired and one wireless network, our model outperforms seven alternative TCP models. We demonstrate how ECON (i) can be used by a Web server application to choose between HTTP/1.1 and HTTP/2 for a given Web page and network condition, and (ii) can be used by a video application to choose the optimal bitrate that maximizes video quality without rebuffering.
    • Abstract: The Transport Layer Security (TLS) protocol has evolved in response to different attacks and is increasingly relied on to secure Internet communications. Web browsers have led the adoption of newer and more secure cryptographic algorithms and protocol versions, and thus improved the security of the TLS ecosystem. Other application categories, however, are increasingly using TLS, but too often are relying on obsolete and insecure protocol options, as we found through a study of applications that use TLS at global enterprises. To understand in detail what applications are using TLS, and how they are using it, we developed a novel system for obtaining process information from end hosts and fusing it with network data to produce a TLS fingerprint knowledge base. This data has a rich set of context for each fingerprint, is representative of enterprise TLS deployments, and is automatically updated from ongoing data collection. Our dataset is based on 96 million endpoint-labeled and 2.4 billion unlabeled TLS sessions obtained from enterprise edge networks in five countries, plus millions of sessions from a malware analysis sandbox. We actively maintain an open source dataset that, at 2,200+ fingerprints and counting, is both the largest and most informative ever published. In this paper, we use the knowledge base to identify trends in enterprise TLS applications beyond the browser: application categories such as storage, communication, system, and email. We study fingerprint prevalence, longevity, and succession across application versions, and identified a rise in the use of TLS by non-browser applications and a corresponding decline in the fraction of sessions using version 1.3. Finally, we highlight the shortcomings of na\"{i}vely applying TLS fingerprinting to detect malware, and we present recent trends in malware's use of TLS such as the adoption of cipher suite randomization.
    • Jordan Jueckstock and Alexandros Kapravelos (North Carolina State University)
      Abstract: Modern web security and privacy research depends on accurate measurement of an often evasive and hostile web. No longer just a network of static, hyperlinked documents, the modern web is alive with JavaScript ( JS) loaded from third parties of unknown trustworthiness. Dynamic analysis of potentially hostile JS currently presents a cruel dilemma: use heavy-weight in-browser solutions that prove impossible to maintain, or use lightweight inline JS solutions that are detectable by evasive JS and which cannot match the scope of coverage provided by in-browser systems. We present VisibleV8, a dynamic analysis system hosted inside V8, the JS engine of the Chrome browser, that logs native function or property accesses during any JS execution. At less than 600 lines (only 67 of which modify V8’s existing behavior), our patches are lightweight and have been maintained from Chrome versions 63 through 72 without difficulty. VV8 consistently outperforms equivalent inline instrumentation, and it intercepts accesses impossible to instrument inline. This comprehensive coverage allows us to isolate and identify 46 JavaScript namespace artifacts used by JS code in the wild to detect automated browsing platforms and to discover that 29% of the Alexa top 50k sites load content which actively probes these artifacts.
  • 10:45 - 11:15 - Break
    • Taejoong Chung (Rochester Institute of Technology), Emile Aben (RIPE NCC), Tim Bruijnzeels (NLNetLabs), Balakrishnan Chandrasekaran (MPI), David Choffnes (Northeastern University), Dave Levin (University of Maryland, College Park), Bruce M. Maggs (Duke University and Akamai), Alan Mislove (Northeastern University), Roland van Rijswijk-Deij (University of Twente), John P. Rula (Akamai), Nick Sullivan (Cloudflare Inc.)
      Abstract: Despite its critical role in Internet connectivity, BGP remains highly vulnerable to attacks such as prefix hijacking, where an Autonomous System (AS) announces routes for IP space it does not control. To address this issue, the Resource Public Key Infrastructure (RPKI) was developed starting in 2008, resulting in deployment in 2011. This paper performs the first comprehensive, longitudinal study of the deployment and quality of RPKI. We use a unique dataset containing all RPKI Route Origin Authorizations (ROAs) from the moment RPKI was first deployed, more than 8 years ago. We combine this dataset with BGP announcements from more than 3,300 BGP collectors worldwide. Our analysis shows the after a gradual start, RPKI has seen a rapid increase in adoption over the past two years. We also show that although misconfigurations were rampant when RPKI was first deployed (causing many announcements to appear as RPKI invalid) they are quite rare today. We develop a taxonomy of invalid RPKI announcements, then quantify their prevalence. We further identify suspicious announcements indicative of prefix hijacking and present case studies of likely hijacks. Overall, we conclude that while misconfigurations do occur, RPKI is “ready for the big screen” and can be used to increase routing security by dropping invalid announcements.
    • Cecilia Testart and Philipp Richter (MIT), Alistair King (CAIDA, UC San DIego), Alberto Dainotti (CAIDA, UC San Diego), David Clark (MIT CSAIL)
      Abstract: BGP hijacks remain an acute problem in today's Internet, with widespread consequences. While hijack detection systems are readily available, they typically rely on a priori prefix-ownership information and are reactive in nature. In this work, we take on a new perspective on BGP hijacking activity: we introduce and track the long-term network behavior of serial hijackers, networks that repeatedly hijack address blocks for malicious purposes, often over the course of many months or even years. Based on a ground-truth dataset that we construct by extracting information from operator mailing lists, we illuminate the dominant network characteristics of serial hijackers, and how they differ from legitimate networks. We then distill features that can capture these behavioral differences and train a machine learning model to automatically identify Autonomous Systems (ASes) that exhibit characteristics similar to serial hijackers. Our classifier identifies some ~1,000 potentially misbehaving ASes in the global IPv4 routing table. We analyze and categorize these networks, finding a wide range of indicators both for malicious activity, misconfiguration, as well as benign cases of hijacking activity. Our work presents a solid first step towards identifying and understanding this important category of networks, which can aid network operators in taking proactive measures to defend themselves against prefix hijacking and serve as input for current and future detection systems.
    • Marcin Nawrocki (Freie Universität Berlin), Jeremias Blendin (DE-CIX), Christoph Dietzel (DE-CIX / MPI for Informatics), Thomas C. Schmidt (HAW Hamburg), Matthias Wählisch (Freie Universität Berlin)
      Abstract: Large Distributed Denial-of-Service (DDoS) attacks pose a major threat not only to end systems but also to the Internet infrastructure as a whole. Remote Triggered Black Hole filtering (RTBH) has been established as a tool to mitigate inter-domain DDoS attacks by discarding unwanted traffic early in the network, e.g., at Internet eXchange Points (IXPs). As of today, little is known about the kind and effectiveness of its use, and about the need for more fine-grained filtering. In this paper, we present the first in-depth statistical analysis of all RTBH events at a large European IXP by correlating measurements of the data and the control plane for a period of 104 days. We identify a surprising practise that significantly deviates from the expected mitigation use patterns. First, we show that only one third of all 34k visible RTBH events correlate with indicators of DDoS attacks. Second, we witness over 2000 blackhole events announced for prefixes not of servers but of clients situated in DSL networks. Third, we find that blackholing on average causes dropping of only 50\% of the unwanted traffic and is hence a much less reliable tool for mitigating DDoS attacks than expected. Our analysis gives also rise to first estimates of the collateral damage caused by RTBH-based DDoS mitigation.
  • 12:30 - 14:00 - Lunch and Poster Concourse
    • Peng Peng and Limin Yang (Virginia Tech), Linhai Song (Pennsylvania State University), Gang Wang (Virginia Tech)
      Abstract: Online scan engines such as VirusTotal are heavily used by researchers to label malicious URLs and files. Unfortunately, it is not well understood how the labels are generated and how reliable the scanning results are. In this paper, we focus on VirusTotal and its 68 third-party vendors to examine their labeling process on phishing URLs. We perform a series of measurements by setting up our own phishing websites (mimicking PayPal and IRS, 62 sites in total) and submitting the URLs for scanning. By analyzing the incoming network traffic and the dynamic label changes at VirusTotal, we reveal new insights into how VirusTotal works and the quality of their labels. Among other things, we show that vendors have trouble flagging all phishing sites, and even the best vendors missed 30% of our phishing sites. In addition, the scanning results are not immediately updated to VirusTotal after the scanning, and there are inconsistent results between VirusTotal scan and some vendors' own scan engines. Our results reveal the need for developing more rigorous methodologies to assess and make use of the labels obtained from VirusTotal.
    • Sergio Pastrana (Universidad Carlos III de Madrid), Alice Hutchings and Daniel R. Thomas (University of Cambridge), Juan Tapiador (Universidad Carlos III de Madrid)
      Abstract: eWhoring is the term used by offenders to refer to a type of online fraud in which cybersexual encounters are simulated for financial gain. Perpetrators use social engineering techniques to impersonate young women in online communities, e.g., chat or social networking sites. They engage potential customers in conversation with the aim of selling misleading sexual material -- mostly photographs and interactive video shows -- illicitly compiled from third-party sites. eWhoring is a popular topic in underground communities, with forums acting as a gateway into offending. Users not only share knowledge and tutorials, but also trade in goods and services, such as packs of images and videos. In this paper, we present a processing pipeline to quantitatively analyse various aspects of eWhoring. Our pipeline integrates multiple tools to crawl, annotate, and classify material in a semi-automatic way. It builds in precautions to safeguard against significant ethical issues, such as avoiding the researchers' exposure to pornographic material, and legal concerns, which were justified as some of the images were classified as child exploitation material. We use it to perform a longitudinal measurement of eWhoring activities in 10 specialised underground forums from 2008 to 2019. Our study focuses on three of the main eWhoring components: (i) the acquisition and provenance of images; (ii) the financial profits and monetisation techniques; and (iii) a social network analysis of the offenders, including their relationships, interests, and pathways before and after engaging in this fraudulent activity. We provide recommendations, including potential intervention approaches.
    • Hiroaki Suzuki (Waseda University), Daiki Chiba (NTT Secure Platform Laboratories), Yoshiro Yoneya (JPRS), Tatsuya Mori (Waseda University / RIKEN AIP / NICT), Shigeki Goto (Waseda University)
      Abstract: The internationalized domain name (IDN) is a mechanism that enables us to use Unicode characters in domain names. The set of Unicode characters contains several pairs of characters that are visually identical with each other; e.g., the Latin character 'a' (U+0061) and Cyrillic character 'а' (U+0430). Visually identical characters such as these are generally known as _homoglyphs_. _IDN homograph attacks_, which are widely known, abuse Unicode homoglyphs to create lookalike URLs. Although the threat posed by IDN homograph attacks is not new, the recent rise of IDN adoption in both domain name registries and web browsers has resulted in the threat of these attacks becoming increasingly widespread, leading to large-scale phishing attacks such as those targeting cryptocurrency exchange companies. In this work, we developed a framework named "_ShamFinder_," which is an automated scheme to detect IDN homographs. Our key contribution is the automatic construction of a homoglyph database, which can be used for direct countermeasures against the attack and to inform users about the context of an IDN homograph. Using the ShamFinder framework, we perform a large-scale measurement study that aims to understand the IDN homographs that exist in the wild. On the basis of our approach, we provide insights into an effective countermeasure against the threats caused by the IDN homograph attack.
  • 15:05 - 15:35 - Closing Remarks
  • 15:35 - 16:05 - Closing coffee
  • 16:05 - 17:00 - Cloakroom service (closes at 17:00 strict)