IMC '20: Proceedings of the ACM Internet Measurement ConferenceFull Citation in the ACM Digital Library
SESSION: COVID-19 at IMC
Due to the COVID-19 pandemic, many governments imposed lock-downs that forced hundreds of millions of citizens to stay at home. The implementation of confinement measures increased Internet traffic demands of residential users, in particular, for remote working, entertainment, commerce, and education, which, as a result, caused traffic shifts in the Internet core.
In this paper, using data from a diverse set of vantage points (one ISP, three IXPs, and one metropolitan educational network), we examine the effect of these lockdowns on traffic shifts. We find that the traffic volume increased by 15-20% almost within a week---while overall still modest, this constitutes a large increase within this short time period. However, despite this surge, we observe that the Internet infrastructure is able to handle the new volume, as most traffic shifts occur outside of traditional peak hours. When looking directly at the traffic sources, it turns out that, while hypergiants still contribute a significant fraction of traffic, we see (1) a higher increase in traffic of non-hypergiants, and (2) traffic increases in applications that people use when at home, such as Web conferencing, VPN, and gaming. While many networks see increased traffic demands, in particular, those providing services to residential users, academic networks experience major overall decreases. Yet, in these networks, we can observe substantial increases when considering applications associated to remote working and lecturing.
During early 2020, the SARS-CoV-2 virus rapidly spread worldwide, forcing many governments to impose strict lock-down measures to tackle the pandemic. This significantly changed peoples mobility and habits, subsequently impacting how they use telecommunication networks. In this paper, we investigate the effects of the COVID-19 emergency on a UK Mobile Network Operator (MNO). We quantify the changes in users mobility and investigate how this impacted the cellular network usage and performance. Our analysis spans from the entire country to specific regions, and geodemographic area clusters. We also provide a detailed analysis for London. Our findings bring insights at different geotemporal granularity on the status of the cellular network, from the decrease in data traffic volume in the cellular network and lower load on the radio network, counterposed to a surge in the conversational voice traffic volume.
The Covid-19 pandemic has led to unprecedented changes in the way people interact with each other, which as a consequence has increased pressure on the Internet. In this paper we provide a perspective of the scale of Internet traffic growth and how well the Internet coped with the increased demand as seen from Facebooks edge network.
We use this infrastructure serving multiple large social networks and their related family of apps as vantage points to analyze how traffic and product properties changed over the course of the beginning of the Covid-19 pandemic. We show that there have been changes in traffic demand, user behavior and user experience. We also show that different regions of the world saw different magnitudes of impact with predominantly less developed regions exhibiting larger performance degradations.
Concern has been mounting about Internet centralization over the few last years -- consolidation of traffic/users/infrastructure into the hands of a few market players. We measure DNS and computing centralization by analyzing DNS traffic collected at a DNS root server and two country-code top-level domains (ccTLDs) -- one in Europe and the other in Oceania -- and show evidence of concentration. More than 30% of all queries to both ccTLDs are sent from 5 large cloud providers. We compare the clouds resolver infrastructure and highlight a discrepancy in behavior: some cloud providers heavily employ IPv6, DNSSEC, and DNS over TCP, while others simply use unsecured DNS over UDP over IPv4. We show one positive side to centralization: once a cloud provider deploys a security feature -- such as QNAME minimization -- it quickly benefits a large number of users.
This paper presents and evaluates Trufflehunter, a DNS cache snooping tool for estimating the prevalence of rare and sensitive Internet applications. Unlike previous efforts that have focused on small, misconfigured open DNS resolvers, Trufflehunter models the complex behavior of large multi-layer distributed caching infrastructures (e.g., such as Google Public DNS). In particular, using controlled experiments, we have inferred the caching strategies of the four most popular public DNS resolvers (Google Public DNS, Cloudflare Quad1, OpenDNS and Quad9). The large footprint of such resolvers presents an opportunity to observe rare domain usage, while preserving the privacy of the users accessing them. Using a controlled testbed, we evaluate how accurately Trufflehunter can estimate domain name usage across the U.S. Applying this technique in the wild, we provide a lower-bound estimate of the popularity of several rare and sensitive applications (most notably smartphone stalkerware) which are otherwise challenging to survey.
Networks not employing destination-side source address validation (DSAV) expose themselves to a class of pernicious attacks which could be easily prevented by filtering inbound traffic purporting to originate from within the network. In this work, we survey the pervasiveness of networks vulnerable to infiltration using spoofed addresses internal to the network. We issue recursive Domain Name System (DNS) queries to a large set of known DNS servers worldwide, using various spoofed-source addresses. We classify roughly half of the 62,000 networks (autonomous systems) we tested as vulnerable to infiltration due to lack of DSAV. As an illustration of the dangers these networks expose themselves to, we demonstrate the ability to fingerprint the operating systems of internal DNS servers. Additionally, we identify nearly 4,000 DNS server instances vulnerable to cache poisoning attacks due to insufficient---and often non-existent---source port randomization, a vulnerability widely publicized 12 years ago.
Phishing is one of the most common cyberattacks these days. Attackers constantly look for new techniques to make their campaigns more lucrative by extending the lifespan of phishing pages. To achieve this goal, they leverage different anti-analysis (i.e., evasion) techniques to conceal the malicious content from anti-phishing bots and only reveal the payload to potential victims. In this paper, we study the resilience of anti-phishing entities to three advanced anti-analysis techniques based on human verification: Google re-CAPTCHA, alert box, and session-based evasion. We have designed a framework for performing our testing experiments, deployed 105 phishing websites, and provided each of them with one of the three evasion techniques. In the experiments, we report phishing URLs to major server-side anti-phishing entities (e.g., Google Safe Browsing, NetCraft, APWG) and monitor their occurrence in the blacklists. Our results show that Google Safe Browsing was the only engine that detected all the reported URLs protected by alert boxes. However, none of the anti-phishing engines could detect phishing URLs armed with Google re-CAPTCHA, making it so far the most effective protection solution of phishing content available to malicious actors. Our experiments show that all the major serverside anti-phishing bots only detected 8 out of 105 phishing websites protected by human verification systems. As a mitigation plan, we intend to disclose our findings to the impacted anti-phishing entities before phishers exploit human verification techniques on a massive scale.
Consumer Internet of Things (IoT) devices are extremely popular, providing users with rich and diverse functionalities, from voice assistants to home appliances. These functionalities often come with significant privacy and security risks, with notable recent large-scale coordinated global attacks disrupting large service providers. Thus, an important first step to address these risks is to know what IoT devices are where in a network. While some limited solutions exist, a key question is whether device discovery can be done by Internet service providers that only see sampled flow statistics. In particular, it is challenging for an ISP to efficiently and effectively track and trace activity from IoT devices deployed by its millions of subscribers---all with sampled network data.
In this paper, we develop and evaluate a scalable methodology to accurately detect and monitor IoT devices at subscriber lines with limited, highly sampled data in-the-wild. Our findings indicate that millions of IoT devices are detectable and identifiable within hours, both at a major ISP as well as an IXP, using passive, sparsely sampled network flow headers. Our methodology is able to detect devices from more than 77% of the studied IoT manufacturers, including popular devices such as smart speakers. While our methodology is effective for providing network analytics, it also highlights significant privacy consequences.
Due to increasing digitalization, formerly isolated industrial networks, e.g., for factory and process automation, move closer and closer to the Internet, mandating secure communication. However, securely setting up OPC UA, the prime candidate for secure industrial communication, is challenging due to a large variety of insecure options. To study whether Internet-facing OPC UA appliances are configured securely, we actively scan the IPv4 address space for publicly reachable OPC UA systems and assess the security of their configurations. We observe problematic security configurations such as missing access control (on 24% of hosts), disabled security functionality (24%), or use of deprecated cryptographic primitives (25%) on in total 92% of the reachable deployments. Furthermore, we discover several hundred devices in multiple autonomous systems sharing the same security certificate, opening the door for impersonation attacks. Overall, in this paper, we highlight commonly found security misconfigurations and underline the importance of appropriate configuration for security-featuring protocols.
SESSION: Dissent, Censorship, and Interception
Shadowsocks is one of the most popular circumvention tools in China. Since May 2019, there have been numerous anecdotal reports of the blocking of Shadowsocks from Chinese users. In this study, we reveal how the Great Firewall of China (GFW) detects and blocks Shadowsocks and its variants. Using measurement experiments, we find that the GFW uses the length and entropy of the first data packet in each connection to identify probable Shadowsocks traffic, then sends seven different types of active probes, in different stages, to the corresponding servers to test whether its guess is correct.
We developed a prober simulator to analyze the effect of different types of probes on various Shadowsocks implementations, and used it to infer what vulnerabilities are exploited by the censor. We fingerprinted the probers and found differences relative to previous work on active probing. A network-level side channel reveals that the probers, which use thousands of IP addresses, are likely controlled by a set of centralized structures.
Based on our gained understanding, we present a temporary workaround that successfully mitigates the traffic analysis attack by the GFW. We further discuss essential strategies to defend against active probing. We responsibly disclosed our findings and suggestions to Shadowsocks developers, which has led to more censorship-resistant tools.
Increased adoption of HTTPS has created a largely encrypted web, but these security gains are on a collision course with governments that desire visibility into and control over user communications. Last year, the government of Kazakhstan conducted an unprecedented large-scale HTTPS interception attack by forcing users to trust a custom root certificate. We were able to detect the interception and monitor its scale and evolution using measurements from in-country vantage points and remote measurement techniques. We find that the attack targeted connections to 37 unique domains, with a focus on social media and communication services, suggesting a surveillance motive, and that it affected a large fraction of connections passing through the country's largest ISP, Kazakhtelecom. Our continuous real-time measurements indicated that the interception system was shut down after being intermittently active for 21 days. Subsequently, supported by our findings, two major browsers (Mozilla Firefox and Google Chrome) completely blocked the use of Kazakhstan's custom root. However, the incident sets a dangerous precedent, not only for Kazakhstan but for other countries that may seek to circumvent encryption online.
Efforts by content creators and social networks to enforce legal and policy-based norms, e.g. blocking hate speech and users, has driven the rise of unrestricted communication platforms. One such recent effort is Dissenter, a browser and web application that provides a conversational overlay for any web page. These conversations hide in plain sight -- users of Dissenter can see and participate in this conversation, whereas visitors using other browsers are oblivious to their existence. Further, the website and content owners have no power over the conversation as it resides in an overlay outside their control.
In this work, we obtain a history of Dissenter comments, users, and the websites being discussed, from the initial release of Dissenter in Feb. 2019 through Apr. 2020 (14 months). Our corpus consists of approximately 1.68M comments made by 101k users commenting on 588k distinct URLs. We first analyze macro characteristics of the network, including the user-base, comment distribution, and growth. We then use toxicity dictionaries, Perspective API, and a Natural Language Processing model to understand the nature of the comments and measure the propensity of particular websites and content to elicit hateful and offensive Dissenter comments. Using curated rankings of media bias, we examine the conditional probability of hateful comments given left and right-leaning content. Finally, we study Dissenter as a social network, and identify a core group of users with high comment toxicity.
SESSION: Cellular Everything
Support for "things" roaming internationally has become critical for Internet of Things (IoT) verticals, from connected cars to smart meters and wearables, and explains the commercial success of Machine-to-Machine (M2M) platforms. We analyze IoT verticals operating with connectivity via IoT SIMs, and present the first large-scale study of commercially deployed IoT SIMs for energy meters. We also present the first characterization of an operational M2M platform and the first analysis of the rather opaque associated ecosystem.
For operators, the exponential growth of IoT has meant increased stress on the infrastructure shared with traditional roaming traffic. Our analysis quantifies the adoption of roaming by M2M platforms and the impact they have on the underlying visited Mobile Network Operators (MNOs). To manage the impact of massive deployments of device operating with an IoT SIM, operators must be able to distinguish between the latter and traditional inbound roamers. We build a comprehensive dataset capturing the device population of a large European MNO over three weeks. With this, we propose and validate a classification approach that can allow operators to distinguish inbound roaming IoT devices.
Natural disasters can wreak havoc on Internet infrastructure. Short term impacts include impediments to first responders and long term impacts include requirements to repair or replace damaged physical components. In this paper, we present an analysis of the vulnerability of cellular communication infrastructure in the US to one type of natural disaster - wildfires. Three data sets are the basis for our study: historical wildfire records, wildfire risk projections, and cellular infrastructure deployment. We utilize the geographic features in each data set to assess the spatial overlap between historical wildfires and cellular infrastructure and to analyze current vulnerability. We find wide variability in the number of cell transceivers that were within wildfire perimeters over the past 18 years. In a focused analysis of the California wildfires of 2019, we find that the primary risk to cellular communication is power outage rather than cellular equipment damage. Our analysis of future risk based on wildfire hazard potential identifies California, Florida and Texas as the three states with the largest number of cell transceivers at risk. Importantly, we find that many of the areas at high risk are quite close to urban population centers, thus outages could have serious impacts on a large number of cell users. We believe that our study has important implications for governmental communication assurance efforts and for risk planning by cell infrastructure owners and service providers.
The emerging 5G services offer numerous new opportunities for networked applications. In this study, we seek to answer two key questions: i) is the throughput of mmWave 5G predictable, and ii) can we build "good" machine learning models for 5G throughput prediction? To this end, we conduct a measurement study of commercial mmWave 5G services in a major U.S. city, focusing on the throughput as perceived by applications running on user equipment (UE). Through extensive experiments and statistical analysis, we identify key UE-side factors that affect 5G performance and quantify to what extent the 5G throughput can be predicted. We then propose Lumos5G -- a composable machine learning (ML) framework that judiciously considers features and their combinations, and apply state-of-the-art ML techniques for making context-aware 5G throughput predictions. We demonstrate that our framework is able to achieve 1.37X to 4.84X reduction in prediction error compared to existing models. Our work can be viewed as a feasibility study for building what we envisage as a dynamic 5G throughput map (akin to Google traffic map). We believe this approach provides opportunities and challenges in building future 5G-aware apps.
SESSION: Overseas and Outerspace
Nearly all international data is carried by a mesh of submarine cables connecting virtually every region in the world. It is generally assumed that Internet services rely on this submarine cable network (SCN) for backend traffic, but that most users do not directly depend on it, as popular resources are either local or cached nearby. In this paper, we study the criticality of the SCN from the perspective of end users. We present a general methodology for analyzing the reliance on the SCN for a given region, and apply it to the most popular web resources accessed by users in 63 countries from every inhabited continent, collectively capturing ≈80% of the global Internet population. We find that as many as 64.33% of all web resources accessed from a specific country rely on the SCN. Despite the explosive growth of data center and CDN infrastructure around the world, at least 28.22% of the CDN-hosted resources traverse a submarine cable.
In the last two decades, the communication technologies used for supervision and control of critical infrastructures such as the power grid, have been migrating from serial links to Internet-compatible network protocols. Despite this trend, the research community has not explored or measured the unique characteristics of these industrial systems, and as a result, most of these networks remain unstudied. In this paper we perform the first measurement study of a Supervisory Control And Data Acquisition (SCADA) network in the bulk power grid. We develop a new protocol parser that can be used to analyze packets not conforming to standards, find attributes to profile the SCADA network, and identify several outliers which underscore the difficulties in managing a federated network where different devices are under the control of different power companies.
SpaceX, Amazon, and others plan to put thousands of satellites in low Earth orbit to provide global low-latency broadband Internet. SpaceX's plans have matured quickly, such that their underdeployment satellite constellation is already the largest in history, and may start offering service in 2020.
The proposed constellations hold great promise, but also present new challenges for networking. To enable research in this exciting space, we present Hypatia, a framework for simulating and visualizing the network behavior of these constellations by incorporating their unique characteristics, such as high-velocity orbital motion.
Using publicly available design details for the upcoming networks to drive our simulator, we characterize the expected behavior of these networks, including latency and link utilization fluctuations over time, and the implications of these variations for congestion control and routing.
SESSION: Measuring the Interconnect
The Tier-1 ISPs have been considered the Internet's backbone since the dawn of the modern Internet 30 years ago, as they guarantee global reachability. However, their influence and importance are waning as Internet flattening decreases the demand for transit services and increases the importance of private interconnections. Conversely, major cloud providers -- Amazon, Google, IBM, and Microsoft-- are gaining in importance as more services are hosted on their infrastructures. They ardently support Internet flattening and are rapidly expanding their global footprints, which enables them to bypass the Tier-1 ISPs and other large transit providers to reach many destinations.
In this paper we seek to quantify the extent to which the cloud providers' can bypass the Tier-1 ISPs and other large transit providers. We conduct comprehensive measurements to identify the neighbor networks of the major cloud providers and combine them with AS relationship inferences to model the Internet's AS-level topology to calculate a new metric, hierarchy-free reachability, which characterizes the reachability a network can achieve without traversing the networks of the Tier-1 and Tier-2 ISPs. We show that the cloud providers are able to reach over 76% of the Internet without traversing the Tier-1 and Tier-2 ISPs, more than virtually every other network.
Many systems rely on traceroutes to monitor or characterize the Internet. The quality of the systems' inferences depends on the completeness and freshness of the traceroutes, but the refreshing of traceroutes is constrained by limited resources at vantage points. Previous approaches predict which traceroutes are likely out-of-date in order to allocate measurements, or monitor BGP feeds for changes that overlap traceroutes. Both approaches miss many path changes for reasons including the difficulty in predicting changes and the coarse granularity of BGP paths.
This paper presents techniques to identify out-of-date traceroutes without issuing any measurements, even if a change is not visible at BGP granularity. We base our techniques on two observations. First, although BGP updates encode routes at AS granularity, routers issue updates when they change intra-domain routes or peering points within the same AS path. Second, route changes correlate across paths, and many publicly available traceroutes exist. Our techniques maintain an atlas of traceroutes by monitoring BGP updates and publicly available traceroutes for signals to mark overlapping atlas traceroutes as stale. We focus our analysis of traceroute path changes at the granularity of border router IPs which provides an abstraction finer than AS- or PoP-level but is not affected by the periodicity of intra-domain load balancers. Our evaluation indicates that 80% of the traceroutes that our techniques signal as stale have indeed changed, even though the AS hops remained the same. Our techniques combine to identify 79% of all border IP changes, without issuing a single online measurement.
Knowledge of the Internet topology and the business relationships between Autonomous Systems (ASes) is the basis for studying many aspects of the Internet. Despite the significant progress achieved by latest inference algorithms, their inference results still suffer from errors on some critical links due to limited data, thus hindering many applications that rely on the inferred relationships. We take an in-depth analysis on the challenges inherent in the data, especially the limited coverage and biased concentration of the vantage points (VPs). Some aspects of them have been largely overlooked but will become more exacerbated when the Internet further grows. Then we develop TopoScope, a framework for accurately recovering AS relationships from such fragmentary observations. TopoScope uses ensemble learning and Bayesian Network to mitigate the observation bias originating not only from a single VP, but also from the uneven distribution of available VPs. It also discovers the intrinsic similarities between groups of adjacent links, and infers the relationships on hidden links that are not directly observable. Compared to state-of-the-art inference algorithms, TopoScope reduces the inference error by up to 2.7-4 times, discovers the relationships for around 30,000 upper layer hidden AS links, and is still more accurate and stable under more incomplete or biased observations.
SESSION: DNS 2
The modern Internet relies on the Domain Name System (DNS) to convert between human-readable domain names and IP addresses. However, the correct and efficient implementation of this function is jeopardized when the configuration data binding domains, nameservers and glue records is faulty. In particular lame delegations, which occur when a nameserver responsible for a domain is unable to provide authoritative information about it, introduce both performance and security risks. We perform a broad-based measurement study of lame delegations, using both longitudinal zone data and active querying. We show that lame delegations of various kinds are common (affecting roughly 14% of domains we queried), that they can significantly degrade lookup latency (when they do not lead to outright failure), and that they expose hundreds of thousands of domains to adversarial takeover. We also explore circumstances that give rise to this surprising prevalence of lame delegations, including unforeseen interactions between the operational procedures of registrars and registries.
The DNS Security Extensions (DNSSEC) add data origin authentication and data integrity to the Domain Name System (DNS), the naming system of the Internet. With DNSSEC, signatures are added to the information provided in the DNS using public key cryptography. Advances in both cryptography and cryptanalysis make it necessary to deploy new algorithms in DNSSEC, as well as deprecate those with weakened security. If this process is easy, then the protocol has achieved what the IETF terms "algorithm agility".
In this paper, we study the lifetime of algorithms for DNSSEC. This includes: (i) standardizing the algorithm, (ii) implementing support in DNS software, (iii) deploying new algorithms at domains and recursive resolvers, and (iv) replacing deprecated algorithms. Using data from more than 6.7 million signed domains and over 10,000 vantage points in the DNS, combined with qualitative studies, we show that DNSSEC has only partially achieved algorithm agility. Standardizing new algorithms and deprecating insecure ones can take years. We highlight the main barriers for getting new algorithms deployed, but also discuss success factors. This study provides key insights to take into account when new algorithms are introduced, for example when the Internet must transition to quantum-safe public key cryptography.
Internet traffic generally relies on the Domain Name System (DNS) to map human-friendly hostnames into IP addresses. While the community has studied many facets of the system in isolation, this paper aims to study the DNS in context. With data from a residential ISP we study DNS along with both activity before an application needs a given mapping and the subsequent application transaction. We find that a majority of applications transactions (i) incur no direct DNS costs and (ii) for those that do the cost is minimal.
Privacy laws like the General Data Protection Regulation (GDPR) and the California Consumer Privacy Act (CCPA) have pushed internet firms processing personal data to obtain user consent. Uncertainty around sanctions for non-compliance led many websites to embed a Consent Management Provider (CMP), which collects users' consent and shares it with third-party vendors and other websites. Our paper maps the formation of this ecosystem using longitudinal measurements. Primary and secondary data sources are used to measure each actor within the ecosystem. Using 161 million browser crawls, we estimate that CMP adoption doubled from June 2018 to June 2019 and then doubled again until June 2020. Sampling 4.2 million unique domains, we observe that CMP adoption is most prevalent among moderately popular websites (Tranco top 50-10k) but a long tail exists. Using APIs from the ad-tech industry, we quantify the purposes and lawful bases used to justify processing personal data. A controlled experiment on a public website provides novel insights into how the time-to-complete of two leading CMPs' consent dialogues varies with the preferences expressed, showing how privacy aware users incur a significant time cost.
The success of platforms such as Facebook and Google has been due in no small part to features that allow advertisers to target ads in a fine-grained manner. However, these features open up the potential for discriminatory advertising when advertisers include or exclude users of protected classes---either directly or indirectly---in a discriminatory fashion. Despite the fact that advertisers are able to compose various targeting features together, the existing mitigations to discriminatory targeting have focused only on individual features; there are concerns that such composition could result in targeting that is more discriminatory than the features individually.
In this paper, we first demonstrate how compositions of individual targeting features can yield discriminatory ad targeting even for Facebook's restricted targeting features for ads in special categories (meant to protect against discriminatory advertising). We then conduct the first study of the potential for discrimination that spans across three major advertising platforms (Facebook, Google, and LinkedIn), showing how the potential for discriminatory advertising is pervasive across these platforms. Our work further points to the need for more careful mitigations to address the issue of discriminatory ad targeting.
Online messaging platforms such as WhatsApp, Telegram, and Discord, each with hundreds of millions of users, are one of the dominant modes of communicating or interacting with one another. Despite the widespread use of public group chats, there exists no systematic or detailed characterization of these group chats. There is, more importantly, lack of a general understanding of how these (public) groups differ in characteristics and use across the different platforms. We also do not know whether the messaging platforms expose personally identifiable information, and we lack a comprehensive view of the privacy implications of leaks for the users.
In this work, we address these gaps by analyzing the messaging platforms' ecosystem through the lens of a popular social media platform---Twitter. We search for WhatsApp, Telegram, and Discord group URLs posted on Twitter over a period of 38 days and amass a set of 351K unique group URLs. We analyze the content accompanied by group URLs on Twitter, finding interesting differences related to the topics of the groups across the multiple messaging platforms. By monitoring the characteristics of these groups, every day for more than a month, and, furthermore, by joining a subset of 616 groups across the different messaging platforms, we share key insights into the discovery of these groups via Twitter and reveal how these groups change over time. Finally, we analyze whether messaging platforms expose personally identifiable information. In this paper, we show that (a) Twitter is a rich source for discovering public groups in the different messaging platforms, (b) group URLs from messaging platforms are ephemeral, and (c) the considered messaging platforms expose personally identifiable information, with such leaks being more prevalent on WhatsApp than on Telegram and Discord.
Blocklists, consisting of known malicious IP addresses, can be used as a simple method to block malicious traffic. However, blocklists can potentially lead to unjust blocking of legitimate users due to IP address reuse, where more users could be blocked than intended. IP addresses can be reused either at the same time (Network Address Translation) or over time (dynamic addressing). We propose two new techniques to identify reused addresses. We built a crawler using the BitTorrent Distributed Hash Table to detect NATed addresses and use the RIPE Atlas measurement logs to detect dynamically allocated address spaces. We then analyze 151 publicly available IPv4 blocklists to show the implications of reused addresses and find that 53-60% of blocklists contain reused addresses having about 30.6K-45.1K listings of reused addresses. We also find that reused addresses can potentially affect as many as 78 legitimate users for as many as 44 days.
Who Touched My Browser Fingerprint?: A Large-scale Measurement Study and Classification of Fingerprint Dynamics
Browser fingerprints are dynamic, evolving with feature values changed over time. Previous fingerprinting datasets are either small-scale with only thousands of browser instances or without considering fingerprint dynamics. Thus, it remains unclear how an evolution-aware fingerprinting tool behaves in a real-world setting, e.g., on a website with millions of browser instances, let alone how fingerprint dynamics implicate privacy and security.
In this paper, we perform the first, large-scale study of millions of fingerprints to analyze fingerprint dynamics in a real-world website. Our measurement study answers the question of how and why fingerprints change over time by classifying fingerprint dynamics into three categories based on their causes. We also observed several insights from our measurement, e.g., we show that state-of-the-art fingerprinting tool performs poorly in terms of F1-Score and matching speed in this real-world setting.
We present the design, implementation, evaluation, and validation of a system that learns regular expressions (regexes) to extract Autonomous System Numbers (ASNs) from hostnames associated with router interfaces. We train our system with ASNs inferred by Router-ToAsAssignment and bdrmapIT using topological constraints from traceroute paths, as well as ASNs recorded by operators in PeeringDB, to learn regexes for 206 different suffixes. Because these methods for inferring router ownership can infer the wrong ASN, we modify bdrmapIT to integrate this new capability to extract ASNs from hostnames. Evaluating against ground truth, our modification correctly distinguished stale from correct hostnames for 92.5% of hostnames with an ASN different from bdrmapIT's initial inference. This modification allowed bdrmapIT to increase the agreement between extracted and inferred ASNs for these routers in the January 2020 ITDK from 87.4% to 97.1% and reduce the error rate from 1/7.9 to 1/34.5. This work opens a broader horizon of opportunity for evidence-based router ownership inference.
SESSION: The Last Mile
Accurate broadband coverage data is essential for public policy planning and government support programs. In the United States, the Federal Communications Commission is responsible for maintaining national broadband coverage data. Observers have panned the FCC's broadband maps for overstating availability, due to coarsegrained data collection and a low coverage threshold.
We demonstrate a new approach to building broadband coverage maps: automated large-scale queries to the public availability checking tools offered by major internet service providers. We reverse engineer the coverage tools for nine major ISPs in the U.S., test over 19 million residential street addresses across nine states for service, and compare the results to the FCC's maps.
Our results demonstrate that the FCC's coverage data significantly overstates the availability of each ISP's service, access to any broadband, connection speeds available to consumers, and competition in broadband markets. We also find that the FCC's data disproportionately overstates coverage in rural and minority communities. Our results highlight a promising direction for developing more accurate broadband maps and validating coverage reports.
Last-mile is the centerpiece of broadband connectivity, as poor last-mile performance generally translates to poor quality of experience. In this work we investigate last-mile latency using traceroute data from RIPE Atlas probes located in 646 ASes and focus on recurrent performance degradation. We find that in normal times probes in only 10% ASes experience persistent last-mile congestion but we recorded 55% more congested ASes during the COVID-19 outbreak. Persistent last-mile congestion is not uncommon, it is usually seen in large eyeball networks and may span years. With the help of CDN access log data, we dissect results for major ISPs in Japan, the most severely affected country in our study, and ascertain bottlenecks in the shared legacy infrastructure.
IP address classification and clustering are important tools for security practitioners in understanding attacks and employing proactive defenses. Over the past decade, network providers have begun transitioning from IPv4 to the more flexible IPv6, and a third of users now access online services over IPv6. However, there is no reason to believe that the properties of IPv4 addresses used for security applications should carry over to IPv6, and to date there has not yet been a large-scale study comparing the two protocols at a user (as opposed to a client or address) level.
In this paper, we establish empirical grounding on how both ordinary users and attackers use IPv6 in practice, compared with IPv4. Using data on benign and abusive accounts at Facebook, one of the largest online platforms, we conduct user-centric analyses that assess the spatial and temporal properties of users' IP addresses, and IP-centric evaluations that characterize the user populations on IP addresses. We find that compared with IPv4, IPv6 addresses are less populated with users and shorter lived for each user. While both protocols exhibit outlying behavior, we determine that IPv6 outliers are significantly less prevalent and diverse, and more readily predicted. We also study the effects of subnetting IPv6 addresses at different prefix lengths, and find that while /56 subnets are closest in behavior to IPv4 addresses for malicious users, either the full IPv6 address or /64 subnets are most suitable for IP-based security applications, with both providing better performance tradeoffs than IPv4 addresses. Ultimately, our findings provide guidance on how security practitioners can handle IPv6 for applications such as blocklisting, rate limiting, and training machine learning models.
SESSION: New Tools for Your Toolbox
We propose a new traceroute tool, FlashRoute for efficient large-scale topology discovery. FlashRoute reduces the time required for tracerouting the entire /24 IPv4 address space by a factor of three and half compared to previous state of the art. Additionally, we present a new technique to measure hop-distance to a destination using a single probe and uncover a bias of the influential ISI Census hitlist  in topology discovery.
Anycast addressing - assigning the same IP address to multiple, distributed devices - has become a fundamental approach to improving the resilience and performance of Internet services, but its conventional deployment model makes it impossible to infer from the address itself that it is anycast. Existing methods to detect anycast IPv4 prefixes present accuracy challenges stemming from routing and latency dynamics, and efficiency and scalability challenges related to measurement load. We review these challenges and introduce a new technique we call "MAnycast2" that can help overcome them. Our technique uses a distributed measurement platform of anycast vantage points as sources to probe potential anycast destinations. This approach eliminates any sensitivity to latency dynamics, and greatly improves efficiency and scalability. We discuss alternatives to overcome remaining challenges relating to routing dynamics, suggesting a path toward establishing the capability to complete, in under 3 hours, a full census of which IPv4 prefixes in the ISI hitlist are anycast.
Limited data access is a longstanding barrier to data-driven research and development in the networked systems community. In this work, we explore if and how generative adversarial networks (GANs) can be used to incentivize data sharing by enabling a generic framework for sharing synthetic datasets with minimal expert knowledge. As a specific target, our focus in this paper is on time series datasets with metadata (e.g., packet loss rate measurements with corresponding ISPs). We identify key challenges of existing GAN approaches for such workloads with respect to fidelity (e.g., long-term dependencies, complex multidimensional relationships, mode collapse) and privacy (i.e., existing guarantees are poorly understood and can sacrifice fidelity). To improve fidelity, we design a custom workflow called DoppelGANger (DG) and demonstrate that across diverse real-world datasets (e.g., bandwidth measurements, cluster requests, web sessions) and use cases (e.g., structural characterization, predictive modeling, algorithm comparison), DG achieves up to 43% better fidelity than baseline models. Although we do not resolve the privacy problem in this work, we identify fundamental challenges with both classical notions of privacy and recent advances to improve the privacy properties of GANs, and suggest a potential roadmap for addressing these challenges. By shedding light on the promise and challenges, we hope our work can rekindle the conversation on workflows for data sharing.
SESSION: Routing and Reachability
In this paper, we introduce a framework to observe RPKI relying parties (i.e., those that fetch RPKI data from the distributed repository) and present insights into this ecosystem for the first time. Our longitudinal study of data gathered from three RPKI certification authorities (AFRINIC, APNIC, and our own CA) identifies different deployment models of relying parties and (surprisingly) prevalent inconsistent fetching behavior that affects Internet routing robustness. Our results reveal nearly 90% of relying parties are unable to connect to delegated publication points under certain conditions, which leads to erroneous invalidation of IP prefixes and likely widespread loss of network reachability.
Pinpointing autonomous systems which deploy specific inter-domain techniques such as Route Flap Damping (RFD) or Route Origin Validation (ROV) remains a challenge today. Previous approaches to detect per-AS behavior often relied on heuristics derived from passive and active measurements. Those heuristics, however, often lacked accuracy or imposed tight restrictions on the measurement methods.
We introduce an algorithmic framework for network tomography, BeCAUSe, which implements Bayesian Computation for Autonomous Systems. Using our original combination of active probing and stochastic simulation, we present the first study to expose the deployment of RFD. In contrast to the expectation of the Internet community, we find that at least 9% of measured ASs enable RFD, most using deprecated vendor default configuration parameters. To illustrate the power of computational Bayesian methods we compare BeCAUSe with three RFD heuristics. Thereafter we successfully apply a generalization of the Bayesian method to a second challenge, measuring deployment of ROV.
Inbound traffic engineering (ITE)---the process of announcing routes to, e.g., maximize revenue or minimize congestion---is an essential task for Autonomous Systems (ASes). AS Path Prepending (ASPP) is an easy to use and well-known ITE technique that routing manuals show as one of the first alternatives to influence other ASes' routing decisions. We observe that origin ASes currently prepend more than 25% of all IPv4 prefixes.
ASPP consists of inflating the BGP AS path. Since the length of the AS path is the second tie-breaker in the BGP best path selection, ASPP can steer traffic to other routes. Despite being simple and easy to use, the appreciation of ASPP among operators and researchers is diverse. Some have questioned its need, effectiveness, and predictability, as well as voiced security concerns. Motivated by these mixed views, we revisit ASPP. Our longitudinal study shows that ASes widely deploy ASPP, and its utilization has slightly increased despite public statements against it. We surprisingly spot roughly 6k ASes originating at least one prefix with prepends that achieve no ITE goal. With active measurements, we show that ASPP effectiveness as an ITE tool depends on the AS location and the number of available upstreams; that ASPP security implications are practical; identify that more than 18% of the prepended prefixes contain unnecessary prepends that achieve no apparent goal other than amplifying existing routing security risks. We validate our findings in interviews with 20 network operators.
SESSION: Measure All Networks
Low latency is of interest for a variety of applications. The most stringent latency requirements arise in financial trading, where sub-microsecond differences matter. As a result, firms in the financial technology sector are pushing networking technology to its limits, giving a peek into the future of consumer-grade terrestrial microwave networks. Here, we explore the world's most competitive network design race, which has played out over the past decade on the Chicago-New Jersey trading corridor. We systematically reconstruct licensed financial trading networks from publicly available information, and examine their latency, path redundancy, wireless link lengths, and operating frequencies.
Distributed deep learning (DDL) uses a cluster of servers to train models in parallel. This has been applied to a multiplicity of problems, e.g. online advertisement, friend recommendations. However, the distribution of training means that the communication network becomes a key component in system performance. In this paper, we measure the Alibaba's DDL system, with a focus on understanding the bottlenecks introduced by the network. Our key finding is that the communications overhead has a surprisingly large impact on performance. To explore this, we analyse latency logs of 1.38M Remote Procedure Calls between servers during model training for two real applications of high-dimensional sparse data. We reveal the major contributors of the latency, including concurrent write/read operations of different connections and network connection management. We further observe a skewed distribution of update frequency for individual parameters, motivating us to propose using in-network computation capacity to offload server tasks.
Scalability has been a bottleneck for major blockchains such as Bitcoin and Ethereum. Despite the significantly improved scalability claimed by several high-profile blockchain projects, there has been little effort to understand how their transactional throughput is being used. In this paper, we examine recent network traffic of three major high-scalability blockchains---EOSIO, Tezos and XRP Ledger (XRPL)---over a period of seven months. Our analysis reveals that only a small fraction of the transactions are used for value transfer purposes. In particular, 96% of the transactions on EOSIO were triggered by the airdrop of a currently valueless token; on Tezos, 76% of throughput was used for maintaining consensus; and over 94% of transactions on XRPL carried no economic value. We also identify a persisting airdrop on EOSIO as a DoS attack and detect a two-month-long spam attack on XRPL. The paper explores the different designs of the three blockchains and sheds light on how they could shape user behavior.
SESSION: Crime and Protection
Trust and reputation play a core role in underground cybercrime markets, where participants are anonymous and there is little legal recourse for dispute arbitration. These underground markets exist in tension between two opposing forces: the drive to hide incriminating information, and the trust and stability benefits that greater openness yields. Revealing information about transactions to mitigate scams also provides valuable data about the market. We analyse the first dataset, of which we are aware, about the transactions created and completed on a well-known and high-traffic underground marketplace, Hack Forums, along with the associated threads and posts made by its users over two recent years, from June 2018 to June 2020. We use statistical modelling approaches to analyse the economic and social characteristics of the market over three eras, especially its performance as an infrastructure for trust. In the Set-up era, we observe the growth of users making only one transaction, as well as 'power-users' who make many transactions. In the Stable era, we observe a wide range of activities (including large-scale transfers of intermediate currencies such as Amazon Giftcards) which declines slowly from an initial peak. Finally, we analyse the effects of the Covid-19 pandemic, concluding that while we see a significant increase in transactins across all categories, this reflects a stimulus of the market, rather than a transformation. New users overcome the 'cold start' problem by engaging in low-level currency exchanges to prove their trustworthiness. We observe currency exchange accounts for most contracts, and Bitcoin and PayPal are the preferred payment methods by trading values and number of contracts involved. The market is becoming more centralised over time around influential users and threads, with significant changes observed during the Set-up and Covid-19 eras.
As technologies to defend against phishing and malware often impose an additional financial and usability cost on users (such as security keys), a question remains as to who should adopt these heightened protections. We measure over 1.2 billion email-based phishing and malware attacks against Gmail users to understand what factors place a person at heightened risk of attack. We find that attack campaigns are typically short-lived and at first glance indiscriminately target users on a global scale. However, by modeling the distribution of targeted users, we find that a person's demographics, location, email usage patterns, and security posture all significantly influence the likelihood of attack. Our findings represent a first step towards empirically identifying the most at-risk users.
Across the world, government websites are expected to be reliable sources of information, regardless of their view count. Interactions with these websites often contain sensitive information, such as identity, medical, or legal data, whose integrity must be protected for citizens to remain safe. To better understand the government website ecosystem, we measure the adoption of https including the "long tail" of government websites around the world, which are typically not captured in the top-million datasets used for such studies. We identify and measure major categories and frequencies of https adoption errors, including misconfiguration of certificates via expiration, reuse of keys and serial numbers between unrelated government departments, use of insecure cryptographic protocols and keys, and untrustworthy root Certificate Authorities (CAs). Finally, we observe an overall lower https rate and a steeper dropoff with descending popularity among government sites compared to the commercial websites & provide recommendations to improve the usage of https in governments worldwide.
SESSION: Sensitive Domains
Domain classification services have applications in multiple areas, including cybersecurity, content blocking, and targeted advertising. Yet, these services are often a black box in terms of their methodology to classifying domains, which makes it difficult to assess their strengths, aptness for specific applications, and limitations. In this work, we perform a large-scale analysis of 13 popular domain classification services on more than 4.4M hostnames. Our study empirically explores their methodologies, scalability limitations, label constellations, and their suitability to academic research as well as other practical applications such as content filtering. We find that the coverage varies enormously across providers, ranging from over 90% to below 1%. All services deviate from their documented taxonomy, hampering sound usage for research. Further, labels are highly inconsistent across providers, who show little agreement over domains, making it difficult to compare or combine these services. We also show how the dynamics of crowd-sourced efforts may be obstructed by scalability and coverage aspects as well as subjective disagreements among human labelers. Finally, through case studies, we showcase that most services are not fit for detecting specialized content for research or content-blocking purposes. We conclude with actionable recommendations on their usage based on our empirical insights and experience. Particularly, we focus on how users should handle the significant disparities observed across services both in technical solutions and in research.
Several data protection laws include special provisions for protecting personal data relating to religion, health, sexual orientation, and other sensitive categories. Having a well-defined list of sensitive categories is sufficient for filing complaints manually, conducting investigations, and prosecuting cases in courts of law. Data protection laws, however, do not define explicitly what type of content falls under each sensitive category. Therefore, it is unclear how to implement proactive measures such as informing users, blocking trackers, and filing complaints automatically when users visit sensitive domains. To empower such use cases we turn to the Curlie.org crowdsourced taxonomy project for drawing training data to build a text classifier for sensitive URLs. We demonstrate that our classifier can identify sensitive URLs with accuracy above 88%, and even recognize specific sensitive categories with accuracy above 90%. We then use our classifier to search for sensitive URLs in a corpus of 1 Billion URLs collected by the Common Crawl project. We identify more than 155 millions sensitive URLs in more than 4 million domains. Despite their sensitive nature, more than 30% of these URLs belong to domains that fail to use HTTPS. Also, in sensitive web pages with third-party cookies, 87% of the third-parties set at least one persistent cookie.
Analyzing Third Party Service Dependencies in Modern Web Services: Have We Learned from the Mirai-Dyn Incident?
Many websites rely on third parties for services (e.g., DNS, CDN, etc.). However, it also exposes them to shared risks from attacks (e.g., Mirai DDoS attack ) or cascading failures (e.g., GlobalSign revocation error ). Motivated by such incidents, we analyze the prevalence and impact of third-party dependencies, focusing on three critical infrastructure services: DNS, CDN, and certificate revocation checking by CA. We analyze both direct (e.g., Twitter uses Dyn) and indirect (e.g., Netflix uses Symantec as CA which uses Verisign for DNS) dependencies. We also take two snapshots in 2016 and 2020 to understand how the dependencies evolved. Our key findings are: (1) 89% of the Alexa top-100K websites critically depend on third-party DNS, CDN, or CA providers i.e., if these providers go down, these websites could suffer service disruption; (2) the use of third-party services is concentrated, and the top-3 providers of CDN, DNS, or CA services can affect 50%-70% of the top-100K websites; (3) indirect dependencies amplify the impact of popular CDN and DNS providers by up to 25X; and (4) some third-party dependencies and concentration increased marginally between 2016 to 2020. Based on our findings, we derive key implications for different stakeholders in the web ecosystem.
SESSION: Careful What You Measure
Fast IPv4 scanning has enabled researchers to answer a wealth of security and networking questions. Yet, despite widespread use, there has been little validation of the methodology's accuracy, including whether a single scan provides sufficient coverage. In this paper, we analyze how scan origin affects the results of Internet-wide scans by completing three HTTP, HTTPS, and SSH scans from seven geographically and topologically diverse networks. We find that individual origins miss an average 1.6-8.4% of HTTP, 1.5-4.6% of HTTPS, and 8.3-18.2% of SSH hosts. We analyze why origins see different hosts, and show how permanent and temporary blocking, packet loss, geographic biases, and transient outages affect scan results. We discuss the implications for scanning and provide recommendations for future studies.
On Landing and Internal Web Pages: The Strange Case of Jekyll and Hyde in Web Performance Measurement
There is a rich body of literature on measuring and optimizing nearly every aspect of the web, including characterizing the structure and content of web pages, devising new techniques to load pages quickly, and evaluating such techniques. Virtually all of this prior work used a single page, namely the landing page (i.e., root document, "/"), of each web site as the representative of all pages on that site. In this paper, we characterize the differences between landing and internal (i.e., non-root) pages of 1000 web sites to demonstrate that the structure and content of internal pages differ substantially from those of landing pages, as well as from one another. We review more than a hundred studies published at top-tier networking conferences between 2015 and 2019, and highlight how, in light of these differences, the insights and claims of nearly two-thirds of the relevant studies would need to be revised for them to apply to internal pages.
Going forward, we urge the networking community to include internal pages for measuring and optimizing the web. This recommendation, however, poses a non-trivial challenge: How do we select a set of representative internal web pages from a web site? To address the challenge, we have developed Hispar, a "top list" of 100,000 pages updated weekly comprising both the landing pages and internal pages of around 2000 web sites. We make Hispar and the tools to recreate or customize it publicly available.
SESSION: False Advertisement
"Incentivized" advertising platforms allow mobile app developers to acquire new users by directly paying users to install and engage with mobile apps (e.g., create an account, make in-app purchases). Incentivized installs are banned by the Apple App Store and discouraged by the Google Play Store because they can manipulate app store metrics (e.g., install counts, appearance in top charts). Yet, many organizations still offer incentivized install services for Android apps. In this paper, we present the first study to understand the ecosystem of incentivized mobile app install campaigns in Android and its broader ramifications through a series of measurements. We identify incentivized install campaigns that require users to install an app and perform in-app tasks targeting manipulation of a wide variety of user engagement metrics (e.g., daily active users, user session lengths) and revenue. Our results suggest that these artificially inflated metrics can be effective in improving app store metrics as well as helping mobile app developers to attract funding from venture capitalists. Our study also indicates lax enforcement of the Google Play Store's existing policies to prevent these behaviors. It further motivates the need for stricter policing of incentivized install campaigns. Our proposed measurements can also be leveraged by the Google Play Store to identify potential policy violations.
This work presents a large-scale, longitudinal measurement study on the adoption of application updates, enabling continuous reporting of potentially vulnerable software populations worldwide. Studying the factors impacting software currentness, we investigate and discuss the impact of the platform and its updating strategies on software currentness, device lock-in effects, as well as user behavior. Utilizing HTTP User-Agent strings from end-hosts, we introduce techniques to extract application and operating system information from myriad structures, infer version release dates of applications, and measure population adoption, at a global scale. To deal with loosely structured User-Agent data, we develop a semi-supervised method that can reliably extract application and version information for some 87% of requests served by a major CDN every day. Using this methodology, we track release and adoption dynamics of some 35,000 applications. Analyzing over three years of CDN logs, we show that vendors' update strategies and platforms have a significant effect on the adoption of application updates. Our results show that, on some platforms, up to 25% of requests originate from hosts running application versions that are out-of-date by more than 100 days, and 16% more than 300 days. We find pronounced differences across geographical regions, and overall, less developed regions are more likely to have out-of-date software versions. Though, for every country, we find that at least 10% of requests reaching the CDN run software that is out-of-date by more than three months.
The rapid growth of online advertising has fueled the growth of ad-blocking software, such as new ad-blocking and privacy-oriented browsers or browser extensions. In response, both ad publishers and ad networks are constantly trying to pursue new strategies to keep up their revenues. To this end, ad networks have started to leverage the Web Push technology enabled by modern web browsers. As web push notifications (WPNs) are relatively new, their role in ad delivery has not yet been studied in depth. Furthermore, it is unclear to what extent WPN ads are being abused for malvertising (i.e., to deliver malicious ads). In this paper, we aim to fill this gap. Specifically, we propose a system called PushAdMiner that is dedicated to (1) automatically registering for and collecting a large number of web-based push notifications from publisher websites, (2) finding WPN-based ads among these notifications, and (3) discovering malicious WPN-based ad campaigns.
Using PushAdMiner, we collected and analyzed 21,541 WPN messages by visiting thousands of different websites. Among these, our system identified 572 WPN ad campaigns, for a total of 5,143 WPN-based ads that were pushed by a variety of ad networks. Furthermore, we found that 51% of all WPN ads we collected are malicious, and that traditional ad-blockers and URL filters were mostly unable to block them, thus leaving a significant abuse vector unchecked.