Realistic BGP Traffic for Test Labs Olaf Maennel and Anja Feldmann Saarland University, Saarbršucken {olafm,anja}@net.uni-sb.de ABSTRACT 1. INTRODUCTION This paper examines the possibility of generating realistic New features (e.g., fair queuing), new components (e.g., routing tables of arbitrary size along with realistic BGP new router architectures, new software versions), are incor- updates of arbitrary frequencies via an automated tool de- porated into the Internet every day. Ideally each component ployable in a small-scale test lab. Such a tool provides the should be tested for its correctness and evaluated for its ef- necessary foundations to study such questions as:the limits fectiveness in a test environment before it is deployed in the of BGP scalability, the reasons behind routing instability, network. Unfortunately the ability to test many features is and the extent to which routing instability influences the limited by the simplicity of current test setups. Typical test- forwarding performance of a router. beds consist of a small number of routers and test-traffic gen- We find that the answer is affirmative. In this paper we erators, e.g., IXIA [1] from Ixiacom, Chariot [2] from Net- identify important characteristics/metrics of routing tables IQ, TeraRouter Tester [3] from Spirent Communication and and updates which provide the foundation of the proposed RIG [4] from Arsin. Some [2] are only capable of generating BGP workload model. Based on the insights of an extensive test traffic, others [1, 3, 4] can also generate routing protocol characterization of BGP traffic according to such metrics traffic, but all of them suffer a severe shortcoming:the traf- as prefix length distributions, fanout, amount of nesting of fic they generate is not necessarily consistent with the traffic routing table prefixes, AS path length, number and times in the Internet. For example test-traffic often does not re- between BGP update bursts and number and times between flect temporal variability as captured by self-similarity nor BGP session resets, etc., we introduce our prototype tool, does it capture the full range of IP addresses. While some RTG. RTG realizes the workload model and is capable of gen- commercial tools support routing protocols their abilities erating realistic BGP traffic. Through its flexibility and pa- are limited to the basic operations:propagation of simple rameterization RTG enables us to study the sensibilities of updates and participation in the exchange of routing tables test systems in a repeatable and consistent manner while or synchronization of topology databases. This is at least still providing the possibility of capturing the different char- partly due to our limited understanding of the dynamics of acteristics from different vantage points in the network. routing protocols, e.g., [5, 6, 7]. Therefore it is currently im- possible to, e.g., recreate complex but realistic BGP (Border Gateway Protocol [8, 9, 10]) routing instabilities as observed in the Internet in a test-lab, except by replaying a captured Categories and Subject Descriptors trace. C.2.2 [Computer Communication Networks]:Routing BGP controls the routing between autonomous systems Protocols (AS). Recent work by researchers [11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 5], and the standardization body, IETF (espe- cially the working groups on Inter Domain Routing (idr) [21, General Terms 22] and on Benchmarking Methodology (bmwg) [23, 24]) have shown that BGP's dynamics are poorly understood. Measurement, Design, Performance On the other hand for network operators it is crucial to un- derstand BGP's dynamics, as can be seen by the numerous Keywords presentations and panels on BGP at the network operator forums, e.g., NANOG [25]. In this work we set out to BGP, Workload * identify a structure in BGP traffic * characterize the structure using actual measurements * exploit the structure for a BGP workload model Permission to make digital or hard copies of all or part of this work for * propose a tool, RTG, to realize the workload model personal or classroom use is granted without fee provided that copies are * and show, in parts, that RTG can create realistic BGP not made or distributed for profit or commercial advantage and that copies traffic. bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific In short the goal of this paper is two-fold:first to identify permission and/or a fee. SIGCOMM'02, August 19-23, 2002, Pittsburgh, Pennsylvania, USA. and characterize BGP traffic and therefore contribute to the Copyright 2002 ACM 1-58113-570-105/02/0008 ...$5.00. basic understanding of BGP dynamics and second to repro- duce realistic BGP traffic, using our tool RTG, in a small peers have to exchange their BGP routing tables. This is test-lab consisting of only a few physical components. done by sending BGP updates for each prefix in the routing Such a tool will allow us to explore BGP in many differ- tables. Each router receiving a BGP announcement applies ent ways:establishing basic BGP implementation regression some local policy regarding accepting the update, adds the tests, testing BGP implementation features, finding good update to its BGP routing table (possibly replacing an al- settings for the many BGP parameters (especially the BGP ready existing route for this prefix-peer-combination) and, timers), testing BGP's scalability, testing the interactions of if the "best" route changes, updates the forwarding table. BGP's routing table updates with packet forwarding, exper- In a next step, the router applies its outbound policy to imenting with changes in the BGP workload (e.g., different the new best route and, after potentially rewriting some at- vantage points or changes to the protocol), understanding tributes, sends the update to its other peers. Each router the reasons for BGP instability, etc. But RTG is not just use- receiving a BGP withdraw, deletes the entry for this pre- ful for understanding BGP traffic, its scalable routing table fix from the peer's BGP table. It possibly calculates a new models are also useful for packet lookup and classification best route to replace the withdrawn route, forwards the up- algorithm designers. date to the other peers and updates its forwarding table. RTGs approach differs from black-box approaches, that just If no announcement or withdraw is sent for a specific time replay a trace, in that it captures the structure of BGP up- period BGP uses keepalive messages to determine if the ses- dates and their imposed changes. It is highly configurable, sion is up or down. If a router notices a session as down parameterizable and scalable and therefore allows insights the corresponding routes have to be deleted from the table into the reasons behind certain behaviors. In contrast our and updates, either announcements of alternative routes or ability to scale a trace and/or adapting it to a different sce- withdraws, have to be sent to the other BGP peers. nario is rather limited. In addition traces are usually con- BGP updates are limited by timers:e.g., the Min-Route sidered proprietary. But tools, such as RTG, can be used by Advertisement Interval timer [31] limits the number of up- people at different locations and companies, that normally dates for each prefix/session for each peer to one every x could not share data to normalize their workloads [26]. But seconds. (A typical value for x is 30 seconds.) Routing up- maybe most importantly the tool highlights how well we un- dates that flow through a network can cause other updates derstand the measured data. Can we capture the essential to be generated, e.g., consider the scenario shown in Fig- pieces in a workload model so that we can reproduce reality ure 1. AS1 has added a prefix P and is therefore sending based on a modeled structure? a BGP update to AS2 and AS3 for P with AS path:AS1. Our model-based BGP reference stream is designed around This update is received by AS2, added to the routing tables, the notion of a workload model that, we believe, captures and sent to AS4, since AS2 has not sent an update to AS4 the structure of BGP traffic in a similar way as other work- within the last 30 seconds. AS4 receives the update, adds load models, such as SURGE [27] or tcplib [28, 29], capture the prefix to its routing table and forwards the update to the structure and characteristics of Web or TCP traffic. In AS5. AS3 also receives the update and adds it to its rout- this paper we ing table, but instead of sending the update immediately to * explain the characteristics we decided to include in AS4, AS3 has to wait until the Min-Route Advertisement the model and our reasoning for including them (Sec- Interval timer expires. Once AS4 receives the update from tion 3). AS3 it realizes that this is a better path, reannounces its * present a novel characterization of BGP traffic follow- routing table entry for this prefix and sends another update ing the elements of the workload model (Section 4). for prefix P to AS5. In this rather simple example AS1 orig- * discuss how the workload model is used by RTG to gen- inated one update for prefix P, yet AS4 is originating two erate BGP traffic (Section 5). updates for the same prefix. More general a single update * demonstrate with examples that RTG generates realistic originated by some AS can cause a sequence of updates, BGP traffic (Section 6). called update sequence, to be observed at some other AS in * summarize our experience and suggest future research the Internet. directions (Section 7). To further limit the number of updates route flap damp- ing [32] has been introduced. Route flaps can be caused Note that RTGs configuration files can either be automati- by administrative changes, such as additions and removals cally generated via BGP traffic analysis or manually derived of network interfaces and network links, or administrative or any combination of the two methods. changes to BGP session characteristics, or due to session resets due to link failures or transport layer connectivity 2. BGP BACKGROUND failures and other scenarios [22]. Since flaps can generate The Internet is divided into a collection of autonomous update sequences that propagate through the Internet, con- systems. Routing through the Internet is accomplished on sume router resources, and may cause other routing updates, a prefix by prefix basis and depends on protocols for rout- one wants to limit the scope of route flap propagation via ing within individual ASes, e.g., EIGRP, OSPF, IS-IS, and route flap damping. The idea behind route flap damping RIP [30] and for routing between ASes [8, 31], for which is to use the history of updates associated with a prefix to BGP [9], a path-vector protocol, is the de facto standard. predict its future behavior and in this way suppress oscil- BGP advertisements are exchanged over BGP sessions be- lating routes until they have stabilized. Typical values for tween pairs of routers. route flap damping, according to the recommendations from Upon startup the routing tables and the forwarding tables RIPE [33], are suppression periods of 30 - 60 (10 - 30) min- of a router need to be initialized. For BGP this means that utes for /22 to /32 (all others) after the 4th change. Other BGP sessions to all peers of the routers have to be estab- parameter settings impose damping in a progressive fashion: lished. Once a BGP session to a peer is established, the two the more flaps the longer the suppression times. AS1 T1: P (AS1) T1: P (AS1) /HYHO GHSWK AS2 AS3 IDQRXW IDQRXW IDQRXW $6 $6 T3: P (AS3, AS1) GL GL VW V VW W GLVW GL T1: P (AS2, AS1) /HYHO GHSWK AS4 IDQRXW IDQRXW IDQRXW IDQRXW G ; T4: P (AS4, AS3, AS1) GLVW /HYHO T2: P (AS4, AS2, AS1) GHSWK AS5 IDQRXW IDQRXW Fig. 3: Example: Fraction of pre- Fig. 1: Update propagation. Fig. 2: Example: prefix vs. forest. fixes updates in session reset. 3. BGP WORKLOAD INGREDIENTS case of an internal BGP (IBGP) session two things may hap- The goal of this Section is to identify essential components pen:a sizeable set of prefixes experiences attribute changes of BGP traffic that will help us build a workload model for and these changes are propagated to EBGP or a small num- BGP traffic. With BGP dynamics one typically refers to a ber of prefixes suffer from the effects. This depends on the series of updates caused by routing changes. Accordingly internal structure of the network and the IBGP/IGP (In- we need a notion that captures the cause of routing insta- terior Gateway Protocol) configuration. For backbones of bilities and their effects:the BGP convergence process [18, tier-1 providers the latter should affect no2 prefix or a small 34, 35] that they create and the changes in the BGP table number of prefixes3. If a sizeable fraction of the prefixes are that they impose. To understand the changes we need a affected the session AS is the creator, otherwise we treat baseline that captures the basic structure/hierarchy of the the updates as prefix additions/deletions or prefix policy prefixes and their attributes1 in the BGP table. In general changes discussed later. the set of AS path attributes reflects the interconnectivity The second case equals the first one since historically of the various ASes and their peering policies. But from changes to the BGP session parameters and the local fil- the viewpoint of a single BGP peering session the AS path tering policies did not take effect until after the BGP ses- is the ingredient that captures correlations between routing sion has been cleared using a hard reset. Using soft resets updates for different prefixes as well as for the same prefix. it is not necessary to tear down the BGP session. Yet if Other attributes such as MED and communities reflect the the parameter changes affect a sizeable set of the prefixes external policy of the peering AS. using this peering session, a sizeable set of updates will be We propose to capture the cause of a routing instability by created by the two peering ASes or the single AS. Therefore the notion of an instability creator and the temporal charac- this case is not distinguishable from the previous one and teristic of the resulting sequence of updates by the notion of the two peering ASes are or the AS or the prefixes are an instability burst. To capture the BGP table we propose considered to be the instability creators. If only a small set to view the set of prefixes as nodes in a graph, where edges of the prefixes is affected we treat the created instabilities capture the structure (the nesting) of the prefixes. Cor- as prefix policy changes. respondingly we propose to view the prefixes in the BGP A link failure/repair of a peering link more or less im- table via a prefix forest. Instead of modeling the AS topol- plies that the corresponding BGP session is torn down/re- ogy [36, 37] and their peering policies [38, 39, 40] explicitly established. Therefore we do not need to consider this case we propose to focus on AS path properties as seen via a sin- separately. A link failure/repair of a backbone link is dif- gle peering session. After all we are looking for a workload ferent. It does not cause an IBGP session reset but might model that can stimulate a system under test/router or a change some IGP path cost and therefore it can create some simulation but not a full BGP simulation/emulation as for number of updates. But most of the time only a small set example provided by SSFNet [41]. To capture the correla- of the prefixes is affected and we again treat this as a prefix tions within an instability burst we focus on the attribute policy change. If an access link fails only prefixes connected changes between updates for the same prefix. via the access link will be affected. Most of the time this will be a small number and therefore we can again tread this Instability creator as prefix addition/deletion or policy change. Routing instabilities can be caused by an AS or a prefix in In the last two cases the instability creator is the prefix one of the following ways: rather than any AS. The distinction between these two cases is that additions and deletions of network prefixes occur only * BGP session establishment/teardown/reset at the originating AS, while policy changes, e.g., changes to * BGP session parameter change, including local filter- the AS path, the communities, etc., can happen anywhere ing policy changes as well as misconfigurations along the AS path. * link failure/repair In summary, we consider an AS or two ASes as the insta- * addition/deletion of network prefixes bility creator if some sizeable fraction of the prefixes using * prefix policy changes. this AS are involved in updates within a reasonable short time period. Otherwise the prefix itself is responsible for In the first case it is easy to identify the instability cre- the instability. In this way we are able to capture the cor- ator/creators. The two peering ASes are the instability creators if an external BGP (EBGP) session is struck. In 2For example if IBGP with route reflectors is run on all backbone routers and each router is peering with at least 1Typical BGP attributes are the AS path, MED, communi- two route reflectors. ties, etc. 3For example with a fully meshed IBGP configuration. updates interarrival time attributes changes update interarrival time # of originating routes bursts duration # of transiting routes # of updates routes within the IP address range AS path length possible interarrival time prefix length # of unique ASes on AS path session duration depth # of duplicate ASes on AS path resets # of updates fanout position duplicate ASes on AS path # of prefixes distance distance of ASes to peer Tab. 1: Metrics: BGP updates. Tab. 2: Metrics: Routing table. Tab. 3: Metrics: Attributes. relations between updates for different BGP prefixes. Note subset of the address space. A prefix S (son) is a child of that human misconfigurations of BGP [43] can be expressed another prefix F (father) if no other prefix exists that covers either as instabilities caused by an AS or by a prefix. This a larger address than S but a smaller address space than F. only depends on the number of affected prefixes. A possible root node for all prefixes corresponds to the full address space (the default route:0.0.0.0/0). If the default Instability bursts route is present the table corresponds to a tree, otherwise An instability creator may generate several instability events. to a forest. Figure 2 shows an example of a set of prefixes For example a prefix instability creator associated with a and their forest. flapping link to a single homed customer would create a Once we view prefixes as forest we can use graph termi- withdraw, followed by an announce, followed by a withdraw, nology to describe its properties. The fanout of a prefix is followed by an announce, etc., while an AS instability cre- the fanout of its node in the forest, which is the number of ator associated with a session reset to a single homed cus- its children. Intuitively, the fanout of a prefix specifies how tomer AS will create instability events for all prefixes origi- many prefixes are more specific than a given prefix. The nated by the AS. Each instability event is an update which depth of a prefix is the depth of the corresponding node in may or may not be observable in the measured data. Only the forest, which is the number of ancestors on the path if the AS hop distance is one we can expect to see all insta- from the node to the root of its subtree (including the root bility events. The larger the distance between the creator node). Intuitively the depth of a prefix specifies how often and the observer is the more likely it is that an intermediary this prefix is a more specific prefix of another prefix. The will have an alternative way of reaching the prefix. Accord- distance of two prefixes, whose nodes in the forest are son ingly the intermediary may or may not relay the original and father, is the absolute difference of their prefix mask instability event and the related BGP updates. Therefore lengths. Intuitively the distance specifies how much more the observer will note a chain of updates, called an update specific a prefix is. sequence, that are triggered by the original instability event. The prefix structure in BGP tables reflect address allo- Each observed update sequence captures the set of updates cation, aggregation and traffic engineering policies in the created by the route convergence process and may not con- Internet. These policies have led to dependencies between tain the original instability event update. We refer to the the prefixes which is reflected in the structure of the prefix total resulting set of updates as an update burst. forest. Prefixes in the same subtree of the prefix forest are Ideally we would like to build our workload model around more likely to be correlated in terms of attribute values than instability event updates and update sequences. Unfortu- two random prefixes. Actually, this is one of the reasons why nately distinguishing between instability event updates and the prefix structure influences the memory needed for stor- related updates is an unresolved problem [7]. But a second ing the BGP tables on routers. A better understanding of look reveals that to a system under test it does not matter the prefix structure may lead to better packet lookup and if the update burst is the result of n or m instability events. classification algorithms [45]. What matters is the number of updates it has to handle and the relationship between the updates. Therefore we propose AS path properties to build the workload model around the notion of update The AS path properties that are important for capturing bursts. the correlations between routing instabilities as observed by Using this terminology we can say that each instability a peer include the properties of the ASes themselves and the creator is either generating a single update burst, in case peering policies reflected in the path. The AS properties in- of a prefix, or a set of update bursts, in the case of an AS. clude the distributions of the number of originated prefixes For example we can express BGP protocol divergence [5, 22, and transiting prefixes. These have been shown to be con- 44] for a prefix as a single update burst that lasts for the sistent with heavy-tailed distributions [36, 46, 47]. A small duration of the workload (or until it is fixed) and consists of number of tier-1 ISPs provide transit for a huge number of a large number of updates. prefixes, while a huge number of customers provide transit Prefix forest for none, and a sizeable number of tier-3 ISPs provide transit for some number of customers. Routing updates are applied to and may change the existing In general the peering policies between all ASes deter- routing table. Each routing/forwarding table consists of a mine the AS level topology of the Internet. From the view set of prefixes. In abstract terms prefixes can be viewed as point of a single router this general graph is restricted to an, nodes in a forest. Each node covers a certain address space at least ideally, directed acyclic graph (DAG) of the BGP and a prefix is a descendant of another prefix if it covers a announcements in its routing table (ignoring replicated AS RRC:1-E RRC:1-D new reconvergence 1.2 RRC:1-C duplicate 4-way RRC:1-C RRC:1-B flapping > 4-way RRC:1-D 500,000 RRC:1-A 1.0 RRC:1-E 100 LISP:1-A 0 0.8 LISP:1-B 300,000 08 percent 0.6 number of updates 06 probability density 0.4 0.2 100,000 204 0 0 0.0 01/14/02 01/16/02 01/19/02 01/14/02 01/16/02 01/19/02 1 5 10 50 100 500 5,000 time time distance Fig. 4: # of updates for four peers. Fig. 5: Relative # of updates: Fig. 6: # updates between up- x-way change. dates with same set of attrib.. entries4). From the view point of a workload model we do cific test-lab experiment, e.g., communities and MED, if the not want to include the whole set of policies and the full router under test ignores these values. On the other hand topology. Rather we want to capture the important depen- one might want to study in a test-lab what would happen if dencies. Here we point out an interplay between the number the peer starts to export community and MED attributes. of prefixes that an AS is transiting and its position in the In general we propose to distinguish announcements from DAG. The default policy for a router peering with a tier-1 withdraws, and for consecutive announcements we either ISP and a small local ISP is that a large number of its best note the kind of change or the value change. The distinction routes will be via the tier-1 ISP and only a small number via is necessary in order not to disturb the other abstractions. the small ISP. If the router is only peering with the small For example for changes to the AS path it may not matter ISP or it prefers the small ISP then a huge number of its which AS is added to/deleted from the path. What mat- best routes will be via the small ISP. Instead of incorporat- ters is that the length is increased/decreased/constant. On ing all BGP policies and all BGP peering agreements into the other hand a community encodes a certain policy of the our workload model we propose to "just" incorporate the AS and one might want to keep the meaning of the specific effects as observed by a BGP peer. This implies that we value. are not interested in which AS is peering with which other It is important to consider attribute changes not just be- AS. Rather we are interested in the distance of the ASes tween two consecutive updates but over some number of up- from the peer and the number of originating and transiting dates. This is especially true within update bursts. Overall routes. The latter matters since routes are coalesced at in- we note that the kind and the number of attribute changes termediate routers of the DAG and aggregated routes may capture some aspects of the dynamics of the convergence be added. The richness of the connectivity does not get lost process and therefore the latency of route convergence. An- by considering just the distances either. The fanout of the other point that we need to deal with when considering DAG is captured in the number of ASes at a certain distance. multiple updates for the same prefix is timing. How much In addition the number of nodes at each distance limits the time separates two consecutive updates and how much time number of alternative paths that may be explored by an up- passes until two updates with the same attributes are ob- date for a prefix originated by a distant router during route served. The latter corresponds to the time until the original convergence. Traffic engineering and routing policies are re- route has been restored. To understand how many updates flected in the announced routes, AS replications on the AS are involved before a route gets restored we propose to use path, and other attribute values. the notion of an n-way change. An n-way change refers to In summary, we do not consider the full AS topology. a set of n + 1 consecutive updates, where the last and the Rather we propose to use the following ingredients in our first updates are the first updates with the same attributes BGP workload model:position of ASes in the BGP DAG, values. distribution of transiting/originated prefixes and distribu- Overall attribute changes and update burst are closely tion of AS path replication. related to each other. Update burst capture more of the temporal characteristics while attribute changes capture the Attribute changes structural relationships between updates. Whenever we create an update for some prefix we need to decide if this is a new prefix or which attribute and in what 4. CHARACTERIZATION fashion the attribute is to be changed. Some attributes are almost fixed, e.g., originator, others reflect the policies of the Having discussed the key ingredient of our workload model peer, e.g., communities and multi exit discriminator (MED), we need to understand the probability distributions needed others reflect the convergence process, e.g., AS path and for the workload model. Contrary to other areas, including community. Which attribute changes we want to consider Web [48], telnet [28, 29], etc., the statistical properties of depends on the test-lab scenario that we have in mind. Cer- many of our workload ingredients have not been character- tain test-lab scenarios might imply certain attribute values. ized before. In this paper we characterize both dynamic For example the next-hop attribute is fixed for external BGP BGP traffic (Section 4.2), e.g., attribute changes/update sessions. Other attributes might be uninteresting for a spe- burst/session resets, as well as static BGP tables (Section 4.3, 4.4), e.g., prefix structure, AS path. While the ingredients of 4Replication is a traffic engineering instrument used to make the workload model are derived top-down the characteriza- a path less desirable. tion has to proceed bottom-up. This implies that our anal- 0.6 Path Policy Path Policy Path/Policy Policy/Att Path/Policy Policy/Att consecutive 0.5 new Path/Policy/Att Att Path/Policy/Att Att fapping Path/Att neither Path/Att neither 100 100 0.4 > 2-way and dups 80 80 0.3 percent 60 percent 60 0.2 probability density 40 40 0.1 20 20 0.0 0 0 10 1,000 100,000 01/14/02 01/16/02 01/19/02 01/14/02 01/16/02 01/19/02 interarrival time time time Fig. 7: Interarrival time: x-way Fig. 8: Relative # of updates: Fig. 9: Relative # of updates: change. attribute change. attribute change. ysis starts with the dynamics of the updates, then moves influences the frequency as well as the kinds of observed up- to sets of related updates, followed by the prefix forest, and dates. Third, all peers seem to show an overall similar be- ends with the AS path. havior in terms of update rates, except during major peaks. 4.1 Data sets Fourth, we observe few withdraws. Fifth, most announce- ments change some BGP attribute, especially after session Our characterization work is based on raw external BGP resets with the collector have been removed. This confirms routing table dumps and update traces that we obtain from that the number of pathological instabilities (Labovitz et Ripe [49], SaarGate, a local ISP [50], Routeviews [51] and al. [14]) has been reduced substantially. Merit [52]. Throughout this section we only present re- While consecutive duplications are not all that common, sults in an exemplary fashion for the following raw data duplicate updates (same set of attributes) are rather fre- sets. BGP update traces: RRC:1 refers to the trace quent, due to flapping, etc. A flapping prefix using some from RIPE's Remote Route Collector (RRC00) [49] (from route may cause the announcement of a new route, followed 01/14/02, 1am to 01/20/02, 1:10am). It consists of 577 by the old route, followed by the new route, . . . . To char- BGP update files with 8, 442, 000 updates from 13 differ- acterize the process of convergence we want to know how ent peering session, including Tier 1 ISPs and major Euro- many updates appear before the same update is repeated. pean ISPs. LISP:1 was gathered via two peering sessions To capture this we propose to use the concept of new, dupli- with a local ISP, SaarGate (12/23/01, 10:05pm to 12/28/01, cate, flapping, re-convergence, and n-way change. Based on 1:05am). It contains roughly 959, 000 updates in 794 update the count of intermediary updates we call an update a new files from two peering sessions. A lower bound on the num- change, if this is the first time an update for a prefix with ber of missing updates is estimated by applying the updates this set of attributes is seen. It is a duplicate, if it is the same to the starting BGP routing table and computing the differ- update as the previous update for this prefix. An update is ences between the resulting and the final routing table. We a flapping change, if there is one update in between two up- found 44 (4446) missing updates in LISP:1 (RRC:1). BGP dates with the same set of attributes. It is a re-convergent routing table dumps: RRC:2 refers to all BGP table change, if there are two updates in between and an n-way dumps, every 8 hours, from RIPE's RRC00 (from 12/31/01, change, if there are n-1 updates in between. Figure 5 plots 11:36pm to 01/08/02, 3:50pm). LISP:2 refers to all BGP the stacked relative distribution of updates over time table dumps from the local ISP during the same time pe- according to this classification for data set RRC:1 and peer riod. Depending on the peer a BGP routing table consists of C, starting from new at the bottom to > 4-way at the top. 90, 300 up to 109, 200 entries. To eliminate artifacts due to We first note that even during the later parts of the week routing table fluctuations we calculate all table statistics for new attribute sets are introduced. Even though we observe each table and then consider the mean over all table dumps. only 0.4% new prefixes, 20.8% new attribute sets are intro- 4.2 BGP updates duced. While we observe a trend towards a smaller number of new changes, it is apparent that during certain time peri- The purpose of this section is to characterize BGP dy- ods spikes of new attribute sets are observed. During most namics. Accordingly we start with the relationship between time periods we observe few duplicates, and a large fraction two updates for the same prefix then move onward to update of flapping prefixes (29.7% of the total). The fraction of up- bursts and finally propose a method for identifying session dates that observe more than 2 updates in between is rather resets. In summary, we characterize BGP updates with re- substantial (33.1% of the total). spect to the metrics shown in Table 1. Figure 6 shows the density of the logarithm5 of the BGP updates: Figure 4 shows the number of updates number of updates in between reoccurrence of the for each two-hour period for four peers for the week long same updates for peers C, D, E and A,B of the RRC:1 and data set RRC:1: First, some events, such as a failure of the the LISP:1 data set. The fact that the maximum number collection machine, e.g., the first large spike, relative to time, of updates in between is larger than 1, 000 and that we need can create updates that effect all peers. Other events can to plot this on a logarithmic scale is in itself rather amaz- create a large number of updates that influence some subset ing. Overall this indicates that at times the time for routing of the peers. To eliminate artifacts due to errors in the data collection process we eliminated all updates caused by re- 5Coupled with a logarithmic scale on the x-axis, plotting sets of sessions with the collector. Second, peers experience the density of the logarithm of the data facilitates direct different numbers of updates during the same time period. comparisons between different parts of the graphs based on This indicates that local peering policies and peer location the area under the curve. 0.30 RRC:1-C 0.5 follows attrib change RRC:1-C RRC:1-D RRC:1-E 0.25 follows no change -10-10 RRC:1-E 0.4 follows failure/repair LISP:1-B 0.20 -2-2 0.3 0.15 -3-3 0.2 probability density 0.10 probability density -4-4 log10(P[ duration > u]) 0.1 0.05 -5-5 0.00 0.0 10 1,000 100,000 5,000 20,000 50,000 200,000 500,000 11 10 10 100 100 1,000 1,000 10,000 seconds seconds u Fig. 10: Duration of bursts. Fig. 11: Burst interarrival time. Fig. 12: updates per bursts. convergence is quite long. This is consistent with the exper- (31.2%), or shorter (33.5%) path lengths (for peer C). The iments from Labovitz et al. [17]. This is also confirmed by fact that most updates contain larger or equal path length the distribution of the interarrival times of routing updates. indicates that convergence after failures and routing changes Figure 7 shows the density distribution of the loga- are dominating the update process. rithm of the time between an update and the previ- ous update as well as the interarrival time of updates with BGP update burst: After characterizing individual up- the same set of attributes for peer C of data set RRC:1. dates within their context we now move to understanding From the density for consecutive updates we can see that the the correlations between updates. Accordingly we group interarrival times of most updates is rather small, roughly 30 updates for each prefix into update bursts in the same way seconds which corresponds to a typical setting of the Min- as one groups packets into flows. If a peer sends two updates Route Advertisement Interval timer [31]. Indeed 53.1% of for the same prefix within a short time window, defined via all interarrival times are smaller than 60 seconds but big- a timeout, they are considered to be part of the same up- ger than 20 seconds, indicating that the MinAS Origination date burst. Based on the results by Varghese et al. [35], we Interval timer, with a typical value of 15 seconds, is not use a timeout value of a bit larger than one hour (4000s). influencing the spacing of updates. The probability curve Our motivation is that each update burst should summarize flattens as the interarrival time reaches values of around 15 all updates caused by one or multiple instability events and minutes, indicating that the time for most routing changes therefore capture the BGP convergence process. to take effect (except for route flap damping) is less than 15 We are interested in understanding the characteristics of minutes. Effects of route flap damping can be seen at 3 - 10 update bursts such as arrival process, duration, number of minutes (first stage) and 30min to 1 hour later stages. Inter- updates. Correspondingly Figures 10, 11 show the den- estingly more than 11.8% of all interarrival times are larger sity of the logarithm of the duration and interarrival than 12 hours, an indication for changes that remain stable. times of update bursts for peers C/D/E (B) of data sets The interarrival times between new updates and their pre- RRC:1 (LISP:1). While a specific timeout value changes vious update is quite different from the distribution for con- the curves, we found that the general characteristics do not secutive updates. Many of the new updates are stable up- change. Route flap damping explains the various spikes of dates and fewer have interarrival times less than 60 seconds the different peers at 10, 15, 30 minutes. Since each peer is and they are less likely to suffer from damping. The differ- free to use its own parameters for their routers the values can ence between the interarrival times of flapping prefixes and differ quite a bit and are biased by the last peer along the consecutive prefixes is that interarrival times of flaps are path. Yet larger damping values will prevail and accordingly much more likely to be in the 1 - 10, 30 - 60 minute range, its not surprising that all peers have a spike at 30 minutes. indicating that they are more likely to be subject to route Surprisingly the median duration of update bursts is rather flap damping. small with 113 (87) seconds for RRC:1-C (LISP:1-B) and The next question is what is changed by an update. We the 90% and 95% quantiles are less than 13.5 (15) and 19.5 distinguish between changes to the AS path and changes to (23.2) minutes. But the maximum durations are surpris- other attributes, such as community, the later are denoted as ingly large ­ they span the full trace duration. The only "Attr". Changes to the AS path that are only due to policy possible explanation is that some prefixes are constantly ex- considerations, e.g., duplication of ASes on the path, are periencing updates. called "Policy" changes; other changes to the AS path are While the distribution of the durations of the individual "Path" changes. Figures 8, 9 show the stacked relative update bursts is consistent with heavy-tailed distributions distribution of updates over time for peer C, peer D the distribution of the interarrival times is not. Figure 11 of data set RRC:1. We observe that most updates cause plots the probability density of the logarithm of the changes to the AS path. For some peers, e.g., peer D, most interarrival time distribution for peer C of RRC:1. To updates involve only the AS path. For other peers, e.g., find such interarrival times we need at least two update peer C, combinations of path changes and attribute changes bursts per prefix. Observing multiple update bursts indi- explain most updates. This depends on the policy of the cates that multiple instability events occurred that each lead peer. If the peer announces communities or other attributes, to a "stable" route. To understand how these instability that can be used to influence routing policy decisions, as in events are related we distinguish three cases: no change ­ the case of peer C, attribute changes are more prevalent, both bursts converged to an update with the same attributes as if the peer does not announce such attributes. Changes (65.4%), failure/repair ­ one burst ended with a withdraw to the AS path can either result in longer (35.3%), equal and the other with an announcement (12.4%), attribute change ­ the bursts end with announcements with different 0.14 LISP:1-A originating RRC:1-A RRC:1-A LISP:1-A transiting LISP:1-A LISP:1-A RRC:1-A originating 0.5 0.25 RRC:1-A transiting 0.10 0.4 0.20 0.3 0.15 0.06 probability probability 0.2 probability density 0.10 0.1 0.02 0.05 0.0 0.0 0.0 0 20 40 60 80 100 1 10 100 1,000 100 1,000 10,000 100,000 1,000,000 fraction of changed ASes duration interarrival time Fig. 13: % prefixes with updates Fig. 14: Duration of session resets. Fig. 15: Session reset interarr. time. attributes (22.2%). The fact that a lot of consecutive update inating and transiting the ASes. Presumably the richer the bursts end with the same update indicates that most pre- peering connectivity of an AS is, the smaller is its reliance on fixes, even if they experience an instability event, converge to any single BGP peering session and therefore the number of a main route. Note that this indicates that most instabilities best path changes. Therefore we presume that the fraction last only for a short time period and are contained within of prefixes with routing updates decreases as the distance an update burst, which validates our methodology. Some between the AS and the measurement point increases since failures cannot be repaired automatically and therefore in- the connectivity increases with the distance. Depending on volve two update bursts (captured by failure/repair). Here the policy (export of IGP metrics via communities or MEDs) we cannot expect certain prevalent time periods which is of the ISP a session reset of an IBGP session may not change supported by the interarrival time distribution. For update the best BGP path or may change a substantial fraction of bursts that result in different attribute values we notice two the best paths. This either causes only a small number of spikes. Both spikes contain a large percentage of updates updates or a sizeable number of updates. For example in the with an AS path of the same length. But the relative frac- example shown in Figure 3 an IBGP session reset will cause tion of updates with longer paths is bigger in the first spike no updates as long as no advanced BGP features are used. than in the second one. This is not surprising either since In summary, we presume that most session resets result in after a failure one can expect an alternative route announce- routing updates for a significant fraction of the prefixes of ment. Interesting features are the peaks in the distributions the AS/ASes involved in the reseted BGP session. at time periods, 10-14 hours and 1-2 days, known from hu- To understand the magnitude of these fractions we want man behavior periods. The interarrival time distribution is to compute, for each peer and for each AS, the fraction of not consistent with a heavy-tailed distribution. In contrast, updated prefixes associated with this AS within some small, the distribution of the number of updates within each appropriately chosen, time period. We approximate this update burst is consistent with heavy-tailed distributions. value by computing, for each update and for each AS on the (Figure 12 plots the complementary cumulative distribution AS path, which fraction of the prefixes transiting/originated (ccdf) of the number of updates in a update burst.) Some by that AS has changed during a window of plus/minus 3 bursts contain more than 10, 000 updates and some bursts minutes relative to the update timestamp. Averaged over have more than 1, 000 different updates. Furthermore the 5 minute periods this approximates the fraction of changed duration of a burst is strongly correlated to the number of originating/transiting prefixes for each AS. Figure 13 plots updates (correlation > 0.92). the density of these fractions, in percentages, for peer Possible session resets: In Section 3 we argued that an C (A) of data sets RRC:1 (LISP:1). For originating ASes instability creator affects either a single prefix or many pre- the distribution is rather bimodal, either close to 80% or fixes from the same AS. Indeed we argued that in the latter smaller than 20%. Therefore we only call events, where at case the updates are or look like session resets. This also is least 80% of the prefixes originating at an AS saw updates, confirmed by the presumption that a significant part of the a possible session reset, in accordance with the above rea- routing table updates are results of BGP session resets [21]. soning. Due to the high degree of peering at most transit A reset causes the two involved peers to update their BGP ASes, each transit AS session reset is bound to only affect routing table and select a new best path for each prefix. If a much smaller fraction of the prefixes. This is reflected in this results in a new "best path"/"no path" for a prefix, an the distribution which shows small peaks around 1/4, 1/3, announcement/withdraw has to be sent to all other peers. 2/3. Given the shape of the distribution we only identify A session reset on an access link connecting ISP A and an update as belonging to a possible session reset, if at least B implies that all prefixes from A are now unreachable or 20% of the prefixes associated with a transit AS experienced reachable via some other AS. A session reset of the only updates. peering link between two ISPs C and D implies updates for After identifying which updates are part of a possible ses- all prefixes that uses the ASes C and D on their AS path. sion reset, we group the updates into session resets in the But most ISPs peer at more than one location, see Figure 3. same way that we grouped updates into update bursts. We This implies that we will only see updates for prefixes whose explicitly choose a much smaller timeout value, 90 seconds, best path included the link with the failed BGP session but for session resets than for update bursts, since session resets not for all prefixes. Assuming that prefixes are evenly dis- are events localized in time. Experiments have shown that tributed across peering links, this explanation predicts up- the results are not sensitive to this specific value. Figures 14 dates at fractions such as 1/4, 1/3, 2/3 of the prefixes orig- and 15 display the density plots of the logarithm of 0.6 Class A Class A CIDR ClassA 0.25 CIDR ClassA Class B Class B 0.5 Class C/CIDR 5,000 Class C/CIDR 0.20 0.4 0.15 # routes 0.3 3,000 probability 0.10 0.2 probability density 0.1 0.05 1,000 0 0.0 0.0 1/8 50/8 100/8 150/8 200/8 250/8 0 1 2 3 4 5 0 5 10 15 20 25 network prefix depth distance Fig. 16: # of routes per IP ad- Fig. 17: Prefix depth. Fig. 18: Prefix distances. dress range. the duration and interarrival times of session resets rates of BGP routing tables, neither are we analyzing the for the peer A (A) of data sets RRC:1 (LISP:1). Surpris- long-term churn of BGP routing tables [56]. ingly some possible session resets last for a long time (> 30 To highlight the dependency of the routing table on the minutes). Note that due to the 90 second timeout this im- history of address allocation policies, Figure 16 plots how plies a continuous sequence of updates. Such updates are many prefixes exist with the same first octet of the likely caused by persistently flapping interfaces not subject IP address. Intuitively this represents the usage of address to route flap damping. The fact that quite a large fraction space within each /8 prefix6. Looking back to the days of (18.4%) of the session resets lasts longer than 90 seconds classful routing [53] class A networks should only announce (second spike), indicates that routes are propagated along a single prefix, class B networks should announce a maxi- different paths. Some follow one AS path, others follow an- mum of 254 prefixes, and class C networks may announce other longer or shorter path. Along the path they are sub- up to 64K prefixes for each first byte of the IP address. This ject to the Min-Route Advertisement Interval. This explains is clearly not the case today. For example within the former session reset durations of 90 seconds. The interarrival time class B address range we usually observe between 100 to 254 distributions show the effect of synchronization due to route prefixes. Still 23 of 45 groups within the same /8 prefixes flap damping (different route flap damping parameters are announced slightly larger number of prefixes. The variabil- in use by different ASes). After a session reset some routes ity in terms of announced prefixes within each /8 is much may be subject to damping. After some time they all will larger in the former class C address range and extreme in be eligible for new updates again. To us, the observer, these the class A/A-CIDR range. While only 41 of the 126 pos- later updates look like another session reset. Therefore it sible class A networks did announce any prefixes, there are should be noted that we only identify candidate session re- some that announce a lot of more specific prefixes, e.g., 12/8 sets. We are neither able to capture all session resets, nor (AT&T), announces over 650 more specific prefixes, or 24/8 are all captured ones actual session resets. But manual vali- (designated for data-over-cable networks), announces up to dation shows that the methodology is promising. Again the 1, 970 prefixes. Some of these more specific prefixes are used interarrival time distribution is neither consistent with an to implement the CIDR allocation strategies, others appear exponential distribution (too many dependencies), nor with to be used for routing policies such as multi-homing, traf- a heavy-tailed one. On the other hand the number of up- fic splitting/sharing, load balancing, etc. Due to the large dates within a possible session reset is, just as the number differences we study the forest metrics separately for each of (unique) updates within a update burst, consistent with address range:class A, B, C and A-CIDR. a heavy-tailed distribution, although maybe not always a Figures 17, 18 confirm this decision. The plots show the Pareto distribution. density of the depth and the distances of the pre- fixes separated according to address ranges. The depth 4.3 BGP prefix forest of the prefixes (see Figure 17) reflects how many holes are punched into the address space, i.e., how often one address The structure of the prefixes in the BGP tables reflects the block is more specific than another one. Surprisingly, we history of address allocation policies in the Internet. This have observed that more than 84 prefixes have depth 4 - 5, policy has led to dependencies between the prefixes which a rather large number. Just because a prefix is at depth 4 is reflected in the structure of the prefix forest. We there- or 5 does not imply that it is a point-to-point link (/30) or fore analyze the BGP tables according to the properties of a host route (/32). Ironically no /30 or /32 can be found at the prefix forest:fanout, depth and distances (for defini- depth 4 or 5 (indeed over 80% are nested exactly 1 time). tions of these terms see Section 3). To capture the history Rather we observe that for prefixes with large depth the dis- of classful address allocation we analyze the difference be- tances between the specific routes are rather small, resulting tween the usage of the address space [53] of class A, B, C in progressions of, e.g., /17, /19, /20, /21, /22. This sug- and CIDR blocks. Furthermore we consider the relationship gests that this technique is used for traffic engineering and between prefix length and distance in the prefix forest. In multi-homing. Punching holes is most extreme in the class summary, we characterize routing tables with respect to the C address block. metrics shown in Table 2. The forest metrics are crucial if The distribution of prefix lengths is as expected. Less we want to study how routing instabilities influence forward- ing performance (see [45, 54]). In contrast to the work by 6Note, that not all prefixes with a mask of /8 need to be Huston [55] we are not analyzing the reasons of the growth present in the routing table. 0.30 Depth 1 0.30 Depth 1 Depth 0 Depth 2 Depth 2 Depth 1 0.25 Depth 3 Depth 3 Depth 2 0.25 0.3 Depth 3 Depth 4 0.20 0.20 0.2 0.15 0.15 probability density 0.10 probability density 0.10 probability density 0.1 0.05 0.05 0.0 0.0 0.0 0 5 10 15 20 25 0 5 10 15 20 25 / 5 / 10 / 15 / 20 / 25 / 30 distance distance length Fig. 19: Prefix distances (class A). Fig. 20: Prefix distances (class B). Fig. 21: Prefix length (A-CIDR). than 3% of the prefixes are larger than 24 and more than of sub-netting within class A/B networks. 54% of the prefixes have length 24. Prefixes from class While the dependencies between the metrics, especially B are mostly /16s or /24s with some prefix length in be- on the depth of a prefix can be significant, e.g., for the tween. Class B is the only address block where another fanout, each metric behaves differently. For example Fig- prefix length, in this case /16, is more prevalent than /24. ure 21 shows the density of the prefix length for class For class C we observe mostly /24s with some number of A-CIDR for each depth. Here we observe the artifacts /19s and /20s and a smaller number of prefixes between 20 of the class A network at depth 0. But the prefix length and 24. Class A and A-CIDR (e.g., see Figure 21) prefixes distributions at other depths are very similar to each other. have relatively speaking the smallest number of /24s and the Actually, it is remarkable how little difference there is for largest number of /19s, /20s and /21s-23s. For example the depth 1 - 4 given the differences in the prefix distance dis- peak at /19 reflects a common filter policy, some ISPs filter tributions. prefixes with more specific prefix mask than /19. This forces others to allocate at least a /19 address range to be visible 4.4 AS path in the whole Internet. Instead of trying to understand the AS-graph level In- The characteristics of the prefix length and the depth of ternet topology, we need to characterize the AS path with a prefix significantly influences the distribution of the dis- respect to the ingredients for our BGP workload model. Ac- tances between prefixes. Figure 18 shows the density of cordingly we consider the AS-graph from the view point of the distributions of the distances for the various ad- the peer, a BGP DAG, and capture the position of each AS dress blocks. As expected the distance 8 dominates the by the distribution of the distances of ASes from the peer. distribution with more than 20% of the total. While overall In addition we need to consider the number of originated distances of less than 8 are very frequent, 1, 2, 3 with roughly and transiting prefixes7. Furthermore we explore some of 5% and 4, 5, 6, 7 with roughly 8% each, are the most promi- the basic characteristics of the AS path, e.g., the length of nent distances for class A (class A-CIDR) prefixes are 16 (12 the AS path, the number of unique ASes on the AS path, the and 14) reflecting the different policies. positions of replicated AS on the AS path, and the number With regards to the fanout we have observed (not shown) of replicated AS entries. This results in a characterization that A-CIDR has the largest fanout followed by class C and of the AS path according to the metrics shown in Table 3. A. Class B prefixes in general have smaller fanout than other Since session resets are a prevalent reason for routing up- prefixes. Overall the tails of the fanout distributions are dates we need to understand how many prefixes are transit- consistent with heavy-tailed distributions such as the Pareto ing or are originated by an AS. Figure 22 plots8 the den- distribution. This indicates that the density distribution sity of the logarithm of the number of prefixes orig- might be biased by a few providers using a large fanout. inated/transiting an AS for the data sets RRC:2 and Naturally, each of the forest metrics does not just depend LISP:2. It is not surprising that the distributions from dif- on the address block of the prefix, but also on the depth of ferent peers are almost equal, because default-free routing the prefix in the tree. For example, Figures 19 and 20, show tables contain most of the prefixes of the Internet. A much the density of the distance of the class A and class larger number of ASes are providing connectivity to only a B prefixes for each prefix depth. While the distribu- very small number of prefixes. Well connected peers, closer tions for the prefixes at depth 3 are similar, the distribution to the center of the topology, provide connectivity for a lot of distances for depth 1 and 2 are quite distinct. For both of other prefixes and ASes. This is reflected in the tails of classes depth 1 is dominated by /16 or rather /8, which the distributions which are consistent with heavy-tailed dis- brings us to /24 prefixes, the most specific prefixes that are tributions (not shown). Still, for a workload ingredient, the allowed by most ISPs. The differences are most likely re- shape of the body of the distribution is at least as important sults of folks with class A prefixes taking more advantage of as the tail of the distribution. Note that the distribution of the available address space and using intermediary aggrega- the transiting ASes reflects the coalescing of AS paths at tion levels. This is reflected by Figure 17 which shows that intermediary routers. prefixes in class A are more likely to be at higher depth. As argued above we do not care about the exact sequence At the beginning the results for class B and depth 2 seem counterintuitive. But the peek at distance 11 is the result 7A prefix is transiting an AS if an entry/update for this of classless routing even within the class B address block. prefix uses this AS on its AS path. 8 Otherwise the distribution at depth 2 reflects the flexibility The dip in the originating curve is due to discretization effects. 0.6 RRC:2 transiting RRC:2-B AS path RRC:2 originating 0.5 unique AS path 0.5 LISP:2 transiting RRC:2-C 50,000 Replications AS path LISP:2 originating RRC:2-D Position replications 0.4 RRC:2-A 0.4 0.3 0.3 count 30,000 0.2 0.2 probability density probability density 0.1 0.1 10,000 0.0 0.0 0 1 10 100 1,000 10,000 100,000 2 4 6 8 10 0 5 10 15 20 number of ASes number of prefixes number of ASes Fig. 22: Distribution of prefixes per Fig. 23: Distance of AS from Fig. 24: Length of AS path. AS. peering point. of ASes on the AS path, only about locating an AS at ap- different AS path than their parent prefix (e.g., multihomed proximately the right distance from the peer. (Note the ASes with other upstream ISPs than the one responsible for distance influences the routing update latencies via interme- their address space or an AS that has switched from one diary ASes, the richness of the connectivity, etc.) Therefore provider to another while keeping its address space). De- we, for every AS, calculate the mean distance of this AS pending on the routing table data structure such common to the BGP peer in numbers of unique AS hops on the AS parts can be used to optimize memory usage. This in turn path of the prefix9. While this distance does not have to be explains some of the dependencies between router memory non-ambiguous, we observed, that it is non-ambiguous for requirements and prefix forest characteristics. 85% of the ASes. The standard deviation of the ones that are ambiguous is usually less than 0.5 and the standard de- 4.5 Summary viation relative to all ASes is less than 0.01, a rather small Our results confirm that it is possible to characterize the value. Figure 23 plots the relative frequency of the dis- proposed ingredients of a workload model via empirical prob- tances for all ASes for four of the data sets RRC:2. For ability distributions according to the metrics outlined in Ta- some peers the curves appear to be shifted by one or two. bles 1, 2 and 3. We find that some of these cannot be easily For others the distributions are more bell-shaped. This in- captured by simple, one or two parameter, distributions. dicates that connectivity characteristics depend very much Therefore, at least for the moment, we propose to either on the location of the peering point in the Internet hierar- rely on experimentally derived probability distributions or chy. This is due to the different distances to the "core of the manually edited probability distributions to instantiate the Internet", e.g., tier-1 providers. If a tier-1 provider is reach- workload. In addition, we have identified some important able within a short distance then most of the ASes will be dependencies between the ingredients, e.g., the fanout distri- reachable within a slightly larger distance due to the huge bution is dependent on the depth in the prefix tree. In such connectivity of tier-1s [57, 38]. a case we propose to use combined probability distributions In terms of general characteristics we find that while most for both. paths are short (93.5% are less than 6 AS hops) some are sizeable (0.75% are greater than 10 AS hops). Figures 24 5. RTG: ROUTING TABLE GENERATOR shows the histogram of the number of ASes on the AS path. Eliminating replicated ASes from the AS path Having identified and characterized key ingredients of BGP reduces the average AS path length from 3.5 to 3.2 affecting traffic we can, in this Section, turn to a proposal for generat- 10.5% of the AS paths. The median length of the replicas ing realistic BGP traffic and its prototype implementation, is 3. There are many short replications of 2 or 3 replicas RTG. The main idea is the possibility of generating rout- but also some rather long ones with 8, 10, or 11 replications. ing updates off-line, storing them in a file, and then feeding The position of the replications on the AS path is rather them to the system, e.g., a router under test, using a simple early (not shown). Almost all duplications appear between program that is capable of maintaining BGP sessions and position 2 and 5, indicating that this instrument is mainly sending BGP updates (see Figure 25). Accordingly the tool used in the center of the Internet. On the other hand most consists of two independent pieces:(a) a routing table gen- replications appear closer to the originator of the prefix in- erator (RTG) which generates routing tables and updates and dicating that the instrument is applied close to the edge of (b) sbgp from the Merit toolkit MRTd [58] for feeding the the network. updates to the system under test. Each entry generated by To understand the correlation between the location of a RTG is characterized by a timestamp, the originating peer, prefix in the prefix forest and its AS path we consider the the prefix and its attributes. The timestamp specifies when similarities between the AS paths of parent and sons. We the update is supposed to be issued by sbgp. find that 26.5% of all nested prefixes have the same AS RTG itself consists of three parts. The first part is re- paths as their fathers. Furthermore 20.6% of the AS paths sponsible for building a routing table and is parameterized of nested prefixes just contain additional ASes. For example in terms of size of routing table and characteristics (pre- this happens if the AS number of a multihomed AS is added fix length distribution, depth and fanout via configuration to the AS path. But 52.9% of all nested prefixes have a files). The second part of RTG associates each prefix of the routing table with a set of attributes. (This process is again 9Note, that replicated ASes are eliminated from the AS driven by configuration files.) The output of the first two path, before this metric is calculated. parts is used to instantiate the initial routing table. The DUT starting at the specified time. Each update burst is real- MRTd MRTd Peer A Peer B ized by selecting the number of involved updates and the update interarrival time according to the appropriate prob- ability distribution or it may be taken from a trace. Next Output File R TG Output File the updates are spaced within the available time frame. To Peer A Peer B instantiate a single update the attribute changes have to be selected. This is done according to the observed attribute changes:AS path changes, policy changes, or other attribute Fig. 25: Example: RTG scenario. changes. RTG advantages: RTGs approach differs from black-box ap- third part of RTG is responsible for generating the actual proaches that just replay a trace, in that it captures the routing updates and is again driven by configuration files. structure of BGP traffic. It is highly configurable, parame- RTG prefix structure: The table generation piece is emu- terizable and scalable and therefore allows insights into the lating the address allocation strategy in the Internet. The reasons behind certain behaviors. Through its flexibility and instantiation of the routing table proceeds top down, from parameterization RTG enables us to study the sensibilities of the root of each tree in the forest to the leaves. The gen- test systems in a repeatable and consistent manner while eration process starts by picking how many trees should be still providing the possibility of capturing the different char- generated. The process of generating a tree starts by picking acteristics from different vantage points in the network. On a prefix length l, followed by a network prefix of this length the other hand the ability of scaling a trace and/or adapting P/l. The requirement for the prefix is that it does not con- it to a different scenario is rather limited. Traces taken from flict with any previously chosen prefixes. This means that different vantage points in the network can have significantly P/l is a more specific prefix only within the current subtree. different characteristics and may or may not be appropriate The subtrees of P/l are generated recursively. The differ- for certain tests [23]. In addition traces are usually consid- ence is that the new prefix lengths have to be larger than ered proprietary. But tools, such as RTG, can be used by l and that the address range is limited by P/l. The prefix people at different locations and companies, that normally length and the fanout are chosen according to various empir- could not share data to normalize their workloads [26]. In ical probability distributions for each level. At the first level addition it allows us to take one characteristic from one trace the prefix is selected according to the distribution of routes and other characteristics from another trace. within the former classful address ranges and uniformly at Just consider two examples where RTG may help us:packet random for all other levels. classification performance and interactions of routing up- RTG attributes: Once a prefix has been chosen RTG selects dates and classification. Having a realistic routing table is attributes for it. The degree of choice depends on the at- crucial for the performance evaluation of the classification al- tribute. For some there is none since it only depends on gorithm, e.g., to determine the memory requirements. Also the physical test-bed setup, others are selected according the performance of some classification algorithms depends to empirical distributions, e.g., community, while another on the depth of a prefix in the tree. Each routing update attribute, the AS path, requires more care. The AS path has the potential to change the routing table and therefore attribute is essential for building the updates, since it ap- interact with the forwarding performance. How many up- proximates the underlying topology. We first generate an dates and what kind of updates are necessary to break a AS path pattern which specifies the length of the path, the router depends on the structure of the updates and the size number of duplicated ASes on the path, and the location of the table [20]. RTG allows us to explore this relationship of these duplications. This is constructed out of the proba- in a defensible fashion, instead of relying on the traces to bility distributions for the AS path length for the peer, the contain the necessary event sequences. number of duplications and the position of duplicates. In a next step this AS path pattern is filled with ASes based on 6. VALIDATION a probability distribution (transiting and originating ASes The goal of the workload generator should be to generate are constructed separately). If the prefix is nested within synthetic traffic that reflects the real workload as accurately another the attribute values are either copied, modified, or as possible. Therefore its validation is crucial for interpret- newly constructed, based on a probability distribution spec- ing the results derived from using the workloads. Ideally the ified in the configuration file. validation of the workload model is not just a verification of RTG updates: The various kinds of routing updates are gen- the parameters of the workload but also a demonstration erated in two steps. First an event log specifies which events that the performance of a system subjected to the work- are created by the instability creators. Possible events for an load is similar to the performance of the system under a AS instability creator are BGP session resets for some (ran- real workload. This is beyond the scope of this paper and dom) AS, BGP session resets at some (random) AS path will be subject of another paper which examines the behav- distance. Possible events for an prefix instability creator are ior of a router subjected to trace-based and RTG generated update bursts for some (random) prefix, or single changes workloads. for some (random) prefix. Note that this file can be auto- Our trace analysis has shown that BGP traffic has many matically generated based on the distribution of interarrival dependencies that are in part captured by our workload times of session resets and update bursts. Based upon the model. To highlight some of RTGs features we examine, in event log we construct a detailed list of updates. Each ses- an exemplary fashion, if the characteristics of the generated sion reset implies that some fraction (chosen according to a workload are consistent with the characteristics of the mea- probability distribution) of the prefixes (chosen uniformly) sured workload. For this we concentrate on derived mea- of a given or randomly chosen AS experience an update burst sures rather than distributions that are already part of RTGs 0.7 0.15 RRC:2 RTG:0 0.5 0.6 RTG:0 LISP:2 RRC:2-C RRC:2-C RTG:0 LISP:2-A 0.5 random 0.4 0.10 0.4 0.3 0.3 probability density 0.2 probability density probability density 0.05 0.2 0.1 0.1 0.0 0.00 0.0 1 10 100 1,000 10,000 0 2 4 6 8 10 12 14 16 18 20 1 10 100 1,000 10,000 updates per min distance number of prefixes Fig. 26: Distance class C. Fig. 27: Prefix dist. per AS. Fig. 28: Update rate per min.. configuration. We show one example for each of the three notion of a prefix forest. We show how to derive the dis- major components of RTG:BGP prefix forest, AS path, and tributional parameters of RTG from actual BGP tables and BGP updates. updates. Indeed we present an analysis of instability cre- BGP prefix forest: Since distances are not used as pa- ators and instability events involved in the current Internet rameterized distribution in the table construction process routing instabilities. In summary, we find that RTG is capa- we present results about their distributions in order to verify ble of generating realistic BGP traffic in the lab. the nesting, fanout and prefix length at the different levels. The development of our prototype tool is part of a larger Figure 26 shows the distribution of the prefix distances research effort of bringing the variability of the Internet into for the class C address block, for a RTG generated table test labs. The goal of the project is to study the impact of with roughly twice the number of prefixes (227,747) and for variability in a controlled environment. RTG adds an impor- the two routing tables RRC:2 and LISP:2. One can observe tant component, routing, to the existing toolset of work- good agreement even though the RTG routing table contains load generators and traffic shaping tools. A test bed with twice the number of prefixes as the original routing tables. all these components will not just enable us to answer the The later underlines some of RTGs capabilities. In addition questions stated in the introduction but experiment with, the plot also contains a routing table with IP addresses cho- evaluate, and judge most Internet components. sen randomly and prefix mask chosen according to the prefix length distribution. Here we observe a sizeable difference. Acknowledgments Checks of the other metrics show good agreements at all depths. We are in debt to Jan Bankstahl of SaarGate and the RIPE AS path: Figure 27 plots the density of the logarithm RIS project for access to the BGP data. Numerous col- of the number of prefixes per AS (this includes transit- leagues at Saarland University have provided valuable feed- ing and originating). We observe an apparent disagreement back on this work. We thank Jennifer Rexford, Andrew between the curves for the RTG generated table and the two Moore, Mark Crovella and the anonymous reviewers for their datasets for small numbers of prefixes per AS. But one just detailed and insightful comments that greatly improved the has to recall that the RTG generated table contains twice paper. This research has been partly supported by Cisco the number of prefixes as the other two tables but the same Systems and the Defense Advanced Research Projects Agency number of ASes. If we double the number of ASes as well we (DARPA), under grant N66001-00-8065 from the U.S. De- again observe good agreement. The tails of the distributions partment of Defense. Its contents are solely the responsibil- (not shown) agree rather well. ity of the authors and do not necessarily represent the official BGP updates: The analysis of Section 4.2 has highlighted views of Cisco Systems nor the Department of Defense. some of the dependencies between BGP updates. Therefore we cannot hope to find a single plot that validates this pro- 8. REFERENCES cess. Rather we pick one kind of instability creators:prefix [1] IXIA BGP Routing Protocol Emulation Software, 2002. addition/deletion which each generates an update burst. We http://www.ixiacom.com/. generate an event log with update bursts based upon the in- [2] Chariot, 2002. terarrival time distributions of update bursts. In a first step http://www.netiq.com/products/chr/default.asp. each update burst in the log file is replaced with an update [3] TeraRouter Tester, 2002. http://www.netcomsystems.com/ solutions/products/applications/pdf/TeraRouting/index.htm. burst of the same kind from the actual trace. Figure 28 [4] A. Kršamer, "RIG, A BGP Routing Instability Generator," shows the resulting rate of updates for both the actual 2002. Diploma Thesis, ETH Zšurich. update trace and the generated update trace (with twice as http://www.barman.ch/rig/. many prefixes). In a second step we generate the update [5] T. G. Griffin and G. Wilfong, "An analysis of BGP bursts according to the burst characteristics (not shown). convergence properties," in Proc. ACM SIGCOMM, 1999. Note the good agreement of the rate plot. [6] A. Shaikh, R. Dube, and A. Varma, "Avoiding instability during graceful shutdown of OSPF," in Proc. IEEE INFOCOM, 2002. 7. SUMMARY [7] T. Griffin, "What is the Sound of One Route Flapping?," This paper motivated and presented our workload model 2002. IPAM talk. [8] B. Halabi, Internet Routing Architectures. Cisco Press, for BGP traffic and its prototype realization, RTG. The work- 1997. load model is based on the key notion of a BGP instability [9] J. W. Stewart, BGP4: Inter-Domain Routing in the creator who creates correlated instability bursts via the AS Internet. Addison-Wesley, 1999. path, and effects the BGP routing, characterized via the [10] C. Huitema, Routing in the Internet. Prentice Hall, 1999. [11] B. Chinoy, "Dynamics of Internet routing information," in [35] G. Varghese, R. Govindan, R. Katz, and Z. Mao, "Route Proc. ACM SIGCOMM, pp. 45­52, 1993. flapdamping exacerbates Internet routing convergence," in [12] R. Govindan and A. Reddy, "An analysis of Internet Proc. ACM SIGCOMM, 2002. inter-domain topology and route stability," in Proc. IEEE [36] M. Faloutsos, P. Faloutsos, and C. Faloutsos, "On INFOCOM, 1997. power-law relationships of the Internet topology," in Proc. [13] K. Varadhan, R. Govindan, and D. Estrin, "Persistent ACM SIGCOMM, 1999. route oscillations in inter-domain routing," tech. rep., [37] K. Calvert, M. Doar, and E. W. Zegura, "Modeling Internet 96-631, USC/ISI, 1996. topology," in IEEE Communication Magazine, 1997. [14] C. Labovitz, R. Malan, and F. Jahanian, "Internet routing [38] L. Gao, "On inferring autonomous system relationships in instability," IEEE/ACM Trans. Networking, vol. 6, no. 5, the Internet," in Proc. IEEE Global Internet, 2000. pp. 515­558, 1998. [39] L. Gao and J. Rexford, "Stable Internet routing without [15] C. Labovitz, R. Malan, and F. Jahanian, "Origins of global coordination," in Proc. ACM SIGMETRICS, 2001. Internet routing instability," in Proc. IEEE INFOCOM, [40] B. Norton, "The art of peering: The peering playbook," 1999. 2002. [16] C. Labovitz, A. Ahuja, and F. Jahanian, "Experimental [41] "SSFNet, Scalable Simulation Framework." study of Internet stability and wide-area network failures," http://www.ssfnet.org/. in Proc. International Symposium on Fault-Tolerant [42] O. Maennel and A. Feldmann, "BGPcharacter: Tool for Computing, 1999. processing and characterizing BGP data," February 2002. [17] C. Labovitz, A. Ahuja, A. Bose, and F. Jahanian, "Delayed NANOG 24 talk. Internet routing convergence," in Proc. ACM SIGCOMM, [43] D. Wetherall, R. Mahajan, and T. Anderson, 2000. "Understanding BGP misconfigurations," in Proc. ACM [18] T. G. Griffin and B. J. Premore, "An experimental analysis SIGCOMM, 2002. of BGP convergence time," in Proc. International [44] T. G. Griffin, F. B. Shepherd, and G. Wilfong, "Policy Conference on Network Protocols, 2001. disputes in path vector protocols," in Proc. International [19] A. Shaikh, L. Kalampoukas, R. Dube, and A. Varma, Conference on Network Protocols, 1999. "Routing stability in congested networks: Experimentation [45] A. Feldmann and S. Muthukrishnan, "Tradeoffs for packet and analysis," in Proc. ACM SIGCOMM, 2000. classification," in Proc. IEEE INFOCOM, 2000. [20] D. Chang, R. Govindan, and J. Heidemann, "An Empirical [46] W. Fang and L. Peterson, "Inter-AS traffic patterns and Study of Router Response to Large BGP Routing Table their implications," in Proc. IEEE Global Internet, 1999. Load," tech. rep., USC/ISI, 2001. [47] H. Tangmunarunkit, R. Govindan, S. Jamin, S. Shenker, [21] S. Ramachandra, Y. Rekhter, R. Fernando, J. Scudder, and and W. Willinger, "Network topology generators: E. Chen, "Graceful restart mechanism for BGP," 2001. Degreebased vs. structural," in Proc. ACM SIGCOMM, Internet Draft (draft-ietf-idr-restart-05.txt). 2002. [22] D. McPerson, V. Gill, D. Walton, and A. Retana, "BGP [48] B. Krishnamurthy and J. Rexford, Web Protocols and persistent route oscillation condition," 2001. Internet Draft Practice. Addison-Wesley, 2001. (draft-ietf-idr-route-oscillation-01.txt). [49] RIPE's Routing Information Service Raw Data Page. [23] H. Berkowitz, A. Retana, S. Hares, and P. Krishnaswamy, http://data.ris.ripe.net/. "Benchmarking methodology for basic BGP convergence," [50] SaarGate. http://www.saargate.de/. 2002. Internet Draft (draft-ietf-bmwg-bgpbas-01.txt). [51] University of Oregon RouteViews project. [24] H. Berkowitz, A. Retana, S. Hares, P. Krishnaswamy, and http://www.routeviews.org/. M. Lepp, "Terminology for benchmarking external routing convergence measurements," 2002. Internet Draft [52] Merit. http://www.merit.edu/. (draft-ietf-bmwg-conterm-01.txt). [53] V. Fuller, T. Li., J. Yu, and K. Varadhan, "Classless [25] NANOG: The North American Network Operators Group. Inter-Domain Routing (CIDR): an Address Assignment and http://www.nanog.org/. Aggregation Strategy," 1993. RFC 1519. [26] S. Hare, P. Krishnaswamy, M. Lepp, A. Retana, [54] Light Reading, "The Internet Core Router Test," 2001. H. Berkowitz, and E. Davis, "BGP convergence [55] G. Huston, "Analyzing the Internet BGP routing table," in measurement issues," 2001. IETF/bmwg talk. Internet Protocol Journal, 2001. [27] P. Barford and M. Crovella, "Generating Representative [56] A. Broido, E. Nemeth, and K. Claffy, "Internet Expansion, Web Workloads for Network and Server Performance Refinement and Churn," in ETT, 2002. Evaluation," in Proc. ACM SIGMETRICS, pp. 151­160, [57] L. Subramanian, S. Agarwal, J. Rexford, and R. H. Katz, 1998. "Characterizing the Internet hierarchy from multiple [28] P. Danzig, R. Caceres, D. Mitzel, and D. Estrin, "An vantage points," in Proc. IEEE INFOCOM, 2002. empirical workload model for driving wide-area TCP/IP [58] C. Labovitz, "Multithreaded routing toolkit," in Merit network simulations," IEEE/ACM Trans. Networking, Technical Report to the National Science Foundation, 1996. vol. 3, no. 1, pp. 1­26, 1992. [29] P. Danzig and S. Jamin, "tcplib: A Library of TCP Internetwork Traffic Characteristics," tech. rep., USC, 1991. [30] C. Huitema, Routing in the Internet. Prentice Hall, 1995. [31] Y. Rekhter and T. Li, "A Border Gateway Protocol 4 (BGP-4)," 1995. RFC 1771. [32] C. Villamiyar, R. Chandra, and R. Govindan, "BGP route flapdamping," 1998. RFC 2439. [33] C. Panigl, J. Schmitz, P. Smith, and C. Vistoli, "RIPE Routing-WG Recommendation for Coordinated Route-flap Damping Parameters ," 2001. http://www.ripe.net/ripe/docs/ripe-229.html. [34] M. Musuvathi, S. Venkatachary, R. Wattenhofer, C. Labovitz, and A. Ahuja, "BGP-CT: A First Step Towards Fast Internet Fail-Over," in Microsoft Research Technical Report, 2000.