Session 8: Network Measurement Scribe: a.botta@unina.it --- Kompella: Every microsecond counts: tracking fine grain latencies with a lossy difference aggregator (Original title: "Lossy difference aggregation: enabling fine-grain measurement of network latency and loss") --- Problem: networks with end-to-end microsecond latency guarantees are important for many new network applications (trading, HPC, cluster computing) but how do we measure these delays ? Traditional solutions: SNMP (too coarse), active probes (too many probes for this resolution), specific hardware, e.g. London Stock Exchange uses Corvil boxes (too expensive). We want to use cheap hardware instead! Another possibility: assume one way flows, discretize time, and Source (S) and Destination (D) can maintain a small amount of state. For losses it is just necessary to store a counter at S and D, and take the difference. For delays you could store the timestamps of all the packets. But, you will need a lot of communication (the current solution is sampling). If you had no loss, S could store the sum of the timestamp and then send it to R who keeps the sum of the receiver timestamp, the difference will be the average delay. Main issue: if a packet is lost than this delay measure is inconsistent. Idea: lossy difference aggregation (LDA), use a hash table and keep the hash, the sum of the timestamps, and the count of the packets with the same hash. Then the hash should be such that the losses will be spread among the different buckets (sets of packets with the same hash). Then you can compute the average delay using the buckets with no losses. Remaining issues: it is necessary to sample the packets before, in order to cope with high losses (there is a formula in the paper). To cope with unpredictability of loss rates, they make multiple sampling for different estimated loss rates, multiple banks (there is a formula in the paper). Experimentations: comparison with active approaches, to achieve the same accuracy for loss estimation you need to probe at 10Kpps. They compared the loss rate estimation with that of Papaggianaki03, and relative error is always less than 5%. The made experiments also to decide the number of banks to use. q-a q- for routers it is ok, but across the network it is difficult to obtain synch ? a- yes, to estimate delay you MUST have time synch. q- can you assign a certain delay to the lost packets so you will not trash the buckets that have packet loss ? a- I have to think about that q- what happens with burst losses ? a- with LDE you can estimate the distribution of packet loss q- what are all the metrics you can estimate ? a- we have not considered other metrics than loss and delay q- how do you cope with out of sequence packets ? a- modern routers actually do fifo, and fifo is a requirement q- how much overhead do you have for delay distribution ? a- we do not estimate delay distribution q- can you have higher order statistics ? a- you need more computations, but I don't know how useful they are q- how do you synch S and D on when to start ? a- you can think about special packets --- Zang: Spatio-temporal compressive sensing and Internet Traffic Matrices --- Problem: how to fill-in missing values in a matrix (mostly traffic matrix, but also delay matrix, ...). Missing values are common in TM measurements (expensive values, unreliabilities, anomalies, future traffic). But many networking tasks are sensitive to missing values. Main issue: the problem is not enough constrained in real networks. Idea: exploit 1- low-rank nature of TMs, 2- spatio-temporal properties, 3- local structures in TMs. How: 1 compressive sensing, increase the number of unknowns, and then use sparsity regularized SVD to cope with problems arising from real scenarios. 2- sparsity regularized matrix factorization, uses a temporal constraint matrix T and a spatial constraint matrix S. 3- Combining global and local methods, local correlation among individual elements can be stronger than among TM rows/column. Evaluation: a- interpolation performance, datasets from abilene, commercial ISP and GEANT, comparison with several other methods, even when you loose 98% of data you will not have more than 30% of error, but this is with random losses, which is not the case in practise. The method is always the best. b- For three possible applications: inference (tomography), prediction, anomaly detection. They can obtain very good results that outperform or even generalize previous methods. q-a q- can you say something more about future work on S and T ? a- it should be more tight to the application. q- have you looked at other norms beside the Frobenius one ? a- that one provided the best results. q- the classical solution is Kalman filtering, if you want to use S and T modeling, probably you will end up doing PCA + Kalman ? a- we didn't look at Kalman filters it can be done in the future. q- seems like we have not yet fully understood previous methods and you want to add yet another one ? a- the additional complexity is very small and we just tuned some methods that were already there. q- how much time is required ? a- a few seconds. q- leftmost side of the plot of random losses estimation shows that the error is not that small ? a- if loss rate is small than the error is small for all the techniques. q- actually is 10%, why ? a- this is the minimum you can achieve --- Papageorgiou: Passive Aggressive measurement with mgrp --- Problem: you are a provider of video conference service and you want to obtain the maximum from the Internet, the network will not give you anything, you have to measure it. Traditional solutions: 1- use your own traffic (efficient but inadequate). 2- active measurement, i.e. probing traffic (bandwidth intensive, can interfere with the application). 3- custom active measurement, bundling something inside the application shaping application data for measurement (efficient but not modular). We need a more generic solution. Idea: MGRP, a new measurement manager protocol that piggybacks application data inside active probes. The packets from any application using sockets will pass through this system that schedules probes for transmission and piggybacks application data on probes. It is transparent to applications, independent of measurement algorithms, easy to adapt, and has usec precision. It is collocated besides TCP, the probing application opens a socket and the normal application as well. Issues: we need to do fragmentation and reassembly because TCP packets are typically full. We have to make the probing technique aware of the traffic piggybacked. First experiments: pathload, the overhead is smaller, the completion time is smaller. It was implemented in the kernel. There is some intelligence to cope with the fact that the probe packets are different from the original one, and that some packets can be lost, and to decide the maximum buffering delay. Case study: MeadiaNet Overlay, without MGRP the overlay never recovers from a decrease of available bandwidth, i.e. it never switches back to the more bandwidth-consuming codec when the cnogestion is over; with MGRP it follows much more the real available bandwidth. Moreover, when using pathload separately, its probing packets are actually interfering and causing a reduction of video quality (the system promptly switches to a lower-quality codec), instead using MGRP the packets are piggybacked and there is no additional overhead. q-a q- why do you piggyback the application packets into the probes and not vice versa ? a- we start from the perspective that there are already a lot of tools for measurements and we did not want to trash everything. q- if you were running UDP applications ? a- there are many streaming applications that use TCP, and they do not rely on TCP for setting the rate. q- is it applicable to data centers where you have a lot of bandwidth etc ? a- this is not targeting data centers. q- did you do tests on gigabit with dummynet ? a- we did not use dummynet on gigabit because the time resolution is too coarse. q- there is a shift between the availability of bandwidth and codec changes, this is the measurement overhead. Would it be better if you were directly increasing the rate ? a- you will ramp up frequently and this would be annoying. q- how do you do with TCP trying to discover the MTU ? a- we did not allow TCP to do this. q- you are mixing two flows which were originally separated, are you introducing some bias ? a- in facts we need a compensation. q- you are putting pieces of TCP packets into UDP this will cause a lot of problems because of losses and reordering ? a- yes.