================================================================================ Session 1: Rearchitecting data center networks (chaired by Arvind Krishnamurthy) -------------------------------------------------------------------------------- Your Data Center Is a Router: The Case for Reconfigurable Optical Circuit Switched Paths Guohui Wang (Rice University), David G. Andersen (Carnegie Mellon University), Michael Kaminsky, Michael Kozuch (Intel Labs Pittsburgh), T. S. Eugene Ng (Rice University), Konstantina Papagiannaki, Madeleine Glick, Lily Mummert (Intel Labs Pittsburgh) Talk: Restructure data center network to provide full bisection bandwidth among all the servers, but this may not be necessary (e.g., Earthquake simulation - spatial traffic locality) or MapReduce (tempoaral traffic locality). Full bisection provides "too much" with too high costs. Hybrid network: electrical and optical circuit switching. Optical switching is different: it's about reflection of light (not just store-and-forward mechanism). So switching is at rate of modulation of I/O ports. Up to 10s ms. What's been done so far? Well, we can get 40G, 100Gbps and now 15.5 Tbps over a single fiber! These optical-switched networks are simple and flexible to construct, expand, and manage as well as high b/w and low power. But fat pipes are not all-to-all and there's reconfiguration overhead (XXXXXX). Lots of open research questions come from this work: 1. Is there enough traffic locality? A few optical paths can offload a lot of data. Still open though, as the MSR folks will share. 2. Can optical paths be reconfigured fast enough? Is this even solvable? Maximum weight perfect matching, which is solvable in polyn. time (Edmonds' algorithm). 3. How to manage optical paths in data centers? Ethernet spanning tree and link-state routing won't work well in this environment. VLAN-based dual-path routing is the solution! But how do we measure application traffic demand? Scheduler in kernel can be configured by the configuration manager - a daemon on the server is the intermediate. ---- And the questions began: Phil Levis began this round of questions by challenging the claims made by the traffic matrix (TM). Especially, while looking at the fractions that were presented, if it's a few Kb then why does it matter? This is a really surprising value otherwise. Look at MapReduce, we're trying to push traffic into the rack. Guohui responded with how they're using this graph to show the fraction though. Sometimes high traffic volume. Sometimes the traffic can be congested. Bryan Ford (Yale). What are the technical limits on how many optical ports can be on one of these switches and how much cascading is possible? It seems like huge port counts are necessary. Answer: Currently in the market, there are 80-100s ports available. But these can scale up to 1000 ports, which can be expensive. And, can you cascade these ports? Yes. Kevin Fall: You should expound on the cost of optical transceivers, can you evaluate the relative cost benefit of these transceivers? Answer: Adding these into the network can give you comparable cost. KF: But aren't you in an arms race with electrical signaling? Eugene Ng: Optical switching will most likely be able to switch faster than electrical, so we have that advantage. KF: (pushes on this point) EN: Electrical has reached the limit and won't drop very much. Optical switches may drop a lot more though! Complexity is also an issue: You create a factory of wires whereas with optical switches each path has higher bandwidth you can achieve raw capacity with a lot less gear. Bryan Lyles: Another way to think about cost-benefit: Ethernet currently is 10 Mbps, etc... (things scale up). We must consider the cost-benefit for the generational gap. If the generational time was 6 months, who'd care? But 10 years, well, that timeline justifies this work. How far ahead does an optical system have to be to make for an effective tradeoff? The final question from the audience began with the observation that the traffic matrix presented was a bit myopic. If you have scalable ethernet, then you can move applications more freely. Can you comment on this? How general are your TMs and for the future? GW: Not easy to achieve in near future. If you can adjust location of applications, then you can adjust optical path. -------------------------------------------------------------------------------- Flyways To De-Congest Data Center Networks Srikanth Kandula, Jitendra Padhye, Paramvir Bahl (Microsoft Research) Talk given by Jitu. Try to provide good connectivity at low-cost. As you go up the hierarchy, link capacity does not keep up with number of servers. Possible solutions presented. Every proposal though aims to eliminate oversubscription. Add capacity at hotspots as they form, may not run into oversubscription problem. Base network + Flyways - wireless links when needed. 60 GHz - 57-64 GHz. 7GHz b/w (802.11 b/g has only 80 MHz). Available worldwide, high b/w. 1-4 Gbps are already available. - low range (1-10 meters). Improves spatial reuse. - line of sight to achieve (easy though!) Small form-factor. Flyways only need modest bandwidth. Add capacity as needed to a data center with flyways. And 60 GHz wireless to be apt for this. ---- Discussion. Brad Karp: Early days for 60 GHz, has anyone tried to deploy this to an appreciable density? There's no examination in this work for that sort of scaling. JP: One advantage is that this b/w is wide. BK: FDMA? JP: Yes. Bryan Ford: It seems like there are economic and generational issues here. How do you account for data centers that exist? JP: I think that the cost will work out in our favor. We don't actually need large capacity from these flyways. BF: But high and low b/w are relative to generation. JP: When servers start adding 10 Gbps next, what will happen? This depends on traffic capacity. CPU bound. We carry less traffic by definition. From the audience, someone then pointed out that data centers with wireless just seems like a bad idea. JP: This is a research problem - we don't know yet. Generally wireless is more flaky, sure, but data centers are much more controlled environment. The performance should be fair. So, how do you handle reconfiguration? JP: Reconfigured on sub-second basis. There is stability over seconds. Eugene Ng asked a question that JP addressed as follows: JP: Much more careful alignment between racks required. Phil Levis: We may only have, say, two racks with a certain number of machines. Why not make racks bigger? JP: Yes and no. Depends how much you want to spend on ToR. The point is we can construct networks for less. Rick Macgear (HP): You don't have interference model for 60 Ghz model but why not a noise model? Motes interfere, right? JP: We don't have to worry about motes, right? (Audience agrees.) Aren't there security problems? JP: Yeah, people think wireless they wonder about security. We don't really address that in this work. KF: Oxygen model and data center. JP: Oxygen concentration and data center throughput Bryan Lyles: If you step back and look at what people are doing networks. Blocking within issues - architectural issue to avoid blocking. So, how is 60 GHz used now? JP: Dongle for in-home devices. This market is still pretty young, right? JP: Well, they are mature enough that they already split (audience laughs). But mostly used for in-home entertainment. Someone then challenged the point of exploring this work for the application of wireless communications. JP: Cost and energy and such make this useful. PL: The key thing that wireless provides is that there are no wires! Kevin Fall: Optics. Free-space optics. EN stole my first question. Both issues that you brought up might be able to address this. JP: Phase-array optical. Does anyone know? The main worry when we think about free-space optics. You can use a stable wire network to coordinate. -------------------------------------------------------------------------------- Applying NOX to the Datacenter Arsalan Tavakoli (UC Berkeley), Martin Casado, Teemu Koponen (Nicira Networks), Scott Shenker (UC Berkeley / ICSI) - How do you apply NOX and openflow technology to data center? Datacenter networking requirements: scaling, location independence in context of scaling, service quality, datacenter-specific (middlebox traversal). Two tracks of networking research: specialized datacenter networking and general network management (4D, routing control platform, tesseract, ethane, nox) Flow granularity control. Openflow switches, you can control switch and flow table. VL2. Bottleneck doesn't have to do with outside network. - ClosTopology with oversubscription - Valiant Load Balancing - Two-distinct L3 addressing scheme. NOX managing a datacenter. - VL2 routing, naming, and addressing. Scalability issues are discussed. QUESTION: Latency of order of magnitude. AT: Makes more sense to do pro-active routing. Regardless of latency, just makes sens to handle pro-actively. Additional capabilities of NOX. VM Migration. Network monitoring. NOX can be effective in managing datacenter network. OpenFlow commercial switches ---- Questions Q: Can't you just have network with OF and reg switches? AT: Why not just use OF. Q: Reducing entries of bottom row in your table. AT: When you have multiple entries you have multiple decisions. Network with more paths. Gian (Intel): OpenFlow and NOX better than VL2? Why would I do that if OF expensive? AT: You can use cheaper switches with OF. Effective means we are doing what VL2 does but with more flexibility. Much easier to generalize network management platform. More effective to that extent. Phil Levis: Go back to outline. I like your second point "Overview of Networking Research." What's something that you wouldn't want to use it for? AT: Something might better with local primitive. Noticed we had to stick coreID in packet. What are the local primitives that make it easier to do than NOX. Kevin Fall: Slippery slope. If I needed encrypted links or some other feature, I'll have to add. AT: I don't have a good answer. On switch hashing, for instance. You're putting some intelligent in, but feeling more comfortable starting from scratch and adding what I want than using bulky thing. EOS ================================================================================