Transport Area (tsvarea) Session IETF-84 Vancouver, BC, Canada Area Directors: Wesley Eddy Martin Stiemerling Mailing List: https://www.ietf.org/mailman/listinfo/tsv-area Draft Agenda as of 2012-07-17 * Agenda bashing none * WG News PCN WG closed completed its work items mailing list is open draft-ietf-tsvwg-rsvp-pcn is last related draft * IP Pseudo Wire Conggestion Considerations draft-stein-pwe3-congcons-01 David Black initial results on what do with fixed bw pseudowires over IP, when mixed with congestion responsive traffic draft has initial promising results when into a range where affected by performance problems, offering bad service. A bunch of graphs in PDF version please come look find Yakov, David, Bob Briscoe if interested want operational considerations when mixing psuedowires with congestion-responsive traffic. * Bufferbloat Topics: * Controlled Delay -- Van Jacobson (35 minutes) http://datatracker.ietf.org/doc/draft-nichols-tsvwg-codel/ http://queue.acm.org/detail.cfm?id=2209336 CoDel / "coddle algorithm" Work primarily done by Kathie Nichols. A way of dealing with buffer bloat, huge queues inside today's routers soon as you think about it, auto traffic queues that's congestion stuff in routers is fundamentally different. with autos: one bozo can cause, but no one thing to do to fix problem. could add lanes, public transit, gas taxes... buffer bloat and queues in routers can be fixed with "one thing" -- better physical model - a fountain (picture from somewhere in England) it's a closed loop system That's how a transport connection works. Better model intuition. Does the water level have anything to do with flow rate? not really. The pond is the queue in the closed-loop servo system backlog so pump doesn't suck air can make queue anything want independent of pump height of water column diameter of pipe is bandwidth change of level doesn't affect other two. -- sender + reciever queue picture High to low bandwidth, Packets have to stretch out, guys behind wait Vertical is bandwidth; horizontal is time; BW * time is bits packet doesn't change size, so can deform, area constant -- once spread out, they are spread and at this point, the five packet queue can't go away. just like pond. arriving and leaving at same amount running in flow balance That's buffer bloat. No analog in erlang traffic theory; they don't run at 100%. it's impossible, but the way every bottleneck runs in Internet Standing queues can be caused by one traffic flow. Get queue at fast-slow transition. for any path, only one slowest point; so single bottleneck. when spread out, points spread out, so no change Because internet is sparse; how gets economy; likely that just one bottleneck particularly for residential home users. sitting on skinny tail. likely that tail is bottleneck for user. Two things you can do. Bigger bottleneck (somewhere else) - move. thow away some bandwidth to do that. About a dozen AQM algorithms that have shown up over last 12 years, that involve moving stuff around; doesn't fix problem. [put somewhere where doesn't affect other traffic, though?] Other alternative: cooperate with endpoints to get rid of queue if don't cooparate, toss packets to make it go away anyway. That is CoDel. -- To make it go away, make the right thing go away -- good vs bad queue. to get something started, need queue as shock absorber. supposed to do that works in terms of extremely hetero bandwidth queue len over time. big burst, metered out, go to steady state. queue at front is "good queue". doing job, delay goes away. queue at end: just adding delay. not helping just slowing you down. No static algorithm to figure this out. Example at the bottom; ack the window TCP. something done to make sucky protocol stack work in tests. completely legal, nasty for network. (big sawtooths) min queue is 0. wouldn't want to take queue away. if try to control based on mean queue level... stupid. drop packets needlessly. Good queues go away over RTT. Stuff that doesn't go away over RTT is bad stuff. Separate by min over sliding window... but window must last as long as RTT. Panimation of minimum moving because width is less than RTT} no uniform Internet RTT. If make too wide, no big deal. Narrow, decide bad queue when you don't and drop packets. first principle of design for AQM: do no harm. don't drop unless sure will make conditions better. once separate good & bad, how measure size of queue? -- how big is the queue today: measure queue size in bytes. started with that, because 1970s/1980s, memory expensive. IMP had 4kb buffers. O(10000$). Interesting if worried about queue overflow. Today, don't really think about memory. And measuring in terms of bytes is a pain in terms of queue measurement. Have to know bandwidth to convert to delay. Bandwidth hard to know. As busy write Komen filter algorithms to measure bw to measure queue. Kathy said just measure the time directly. "packet sojourn time". Direct measure that is trivial to compute. To measure bytes in flight - need to stop enqueue to measure. Can be seriously hard. Enqueue side puts timestamp in packet Dequeue pulls out and compares to current clock. [no need for synch clocks?] -- sojourn time. if have multi-queue structure, bytes very difficult to deal with. -- two views of queue bottom is like red algorithm top is sojourn time. 10mbps botteneck, 80ms rtt, tcp reno linear window ramp in cong avoidance phase blow up bottom, see individual packets phasing between arrival and departure random why oscillation +/- one packet at packet arrival rate (2xfreq of packet arrival rate) actual queue, have to integrate ups and downs. relative area is bytes in queue pedistal is 100btyes of web traffic. if integrate bottom, get top take away: measuring bytes sucks. sojourn easier, and tells you what is going on -- multi-queue behavior RR queues lhs / rhs: same traffic, one queue versus across all queues intuition is that have to drop same in either case. but impossible to do with something measuring queue volumes - must look across all queues to get interaction. sojourn time includes effect of all queues and service discipline -- the entire system rather than queue a packet lives in. -- controlling queue Measure current state of system against model of what system state should be. If it doesn't match, try to move it. Control systems 101. estimator, set point, control loop. so what about set point and control loop? -- How much bad queue do we want? (the estimator) Drive it to 0? Can't. [Pump sucks air :) ] If goal is to do no harm, want bottleneck link to be active. 0 queue -> not busy. Packets spaced out over path. One packet time apart at bottleneck. Therefore, better have one or two MTU as a backlog. 2 because ack every other packet today. negligible on fast link... If increase above 2 packets, impact on TCP control system. takes many RTT for control system to ramp up. smaller window - less impact with loss. -- Quantify - graph for Reno. As increase delay (as %RTT at the bottom) (how algorithm works). No delay 75% of link bw. Goes up like sqrt(n) (mathis et al eqn) Delay goes up linearly as add more queue want tradeoff take ratio of bw to doelay -- Power. Power is linear down except at top blow that up... flat up to 5 or so, then falls down on downside by 10% 10% thrown away 1% of link bw. 99.9% at 5% delay increase -- Hence, sendpoint /target of 5% of nominal RTT, get big win in BW for small increase in delay. Result holds for many congestion control algorithm! [reno, cubic, westwood tested.] basic result is how tcp fills hole after loss. reno sucks pretty badly westwood sucks because it wants to keep queue full cubic is the goldilocks :) There are no free parameters here; need RTT big enough to separate good and bad queue. once you pick that, tells you set point. About 5% of that number. Pretty much set it and forget it. -- Eric Dumazet of google did initial impl for linux (in linux 3.5) put in stochasitc fair queuing system too (simple, 256-1024 buckets with round-robin queue discipline). simple one. Provides great isolation. concerns with IW10 go away... Gets rid of ack-compression due to bttleenck bi-dir traffic problems. across board win. -- CoDel alg is in paper journal. Would have liked to spend another 6 months on it, but protects ideas, puts them into public domain. Online pub doesn't protect :). Opensource dual licensed code. ns2, ns3, linux To best of knowledge, no prior art on ideas. In simulation and execution, top level design goal seems to be met. todo: still looking at alg; but all 2nd order thigns would like it everywhere (on residentil/soho links) but deployment issues. need to understand dqueueing process, what time structure is. a lot of engineering in running slow link efficiently. WiFi has many queues, power management, etc. want to make sure not fooled by it. -- home gw deployment issue fast to slow is in cable modem can't hack that box. cablelabs box. can hack rtr/ap. which means moving queue from cable modem to router box but that puts rate limiter in router box matches generic problem... "getting things into linux kernl" proto stack, to device driver. queue sched and mgmt is done in proto stack but dev driver to device is bottleneck (descriptor ring) again, move queue to where aqm lives. google did byte counting to push packets out of dev driver upstream. problem: easy to move out of ethernet. 1 to 7 (if priorities) queues power managmeent: 4 queues with 4 states. problem gets nasty hard to represent what queue is doing earlier hence no codel in wifi, which is where we want to put it probalby byte counting not right wyay to move last one is phones: phone cpu to onboard 3g modem to radio access network, to in 3G is atrocity called SGSN ..service node. Same thing. packets queue in modem. and problem may be between RAN and SGSN. gotta talk to people that own various parts of hw ecosystem. a while to deploy to take effect. thanks to parties. chris donnely? cablelabs. talk to you offline about docsys, queueing, and what might be done. stewart cheshire: thanks for wonderful preso particularly liked that start with LA freeway and pointed out that garbage comfortable metaphors for reasoning, and if metaphor is comfortable tend to use it anyway, even though wrong. ATT raises rates, and lowers count because delays are do to congestion. Nope. zero evidence of congestion. it's bufferbloat at varoious places. some can be fixed, some can't. And we're paying for it. cheshire: action item, need to see in places with queues cablemodems, iphones, ... cheshire: in kernel case, think in reality it's the device firmware buffers is real problem. high bit: find place where queue really is "can't manage queue you don't have" most areas, find real queue, and look hard at behavior. fbaker: comment. commented on wifi; on wifi, when talked to hw people about how get AQM in this... they said want to maximize queue depth problem is moving beam around takes time. if keep queues really shallow get less than 10% util, because spend time moving beam. We could use some guidance as to where tradeoff works out there. fbaker/question: looking at incast in data centers. one of attributes of that... request goes from proc a to proc b, proc b asks 5000 best friends, all reply using tcp. get 4000 thanks but no thanks, and 500-1000 that have a few segments. as a result, AQM alg in system starts tossing. every 10 sec head of line blocking problem for thread. next thread asks problem is stuck behind tcp. like to think in terms of ECN. CoDel and ECN and not drop packets but signal congestion when appropriate. for linux, just turn on. "but there are details" in termes of mechanism, doesn't matter how signal drops believe ECN doesn't work, and never work on internet. because ??? completely wrong. but great in data center. believe codel is way to set CE bit. but forces into HOL blocking. Think Erik model for FSQ. using DRR should be using packet-based. not about being fair, babout getting good mixing. use packet based RR to spread out recombine when mixed. gets rid of HOL. CoDel works on aggregate. running very late. Andrew McGregor: Don't htink fred's beam steering or link agg that nasty. Device has requirement for certain amt of queue. Correct level of min queue is non zero, but doesn't need to be large. not that large of a tweak to do. code wise, not sure moving codel down to bottom of linux wireless driver is that hard. next talks all lightening talks now. compressed due to congestion. [20 min left?] * DCTCP & CoDel: the Best need not be the Enemy of the Good -- Bob Briscoe (20 minutes) 7 min for 20 min talk :) result out of data center TCP. on behalf of Murari Sridharan both attack different problems, and can complement each other. argue with voltaire. DCTCP is best and CoDel is good :) untested roadmap - use DCTCP with AQM -- big idea of DCTCP: if change endpoint algorithm to not be so sawthoothy, how get low utilizaiton and stable delay, and still ahve plenty of buffer. if really want to do good, change queuq and end system it's those large sawtooth that give you a problem in stable situation. -- DCTCP alg sender side is in bottom switch side is easy. mark if queue above threshold. tcp change is only a few lines from reno -- show it works in a graph -- in terms of insensitiveity to parametr problems... if set K to about 17% is ideal. but if larger is OK. if set too low, can only go to 94% (and vanilla tcp 75%, so quet insensitive to getting things wrong). -- is dctcp only for data centers? named that because guys bringing out couldn't figure out how to do it in general internet . change all sneders, all receivvers, switches. but works in wider area, given that constraint :) addresses high bw-delay product, but not fully tested -- ... -- differences from traditional AQMs (e.g. CoDel): important point in dctcp, source smooths signal... not the queue. no need for nominal RTT network doesn't have to know RTT, no smoothing, get signal out straight away. end system knows RTT, smooths own rtt. ... (see slides) -- if saying nominal RTT is 100ms, but things can respond in 5 or 3 ms they see that as infinite time. so queue for themselves, not getting signal about. reason this can only work with ECN -- if want to not smooth on net and only smooth at end systems gotta get signals out immediately. can do that with ECN. drops is an ampairment and signal, dont' want to send too many of them. why smooth in net so don't ahve to sen dout if don't want to. --- can work together MM: although the name was admission of narowness of scope of original problem but now name is impairment to progress. have used low threshold ECN focus on ECN not TCP This alg might not be best one, but that approach. * LEDBAT and Real-Time Media -- Randell Jesup (15 minutes) Scavenger protocol, background flows, keep delay low deployment by bittorrent, and Apple for updates and backups Has a set point. officially 100ms. will induce that delay. causes some problems. one is that. when hits AQM stops being scavenger, and starts being normal TCP RED and LEDBAT... showing it isn't letting tcp own world, fight it out. It also delays VOIP flows. 100+ms delay on that flow. that can be bad. RTCWEB trying to use delay-based cong control (1 day Saturday discussion) those tend to target 0 or low dleay, then LEDBAT will out compete and end up being foreground protocol. conclusions: LEDBAT design... not hurt you. but it's not free, can hurt you if AQM deploye made worse by fact that if people think it's free, let run in background with no indicaitons to user or ... so: dont' do it silently, give user knowledge & control and see if can find a better solution that works with these use cases could fix LEDBAT so doesn't change characteristics under AQM... or make fix other delay-based algs. RTCweb params different, might work. MM: two things discussed early on. wasn't paying attention when doc closed... Two things would have made a difference. 1. original thresh 25ms understand changed late in process, wasn't well discussed. 2. original implementation marked the traffic with LBE QOS tags. don't know if got preserved. Think renewed effort for QoS deployment LBE marking e2e would fix problem. Don't know if QoS. Mirja: reason for changing was home routers , 25ms problematic, 100ms works compare LEDBAT to standard TCP. if using standard TCP, would be even worse. just "not free". is better than TCP, but causes problems. "think bittorrent got fed up with IETF and went off and did whatver" Where is discussion of CoDel going to be? TSVWG. * Other Topics: * Self-forming P2P -- Johan Pouwelse (15 minutes) https://datatracker.ietf.org/doc/draft-pouwelse-censorfree-scenarios/ Bar BoF Wed evening. 2000 Regency A Less than 0.5 hr, then move to a real bar :) Early stage, trying to generate interest in IETF Member of faculty, 10 yr experience with p2p. questions in parlaiment, court cases, ... Proposing clear goal: move P2P as new level Idea: system with no single points of failure. Based on cellphones, etc. Self organizing Only running in smartphone app Go as a service. beyond DHT Have implemented a 0 server implementation of search ... and streaming also working on TV, because of 0 day exploit. worlds first TV to TV net can download firmware http://www.tribler.org/SwiftTV 26M european funding very difficult to hit overall goal hope for more in bar BoF. is a draft with scanarios to help with definition.