TSVWG Interim Meeting, Monday Apr-27-2020 0700-0930 UTC-7 (San Francisco) mir 1000-1230 UTC-4 (Boston) 1500-1730 UTC+1 (London) = 1600-1830 UTC+2 (Berlin) 2200-0030[+1] UTC+8 (Beijing - not on summer time, other 4 are) WG Chairs: Gorry Fairhurst (remote), David Black (remote), Wes Eddy (remote) Meeting started 0708, Note Well shown TSWG Interim 2: ECT(1) - 3rd interim of 2020. 1. Agenda - as advertised. 2. Chairs Update: Note Well 3. L4S and ECN Overview of slides Stuart comment 0724: agreeing with chairs input/output is useful decision question - Greg discussion of L4S slide 5 - Jonathan discussion of SCE slide 6 - Wes moved on to comparison overview - slide clarification (some unintended slide modifications happened) - Greg covering at 0746, discussing L4S deployment plans - 2.5 minutes (slide 13), then Jonathan on SCE deployment (slide 14) - Stuart clarifying comments about Apple interest - Jake clarifying comments about Akamai interest - Wes moving on (if no consensus) - several comments from Dave Taht: please consider asymmetric paths, and more testing is needed before decisions. - Spencer question 0758: Do we have recent information about 3GPP intentions to adopt L4S? That was the case when I was talking with the 3GPP liaison to IETF three or four years ago, but I haven't heard anything since, and 3GPP does remove content from releases when it's not available to be included. That's a good question to ask them now. - Answer from Kevin Smith (Vodafone) saying it's strongly supported by them, and has a change request with Ericsson in 3GPP - Answer from Per Willars (Ericsson) 0801: L4S input is working as-is, matches with carrier signal well, some discussion has been postponed for a future release (but L4S can be deployed today from a 3GPP point of view.) 4. Discussion of ECT(1) - Discussion of slides - Jana: thanks, This was a good summary. - The most important point is that low latency requires multiple queues in network. An explicit classifier is preferred, not tractable to detect a lower latency intent in the network device - I prefer no further AQM innovation. timescales for queue devices are intractable, vs. endpoints. - I like maturity. L4S is more mature, has had a lot of time in ietf and much engagement. L4S preferred. - Stuart Cheshire (speaking for Apple with caveats): - ULL is critical. - Anecdote: 300kbps DSL outperforming gigabit link as perceived by voice connection partners, latency is important. - Stuart is fully behind L4S. - Andrew Macgregor: + to L4S - +1 Jana and Stuart. L4S wins us more and trying it teaches us more. It's the bold thing to do and we should do it. SCE is feasible, but L4S is feasible on deployed hardware. - Uma: Are there MPLS queuing implications? - Bob: There are specs. This is not necessarily impossible, but maybe it depends on the deployment scenario. - Uma: MPLS important in 3GPP, how will the L4S signaling be implemented? - Chairs: This is a topic we can hear more of in a later TSVWG meeting as an ID or possible presentation. - Bob: - Answering multiple questions: - re Jana multiple queues: 2 queues are better than multiple queues. FQ can get 8ms, dualq can get lower. - re David Taht's points: we've done more than congestion avoidance tests, disagree with criticism; we will add asymmetric testing, good point there. - Mirja: - I want to underline the point about strong incentives. The whole point is not about being fair. - If we pick something up that has only a minor benefit, we are missing an important opportunity! - David Taht: - musicians a good point. Conflict with greedy traffic is the issue. - FQ solves the problem very well, non-greedy traffic does not experience latency on the link. - We have got end-to-end collaborative music over Internet using diffserv - Martin Duke (Individual, no hat): - SCE is good work and would be a significant improvement if adopted, but deployment concerns may be a problem. - L4S is transformative and we should go big if possible. - My only concern is safe or not. ask the WG to consider it, hard to assess but encourage consideration here, many tests run on L4S - unfairness issues in 3168 queues are complicated, but working to address them. limits of scoping testing and limits of queueing testing are hard - personally comfortable with the tradeoffs - Jonathan: This is a big "if": our tests demonstrate this is currently not safe - detection of RFC 3168 AQMs is not reliable yet, we doubt it can be made so - SCE offers 2 levels of service: - one designed for general internet with long paths targeting 2.5ms - shaped to 1mbps to make this call, works well because normal AQM - the problem that Stuart describes is a cable network not using AQM at all. Any normal AQM solves the problem. - Roland: The debate is not SCE vs. L4S, the decision is about ECT(1) semantics. Using a DSCP as a classifier and another as signal is a discussion he'd like to see. - Pete: - I want to add to Jonathan testing: we found false negatives on RFC 3168 detection in L4S detection. There are also false positives: 2ms of jitter can make under-utilization - detection needs to work before deployment. not clear if it can - question Is low latency <1ms achievable for Internet bursty traffic? - Stuart I have a question of clarification: is it not possible or not useful for 2.5ms->1ms? - Pete: This is a tradeoff with under-utilization is an issue, needs consideration. - Stuart: This is a good argument for segregating the traffic. - Pete: The goal is both will offer high thruput. - Stuart: That's why it needs input differentiation. - Jake - Please don't break upgrading of RFC 3168. This is the big problem. - Jana - I am not holding our breath on RFC 3168 - The point is we must have at least 2 queues to get low latency - implicit or explicit is a question, implicit is - traffic mix: many connections are small, but 80% of bytes is long flows. - Martin: We need to consider newer CCs, not just sawtooth. A real deployment needs to consider this. Yes, safety is important, but we can discuss going forward. The deployability is important to consider here. - RTT fairness is a silly discussion now. We need to move forward with a deployable solution. Safety can take a hit on legacy traffic. - Bob - In answer to David Taht: L4S does not assume all traffic is greedy, it aims to allow greedy traffic to have low latency - on testing of fallback: not yet happy with it, not complete yet, took longer than expected to get a visualization but now that we have it, more algorithms can be tested more quickly than current status. - In answer to Pete & Jonathan: About false positives - you think it's a classic queue but it isn't and you continue to behave agressively. These occurred in our 4k experiments only when no classic traffic was present with 1 exeption so far - Regarding false negatives (stop L4S even tho not classic): this is ok, it just drops to 50/50 vs. cubic, and is fine in terms of fairness, though it underutilizes link - In reply to Jonathan: Though SCE is more safe on fairness issues, that's not the only possible safety issue. In L4S there's ambiguity on CE, but in SCE there's ambiguity on ECT(0). Then we need to consider a firewall that blackholes ECT(1) packets would cause unsafety even for non-participating codepoints. There are more safety problems - David : The discussion so far is mostly about edge networks. Consider datacenters, also important for Internet traffic. datacenter incast is a big problem, DCTP is a good solution, but it is slow to deploy because of risks of getting into same queue as conventional traffic. If we use 2 codepoints to distinguish, this is very attractive, making it easy to deploy in a DC - to the extent that the problems vary according to network type, this is area diffserv is intended to solve. - Aidan (Mellenox), regarding the datacenter case: This can be a controlled environment: useful to distinguish with explicit marking... - Stuart: Do you mean different CCs or different queuing? - Aidan: Differentiated in DCs by priority to avoid getting confused using DSCP. People are doing it today with high speed transports. There are only 2 bits in the ECN field, a better CC implies lower latency. Using the ECN bits as output would enable low latency traffic in datacenters by enabling CC algorithms that would react faster to congestion state, and would be significantly important for datacenters. - L4S does not appear to enable something better than DSCP enables today for datacenters. - Ingemar: - Jitter in WIFI is also an issue, but we shouldn't use today's snapshot for tomorrow's technology! - In 3GPP, we have solved this with timeslots and there's room to fix jitter in other ways. - About the non-greedy case: We are looking at L4S for non-greedy traffic primarily. The upper limit of bitrate is a common use case. It's not about only greedy traffic in this case. - Nikki Pantelias (broadcom): Where we sit is usually neither endpoint, nor datacenter switch. Our implementation concerns are: - not having to inspect layer 4 header - being compatible with ACK-thinning - being able to do this with a dualq implementation - FQ is not practical for us. We favor ECT(1) as input for a classifier. - Ron Raganathan (comscope?) cable modem/headend equip: - latency much more critical now, we're looking at it - same space as n=Nikki - AQM is doing its thing, very powerful tool - challenge is when there's a lot of burat traffic from different flows on different pipes - consistent latency under 10ms is hard when bursty traffic arrives - with L4S we so far see usually <8ms - our experiments not finding impact on classic traffic - support L4S - Greg: - these bits are precious; L4S looks to reclaim ECT(0) at some point, but SCE requires using all the codepoints forever. - Lars: - no matter what we do we need to make a decision now, we've waited too long already. don't want more discussion, need to decide something now. - Stuart: - to "no point lowering latency, what about wifi": disagree, this is worthwhile (1ms vs. 2.5ms i think?) - to "video is inelastic": video will scale to fit capacity - to "some traffic will cheat": there's no benefit to cheating in L4S. - Luca: - (first point missed, please check video) - semantic disagreement with Stuart over inelasticity of video conferencing - need to support wireless as well. network will be heterogeneous, need to support them all. many bottlenecks will exist in the future, important for all apps, not just low latency apps - Jonathan: - In considering that "L4S is mature" there is the sunk cost fallacy. It's not demonstrated that ambiguity can be resolved. we have counterexamples and all sides agree at least more work is needed. ECT(1) as input decision would be premature today. We would be better to decide not to use ECT(1) at all would be a better decision for today, the main thing is getting deployment, and we have much better today than in prior years. please deploy ECN in some form. - Sebastian: - In reply to Stuart: "cheating backfires": as currently implemented L4S gives both high thruput and hi latency, there is no incentive not to cheat. - Anna: - If the goal for L4S is for all traffic to be L4S queue, how is the throughput lower for ECT(1)? - Stuart: The expectation is we have legacy greedy traffic for a long time, hope of reclaiming ECT(0) is like retiring IPv4, need to keep it for forseeeable future. We focus on what we want, long tail will be there for a while. - Anna: I agree we have long tail, but both high throughput and low latency would go in L4S queue - Koen: For now we still need 2 queues, we will have non-ECT flows too. L4S is possible for high throughput with some gaps, but traffic can always use classic queue. - Bob: It took 20 years to reclaim ECN nonce from only 1 use somewhere in Scandinavia... It will take time to adopt ECN. 5. Chairs Summary of Position - Gorry recap: - strong consensus in 2016 to continue L4S. do we have enough information today to make decisions we need to make today? - Wes, to summarize: - There continues to be support to use ECT(1) as input, but we heard range of opinions, not unanimous support. not sure if consensus, but majority wants input, some would like to use dscp+output. - If we go forward using ECT(1) as input, there are concerns about classic detection/fallback, will remain an important bit of work to be doing. - We saw open work to continue discussing about data links like wifi and cheating scenarios. - Gorry: During the L4S BoF people were unconcerned about transport fallback, but seems today to be the sticking point. is this what should stop us doing standardization? - Jana: We expect to have work to do, need to stop running in circles, we can do the work you've listed in the working group, and just need solutions. let's keep going and work out the problems. - David: Safety is crucial. This must not break things. interesting point mentioned: queue protection that might apply the disincentive, but right now the safety case is not there. - Stuart: A procedural request: Apple is not supporting SCE, please the update slides. - Gorry : OK, the L4S contributions checked this and we amended the slides to reflect company positions, we'll do the same for SCE. A consensus poll cannot be done in this meeting. Summary of tentative positions from Etherpad at conclusion of meeting: Sebastian Moeller (currently, I lean towards ECT(1) as output, this IMHO is orthogonal to the question whether L4S and/or SCE are to proced) Dave Taht (TekLibre, aka Rube GoldBerg) I am opposed to wide deployment of undroppable packets. L4S mandates that, SCE offers a gradual and OPTIONAL deployment path, and not widely deploying any form of ECN at all except for traffic that truly benefits is the sanest path forward. Greg White, CableLabs (Support L4S and ECT(1) as input)ECT George Hart, Rogers (support L4S and input signaling for ECT(1) Ingemar Johansson, Ericsson (support L4S and input signaling for L4S) Per Willars, Ericsson (support L4S, especially ECT(1) as input to 5G networks) Julien Maisonneuve, Nokia (support L4S and input signaling for L4S) Sebnem Ozer, Comcast (support L4S) Kyle Rose, Akamai: I am agnostic *if* part of the L4S work is to explicitly deprecate classic ECN The chairs will review the outcomes and summarise the next steps on the list.