*TSVWG - Virtual Interim Meeting *May 10, 2021 *Attendees: Wes Eddy (MTI Systems) - Chair Gorry Fairhurst (University of Aberdeen) - Chair David Black (Dell EMC) - Chair Rodney W. Grimes (rgrimes@freebsd.org, netDEF/SCE) Pete Heist (SCE) Jonathan Morton (SCE) Ilpo Järvinen Greg White (CableLabs) Martin Duke (F5) Ingemar Johansson (Ericsson AB) Steve Blake Huichen Dai (Huawei) Mirja Kühlewind (Ericsson) Michael Scharf (HS Esslingen) - partly Ermin Sakic (NVIDIA) Stuart Cheshire (Apple) Vidhi Goel (Apple) Alex Burr (Kano Computing) Mohit P. Tahiliani (NITK Surathkal) Koen De Schepper (Nokia) David Millman Spencer Dawkins (Tencent) Michael Tüxen (Münster University of Applied Sciences) Sebastian Moeller Philip Anderson (Charter Communications) Bob Briscoe (Independent) Jason Livingood (Comcast) Asad Sajjad Ahmed Neal Cardwell (Google) Anna Brunstrom (Karlstad University) Jake Holland chi-jiun su (Hughes Network systems) *Minutes: *-- 1. Introduction & IETF Note Well - chairs (5 minutes) Note Well and chairs slides shown (see slides) *-- 2. Transport requirements from: draft-ietf-tsvwg-ecn-l4s-id - Status update from Bob Briscoe & Koen De Schepper et al (15 minutes) Use of SHOULD vs. MUST for reducing RTT bias will be worked out via email. Summary of secure VPN reordering concern on slide disagreed with in meeting - will be taken to list. Result will need to be added to security considerations. Ingemar: I believe that the possible replay issue is not only an L4S problem. For that purpose I believe that this deserves a wider discussion (not just TSVWG) to get an understanding of under which conditions packet reordering becomes a problem (if they exist). This is however not something that will or should be handled now and I don't see it as something that should affect the L4S RFC timeline David: @Ingemar: There are other causes of reordering, diffserv in particular. Diffserv-caused reordering has been addressed in the relevant RFCs, e.g., for IPsec. In contrast, L4S is new at this juncture. With luck this & Ingemar's comment will get picke up in list discussion. Sebastian: L4S adds a whole new re-ordering mechanism, ECT(1) over ECT(0) even for packets with identical DSCP. The effect is that C-queue traffic gets easily starved/suppressed by ECT(1) traffic (if the ECT(1) packets are hoisted early enough to move the replay-window so much, that the NotECT/ECT(0) packets that were delayed in the C-queue arrive with a replay sequence number below the replay-windows lower end). Conceptually that is similar to re-ordering from different latency paths through DSCPs, but DSCPs are rarely end2end and ECN bit so far did not matter for this kind of re-ordering. - Open discussion (40 minutes) Jonathan: Draft should provide a reference algorithm to implement monitoring. Reference code that is available is not reliable under lab conditions. Algorithm not documented in an IETF draft. Koen: Reference code has parameters that need to be tuned. Anticipates deployment-specific tuning. David: Ought to add discussion of algorithm and tuning to draft (Bob suggests: possibly in Appendix) Bob: Only affects long-running flows. Jonathan: Dispute that. Wes (chair): Would like to hear from implementers on this topic. Bob: Whitepaper contains an out-of-band detection algorithm for RFC 3168 AQMs, could add to draft. David & Bob: There is a problem in L4S reordering interaction with anti-replay in secure VPNs, will take discussion to list. (Sebastian's above comment on re-ordering is related to this). Jonathan: Experiment success criteria are deployment-centric. Need to look at safety, particularly with respect to RFC 4774 Option 2 (check that routers understand new ECN semantics) vs. Option 3 (new ECN semantics coexist well [friendly] with competing traffic). David: L4S was originally designed for RFC 4774 Option 3 - whether it has met the criteria to use that option is an open issue for the WG to discuss. The RFC 4774 options are in Section 4: https://datatracker.ietf.org/doc/html/rfc4774#section-4 Sebastian: The success measure for an AQM needs to be active use and NOT simply passive deployment. Jake: Asks about expected timeliness and responsibilty for response to monitoring-detected problems. Bob: Recommendation is for real-time monitoring, relies upon absence (or close to it) absence of false negatives. Pete: Is congestion-control interaction of L4S/non-L4S flows in a shared RFC 3168 queue similar/analogous to interaction of DCTCP/non-DCTCP flows? Ingemar: Scream is driven by video encoders, network queues will often be empty because there is not an always-present backlog of data to send. Bob: Not sure why question is being asked, DCTCP does not meet L4S "Prague requirements". Pete: Reason for question was whether DCTCP restrictions settle coexistence question. Bob: L4S "Prague requirements" have improved on DCTCP, does not consider DCTCP to have settled RFC 3168 coexistence question. Prague *in L4S mode* is not expected to coexist well. More discussion to come on list. Koen: Sees role for both L4S and non-L4S services in future of Internet. *-- 3. Safe Internet-wide experimentation: draft-white-tsvwg-l4s-ops (or newer WG version) - Status update from Greg White et al (15 minutes) Sebastian: Recent paper with 5% use of ECN seems fishy, but 0.3% use on HTTP/HTTPS traffic agrees with Akamai results reported on slide. David: DSCP material ought to be added Greg: Further discussion, couldn't figure out what to do from (confusing) list discussion. David: Will send note to list on network-only use of DSCPs, without endpoint reaction to received DSCPs. - Open discussion (40 minutes) Stuart: Would like to see latency improved (has been ~0.5sec for too long, RFC 3168 not widely deployed). Need a selector for L4S treatement, end devices want best behavior at bottlenecks, independent of whether they're RFC 3168 vs. L4S. Interested in whether DSCP marking will provide a feasible path forward as alternative to ECT(1). Seeing increasing areas of traffic that want both bandwidth and low-latency, e.g., video streaming. Greg: Hope to see widespread deployment of L4S, improve classic ECN deployed systems over time. Stuart: fq_codel deployed, deployments increasing, classic ECN will be with us for at least a decade. Just by moving mobile phone around house, network bottleneck may shift between cable modem infrastructure and home WiFi AP, if latter has fq_codel/classic ECN, it's unlikely to be upgraded. Spencer: I strongly agree with Stuart about the idea that "some applications want high bandwidth and others want low latency" is not a useful strategy long-term. In discussions about applications that want to use multiple connections in the QUIC working group, I am seeing more and more people saying that they really care about close control of latency. (https://datatracker.ietf.org/doc/draft-dawkins-quic-what-to-do-with-multipath/ and https://datatracker.ietf.org/doc/draft-dawkins-quic-multipath-questions/) Especially pleased with discussion about possible DSCP guard usage that may move this work forward soon. Jake: 0.3% of Internet users seems small, but that's millions of people. Need to consider reactions to breakage that occurs. Greg: Looking to future where RFC 3168 support is L4S-aware. Bob: RFC 3168 only causes problems when multiple flows are in same queue. Not convinced that short flows are an important part of the problem - long flows may be appropriate focus. Will be important for DSCP discussion to distinguish DSCP usage as 1) traffic marking, 2) classifier and 3) part of transport protocol behavior. (David: Agrees) Koen: Has not seen a bulletproof DSCP solution. Existing usage of DSCPs constrains what's possible. Stuart: If network treats ECT(1) as Not-ECT, that's bad, provides incentive for app developers to not use ECT(1) because result could be worse than ECT(1) [RFC 3168]. (Bob: agrees that this would be bad). Jonathan: fq_codel (in RFC 8290) provides improvement over prior technology, should be baseline for judging utility of L4S improvements. Ingemar (from chat): I have over the years tried to push ECN support into LTE but so far it has not materialized, the main reason is that it did not give a large enough delta improvement when I tried (~5 years ago). The situation is a bit different now with the emerging interest in XR/cloud gaming/remote control so ECN may be easier to push. There are however a few aspects related to 5G access that makes L4S more appealing. One important is fast fading which is a natural part of cellular access. The high marking intensity of L4S makes it possible for an interactive application to react promptly and reach a working point that gives sufficient headroom for the fading dips. It has been hard to reach a similar good balance with classic ECN, mainly because of the more sparse marking. *-- 4. Wrap-Up - chairs (5 minutes) Wes (chair): Does L4S operations draft contains sufficient info to run L4S experiment? Specific Question asked: Does the group agree that with these guidelines available, that L4S will be suitable for experimentation in parts of the Internet? Over half of attendees (close to 20 of about 35) agree. A smaller number, about half a dozen attendees, do not agree. A number of people did not express an opinion.