TSVAREA @ IETF-97 (Seoul) Wednesday, Nov 16, 2016 Afternoon session I 13:30-15:00 Room: Grand Ballroom 2 Scribed by: Mat Ford * Administrativa (15 minutes) - TSV ADs - Note Well, Blue Sheets, Jabber Scribes, Agenda Bashing Mirja Kühlewind (MK) reviewed the agenda - TSV Overview and status MK thanked the Area Review Team for their service. WG status is at https://datatracker.ietf.org/group/all-status/ TSV ADs have been encouraging chairs to include their status report in the datatracker before every meeting. Spencer Dawkins (SD) commented that 3 WGs closed as Martin was stepping down and Mirja was stepping up, have had some BoFs since then, have chartered one new WG so Area now on more of an upswing. MK welcomed new chairs Colin Perkins (rmcat) and Michael Tüxen (tcpm). Also Mark Nottingham and Lars Eggert are new chairs for the newly formed QUIC WG. QUIC is planning an interim meeting in January. SD reviewed TSV document output since IETF 96. NFSv4.2 is major new document output. GRE & UDP encapsulation, 5405-bis are also notable. AQM WG has one more document in the pipe and is then basically done. L4S had good non-WG forming BoF at the last meeting, lots of interest from the community, but will not be forming a new WG. L4S documents are going into either tcpm or tsvwg. Everything related to TCP in tcpm, everything else in tsvwg. David Black (DB): one of tsvwg chairs. sense of the room was to adopt l4s docs, need to adopt ecn experimentation draft to clear the way for what l4s wants to do. * BoF Announcement: BANANA (5min) Margaret Cullen (MC) introduced the BANANA BoF (Bandwidth Aggregation for Network Access). Considering case where there exist multiple Internet connections and you want to have some way to use the aggregate bandwidth. Specifically interested in solutions that are in network, not in the hosts themselves. Question from the floor: IPv4, IPv6 or both? MC: Haven't discussed that yet. We will discuss the problem and points in the solution space, specifically GRE tunnel bonding, MPTCP proxy and bare UDP experiment. Coordination with Broad Band Forum is an aspect here (they have a single-provider use case for aggregating DSL and LTE links). Hope to determine a way forward - new WG?, existing WG? or work on a next BoF? This is not a WG-forming BoF so there is no draft charter right now. * Report from NetDev 1.2 (Oct 5-7, Tokyo) (10 minutes + 5 minutes Q&A) - Tom Herbert (TH) TH presented his slides (https://www.ietf.org/proceedings/97/slides/slides-97-tsvarea-linux-netdev-update-tom-herbert-00.pdf). MK - what are the dates for the next meeting? TH - NetDev 2.1 - March 2017, Montreal - dates not fixed yet - will try to avoid clash with IETF in Chicago Question from Jabber - What is XDP offload? TH - For more information on XDP, look at iovisor.org. There is also some documentation in the Linux docs, and on the BPF side. We haven't organised all the XDP documentation yet. Bob Briscoe (BB) - not just avoiding IETF clash, but getting the adjacent week would help people wanting to attend both. * Open discussion on Transition Strategies (Brian Trammell + chairs) Brian Trammell (BT) - this was also presented at TSVAREA in Berlin by Dave Thaler - there was very limited time for discussion on that occasion, so this is another opportunity to discuss this. [Based on a show of hands around half the room was not at TSVAREA in Berlin] BT presented his slides (https://www.ietf.org/proceedings/97/slides/slides-97-tsvarea-planning-for-protocol-transitions-draft-iab-protocol-transitions-03-dave-thaler-and-brian-trammell-00.pdf) Michael Welzl: Part of the ECN problem was that if you used it as it was, the benefits were not very clear. Missing documentation was a piece of the problem - Stuart Cheshire helped with that when he presented about head-of-line blocking going away when using ECN. ABE proposal helps with this as it is a simple change on one side - fixes incentives. Lorenzo Colliti (LC): QUIC works 94% of the time - there's no guarantee that that 94% might not go down - transition plan mostly moves monotonically, but quic has built in security so there are incentives against letting it work - so the amount of time that it works may go down until something else changes, but this is still way better than the IPv6 experience. Also quic is not a transition - it is totally opaque. Port 443 is the only port on the Internet. That tells you not to design a replacement but to design something that looks like the thing that peeople think they already understand. Then the only real problems are MTU problems. DNS over HTTP does SNI so either you block the whole transaction or you can evolve it. Trojan horses are the way of the future. BT: I will remember this 25 years from now when QUIC is widely deployed and the only way we can change things is by piggybacking over something that has the right quic version number on udp port 443. LC: I have heard that some quic release got rolled back after some firewall vendor broke. Already. BT: We have also seen certain kinds of udp rate limiting go away because people will complain if quic goes away. MK: wondering if these things are transitions or deploying new features - in ipv6 case we want ipv4 to go way - but is that the case here? BT: you keep the old tech but you hope everybody uses the new one. another transition that went relatively well was pre-sack to sack. so loss recovery works a bit better. rack similarly deployed well because it's at a single endpoint. we've also learned that you do have to think about these things when designing transitions in transport. MK: maybe not all transition points on the previous slide apply here. on mptcp, emergence of proxies helps deployment but not on endpoints - so we need a transition plan to get proxies out of the network which wasn't even something we thought of before. BT: yes, how to transition away from a transition technology. LC: quic works some of the time, slowly getting better. but we don't know what would happen if we use new features. quic can do 0-RTT reconnect and seemlessly resume the session from a different IP address. I believe we only have maybe a year to launch this before it becomes impossible. middlebox vendors will wise up and treat this a security problem. lesson might be don't make things extensible, make them monolithic to avoid middleboxes that only half understand something and consequently half break it. BT: crypto gives us the opportunity to make the wire image monolithic, while putting extensibility behind the veil. that's an interesting architectural principle. LC: using a different IP address is the problem, which is outside the crypto by definition. mobility is a special case. if you're trying to use a protocol to address the shortcomings of the layer below you, you can only do that for as long as the middle boxes don't know what is going on. BT: if middlebox vendor in outer control loop is your threat model, change wire image quickly enough that as soon as they deploy the thing it breaks David Black (DB): ECN may be in transition from what we thought it was to what it is going to be. L4s is important. ABE is another one. Both are promising the network that you will get better earlier response to congestion if you don't drop traffic. might be an incentive to routers to do it. ECN was a plug-in replacement for drops which lacks something in the motivation and incentives framework. Example of failed transition is udp-lite - dealing with udp checksum problems that could be solved by udp-lite if it carried udp protocol number, but it doesn't. Latest we've got is Joe Touch's clever hack to hide something between the end of the UDP packet and the end of the IP packet to tell you the checksum coverage is incomplete. An interesting example where we've ossified around something else. BT: I'm also big fan of that hack. But as someone who has written automated verification of security bugs this is one where the compliant thing is a security bug. It's a bounds-checking error. DB: Definitely a clever hack if it works. If we'd had udp-lite we wouldn't have had the horrendous discussions about turning off udp checksums for IPv6. MK: Layering can help us. If you can separate nicely who needs to know what and do which action and not mix everything that can be very helpful in transition. BT: That's what I think I'm getting at with the wire-image concept. Colin Perkins: ECN may be enabling operators to do what they want to do, e.g. not drop packets. Also wonder if L4S runs the risk of muddying the perception of when it is safe to turn ECN on (it's still changing so keep it turned off). BT: There is a work item for us to be very clear about the impact of L4S on ECN. CP: Picking a name for former ECT(1) codepoint that doesn't involve ECN could be important. LC: ability to evolve once you've launched a protocol - if you have something that provides additional benefit to users of protocol, you have to launch it simultaneously to avoid naïve blocking. SD: transitioning away from transition mechanisms: how do you coexist with a coexistence mechanism? BT: we don't address that in this particular document, we probably should. multiplexed transition mechanism is one that endpoint implementations then have to pay attention to. mptcp - have multiple interfaces, could have multiple paths, some using proxies - how do I deal with this? should i build something into protocol to discover and interact with that? i'm probably going to have build something to discover proxy anyway, but then it's turtles all the way down. another situation where you've added code paths to work around something that will hopefully go away, but the code paths won't go away. Martin Duke: when we're talking about incentives i don't think endpoints are an important stakeholder. convincing OS implementors is the key. BT: apple deployed ECN because testing showed better user experience. Not clear why decision was made to add to Linux kernel. MD: Not so much adding to kernel, but turning on by default is the key decision. BT: there was one problem with ECN in particular where we were talking to some kernel devs that owned that code about making ECN 3168 fall back work. if ECN negotiation doesn't work you remember that and don't try it again in future. in discussion one of developers observed serious breakage because he was on one network that was messing with ECN signalling due to a bug in how they had down QoS inside their network. It took a while to convince him because it was this little tiny cable operator in Germany. lesson from that is that local conditions can also affect peoples willingness to help you in a transition. SD: I was around when NATs were deployed in the first place. By the time IPv6 was a realistic thing to deploy you had all these auditors with network security checklists asking 'do you have a nat'. impact of operational practices on our ability to evolve the internet. BT: that's a good point because you have the same thing with certain compliance regimes that make TLS OK and anything that's not TLS is not OK. may have impacted decision to move quic to TLS 1.3. Think we're reasonably safe from compliance level meddeling in transport apart from the NAT thing. SD: i think this is something to discuss between iab and iesg on how we can get a better handle on what practices are out there and what the impacts are going to be. Mat Ford: Following on from point about endpoints not being relevant. It's generally accepted that flag days are impractical for the entire Internet, but they can be of use with some smaller constituencies. For example, World IPv6 Launch where identifying key actors and having them agree to jump off the cliff at the same time then they can bring a lot of the Internet with them. Seeing similar things now with centralisation of web content and infrastructure services which has its downsides but can help with getting large portions of the Internet to adopt some new technology. BT: Fewer phone calls to make, absolutely. Jen Linkova: looking at the cost benefit graph, benefits are for one player but costs go elsewhere. critical factor might be timeframe in which benefits can be realised. some operators do not care about 10-year timeframes. BT: beyond their reporting horizons. hard to predict inflection points. engineers unlikely to give inspirational estimates. BB: on the subject of timing, it's not necessarily the operators, could be operator forums like 3GPP or GSMA. Eg 3GPP release cycles - if you miss those you can be very disadvantaged. what you're doing has to be good enough to be compelling considering the impacts of the next release cycle. BT: this was good input to the document, thankyou very much. MK: discussion can be continued on stackevo-discuss list and tsv-area mailing list. iab is working on the document, but what will come after publication? BT: had idea of transition considerations section but that seemed bureaucratic to everyone. a question on the document shepherds checklist (is there discussion of how this is going to coexist or replace other things) might help. DB: i suggest having a look at the ops area checklist - if the ops area does a full review, the reviewer will hit transition and coexistence questions. appendix to 5706. BT: we could have the opsdir take a look. DB: that is the right area to look at it. probably need to have a discussion of resources to do the reviews. CP: we're quite good at producing guidelines documents and not so good at advertising them. plenary would be fine, but need to reach beyond ietf to wider networking community. we're not very good at reaching the academic teaching community who could use the documents to educate the next generation to be better at this stuff. MK: talking about protocol transition is technical but we have also identified non-technical aspects - phone calls, getting the right people talking to each other - more political, about outreach and educating people SD: thanks to Brian Trammell for bringing this here. Any other business? BB: AQM working group - if it closes where does AQM work go? MK: It's not closed yet, still discussing, but presumably future work will go to tsvwg. aqm closing is because we had a meeting Berlin, lots of discussion, but there are no new work items coming out of that group. Fallback tsvwg allows a way to move forward. SD: if there is work that needs to be done and a community of folks willing to do it we will find a home for it. DB: as tsvwg co-chair, we pick up a lot of stuff in tsvwg and adding aqm to the list is fine. it doesn't have to stay put - tsvwg is a 'this really needs to be done' wg, but it would also be fine for stuff to find its way out of tsvwg into its own wg too.