# IETF 116 RTGWG Minutes {#ietf-116-rtgwg-minutes} Chairs: Jeff Tantsura (jefftant.ietf@gmail.com) Yingzhen Qu (yingzhen.ietf@gmail.com) WG Page: https://datatracker.ietf.org/group/rtgwg/about/ Materials: https://datatracker.ietf.org/meeting/116/session/rtgwg * * * ## 17:30-18:30 - Monday Session IV, March 27 {#1730-1830---monday-session-iv-march-27} * * * #### 0. Meeting Administrivia and WG Update {#0-meeting-administrivia-and-wg-update} Chairs (10 mins) ============================================= ### WG document Update {#wg-document-update} ============================================= #### 1. YANG Models for Quality of Service (QoS) {#1-yang-models-for-quality-of-service-qos} https://datatracker.ietf.org/doc/draft-ietf-rtgwg-qos-model/ Aseem Choudhary (10 mins) No questions at the end of the presentation. ============================================= ### Individual draft {#individual-draft} ============================================= #### 2. Considerations for Protection of SR Networks {#2-considerations-for-protection-of-sr-networks} https://datatracker.ietf.org/doc/draft-liu-rtgwg-sr-protection-considerations/   Yisong Liu / Changwang Lin (10 mins) * Jeff Tantsura: What's your plan for this document? You listed all protection techniques. * YingZhen: The draft is informational. It doesn't seem like you need any extension * Weiqiang: the draft is about deployment of protection. all the components have been discussed in the relevant groups. This draft is only for information only. * Greg Mirsky: Slide #6: it is not clear what is the purpose of using BFD session. BFD doesn't provide information about the locator SID, it only verifies the forwarding path. I see it is over complicating the OAMs * Jeff T: you can add some deployment considerations, when to use which protection technology. * Weiqiang: we give the full picture of the all possible solutions. not necessarily all of them are needed. some operators might only need a subset. We will optimize the OAM. We think the document is mature enough. We would like WG adoption. #### 3. Scenarios and Challenges of Overlay Routing for SD-WAN {#3-scenarios-and-challenges-of-overlay-routing-for-sd-wan} https://datatracker.ietf.org/doc/draft-sheng-rtgwg-overlay-routing-requirement/ Hang Shi / Cheng sheng (10 mins) * Jeff T: you can use BGP for multicast, which has been there for many years. Deployment of BIER is non trivial, and it's not supported by silicon. What you described is vanilla SDWAN deployment for years, looking forward to your protocol enhancement. it is not clear on what you want to do. What is your next step? * Shi Hang: This is a requirement document. We are looking for feedback and see if there are interests to collaborate. If so, we will propose solution based on this requirement. * Tony P: The BIER BGP extension draft is under WGLC in IDR. If you need any changes, please go to the relevant WGs to propose the extension. #### 4. Signaling In-Network Computing operations (SINC) {#4-signaling-in-network-computing-operations-sinc} https://datatracker.ietf.org/doc/draft-zhou-rtgwg-sinc/ Signaling In-Network Computing operations (SINC) deployment considerations    https://datatracker.ietf.org/doc/draft-zhou-rtgwg-sinc-deployment-considerations/ Zhe Lou (20 mins) * Adrian Farrel: wearing my CATS chair hat. I try to see the difference between this and CATS. I see this as overlay as well, but it's on path computation between two end points. It is similar to SFC. CATS is about A-> B, and B does the calculation. CATS is about which B to use and how to get to it. * Zhe: Correct. We try to define what is within the network. * Adrian: How does transit node know there is SINC header under the encapsulation? * Zhe: transit nodes don't know. They just pass the traffic. The SINC capable node will find the header. * Adrian: so the SINC capable node are deep parsing. * Jeff Tantsura: You are define the collective capability. The operation stateful, you're not really building routes, you're building trees. How do you signal when operation starts and end? resiliency? Collective operation can take long time, like large language models. I'm expecting more from the document. We're looking at collective tree operation than just encapsulation. * Zhe: We start from fixed domain, like DC, so in a controlled environment. * David Lamparter: You should focus on the characteristics of computation (primarily: processing single packets vs. aggregation), ignore what computation itself actually is. Separate problems. Just describe what different cases you are talking about. * Zhe: Routers should announce its capability. It should be put somewhere else. ### Chat History {#chat-history} * Jim Uttaro 00:26:44 Along with scalability it would be helpful to understand the operational complexity ======================================================= ## 09:30--11:30 - Friday Session I, March 31 {#09301130---friday-session-i-march-31} * * * #### 0. Welcome and Introduction {#0-welcome-and-introduction} Chairs (5 mins) #### Agenda bashing {#agenda-bashing} * David Lamparter: I do not believe the BGP Blockchain draft has sufficient merit to be worth our time here, would just like this to be recorded for future sessions. ============================================= ### Individual drafts {#individual-drafts} ============================================= #### 1. Routing on Service Addresses {#1-routing-on-service-addresses} https://datatracker.ietf.org/doc/draft-trossen-rtgwg-rosa/ Dirk Trossen (15 mins) * David Lamparter: Extension Header needs clarification, maybe for next presentation. * Dirk: you can find some technical details in section 7. * Aijun Wang: SAR needs to have a full table, no? How does it get the information? * Dirk Trossen: Problem is understood but the routing table is limited to the services a ROSA domain serves, thus not ALL services of the Internet. * Aijun: ICNRG has some work going on but not depending on IP network. * Dirk: This one is to run over IP network. #### 2. BGP Blockchain {#2-bgp-blockchain} https://datatracker.ietf.org/doc/draft-mcbride-rtgwg-bgp-blockchain/ Dirk Trossen (10 mins) * David Lamparter: All of the use cases have existing authorities that control the topic at hand. Applying distributed consensus into that is entirely useless. * Dirk Trossen: Trying to do permissioned DCS, not permissionless. * Q Misell: The cryptographic parts would require buy-in from e.g. RIRs, have you been in contact with them? * Dirk Trossen: Still trying to figure out, need to facilitate discussion somehow. * Andrew Alston: (personal) Answer to facilitating discussion: take this to the IRTF. Too early, not even vaguely ready for standardization, it's a research topic, better suited to IRTF. * Rüdiger Volk: Not seeing a direction what this is trying to tackle, kitchen sink of problems that are raised once in a while by people using BGP. Better cut down to specific problems. Possible cyclic dependency with the network operating itself. * Dirk Trossen: ACK on Andrew's comment, will be taking this there. * Jeff T: speaking for myself. I support to get this to IRTF than IETF. #### 3. Protocol Assisted Protocol (PASP) {#3-protocol-assisted-protocol-pasp} https://datatracker.ietf.org/doc/draft-li-rtgwg-protocol-assisted-protocol/   Zhen Tan (10 mins) * Greg Mirsky: Characterization of problems only, no notification/propagation of the failure information? * Zhen Tan (ZT): Protocol for gathering information on a device. It helps to locate problems on the internet. * Greg Mirsky: Will devices keep some history log of events? (ZT: Yes) How long? * Zhen Tan: Vendor dependent * Greg Mirsky: So, possible data will already be gone when operator looks at it? (ZT: Yes) Notification would allow data to be stored elsewhere. * Zhen Tan: We have notifications, like in use case #2. There will be pre-configurations about sending notifications. * Aijun Wang: Protocol already knows reason for failure, what is the benefit of having a separate protocol? * Zhen Tan: This does not need to keep another connection. PASP uses UDP, the connection is on demand. * Aijun Wang: OK, need to understand reasons protocol itself can't do this. is PSAP a on-demand protocol? * Zhen Tan: yes. * Adrian Farrel: You said RSVP, did you mean RSVP-TE? (ZT: Yes) This might not be as applicable, RSVP-TE already has mechanisms for collecting and propagating fault information, e.g. RFC 4873. * Zhen Tan: Goal is to have one way to get the information for all protocols, otherwise hard to gather information. #### 6. Routing in Dragonfly topologies - problem space and solutions {#6-routing-in-dragonfly-topologies---problem-space-and-solutions} Dmitry Afanasiev (20 min) * Tony Przygienda: Add path will cause massive path hunting, you need this (tunneling?) You need to shout everything off, it's a broadcast domain. Dynamic routing will be too slow, had that discussion before. Shifting traffic based on congestion is a viable thing, outside the realm of building a routing protocol that can keep up. Doable? Yes, look at DAR (dynamic adaptible routing), was looking at previous traffic, statistic on success. Works stunningly well. For dragonfly, when the network grows big, you will have to tunnel to keep 3-hops, then you will have to broadcast it and recompute. Flooding & broadcasting approach versus reactive shifting flows around. Reactive might be better to keep up. * Dimitry Afanasiev: Using VRFs rather than Tunnels, but yes. Adaptive routing works in milliseconds. * Tony Przygienda: Amorphous Broadcast domain? * Tianji Jiang: Reminiscent of previous work 15 ears ago by Brocade, L2 was done using trill, has that been looked at? * Jeff Tantsura: A lot of development happening on this topic recently, some of it needs to happen at IETF. #### 4. Tactical Traffic Engineering (TTE) {#4-tactical-traffic-engineering-tte} https://datatracker.ietf.org/doc/html/draft-li-rtgwg-tte-00 Colby Barth (15 mins) * Greg Mirsky: This is to be monitored on a link level, not path level? (CB: yes) Detection of congestion happens on egress to link, action is supposed to be taken by ingress (upstream node)? * Colby Barth: Action is to be taken at the point of (local) repair, that would otherwise act. Congestion is detected on a node's outgoing interface, which also serves as the repairing node. * Greg Mirsky: So, monitoring outgoing interface, and taking action on that. Not monitoring incoming queue, rather outgoing. (CB: yes) Not taking notification from sources of traffic? (CB: Yes.) Action is local, so effect overall on other flows cannot be considered, right? * Colby Barth: Yes, only flows transiting the affected node can be considered. Not attempting to come up with a global fix. * Himanshu Shah: This is doing TI-LFA on a congested outgoing link, that has been done before. Isn't this just a local implementation that doesn't need an IETF specification? * Colby Barth: Absolutely correct, this is an informational draft, it's a local node decision. * Himanshu Shah: What happens to other ongoing traffic, won't this make things worse elsewhere? * Colby Barth: The example uses TI-LFA & tunnels, but other mechanisms can be used. We call it TTE tunnels in the draft. * Himanshu: e2e tunnels are precalculated, better switched to those then using ti-lfa. * Zhenbin Li: for TE tunnels, if you change in the middle, will this cause packets out of order? * Colby Barth: Are you asking about this causing out of order packets? (Yes) * Colby Barth: Typically the hashing algorithms should be flow based which should alleviate problems. * Tony Li: Delay change is more significantly and will cause congestion control impact. There is performance impact. * Jeff Tantsura: Other similar approaches suffer a lot from their locality of action and may cause downstream congestion. Adding some non-local decision might help. #### 5. Requirement of Fast Fault Detection for IP-based Network {#5requirement-of-fast-fault-detection-for-ip-based-network} https://datatracker.ietf.org/doc/draft-guo-ffd-requirement Framework of Fast Fault Detection for IP-based Networks https://datatracker.ietf.org/doc/draft-wang-ffd-framework Haibo Wang (20 Mins) * Greg Mirsky: Terminology - mechanism used isn't failure detection, more about failure notification? Using other mechanisms, e.g. BFD, to detect defect? what you described is about propagating the information in management plane. * Haibo Wang: Yes, other mechanisms in use in parallel, but also don't want to run heavyweight things on endpoints. * Greg Mirsky: Motivation is the large delay in discovery (>10s)? * Haibo Wang: It's based on keep-alives. In some scenarios it's much longer, 5-15 seconds, or 15 mins. * Greg Mirsky: Hints to me that there's some OAM mechanism missing. It seems to me not a good design. * Zhenbin Li: Overlap with CATS working group, coordination? * David Black: For an unconverged failure, how do you detect that the failure is unconverged or converged? Network examples on slides are simple and obvious - how would determination that a failure is unconverged be made for a more complex network such as the dragonfly networks described in item 6 earlier in the meeting? (Communication issues at this point. To continue on list.) =========================================================== ### Side Meeting Update if time allows {#side-meeting-update-if-time-allows} ============================================================ #### 7. APN Update {#7-apn-update} https://datatracker.ietf.org/doc/draft-li-apn-problem-statement-usecases/ https://datatracker.ietf.org/doc/draft-li-apn-framework/ Zhenbin Li/Shuping Peng (10 mins) * Joel Halpern: Presentations of unchartered side meetings do not seem appropriate for this working group; next one on agenda seems to have the same problem. Please try to get to a problem statement we can progress on. Frustrated with the structure. #### 8. Summary of GIP6 Side Meeting {#8-summary-of-gip6-side-meeting} Hongyi Huang/Qiangzhou Gao (5 mins) *presentation skipped, not enough time* ### Chat History {#chat-history-1} Louis Chan 00:18:57 For ROSA, is there development requirement for client application? David Lamparter 00:24:45 There seems to be no note-taker, I've hopped in but I'm a bit multitasking-limited as I have comments to make too :) David Lamparter 00:24:59 (or is someone taking notes outside the pad?) Jeff Tantsura 00:25:32 David - hope you could do it David Lamparter 00:26:31 I'll try my best. Would still appreciate if you could ask the room if someone else wants to share the load. David Lamparter 00:27:24 https://notes.ietf.org/notes-ietf-116-rtgwg?edit Andrew Alston 00:36:27 I cannot see how this is vaguely ready to look at in terms of standardization - I can see how someone may wanna try and do some research on this in the irtf - maybe Yingzhen Qu 00:39:57 https://notes.ietf.org/notes-ietf-116-rtgwg?both Yingzhen Qu 00:40:19 Please contribute to notes David Lamparter 00:40:37 (Uh. That comment was very disingenious. "Just asking questions. Can't take questions to the IETF?" … you asked your question, you just didn't like the answer. I'll point this out to Dirk after the session.) Anthony Somerset 00:47:42 MD5 is not considered secure anymore surely? Jeff Tantsura 00:48:37 for quite some time Joel Halpern 00:49:33 This PASP thing seems to be addressing an already multiply-solved problem. David Lamparter 00:54:59 The agenda copied into notetaking pad doesn't match the room… I assume the agenda in the notetaking pad wasn't updated for some rescheduling Yingzhen Qu 00:59:18 @David. you're right, I got the presentation sequence wrong, this is supposed to be #6. Sorry about that David Lamparter 01:00:18 OK, no problem, I was just confused in the notes for a moment :) Hesham ElBakoury 01:07:24 When sinc will be presented? Yingzhen Qu 01:08:25 @Hesham, SINC was presented on Monday Greg Mirsky 01:20:50 voice is breaking. Perhaps not using video feed might help John Scudder 01:21:20 Audio seems better now. I assume it was affecting everyone and not just those of us onsite? David Black 01:21:35 Yes, affected me - remote. Jeff Tantsura 01:23:44 me too Tony Li 01:30:48 There's no signaling at all. Nothing to interoperate. David Black 01:33:33 Still have an opportunity to misorder when an in-progress flow is switched to another path. Tony Li 01:34:58 Misordering is more likely when deactivating a prefix. You're moving a flow from a presumably suboptimal path back to an optimal one. Tony Li 01:35:22 In any case, ordering and latency are possible issues ANY time we change the routing table. Shaofu Peng 01:36:53 Hi Tony, In the absence of a central orchestration of controller, when a node in the network implement local path switch, they cannot perceive the impact on how much traffic will be affected, which may lead to congestion on a link in the new path. Of course, it is exactly difficult to learn how much traffic will be affected, but if we have that knowledge, that will be more perfect. Jeff Tantsura 01:38:31 @Tony - you might consider using a similar strategy as with adaptive routing/DLB and move the flow only if the interpacket gap is large enough not to cause reordering, it is somewhat less of an issue in the WAN (perceptionally) than in DC, but still something to think about Tony Li 01:38:54 No argument. One of the more intensive ways of using this technique is also to monitor per-prefix traffic levels and decide to select prefixes to balance bandwidth utilization. Tony Li 01:39:35 @Jeff we're not too worried about this, given that the alternative is packet loss. Tony Li 01:39:51 But we don't want to thrash, either. Jeff Tantsura 01:40:09 absolutely, rebalancing usually yields better results than binary on/off Shaofu Peng 01:40:42 IMO misorder is out the scope of this proposal... Tony Li 01:44:44 It's not really out of scope. It's more that it's the lesser of two evils. :-) Shaofu Peng 01:48:32 Agree, I just think that it is another local behavior, similar to LFA, TI-LFA, and previously, we have not raised any concerns about the disorder of these local behaviors. This issue is addressed by other technology. David Lamparter 01:54:58 I'm incredibly confused \[by the discussion, not the problem David Black describes\], not sure what to put in the notes here. John Scudder 01:56:13 I think David's point is very well-taken. If this problem (insofar as I understand what the speaker is trying to do!) were easy, it would already have been fixed. If there are low-hanging fruit special cases, then identify them and make the case that they're worth addressing, but I don't think that's been done. Jeff Tantsura 01:56:52 +1 John