IDR meeting at IETF 99 (version 06) 9:30-12:00 am, 7/20/2017 Room: Congress Hall III Sue: The show^Wmeeting is about to start in a few minutes, we have not that full agenda this time. Sue: Mike procedures. I am Sue, the chair of IDR. JGS: Note Well, the new Note Well. Sue: Please take a look at the new Note Well. It covers more sessions. Anything you say here is under the NW. JGS: It references RFC8179 on IPR. JGS: Please state your name clearly at the mike. Chair's administrivia [9:30-9:40 am] 0) Agenda bashing and Chair's slides (10) JGS: We have been a busy WG. No new RFCs issued, one is pretty close. 2 sent to the IESG, barring surprises those should move soon. Finished WGLC, waiting for shepherd. 2 WGLCs in progress, please provide feedback. WGLC for tunnel encapsulations, active discussion that called for a revision. This will be discussed during the meeting today, WGLC to follow soon. Plus one is better than silence, encourage to provide better feedback. Anything but silence. 5 new WG documents. Discussion on whether SR related ones need to be merged. There was no enthusiasm to merge them, they still remain as 3 separate ones. Update on the OPEN policy, waiting for an implementation. We suspect there are implementations but that is not reported. One thing that is not on the slides which will likely have to defer to a later discussion is the question of early allocations. I will discuss allocation stuff after the status. It is not allocation, it is implementation. IDR has a relatively unusual requirement tradition of having protocol implementations before sending to IESG. The question came up a few months ago why do we allow things to go to WGLC before we have evidence of implementation. That is a fair question. If WGLC concluded and you do not have implementation then you end up in this state, and if you sit there for long enough then we will ask the drafts to fix the found defects and will do another WGLC. Should we change our procedures in this group that we want two reported implementations before we proceed to WGLC? Pausing now and asking the room. Randy identifying himself John Scudder/Juniper Networks :-) :-): This will make a queue that has mixed things in it, now we have two separate queues. JGS: This is a reordering of the queue. Randy: There will be many things waiting for implementation. Job Snijders: We should reorder the procedure. We have fairly large drafts that have tons of codepoints assigned. WGLC time would be a much better time to say that we have an implementation that at least compiles. If we discover mistakes later we have to deal with that. I would see that the quality will improve this way. JeffH: I disagree with Job. The purpose of this is to have the clear specification that would allow for doing an interoperable implementation. If you find issues as part of implementation. Bruno: If you delay for a long time before you get as second implementation, people from first implementation will not accept any technical changes. Acee: I like to agree with Jeff. If you delay you are backing further away the whole process, how do you manage the change. You are raising the barrier for implementations. Sue: I was at GROW and we missed yet another RFC that was useful for operations people - AS4 communities. How does that affect the waiting? Mahesh: Any questions on BGP YANG model. What is the status, what are we doing with it? Sue: We have a challenge. The YANG doctors new requirement is that all models adopt the revised datastore. We will see whether the models are deployed, I have spoken to a friendly AD and we will likely publish the current one and then the revised one. It is a compromise mode to be friendly to other WGs. This will be brought to the list. Keyur: I agree with that. There are implementations. For those implementations that are shipping it makes sense to document the existing work. JeffH: This is what versioning in YANGis for. It is fine to ship 2.0 if it fits the new thing. It is not a huge challenge to reorganize the code. Sue: Question for you Jeff. If that versioning does not work and we use two distinct models? JeffH: Even if we have version, the BGP model will not be backward compatible with revised datastores. Therefore, we will need 2 different versions. Mahesh: If you are asking for two different mane spaces then probably it is not a big problem. Do you need those models to be backwards compatible? Sue: Generally yes. To reiterate the question, do we require that two models be backwards compatible? JeffH: That is fundamentally not backwards compatible. JGS: Back to the boring status update. We have 2 new IPR disclosures. Randy: Is my perception correct that we are doing more code points than in the past, what has changed? Is there a permanent change that we need to adapt to it? JGS: That is a great question and I will ask to hold it until the next talk. JGS: We have a wiki. JGS: Hum, if you use a wiki once a quarter. Some people do use it. Hum or raise your hand if you never used a wiki. Quite a few. It would be interesting to know what we are not doing right. Is there information that is not available that you would need? Robert: There is another draft waiting for implementation, ORR, both are reported in the implementation report as of 10 minutes ago, it is ready to WGLC now. JGS: Please add that to the minutes. Alvaro: I was wondering before we were discussing YANG model, are we finished with that? JGS: To summarize what I heard - I heard more people saying lets try keeping roughly the way as it is. I do not want to say what the WG policy is on the fly, instead to report to the WG in a few weeks. JGS: We keep having a topic on codepoints. It is thought to be a good policy in IETF on getting codepoints from registry and not just use out of band or self allocation of codepoints. As or right now the WG policy is not to put codepoints to documents until they are allocated. If you are targeting your individual drafts as a WG document, you need to address this. People have reasons to do what the are doing, what might those reasons be, what could we do differently. Three possible ways forward. I started the conversation on the list a week ago. One of the answers is that publishing in my draft is better than quietly allocating and using it without telling anyone. I hope we have a better option. Code development moves and IETF process moves at different timescales, I do not want to slow the development process in order to get a codepoint. Lets talk about this. I was looking through RFC8126, it has good stuff in it. It captures the essence of our problem. [quote from 8126, see slides]. This is what we are seeing in a nutshell. There is a zoo of 10 different policies outlined in that RFC. We are not required to use those templates. If we can reuse some work form there would be better than invent out own. How many people have registered something in FCFS registry here? It is easy to use. It may be slightly slower than putting a number and sticking into the draft, but that is the same time scale. Virtually no overhead. One of objections to changing everything to FCFS was no sanity check at all and whether we do not need to go all the way of cats and dogs, there is also expert review process. It is almost the same as FCFS, there is an expert who guards the registry. As long as the expert has right expectations of what the job is it rather lightweight. All of those processes basically say that you have to have an RFC number to get a codepoint. This is a circular dependency. To break that we have an early allocation. The criteria for early allocation are that you have to have a spec that people can read and understand, the spec has to be stable, there has to be interest from community in deploying this thing. If you cannot get your document adopted you havent demonstrated that that bottom expectations are met. Once you have done that in parallel with WG adoption, you can request early allocation, we poll the list, if there are no objections we ask for a codepoint and AD generally says ok, then we send to IANA and we are done. This may take 3 weeks-ish. What can we do about getting people the codepoint? Three options: beatings will continue until morale improves, we should yell louder at those putting codepoints into drafts. We can embrace the anarchy and say do whatever you have to do. We will stop controlling what you put into your drafts. Speaking as a WG member that does not make sense to me. Maybe we should stop maintaining the registries at all in this case. We can also say that we have a process and people find that the process is not working maybe we should change the process. Should we reclassify some or all of our registration policies? It is an administrative work to do that, you then write a short RFC to cover IANA considerations. I do not intend to come to a final conclusion in this room. Keyur: Typically these days when someone implements an extension it is a bit more formal. 3 weeks is a long time. If there is a process that can get us a codepoint in 2 days that is wonderful. Bill Fenner: When we went through this exercise 8 years ago, there was a lot of concerns to convert 200 point registries to FCFS. What if someone came and registered 200 codepoints? That question needs ot be considered. JGS: I thought about that. RFC8126 talks about that. My own answer is that if we are concerned we can choose expert review instead of FCFS. This gives a gatekeeper layer. Another option is to divide the space into parts for FCFS and other allocations. Randy: Not that I am uncomfortable with this approach, but what is missing is the ignore the squatting. We are taking the cost, the blame is over there. JGS: That is the anarchy option. Alexander Azimov: The proper process is idea - draft = allocation - RFC. What if I have an idea, then implementation, then draft in order to find out whether my idea is working? This way I have squatted a codepoint for testing purposes. What is the proper process here? JGS: It is a good question. Some registries have experimental use points, they are supposed to be used for this. The problem is what if you start with idea, you get an implementation, your implementation is fielded, and then you need to get a properly allocated one. Then you need a flag date to replace that code. Another answer you make your codepoint configurable and force your user to do a configuration. We should never ship another spec that has code fields smaller than 16 bits. That would allow for use of permissive policies. Alexander: If I am not aware of any processes? When I got an idea I was unaware how IETF process works. JGS: I have no good answer for that. Sue: Take Randys answer for that. Alexander: My suggestion is to have a pool of experimental codepoints in each registry. Job: I like to respond to what Randy said - to railroad the squatter. Job: The developers and the operators that were experimenting with codepoints. RFC8093 was written for exactly that problem. If we observe squatted codepoint we need to obsolete them. Lets make getting codepoints easier. Bruno. Plus one. We may have very large registries in future with very permissive policy. We deprecate a squatted codepoint and therefore it is very permissive. We lose the deprecated code point. JGS: Excellent discussion. I hope that in next few weeks we will come to the decision. What I hear is the request to reorganize registries to fit peoples needs better. Would someone write a draft that clarifies what that policy would be? We need someone stepping up and writing the draft. Keyur: For me the problem is not the process, you can put any process as long it takes less than 2 days. That is the source of squatting. JGS: The fact that we have a problem does not mean that people are bad. We need to fix the process and the problem will go away. Robert: Why are we discussing this at IDR? All of RTG area has this problem, it is not unique to BGP. JGS: My answer is because I am IDR co-chair and or the RTG AD. The way how IETF is structured is that most of the work is done in the WGs. We need to pick a process that works well for the community. JeffH: This is a problem for other WGs too. BGP tends to have a global scope. The person that suffers from the squatted codepoint is the one that tries to get the codepoint legitimately. Updates on existing IDR drafts [9:40-10:40 am] 1) Dissemination of Flow Specification Rules [Christoph Loibl] (10) https://tools.ietf.org/html/draft-ietf-idr-rfc5575bis/ Christoph presenting. [presentation] Requesting WGLC. [discussion] JeffH: This is very useful for dealing with both internal defects and also other implementations. Extended communities of this type have this problem in the IETF, it is littered with the problems of magic communities and what to do when you have more than one of it. We probably need to write a general draft what to do with it. JGS: My reaction is that the magic communities semantics depends on the use case. JeffH: One of the possibilities is to say that the for these classes of things there needs to be exactly one and there is a value associated with it. Sue: Work in this draft is good and it can go forward - is my understanding right, and we need to drain the swamp? [Jeff nods.] 2) The BGP Tunnel Encapsulation Attribute [Keyur Patel] (10) https://tools.ietf.org/html/draft-ietf-idr-tunnel-encaps/ Keyur presenting. [presentation] [discussion] JGS: We can start WGLC next week. 3) Making Route Servers Aware of Data Link Failures at IXPs [Jeff Haas] (10) https://tools.ietf.org/html/draft-ietf-idr-rs-bfd Jeff Haas presenting. [presentation] [discussion] Keyur: Back when we did ORR we had NH-SAFI to simplify ORR computations. Can we leverage that SAFI here with the idea that we can use a generic NH-SAFI that requires per nexthop computations? JeffH: Previous draft used that, and we decided that it was not a good fit for the purpose of this solution. Acee: Given that this is a lot of new mechanisms that need to be implemented on the client and RS, why cannot you run OSPF among all the clients and BFD and reuse the existing technology? JeffH: Two things - route server has a little more work to do than a router does, and the RS has policy that is the key for IX, it is not only as plumbing reachability. Speaking of OSPF, using a shared instances for thousands of peers. Acee: That could be done? Joel Jaeggli: What is one of the peers gets out of sync - that happens all the time. JeffH: Not with our implementation. Joel: Good maybe you should talk to Cisco. Joel: Blackholes exist in the infrastructure all the time even when things are working normally. The way how you solve that is not to use route reflectors. You cannot see which neighbor is blackholing through you. I think this work is super useful. JeffH: This problem has existed since frame relay times. Job: The case of a partition is a very clean cut, but operational practice that if IX becomes partitioned means that things are failing not in a clean way. I would prefer to shut down everything down that does through that exchange. If participants have very fast detection that is a good solution. Otherwise you need to wait a long time, few hours, to get routes back. There are other use cases where RSes can be useful other than IX, would be good to document that in the document. JeffH: It can be a telemetry for a route server to see that something has gone awry. Randy: This all came from DE-CIX. This is an attempt to fix a problem in their environment. The reason I like this idea despite Jeff trying to complicate it :-) that this allows to measure the things. The distinct problem that a fair number of participants in the IX are on the routers with insufficient resources. Two reasons to use RS - less configuration, and two - I cannot hold all the routes. And cannot handle addpath. The small routers will take a while before they have BFD. The route servers if you read the RFC, they are designed for this, they are keeping a separate RIB for each customer. If that customer says that they cannot get to /32. Chris Loibl: Thank you for this draft, we have hit this problem a few times. I think it is a good idea to use some form of BFD autoprovisioning. I really like route server to give me routes. Not sure that putting this state on the routers that are connected to IX and making more overhead and making the decision on RS more complex, I would vote for addpath. And I want to select what routes I want to put in my table. JeffH: This proposal does not preclude addpath. Chris: It adds much to the complexity. If you just have BFD autoprovisioning in there it may be much easier. No blackholing at IXes. JeffH: Every single BGP implementation needs to implement NH reachability. Most of them do proxying from the route server. This is a question of how do you get state to a RS, Robert: You made this being complex for current hardware. JeffH: I did not say complex, I said unsupported. Robert: Unsupported? JeffH: BFD requires endpoints to be provisioned. That is a simple change to protocol. Robert: That changes the establishment moment. The runtime for dataplane is the same. JeffH: You are mistaken on how BFD operates on dataplane. John Scudder: Due to time, we'll need to take this to the mailing list. 4) Route Leak Prevention using Roles in Update and Open messages [Alexander Azimov] (10) https://tools.ietf.org/html/draft-ietf-idr-bgp-open-policy Alexander presenting. [presentation] Alexander: Please share your thoughts on this. Sriram/NIST: Since complex is removed now, do we need to have a recommendation in the document on what to do when OTC is not used? Alexander: IOTC is a path attribute. If the roles are configured the attribute is filled in correctly. In your case you would not need to do anything. Sriram: Should you draft say that it is possible to derive IOTC from the configuration? Alexander: It is not done automatically. You need to configure it manually. If you have another complex partner you can set IOTC on a per prefix policy, but you need to do that yourself. Sriram: You can automate from configuration, cant you? Alexander: What do you mean automatically? Sriram: You can take from the configuration what the role is and you can set IOTC from that, Alexander: My meaning of automation is that you are not configuring IOTC per peer, you derive that from existing configuration. Sriram: Something in the configuration does not automatically mean what the role is. We have intra-AS messaging, and that messaging is IOTC in this draft, someone else may be using communities. Those are per prefix roles and I will send IOTC or community, that can happen automatic and it does not need to be OPEN time. Alexander: []. Alexande Lyamin: Dont you see that complex case should be moved to a separate document? Alexander Azimov: I do not think so. It shows that the prefix was learned from a peer. AL: You are providing instructions without hurting the other peer? AA: Maybe we need to clarify that in a separate draft? Sue: You may want to talk offline about the specifics. Job: Somewhere in the draft you say that if OPEN policy mechanism is used it is consistent with the reject policy. Reject RFC applies to all AFIs. Exchanging nothing unless you do configure otherwise is a safe thing to do. OPEN policy applies to 1/1 (IPv4) and 2/1 (IPv6) and not other AFIs, the authors may consider to limit that explicitly. Are you saying this should be mandatory? AA: Not any more. Job: Thank you for clarifying. Keyur: I am in favor making it generic and making it default for all address families, Particularly for VPN AFIs. In VPN it can be used in a different manner than in intra-AS level. Job: Internet use and VPN use are similar, but we do not know what future AFIs will mean. It is unsafe to open that door and go against the BGP reject RFC. AA: Reject RFC does not specify which AFI it applies? Job: It applies to all AFIs. AA: We do not add anything here, if there is no policy you need to configure manually. Job: If you configure a role, you also open other AFIs. AA: Same as BGP reject. You are able to configure it in a way that you want. JGS: Time check, the conversation is useful. Ben Madison: On multiple AFIs - I have come across instances where people have multiple AFIs active on the same session. In this case it is not clear what the peer role for internet AFI will work with VPN AFI. It may be pretty hard. What I like with this - route distribution policy lives in AFs, and we are trying to enforce it on a session level. AA: I will try to discuss this during lunch with you. AA: We do not have a consensus here. First one is minor - we have two scenarios to send notifications. If we have conflict in roles, we are sending a notification. If we have one side having a strict role and other no role, we also send notification. Do we need separate subpoints for notification code? Do whatever we want? No best practice? Ruediger: I do not know about the BCP or precedent, but looking at the question these are different scenarios where the signaling to the potentially failing end should be distinct, so use two codes. AA: Will do if there is no objection. Job: On the flip side the subcode within its context will mean something and strict/no strict context may have a different meaning. AA: I may have a different opinion. Randy: It is 128in the registry but you can use the first 64 only :-) AA: If the roles are globally deployed, do we need route leak detection and mitigation then? There are two drafts, they give out some hint about the peering relationship between SPs. Route leak detection and mitigation is also important. Route leak detection and mitigation could be helpful in partial deployments to limit the damage. Do we need it? JGS: Please keep brief. Sriram/NIST: It is important to have detection and mitigation, there will be partial deployments for a long time. SPs would like to have a mechanism to work in a case customer AS is not using it. Sriram: []. AA: The problem is that this mechanism will give a hint about peering relationship. Are those SPs willing to reveal that relationship in trade for the benefits of this mechanism? AL: Given the publicly available information, can you say with a good probability what is the relationship between Job and Randy? Job: Lovers. :-) AA: Complex. :-) There is such possibility. Sue: We are out of time, encourage for this topic to go a few layers more. Is there an interest in having an interim? Hum please. I hear some hums. [discusson] Updates on existing individual drafts [10:40-11:10 am] 1) Carry congestion status in BGP extended community [Zhenqiang Li] (10) https://tools.ietf.org/html/draft-li-idr-congestion-status-extended-community/ Zhenqiang presenting. [presentation] [discussion] JGS: We can have a discussion at the adoption call on the list. Ruediger: I missed in the presentation on what the dynamics of this attribute will be. That should be well understood. The other comment - you are limited to links at 256Gbps. In this we are beyond that scope in actual deployments. Zhenqiang: The new unit is 10Gbps. Ruediger: Still this gives limited life time. Zhenqiang: I want the WG to adopt the document and optimize the solution. Sue: Jie, could you sit with Zhenqiang to give him a few pointers? JGS: The bar for accepting the document should be to check whether there is a problem, how to address it, and then we should discuss accepting it. ?/Cisco: Why cannot you use MED or local preference instead of the extended community? Ruediger: Large communities can do everything. :-) Randy: Just ask Job. :-) Zhenqiang: Maybe I should choose another community to deliver this. The community container (wide community) is a possible solution too. Sue: Encourage to have a discussion at the back of the room. JGS: This is deal time - the adoption poll time - then engage with questions like that. 2) Populate to FIB Action for FlowSpec [Zhenqiang Li] (10) https://tools.ietf.org/html/draft-li-idr-flowspec-populate-to-fib Zhenqiang presenting. [presentation] Zhenqiang: Asking for adoption. [discussion] Robert: Who sets the L bit? If the guy who sets the L bit injects the route then you do not inject the rote at all. Zhenqiang: []. Robert: It makes no sense at all. JGS: There are no other comments. We can take to the list. Sue: Robert, perhaps you should ask this question again on the list. 3) BGP Logical Link Discovery Protocol (LLDP) Peer Discovery [Acee Lindom] (10) https://tools.ietf.org/html/draft-acee-idr-lldp-peer-discovery Acee presenting. [presentation] [discussion] David Lamparter: We are one of those implementations that have ICMP based discovery. Regardless of having that feature, my view is that LLDP is completely wrong place for this. I do not see why we are going for L2 protocol that may not be ported on a GRE tunnel that may not even support L2. Why dont you take all of the protocol mechanism and run it over IPv6 with a new protocol id? Acee: Do you realize I may retire in next 5 years? :-) David: Are you saying that it is easier to get IEEE OUIs than ND? Acee: It is easier to implement and get. David: It is multiple times more complex to implement. Sue: We need very brief comments, we are running out of time. Keyur: Do not implement it, we will get you a free code? There is no generic discovery protocol that applies to many of the network. It is a compromise. Robert: Error handling - if you mess up and BGP fails it is even worse. You can get a peering address in the OPEN. You do not need any additional TLVs. Donald: This is RFC7177. New Work [11:00-12:00] 1) BGP Support for Fast Link Status Notification [Marcus Sun] (10) https://tools.ietf.org/html/draft-sun-idr-bgp-ls-notification/ Marcus presenting. [presentation] [discusion] Acee: Did you think about BGP-LS? Marcus: Yes, we did think something. Burjiz via Meetecho: To Acees comment - we can use BGP-LS but it is too heavy, we are configuring link detection via community that makes detection a bit better. You can do a proprietary protocol also. Sue: Is there an interest on having virtual interim on link availability and BGP? Hum please? No hum. We will take that to the list. [End of meeting]