Summary: Has 2 DISCUSSes. Has enough positions to pass once DISCUSS positions are resolved.
I want to thank the authors for a very readable draft. It was a pleasure to review, and that's a high bar for the subject. I have loads of questions, but my first set of questions is an expansion of Alvaro's comment that I think rises to the level of a Discuss. Please note that I'm asking questions, not proposing text changes, so I really do want to discuss it. ---------- my first set of questions In this text, Three typical ring protection mechanisms are described in this section: wrapping, short wrapping and steering. All nodes on the same ring MUST use the same protection mechanism. I would like to understand what happens if they aren't - and I'm asking, mostly as a way of encouraging guidance for operators in debugging cases where they're not all using the same mechanism. I'm not asking for a full mesh of possible misconfigurations, only for a sentence or two ("If they aren't all using the same protection mechanism, the following things may happen"). More broadly, I'd like to understand why wrapping and short wrapping are both defined. It seems like the only functional difference is that short wrapping doesn't give you as much latency. Is that right? 24 pages in, I see this: o In rings utilizing the wrapping protection, each node detects the failure or receives the RPS request as the destination node MUST perform the switch from/to the working ring tunnels to/from the protection ring tunnels if it has no higher priority active RPS request. o In rings utilizing the short wrapping protection, each node detects the failure or receives the RPS request as the destination node MUST perform the switch only from the working ring tunnels to the protection ring tunnels. so I'm pretty sure there are differences beyond what I was seeing, earlier in the document. And, of course, I'm not sure what the effect of choosing steering over wrapping/short wrapping would be, for my users, but that can wait until we talk about wrapping and short wrapping ... At a minimum, I'd like to see guidance for operators in choosing among the three protection mechanisms. Why would they choose any one of the three? I also note that this MUST seems to be repeated using different words in section 5.1, as All nodes in the same ring MUST use the same protection mechanism, Wrapping, steering or short-wrapping. If that's saying the same thing, one MUST is all you need.
---------- all the other questions In this text, When the service LSP passes through the interconnected rings, the direction of the working ring tunnels used on both rings SHOULD be the same. For example, if the service LSP uses the clockwise working ring tunnel on Ring1, when the service LSP leaves Ring1 and enters Ring2, the working ring tunnel used on Ring2 SHOULD also follow the clockwise direction. I'm not understanding why this is a SHOULD, and not a MUST. If the direction of the working ring tunnels used on both rings is not the same, does this still work? If it still works, why does this matter? But, either way, you might usefully say something about why this isn't always the right thing to do, even if you just give one example. The point of SHOULD is that implementers make their own informed decisions, so providing information that will inform those decisions seems important. I wanted to call out Ring switches MUST be preempted by higher priority RPS requests. For example, consider a protection switch that is active due to a manual switch request on the given link, and another protection switch is required due to a failure on another link. Then an RPS request MUST be generated, the former protection switch MUST be dropped, and the latter protection switch established. MSRP mechanism SHOULD support multiple protection switches in the ring, resulting in the ring being segmented into two or more separate segments. This may happen when several RPS requests of the same priority exist in the ring due to multiple failures or external switch commands. as really good examples of the kind of text I think would help the places in this document ("For example", "This may happen when") where no examples are given. Thanks for providing those examples! Ouch. Do I understand from o Protection Switching Mode (M): This 2-bit field indicates the protection switching mode used by the sending node of the RPS message. This can be used to check that the ring nodes on the same ring use the same protection switching mechanism. The defined values of the M field are listed as below: +------------------+-----------------------------+ | Bits (MSB-LSB) | Protecton Switching Mode | +------------------+-----------------------------+ | 0 0 | Reserved | | 0 1 | Wrapping | | 1 0 | Short Wrapping | | 1 1 | Steering | +------------------+-----------------------------+ that you already have three protection mechanisms, and have only one possible codepoint to allocate for any future optimizations? Assuming that "0 0" can be unReserved ... Could you clarify what "anyway" means in this text? When multiple MS RPS requests exist at the same time addressing different links and there is no higher priority request on the ring, no switch SHOULD be executed and existing switches MUST be dropped. The nodes MUST signal, anyway, the MS RPS request code. I'm seeing that the commands like LP described in section 126.96.36.199 are used in the document before these (I'm serious) helpful and clear explanations appear. If it's possible to move section 188.8.131.52 up in the document, that would be great, but if it isn't possible, a forward pointer would be helpful to readers who don't already know what the command abbreviations mean. I'm really confused by this SHOULD: The PSC protocol [RFC6378] is designed for point-to-point LSPs, on which the protection switching can only be performed on one or both of the end points of the LSP. The RPS protocol is designed for ring tunnels, which consist of multiple ring nodes, and the failure could happen on any segment of the ring, thus RPS SHOULD be capable of identifying and handling the different failures on the ring, and coordinating the protection switching behavior of all the nodes on the ring. I suspect that's because it's not a 2119 SHOULD, but if people think it is, I wouldn't mind understanding why. Section 5.3, "RPS and PSC Comparison on Ring Topology" is really helpful, but it appears 43 pages in. Given that I'd expect people to be asking why they should implement a new protection switching protocol when they've already implemented PSC, I'd think this would be much more useful, early in the document. I'm somewhat confused about the code point allocation strategy in this text: The RPS Request Field is 8 bits, the allocated values are as follows: Value Description Reference ------- --------------------------- --------------- 0 No Request (NR) this document 1 Reverse Request (RR) this document 2 unassigned 3 Exercise (EXER) this document 4 unassigned 5 Wait-To-Restore (WTR) this document 6 Manual Switch (MS) this document 7-10 unassigned 11 Signal Fail (SF) this document 12 unassigned 13 Forced Switch (FS) this document 14 unassigned 15 Lockout of Protection (LP) this document 16-254 unassigned 255 Reserved My first question is, why the highest priority RPS value is 15, given that the field is 8 bits wide. If anyone ever needs to add a code point higher than the highest priority code point, will that work well? I can imagine code that says "if operation_priority is greater than highest_priority, it's an error", for example. I may have other questions depending on your answer, but let's start there.
The security considerations of this document seem unacceptably incomplete, as they basically just point to other documents. The RPS protocol defined in this document is carried in the G-ACh [RFC5586], which is a generalization of the Associated Channel defined in [RFC4385]. The security considerations specified in these documents apply to the proposed RPS mechanism. The security considerations of those documents don't seem that great either. However, I believe that they miss a new security issue raised by the mechanism in this draft, which is that a member of the ring appears to be able to forge reports of errors at other parts of the ring. Specifically, S 184.108.40.206 says: When a node is in a pass-through state, it MUST transfer the received RPS Request in the same direction. When a node is in a pass-through state, it MUST enable the traffic flow on protection ring tunnels in both directions. This seems not to involve any filtering, which suggests that node B can send a forged SF from C->D and from D->C, which at least potentially temporarily breaks the link there, causing traffic diversion. More generally, this system assumes that every node trusts every other node completely. That must at least be stated. Incidentally, the text above appears to contain a bug in that it doesn't talk about processing incoming RPS requests intended for the receiving node, but I may just have missed the section where it says that.
S 4.1.1. protect these LSPs that traverse the ring, a clockwise working ring tunnel (RcW_D) via E->F->A->B->C->D, and its anticlockwise protection ring tunnel (RaP_D) via D->C->B->A->F->E->D are established, Also, an anti-clockwise working ring tunnel (RaW_D) via C->B->A->F->E->D, and its clockwise protection ring tunnel (RcP_D) via D->E->F->A->B->C->D Why does the protection tunnel include D on both ends whereas the working tunnel does not? S 4.2. packets are periodically exchanged between each pair of MEPs to monitor the link health. Three consecutive lost CC packets will be interpreted as a link failure. Is this a normative statement (i.e., does it need a MUST). S 220.127.116.11. Why do you ever not use short wrapping? S 18.104.22.168 A node MUST revert from pass-through state to the idle state when it detects NR codes incoming from both directions. Both directions revert simultaneously from the pass-through state to the idle state. incoming within what time frame?
Substantive: - The abbreviation "MSRP" is already used by RFC 4975. Please avoid overloading it if at all possible. (And you probably want to collide with "Manufacturer's Suggested Retail Price" even less.) -4.4.2: "When the service LSP passes through the interconnected rings, the direction of the working ring tunnels used on both rings SHOULD be the same. " Would it ever make sense for the directions to be different? (That is, why not MUST?) If so, a few words about that would be helpful. -5.1, 3rd bullet: "Determination of the affected traffic SHOULD be performed by examining the RPS requests (indicating the nodes adjacent to the failure or failures) and the stored ring map (indicating the relative position of the failure and the added traffic destined towards that failure)." Would it ever make sense to violate that SHOULD? (That is, why not MUST?) -6.2: Why "standards action"? That's a high bar. Are there reasons why a lower bar like "specification required" would not be appropriate? For example, are we in danger of running out of code points? Is this registry at unusual risk for poor quality registrations? Editorial: -3: Is this section expected to be useful to implementors? It reads more like evidence to the WG that this meets the requirements. I suspect people won't much care about that once this is published as an RFC. Please consider moving it to an appendix, or even removing it entirely. -4.4.2: "For example, if the service LSP uses the clockwise working ring tunnel on Ring1, when the service LSP leaves Ring1 and enters Ring2, the working ring tunnel used on Ring2 SHOULD also follow the clockwise direction." Please avoid repeating the 2119 "SHOULD" in the example. - 5.1: "The MSRP protection operation MUST be controlled with the help of the Ring Protection Switch protocol (RPS)." That seems like a statement of fact, rather than an implementation requirement. Starting around 5.1, I notice several uses of the word "source" as a verb, where from context it seems like you mean "to send" or "to originate". Is that a term of art? I usually think of "source" as a verb to mind "acquire","find" or "find a source for" -5.3: "... thus RPS SHOULD be capable of identifying and handling the different failures on the ring ..." That seems like a statement of fact.
I'd like to see the discussion with gen-art reviewer conclude and the associated changes folded into the next version of the document.
Some nits and a question: 3. MPLS-TP Ring Protection Criteria and Requirements a. The number of OAM entities... "Each ring-node requires only one instance of the RPS protocol. " --- not super important, but is this "Each ring-node requires only one instance of the RPS protocol (regardless of the number of rings)" or "Each ring-node requires only one instance of the RPS protocol per ring"? -- if a node participates in multiple rings, does it need an instance for each ring? (I suspect that this is somewhat of an implementation choice, but am not sure). 4. Shared Ring Protection Architecture 4.1. Ring Tunnel "... ring tunnels which provides a server layer for the LSPs traverse the ring." I think "for the LSP's traversing the ring." (or perhaps "which traverse the ring.")
Two technical comments that I think are important to address but do not warrant a discuss: 1) section 5.2: "As shown in Figure 14, when no protection switching is active on the ring, each node MUST send RPS requests with No Request (NR) to its two adjacent nodes periodically." What does periodically mean here? Can you maybe give a number or even a normative statement like "and MUST NOT send more often than every X seconds" to avoid unnecessary congestion...? 2) section 5.1.1: "A ring node which is not the destination of the received RPS message MUST forward it to the next node along the ring immediately." Why would you forward these? I thought you only send messages to your neighbors? Maybe I missed this but is there a use case for this scenario? Otherwise it might be safer to not forward to avoid that messages with a wrong destination node ID circle around forever. If you forward maybe you also need a hop-count to decrease or at least say that messages that are received and have the own node ID as source node ID MUST be dropped...? Further, as mentioned by Ben for a couple of case, some of the uses of normative language in section 5 seems not to be appropriate as they don't specify a concrete implementation action. Please check carefully and change some to lower case instead, e.g. "The MSRP protection operation MUST be controlled with the help of the Ring Protection Switch protocol (RPS). " "The RPS protocol MUST carry the ring status information and RPS requests,.." (this sounds like a requirement on the protocol design but when you implement the protocol as specified there is no way to not do it, so this MUST is unnecessary) "Each node on the ring MUST be uniquely identified by assigning it a node ID." (also requirement-like; the MUST in the next sentence is the important one) "When a node detects a failure and determines that protection switching is required, it MUST send the appropriate RPS request in both directions to the destination node." "MSRP mechanism SHOULD support multiple protection switches in the ring, resulting in the ring being segmented into two or more separate segments. " "The first three RPS protocol messages carrying new RPS request SHOULD be transmitted as fast as possible." (Again the later SHOULD is the more important one) There may be more…
This document describes 3 different protection mechanisms and it specifies that all nodes "MUST use the same protection mechanism". When should these mechanisms be used? What are the conditions that an operator should take into account when selecting between them? I would like to see operational considerations explained.