The Babel Routing Protocol
Summary: Has a DISCUSS. Needs 4 more YES or NO OBJECTION positions to pass.
Alvaro Retana Discuss
Discuss (2019-08-07 for -12)
I really enjoyed reading this document! Thank you for the work and time that has gone into it. However, I don't think that this specification is ready to be published as a Proposed Standard. In general, I don't think that the document is clear or specific enough to be considered in the Standards Track -- that is the main reason for this DISCUSS. (A) Clear Defaults and Operational Guidance While I appreciate Babel's flexibility in terms of the ability to use different strategies, I believe that both defaults and clear guidance should be provided. Given that "not all...strategies will give good results" and that in most cases these are listed as possible choices, I don't think that this document "has resolved known design choices" [BCP9/rfc7127]. The cost/metric computation and route selection specially concern me because I believe that a robust/clear specification is at the heart of any routing protocol. In general what I am looking for to resolve this part of the DISCUSS are two items: (A1) Clear defaults. For example, Appendix B talks about constants/default values. I would assume that, given the existing experience, that the values there are probably sensible defaults. Is that not the case? (A2) Operational Considerations. Given that Babel can be (and is) used in different environments, I would like to see guidance to operators as they deploy the protocol in their networks. An example of the type of discussion I would like to see expanded is: "a mobile node that is low on battery may choose to use larger time constants (hello and update intervals, etc.) than a node that has access to wall power" (§1.1). Consider §2 in rfc5706 (Operational Considerations - How Will the New Protocol Fit into the Current Environment?). I believe that both items are important, specially in a protocol as flexible as Babel. Some of this guidance could have been included in draft-ietf-babel-applicability -- but this information is not there either. (B) Error Handling Many sections of the document describe functionality, or even Normatively mandate it, but there is no discussion about Error Handling. (B1) Router-Id Setting §4.5: o the current router-id; this is undefined at the start of the packet, and is updated by each Router-ID TLV (Section 4.6.7) and by each Update TLV with Router-Id flag set. It took me some time to figure out the reason for being able to carry the router-id in two different places inside the same packet, which is my interpretation of the "and" above. Let me see if I understood: a packet can carry multiple updates...updates contain routes that were either originated by the local node, OR, learned from other routers...the router-id matches the originator... So...if a packet carries multiple updates, some locally originated and some learned, then it is possible for the packet to first include (for example) a Router-ID TLV (indicating router-id_A), followed by some Update TLVs (without the R-bit set), than then some other Update TLVs (with the R-bit set)... Did I understand correctly? If so, I think there are significant pieces of this operation that are not clearly specified in the document. There is mention of the effect of the Router-ID TLV (or the Update TLV w/R=1) on subsequent Update TLVs...there is an very subtle hint (for my taste) in §4.5 (Parser state) about the state learned for each packet from those TLVs...but there is no explicit text that talks about the need for strict ordering when sending and later when processing...it is all simply implied. What should happen if no Router-Id has been defined? For example, an Update (R = 0) is received but no Router-ID TLV is present... What if the Router-ID TLV is present, but *after* the Update? There are many possible combinations... (B2) Default Prefix Similar comments as above... "P (Prefix) flag...establishes a new default prefix for subsequent Update TLVs with a matching address encoding within the same packet" (§4.6.9). What if an update with an AE that allows compression is received *before* the one that sets the new default prefix? (B3) Next Hop §4.6.9: The next-hop address for this update is taken from the last preceding Next Hop TLV with a matching address family (IPv4 or IPv6) in the same packet even if it was otherwise ignored due to an unknown mandatory sub-TLV; if no such TLV exists, it is taken from the network-layer source address of this packet. What if the Next Hop TLV doesn't exist and the network-layer doesn't correspond to the address family in the Update? For example, let's say IPv6 is used as the network-layer protocol and the Update contains IPv4 prefixes... (B4) For the Normative behavior listed here (I may have missed other instances), I have basically the same question: what should a receiver do if it is not the case? - §188.8.131.52: "A node MUST NOT increase its sequence number by more than 1 in response to a seqno request." - §4: "A Babel packet MUST be sent as the body of a UDP datagram, with network-layer hop count set to 1..." - §4.6.9: "If the metric is finite, AE MUST NOT be 0. If the metric is infinite and AE is 0, Plen and Omitted MUST both be 0." - §4.6.10: "...if AE is 0 (in which case Plen MUST be 0 and Prefix is of length 0)." - §4.6.10/§4.6.11: Is AE 3 a valid value in a request? I assume it isn't. What should a receiver do if AE = 3. (C) Mandatory Bit §4.4: "The most-significant bit of the sub-TLV, called the mandatory bit..." The most significant bit of which part of the sub-TLV? As written, that bit would be the first one in the Type, which corresponds to the text in the IANA section. Please be specific. In the IANA considerations section, please include the whole registry in the table to avoid confusion. Note that because of the mandatory bit, the 128-239 range should be Reserved...but it is currently marked as Unassigned. Even worse, value 128 is assigned already [draft-ietf-babel-source-specific]. The impact may not be too bad because I doubt that Pad1 would need to be mandatory, but it at least causes confusion and inconsistency, and (as currently specified) there would be no way to differentiate between Pad1 and the Source Prefix sub-TLV.
Comment (2019-08-07 for -12)
(a) §3.1 introduces the term "urgent TLVs". (1) It might be a good idea to explicitly mention/list which are these TLVs. There are some references in subsequent sentences, which may or may not be enough for most readers. (2) There are some Normative actions applied to them, for example "MUST be sent in a timely manner". While the intent may be ok, the Normative enforcement of "in a timely manner" is not clear at all -- how do you comply with that? Appendix B says: The amount of jitter applied to a packet depends on whether it contains any urgent TLVs or not (Section 3.1). Urgent triggered updates and urgent requests are delayed by no more than 200ms; acknowledgments, by no more than the associated deadline; and other TLVs by no more than one-half the Multicast Hello interval. I think it would help if this text is moved to §3.1 to make it explicitly clear what a timely delay is...and the text was changed to (something like) "MUST NOT be delayed more than 200ms". (b) §3.2.2: "SHOULD NOT increment its sequence number (seqno) spontaneously" When it is ok to increase the seqno spontaneously? IOW, why not use MUST NOT? I think it would be better if there was a clear indication of when the seqno is increased. Scanning the rest of the document, it seems that those indications are in place. (c) There seems to be no specific explanation of how the timers are handled, what happens when they expire, etc.. For example, §3.2.4 includes this text: There are three timers associated with each neighbour entry -- the multicast hello timer, which is initialised from the interval value carried by scheduled Multicast Hello TLVs, the unicast hello timer, which is initialised from the interval value carried by scheduled Unicast Hello TLVs, and the IHU timer, which is initialised to a small multiple of the interval carried in IHU TLVs. But there is no explanation (that I could find) about how to manage those timers. The only place where hello timers are mentioned is in Appendix A.1...but that is just an example. (d) §3.7.2 includes two instances of "SHOULD make a reasonable attempt at ensuring that all [reachable] neighbours receive this update/retraction". What does making "a reasonable attempt" mean? How can that be Normatively enforced? (e) §3.7.2 Finally, a node MAY send a triggered update when the metric for a given prefix changes in a significant manner, due to a received update, because a link's cost has changed, or because a different next hop has been selected. A node SHOULD NOT send triggered updates for other reasons, such as when there is a minor fluctuation in a route's metric, when the selected next hop changes, or to propagate a new sequence number (except to satisfy a request, as specified in Section 3.8). How much is "a significant manner"? What about "a minor fluctuation"? Are the modifiers (next hop change, for example) the only conditions to take into account, or are they just examples of when these significant/minor changes may occur? How can these terms be Normatively enforced? (f) §184.108.40.206: "When a node receives a wildcard route request, it SHOULD send a full route table dump." When is it ok to not send a full table dump? IOW, why is MUST not used? (g) §220.127.116.11: "a node SHOULD repeat such a request a small number of times if no route becomes feasible within a short time." What does "a small number of times" and "within a short time" mean? How can that be Normatively enforced? Please be specific. (h) §4.6.9: "Omitted...that should be taken from a preceding Update TLV in the same address family with the Prefix flag set." What if that Update TLV is not in the packet? (i) Security Considerations The initial vulnerability listed ("attacker can misdirect data traffic by advertising routes with a low metric or a high seqno") is only one of several actions an attacker can take. More importantly, if the attacker happens to be in control of an authenticated node, then the mitigation proposed doesn't help. This type of rogue node can, for example, set the mandatory bit in an unknown TLV (as in completely made up!) to cause whole TLVs to be ignored, resulting in loss of routes, etc.. I am not sure what can be done to mitigate this type of vulnerability...but I think it is important that it is at least called out. (j) Are the appendices intended to be Normative or not? I'm assuming the answer is no...but I can base that only on the references in the text to Appendix A.*, pointing to them as examples. What about the others? They are not even referenced in the text. Some comments: - Appendix B talks about constants/default values. See my DISCUSS comments above. - Appendix C "is intended to guide designers of protocol extensions in chosing a particular encoding." I think is is valuable information. It would be very nice if there was a reference (or perhaps several, from where the different extensibility methods are presented) in the main body of the specification. I can see how this is an informative section. - Appendix D defines a "stub implementation". This is also valuable information. But...there's no reference from the text, and Normative language is used... Why is this type of implementation (which I would think might be relatively common) not normative? - Appendix E simply points to the sample implementation. Personally, I would prefer to see an rfc7942 section instead -- it would have been nice to also mention other implementations. (k) "The length of..." is used everywhere in the document, but no units are mentioned. Some seem to obviously be in octets, but others could easily be in bits... (l) §4: s/SHOULD attempt to maximise the size of the packets/SHOULD maximise the size of the packets (m) §4.1.3: The description of AE 1 and 2 says that "Compression is allowed." -- but it looks like the only place where it can happen is in an Update. It might be nice to indicate that...and avoid indicating that compression is not allowed where it can't be done anyway. (n) rfc8126 should be a Normative reference. (o) Please include Informative references to rfc6126 and rfc7557. (p) s/Bellman-Ford protocol/Bellman-Ford algorithm (q) §2.4: Include an Informative reference to AODV (rfc3561). (r) §2.4: "if A has selected B as its successor" This is the only place where "successor" is used. For clarity, perhaps use a different word/description.
Martin Vigoureux Yes
Deborah Brungard No Objection
Roman Danyliw (was Discuss) No Objection
Comment (2019-11-11 for -15)
Thank you for addressing my DISCUSS and COMMENT points, and further explaining the protocol where necessary.
Benjamin Kaduk (was Discuss) No Objection
Comment (2019-08-22 for -14)
Thank you for addressing my Discuss points! I am happy to see the expanded text on update representation and parser state in Section 4.5 -- that's enough to clear my discuss point, though I probably would have added even a bit more text if I was writing it myself. Section 4.5 In addition to the above, an Update TLV can omit a prefix of the prefix being announced, which is then extracted from the preceding Update TLV in the same address family (IPv4 or IPv6). Finally, as a nit: from a rhetorical sense, I'd suggest "omit an initial portion of the prefix", to avoid using the word "prefix" with two different meanings in the same sentence. Appendix B Link cost: estimated using ETX on wireless links; 2-out-of-3 with C=96 on wired links. Perhaps "ETX as described in Appendix A.2.2". Appendix C At a minimum, they discard routes with a destination prefix in fe80::/64, ff00::/8, 127.0.0.1/32, 0.0.0.0/32 and 18.104.22.168/8. 127.0.0.1/32 as opposed to /8? Appendix F There are two optioal features that make the new protocol incompatible with its predecessor. First of all, RFC 6126 did not nit: "optional" Two changes need to be made to an implementation of RFCs 6126 and 7557 so that it can safely interoperate in all cases with implementations of this protocol. First, it needs to be modified to either ignore or process Unicast Hellos. Second, it needs to be modified to parse sub-TLVs of all the TLVs that it understands and that allow sub-TLVs, and to ignore the TLV is an unknown mandatory sub-TLV is found. It is not necessary to parse unknown TLVs, since nit: "if an unknown" There are other changes, but these are not of a nature to prevent interoperability: [...] o the compression state is now specific to an address family rather than an address encoding Section 4.5; It seems like an old implementation that decompresses an update to different contents than a new implementation does, would have some effect on routing. Am I missing something?
(Suresh Krishnan) (was Discuss) No Objection
Comment (2019-08-20 for -14)
Thanks for including explanatory text for describing expectations and limitations of Backward Compatibility in Appendix F, in order to address my DISCUSS point regarding issues with backward compatibility with RFC6126 implementations.
(Mirja Kühlewind) (was Discuss) No Objection
Thanks for all the work to address my discuss points. I still think that it is important for PS spec to normatively define default values as part of the main spec, also because it is rather uncommon to use normative language in the appendix. However, there are references to the appendix in all relevant places in the spec now and I'm okay to clear my discuss on this basis. Please note that the following comment from my discuss ballot is not fully addressed: "In section 4.1.1 the update interval needs a lower limit (e.g. 3 seconds) and a recommend default value would be could as well (Note that there are other part in section 3 where the update value is discussed as well)." We had a long discuss about this value specifically and the recommended default in the appendix is fine with me. However, section 4.1.1 does not have a pointer to the appendix. Further I think it would be appropriate to add a warning here that low values mean higher network load which can have a negative impact in certain networks. ------------------------------- Old comments (here for the record; didn't check these points): 1) While this point might not raise discuss-level, it would probably also be good to provide more concrete advise on how to implement jitter: Sec 3.1.: “ A moderate amount of jitter may be applied to packets sent by a Babel speaker: outgoing TLVs are buffered and SHOULD be sent with a small random delay.” Sec 4: “a Babel node SHOULD buffer every TLV and delay sending a packet by a small, randomly chosen delay [JITTER].” 2) Sec 4.1.2. (Router-Id) should probably state again that the router-id is assumed to be unique within a domain. 3) Sec 4: “The most-significant bit of the sub-TLV, called the mandatory bit, indicates how to handle unknown sub-TLVs.” I would recommend to also indicate this bit in the image. 4) Sec 4.4: “If a TLV has a self-terminating format, then it MAY allow a sequence of sub-TLVs to follow the body.” Initially I wasn’t quite sure what you wanted to say here. I guess you say that the length would indicate a larger value that needed for the body and therefore a subTLV might be present? I recommend to clarify this here a bit. 5) I recommend to move Appendix C (Considerations for protocol extensions) in the body of the document.
Barry Leiba No Objection
Comment (2019-08-07 for -12)
I support the DISCUSS points by Ben and Álvaro.
(Alexey Melnikov) No Objection
Éric Vyncke No Objection
Comment (2019-08-05 for -11)
Dear authors, Thank you for the work put into this document and its companion documents. The text is usually clear and explanations are concise and easy to understand. I have nevertheless some COMMENTs and a few NITs. Regards, -éric == COMMENTS == -- Section 1.1 -- In the properties bullet points, "its diameter" is it about the loop diameter or the network diameter? I suspect the latter but this is ambiguous IMHO. -- Section 3 -- It is unclear by reading "The protocol encoding is slightly more compact when router-ids are assigned in the same manner as the IPv6 layer assigns host IDs." whether EUI-64 is referred here. I also fail to see in section 4.6.7 (router-id TLV) what is the encoding benefit? -- Section 3.1 -- Should there be a mention of maximum UDP datagram size? and some words on layer-3 fragmentation ? I understand that section 4 has a section on this, so, perhaps refer already to that section for completeness ? -- Sections 3.2.3 and 3.2.4 -- Does a dual-stack host have to send 2 hello? One on each protocol stack (v6 or v4) ? Unclear from the explanation. -- Section 3.4.2 -- It is unclear to me whether a link to a neighbor (router-id) can have different cost based on the v6 or v4. -- Section 4 -- Is there a reason why the well-known ports and multicast group addresses are not spelled out in this section ? They only appear in the IANA considerations section. Also, should the hello be sent over v6 _AND_ v4 ? -- Section 4.2 -- No a real comment, just an appreciation of your humor: "The arbitrary but carefully chosen value 42" ;-) you made my Monday morning ! -- Section 4.4 -- Is there any reason why the 'mandatory bit' does not appear in the packet structure? -- Section 4.6.2 -- Is there any reason why the "MBZ" is not expanded ? Must Be Zero ? -- Section 4.6.3 -- I wonder how the receiver could estimate the propagation time (in each direction BTW) + queuing time + whatever delay... -- Section 4.6.7 -- Should the router-id field length be repeated here as well ? == NITS == -- Section 1.1 -- Rather than using 'naive' to describe RIP, let's rather use 'trivial' or 'simple' ;-) s/the routers involved participate/the involved routers participate/ ? -- Section 3.2.6 -- Suggest to use '0xffff' rather than 'FFFF' and be consistent in the use of lowercase / uppercase for hexadecimal numbers. -- Section 4.5 -- Parsing "since for correct parsing it must be identical across implementations" is not easy... a comma would be welcome. -- Section 4.6.3 -- s/receiver send/receiver sends/
Alissa Cooper (was Discuss) Abstain
Comment (2019-08-24 for -14)
Part of the response to my DISCUSS argued that making this specification comply with BCP 61 would harm the reputation of the IETF. I'd rather continue to support BCP 61 than have this protocol standardized in the IETF, if that is what the choice is. If the WG consensus is that deploying this protocol with no MTI security protections is appropriate despite the mountain of evidence that exists showing that deployments of insecure protocols on private networks or in limited domains still often get compromised, I don't see a need to discuss this further.