Summary: Has 3 DISCUSSes. Has enough positions to pass once DISCUSS positions are resolved.
I have a few points that I think merit IESG discussion. (1) I see that several directorate reviewers expressed unease at the destination (IP and) MAC address assignment procedure for the inner VXLAN headers, and appreciate that there was extensive on-list discussion (more than I could follow). That said, I failed to find a clear statement of why the current text is believed to be safe, and in fact my reading of the current text is that the described procedure is *not* safe. Pointers to key parts of the WG discusison would be more than welcome! To take something of a high-level view of my concerns, if we think of the VXLAN as being a tunnel between VTEPs that carry encapsulated tenant traffic, then what we're trying to do is roughly like BFD between VTEPs, but we want to get fault-detection over as broad a coverage as we can (the "outermost part of the tunnel"), so we want to have the option of per-VNI BFD instead of just endpoint-to-endpoint (VTEP-to-VTEP). However, we end up having to do this by trying to insert a thin filter into the tenant's address space (i.e., the inner VXLAN header) and pick out the specific stream of BFD traffic that we're introducing. This is, in some sense, a namespace grab in what is conceptually the tenant's namespace, and we have to be careful that what we do is either guaranteed to not impact the tenant or well-documented and compartmentalized (akin to the "well-known URIs"). I've made comments at several places in the document that are more directly tied to specific pieces of text, but in general, if we assume that the tenant can add/remove new addresses at will within their VXLAN abstration, then any attempt to preconfigure by mutual agreement the BFD addresses to use at the VTEPs or to use the VTEP's normal (outer) address as the sentinel value seems subject to the tenant coming in and subsequently trying to use that address, leading to (some of) the tenant's traffic getting silently filtered and interpreted by the VTEP. If we were using domain names as identifiers, we could allocate something under .arpa or similar, but I think our options are more limited when numerical addresses are used. The option suggested by the rtg-dir reviewer of always using the management VNI does not suffer from this namespacing issue, though I recognize that it does reduce the scope over which fault-detection is available, for the cases when different VNIs' traffic are routed or handled differently. (2) Section 6 says: The selection of the VNI number of the Management VNI MUST be controlled through management plane. An implementation MAY use VNI number 1 as the default value for the Management VNI. All VXLAN packets received on the Management VNI MUST be processed locally and MUST NOT be forwarded to a tenant. It seems like the management VNI concept is something that would apply to the entire VXLAN deployment and not just to the BFD-using portions; is this already defined somewhere (in which case we should reference it), or is it new with this document? In the latter case wouldn't it be an update to the core VXLAN spec? (I note that there are some procedural hoops to jump through for an IETF-stream document to update an ISE-stream document...)
Section 1 In the case where a Multicast Service Node (MSN) (as described in Section 3.3 of [RFC8293]) resides behind a Network Virtualization Endpoint (NVE), the mechanisms described in this document apply and can, therefore, be used to test the connectivity from the source NVE to the MSN. I'm not sure that I'm parsing "resides behind" properly. Is the idea that the multicast traffic starts off at a tenant-system source, hits a NVE gateway to enter the VXLAN, traverses the VXLAN a bit before getting to the MSN, and is replicated from the MSN to various NVE termini? I think I'd be less confused if this was described as "participates in the VXLAN" or "is part of the virtualized environment", as the current "behind" wording makes me think of a firewall-like topology where the NVE behind which the MSN resides will be decapsulating traffic. This document describes the use of Bidirectional Forwarding Detection (BFD) protocol to enable monitoring continuity of the path between VXLAN VTEPs, performing as Network Virtualization Endpoints, and/or availability of a replicator multicast service node. All the commas here potentially make the parsing ambiguous; assuming that the "performing as Network Virtualization Endpoints" is just describing the VXLAN VTEPs, I'd suggest do drop the first comma and instead join those clauses with "that are". Section 3 between the same pair of VTEPs. BFD packets intended for a VTEP MUST NOT be forwarded to a VM as a VM may drop BFD packets leading to a false negative. This method is applicable whether the VTEP is a [This "MUST NOT" is a very strict requirement, so we have to be sure that it's achievable without disruption to tenant traffic, per the Discuss point] At the same time, a service layer BFD session may be used between the tenants of VTEPs IP1 and IP2 to provide end-to-end fault management. In such case, for VTEPs BFD Control packets of that session are indistinguishable from data packets. nit(?): I suggest s/indistinguishable from/regular/ -- the tenants' BFD sessions are just regular data to the VXLAN infrastructure, though IIUC a VTEP could, if so inclined, peek inside and "distinguish" them from non-BFD tenant data based on on heuristics and packet format. 0:0:0:0:0:FFFF:7F00:0/104 range for IPv6). There could be a firewall configured on VTEP to block loopback addresses if set as the destination IP in the inner IP header. It is RECOMMENDED to allow addresses from the loopback range through a firewall only if it is used as the destination IP address in the inner IP header, and the destination UDP port is set to 3784 [RFC5881]. I think we should reword this to make it clear that the default behavior is still "block all incoming traffic with loopback destination" and that the exception is tightly scoped to the encapsulated VXLAN traffic discussed in this document and the specific destination port *and when BFD has been configured for the VTEP*. I note that well-known ports are not reserved ports, and we have no guarangee that only a BFD implementation would be listening on port 3784. I think the rewording would include some phrasing like "RECOMMENDED that the only firewall exception to allow incoming traffic with destination address from the loopback range is when [...]", and of course, mention the need to have BFD configured. Section 4 VXLAN packet. The choice of Destination MAC and Destination IP addresses for the inner Ethernet frame MUST ensure that the BFD Control packet is not forwarded to a tenant but is processed locally at the remote VTEP. [...] This has to be 100% reliable, and I think we need to provide some example mechanism that has that property even if we don't mandate that it be the only allowed mechanism. Destination MAC: This MUST NOT be of one of tenant's MAC addresses. The destination MAC address MAY be the address But the tenant can start using new MAC addresses at any time! How is BFD-over-VXLAN going to dynamically detect and avoid that? associated with the destination VTEP. The MAC address MAY be configured, or it MAY be learned via a control plane protocol. The details of how the MAC address is obtained are outside the scope of this document. This all talks about the MAC address being relatively static configuration, but per above, I don't think that's safe in the face of a MUST-level requirement to avoid conflicting with tenant MAC addresses. IP header: Destination IP: IP address MUST NOT be of one of tenant's IP addresses. The IP address SHOULD be selected from the range 127/8 for IPv4, for IPv6 - from the range 0:0:0:0:0:FFFF:7F00:0/104. Alternatively, the destination IP address MAY be set to VTEP's IP address. As for MAC addresses, can't the tenant start using new ones at any time? Loopback is mostly safe in that the tenant generally shouldn't expect incoming traffic to that destination address ... but what if the tenant is also using a BFD scheme that expects incoming (single-hop) packets to loopback as an exception to RFC 1122? nit: please use a parallel grammatical construction for describing the IPv4 and IPv6 recommended behavior. TTL or Hop Limit: MUST be set to 1 to ensure that the BFD packet is not routed within the Layer 3 underlay network. This addresses the scenario when the inner IP destination address is of VXLAN gateway and there is a router in underlay which removes the VXLAN header, then it is possible to route the packet as VXLAN gateway address is routable address. nit: the grammar here is a bit wonky; I think the following preserves the meaning with better grammar: % TTL or Hop Limit: MUST be set to 1 to ensure that the BFD % packet is not routed within the Layer 3 underlay network. This % addresses the scenario where the inner IP destination address is % that of a VXLAN gateway and there is a router in the underlay % that removes the VXLAN header; in such cases it is possible for % the packet to be routed, as the VXLAN gateway's address is a % routable address. Section 5 Once a packet is received, VTEP MUST validate the packet. If the Destination MAC of the inner Ethernet frame matches one of the MAC addresses associated with the VTEP the packet MUST be processed further. If the Destination MAC of the inner Ethernet frame doesn't What prevents the scenario where the MAC address associated with the VTEP is also in use by the tenant? match any of VTEP's MAC addresses, then the processing of the received VXLAN packet MUST follow the procedures described in Section 4.1 [RFC7348]. If the BFD session is using the Management VNI (Section 6), BFD Control packets with unknown MAC address MUST NOT be forwarded to VMs. nit: either "an unknown" or "MAC addresses" The UDP destination port and the TTL of the inner IP packet MUST be validated to determine if the received packet can be processed by BFD. Can you give a pointer to or description of what this validation consists of? Section 5.1 case of VXLAN, the VNI number identifies that logical link. If BFD packet is received with non-zero Your Discriminator, then BFD session MUST be demultiplexed only with Your Discriminator as the key. nits: "If a BFD packet", "then the BFD session" Section 6 In most cases, a single BFD session is sufficient for the given VTEP to monitor the reachability of a remote VTEP, regardless of the number of VNIs. When the single BFD session is used to monitor the reachability of the remote VTEP, an implementation SHOULD choose any of the VNIs. An implementation MAY support the use of the Management nit: I feel like this is trying to say that the choice is arbitrary and it doesn't matter which one is picked, but "SHOULD choose any of" is more of a recommendation to make a choice than guidance on how to make that choice, as written. Section 9 I think we need to discuss the risk/potential consequences of a VTEP failing to properly filter BFD traffic and incorrectly passing it through to the tenant. Relatedly, I'd also consider discussing the case of a mixed deployment where one peer attempts to speak BFD-VXLAN to a peer that does not implement that mechanism. The document requires setting the inner IP TTL to 1, which could be used as a DDoS attack vector. Thus the implementation MUST have An attack vector on what part of the system?
I support Eric's DISCUSS point about the TTL, but I want to go a step further because this document contradicts rfc5881, which is clear about the TTL setting (from §5): If BFD authentication is not in use on a session, all BFD Control packets for the session MUST be sent with a Time to Live (TTL) or Hop Limit value of 255. All received BFD Control packets that are demultiplexed to the session MUST be discarded if the received TTL or Hop Limit is not equal to 255. A discussion of this mechanism can be found in [GTSM]. If BFD authentication is in use on a session, all BFD Control packets MUST be sent with a TTL or Hop Limit value of 255. All received BFD Control packets that are demultiplexed to the session MAY be discarded if the received TTL or Hop Limit is not equal to 255. If the TTL/Hop Limit check is made, it MAY be done before any cryptographic authentication takes place if this will avoid unnecessary calculation that would be detrimental to the receiving system. OTOH, Section 4 of this document specifies: TTL or Hop Limit: MUST be set to 1 to ensure that the BFD packet is not routed within the Layer 3 underlay network. This addresses the scenario when the inner IP destination address is of VXLAN gateway and there is a router in underlay which removes the VXLAN header, then it is possible to route the packet as VXLAN gateway address is routable address. Not wanting the packet to be routed in the underlay sounds like a reasonable justification -- but I couldn't find the specification in rfc7348 about "a router in underlay which removes the VXLAN header". Maybe I missed it... Independent of VXLAN, the conflict with rfc5881 remains -- given the text above, it seems to me that it would be ok if the TTL was set to 1 if authentication is is use, but this document doesn't talk about requiring authentication.
I also support Ben's DISCUSS. Non-blocking comments: (1) §3: "...a service layer BFD session may be used between the tenants of VTEPs IP1 and IP2..." Just to be clear, the use of BFD in a "service layer session" is out of scope of this document, right? It might be nice to say so. (2) §3: "As per Section 4, the inner destination IP address SHOULD be..." If the specification is already in Section 4, then there doesn't seem to be a need to repeat it. It might make more sense to put the text about a potential firewall there. (3) §4: "...SHOULD ensure that the BFD packets follow the same lookup path as VXLAN data packets within the sender system." What is a "lookup path"? When would it be ok to not ensure it? BTW, a better Normative statement might me (something like): MUST follow the same lookup path... (4) §4: "The MAC address MAY be configured, or it MAY be learned via a control plane protocol. The details of how the MAC address is obtained are outside the scope of this document." The Normative MAYs are really just stating a fact, and out of scope any way. s/MAY/may (5) §5: "If the Destination MAC of the inner Ethernet frame doesn't match any of VTEP's MAC addresses, then the processing of the received VXLAN packet MUST follow the procedures described in Section 4.1 [RFC7348]." §4.1 of rfc7348 is about Unicast VM-to-VM Communication -- which makes me think that if the procedures from that section are followed then the BFD packet may be forwarded to a VM, which seems to be in contradiction with this statement (from §3): "BFD packets intended for a VTEP MUST NOT be forwarded to a VM as a VM may drop BFD packets leading to a false negative." What am I missing? (6) Related to the last point, the following sentences also mention that BFD packets MUST NOT be forwarded to VMs...but with qualifications: §5: "If the BFD session is using the Management VNI (Section 6), BFD Control packets with unknown MAC address MUST NOT be forwarded to VMs." §6: "All VXLAN packets received on the Management VNI MUST be processed locally and MUST NOT be forwarded to a tenant." The difference between these 2 statements and the one from §3 is that they seem to be intended only when using the Management VNI...but it would seem that the general statement applies there too, right? IOW, the specific statements about the Management VNIs are simply affirming what was already said more generally in §3, right? (7) Nits: s/of VXLAN gateway and there is a router in underlay/of the VXLAN gateway and there is a router in the underlay s/VTEP MUST validate/the VTEP MUST validate s/then BFD session/then the BFD session
Thank you for the work put into this document. I fully second Adam's COMMENT that should be fixed before publication (IMHO this is a DISCUSS). Answers to my COMMENTs below will be welcome, even if those COMMENTs are not blocking. As usual, an answer to the DISCUSS is required to clear my DISCUSS though. I hope that this helps to improve the document, Regards, -éric == DISCUSS == May be I am not familiar enough with BFD, but, RFC 5881 (the one defining BFD) specifies the use of TTL = Hop Limit = 255.. Why this document uses a value of 1 ? -- Section 3 -- IPv4-mapped IPv6 addresses are only to be used inside a host and should never be transmitted in real packets (including packets inside a tunnel) see section 4.2 of RFC 4038 (even if informational). As other IESG reviewers, I wonder why ::1/128 is not used? -- Section 8 -- The document specifies no IANA actions while the shepherd write-up talks about a IANA action. -- Section 9 -- This section is only about IPv4 (TTL and RFC 1812). Please address IPv6 as well.
== COMMENTS == RFC 5881 (BFD) states that it applies to IPv4/IPv6 tunnels, may I infer that this document is only required to address the Ethernet encapsulation ? I.e. specifying the Ethernet MAC addresses? -- Section 3 -- At first sight, I was surprized by having a BFD session per VXLAN VNI as it will create some scalability issue, but, I assume that this is to detect misconfiguration as well. If so, perhaps worth mentionnig the reasoning behind? In "the inner destination IP address SHOULD" it is unclear whether it is in the all BFD packets, or only the request one or ... ? -- Section 4 -- While probably defined in RFC7348, should "FCS" be renamed as "Outer Ethernet FCS" for consistency with the "Outer Ethernet Header" in figure 2 ? Why not using the Source MAC address as the Destination MAC address ? This would ensure that there is no conflict at the expense of "forcing" the transmission of the frame even if addressed to itself. Please consider rewriting the section about TTL/Hop Limit as it is not easy to parse/read. -- Section 9 -- It is unclear to me (see also Ben's comment) what is the 'attack vector' of sending packets with TTL=1 ?
I support Ben Kaduk’s DISCUSS position. * Section 9. Per “The document requires setting the inner IP TTL to 1, which could be used as a DDoS attack vector”, could you please clarify what part(s) of the notional architecture would be impacted (e.g., physical, virtual; and how)? * Section 9. Per: Thus the implementation MUST have throttling in place to control the rate of BFD Control packets sent to the control plane. On the other hand, over-aggressive throttling of BFD Control packets may become the cause of the inability to form and maintain BFD session at scale. Hence, throttling of BFD Control packets SHOULD be adjusted to permit BFD to work according to its procedures. I’m having difficulty parsing the guidance above – it points out the need to throttle and the ramifications of doing so. Per the last sentence, could you please clarify how the throttling should be calibrated. * Section 9. Per “this specification does not raise any additional security issues beyond those of the specifications referred to in the list of normative references”, I recommend being clearer which references you mean (i.e., I’m assuming you don’t mean RFC2119, RFC8174, etc.) * Editorial Nits: - Abstract. s/forming up/used to form/ - Section 9. s/such address/such an address/
Support Eric's DISCUSS on the IPv6 destination address and would like to see this clarified and resolved.
I support Benjamin and Eric's DISCUSSES - I considered holding a DISCUSS on the "loopback address" terminology and formatting (which was also noted in the excellent OpsDir review by Jürgen Schönwälder), but think that Eric can carry it. In addition, like Jurgen, I think it would be helpful to have pointers to where terms are defined - the "Terminology" section isn't really terminology, but rather just an acronym expansion section.
UPDATE: I didn't not see a reply to the original issue raised by the TSV-ART review (Thanks Olivier!). Please have a lock and provide a response. I don't think this raises discuss level but I think some clarifications would be good! This document describes the use of BFD in VXLAN, however, it does not specify any new protocol elements or extension. Therefore I would expect such a document to be informational. The shepherd write-up doesn't give any additional information about why this doc is PS.
I support Ben’s DISCUSS. In addition, I have a number of editorial comments. General: there are a lot of missing or incorrect articles, making the document harder to read than it should be. It would be good to fix that. If you let the RFC Editor fix it, it will require careful review during AUTH48 to make sure it’s correct. — Abstract — The phrase “forming up” is odd; I suggest just “forming”. — Section 3 — BFD packets intended for a VTEP MUST NOT be forwarded to a VM as a VM may drop BFD packets leading to a false negative. This needs two commas: one before “as” and one before “leading”. And what does “leading to a false negative” mean here? I don’t understand. It is RECOMMENDED to allow addresses from the loopback range through a firewall only if it is used as the destination IP address in the inner IP header, and the destination UDP port is set to 3784 [RFC5881]. I THINK the antecedent for “it” is meant to be “addresses from the loopback range”, though because of the number mismatch it looks like the antecedent is “a firewall”. One fix is to change “addresses” to “an address”, correcting the number error, but that leaves the ambiguity. Maybe betterto make it “only if they are used as destination IP addresses”. Also, remove the comma. That fixes the sentence as written, but I also agree with Ben’s comment that this might need more significant rewording. — Section 4 — BFD packet MUST be encapsulated and sent to a remote VTEP as explained in this section. This needs to be either “A BFD packet” or “BFD packets” and “VTEPs”. The MAC address MAY be configured, or it MAY be learned via a control plane protocol. Are those the only two choices? As both “MAY” are optional, as written it allows for others. I suggest not using BCP 14 key words here, and instead saying, “The MAC address is either configured or learned via a control plane protocol.” This addresses the scenario when the inner IP destination address is of VXLAN gateway and there is a router in underlay which removes the VXLAN header, then it is possible to route the packet as VXLAN gateway address is routable address. This sentence is too fractured for me to make any sense of it, so I can’t suggest a fix. Please fix it. It looks like Ben made more sense of it than I could, so maybe his suggestion will work. — Section 5 — received VXLAN packet MUST follow the procedures described in Section 4.1 [RFC7348]. This needs to say “Section 4.1 of [RFC7348].”
Thanks for the work that everyone has put into this document. I have a couple of relatively important, related comments that should be taken into account prior to publication. --------------------------------------------------------------------------- §3: > As per Section 4, the inner destination IP address SHOULD be set to > one of the loopback addresses (127/8 range for IPv4 and > 0:0:0:0:0:FFFF:7F00:0/104 range for IPv6). Please consider reformatting this IPv6 address according to the recommendations of RFC 5952 (paying particular attention to sections 4.2.1, 4.3, and 5): ::ffff:127.0.0.0/104 It's also worth noting that, as a practical matter, modern operating systems do not seem to bind to anything in the IPv4-mapped range assigned to IPv4 loopback: Linux: ~$ ping6 ::ffff:127.0.0.1 PING ::ffff:127.0.0.1(::ffff:127.0.0.1) 56 data bytes ^C --- ::ffff:127.0.0.1 ping statistics --- 14 packets transmitted, 0 received, 100% packet loss, time 13316ms MacOS: ~$ ping6 ::ffff:127.0.0.1 PING6(56=40+8+8 bytes) ::ffff:127.0.0.1 --> ::ffff:127.0.0.1 ping6: sendmsg: Invalid argument ping6: wrote ::ffff:127.0.0.1 16 chars, ret=-1 It is not clear to me whether this poses an issue for your intended usage. In any case, please do not refer to ::ffff:127.0.0.0/104 as "loopback addresses": IPv6 has only one loopback address defined (::1). The range you cite is best described as "IPv4-mapped IPv4 loopback addresses." Alternately -- and this is probably better -- use "::1/128" instead of "::ffff:127.0.0.0/104" for the inner IP header destination address. As an aside, I share Benjamin's unease around the use of loopback addresses in this fashion. It may be worth noting that IETF protocols can reserve addresses in the 192.0.0.0/24 and 2001::/23 blocks if necessary, and such reserved addresses won't ever correspond to a valid destination. (There is corresponding text in section 4 that all of the preceding pertains to as well) --------------------------------------------------------------------------- §9: > This document recommends using an address from the Internal host > loopback addresses (127/8 range for IPv4 and > 0:0:0:0:0:FFFF:7F00:0/104 range for IPv6) as the destination IP > address in the inner IP header. Using such address prevents the > forwarding of the encapsulated BFD control message by a transient > node in case the VXLAN tunnel is broken as according to [RFC1812]: > > A router SHOULD NOT forward, except over a loopback interface, any > packet that has a destination address on network 127. A router > MAY have a switch that allows the network manager to disable these > checks. If such a switch is provided, it MUST default to > performing the checks. In addition to the comments above about IPv6 address formatting, the improper use of "loopback" terminology as it applies to IPv6, and concerns about using localhost: it's worth noting that this text in RFC 1812 refers to IPv4 routers -- RFC 8504 has no equivalent language, and so the use of ::ffff:127.0.0.0/104 implies no special router handling. ::1 *probably* does, at least as a practical matter.