Network Service Header (NSH)
Summary: Has 2 DISCUSSes. Has enough positions to pass once DISCUSS positions are resolved.
Kathleen Moriarty Discuss
Discuss (2017-09-26 for -24)
First, I'd like to thank the authors and WG for your efforts in recent revisions of this draft, it has come a long way. I still want to poke at the lack of a requirement for either integrity protection on the NSH itself or for MUSTs on protections from the transport encapsulation. Attacks inside of a data center or single operator domains happen all too often. The number from 2016 is up 164% as of a statistic I saw earlier today. We can't srug this off anymore. Security Considerations section: First two sentences say: NSH is designed for use within operator environments. As such, it does not include any mandatory security mechanisms. I think you intended the first sentence to say, "within a single operator environment" as what you have now could be multiple networks managed separately with that statement. Then for the second sentence, I know you don't have an integrity mechanism mandated, but I really think one should be. Couldn't the path be altered and not detectable if there is no integrity checking? This could be used to avoid security protections or to route it inappropriately through a multi-tenant environment. Sure, the underlying protocol should provide session encryption on application traffic, but there's no reason why security shouldn't have been baked into this protocol as a requirement. From the architecture document, the security considerations section calls attention to possible issues related to lack of integrity checking. Since no encapsulating transport is specified with required session encryption, and the NSH addition doesn't have integrity protection, how will you meet this architecture requirement from RFC7665: Service Overlay: Underneath the service function forwarders, the components that are responsible for performing the transport forwarding consult the outer-transport encapsulation for underlay forwarding. Used transport mechanisms should satisfy the security requirements of the specific SFC deployment. These requirements typically include varying degrees of traffic separation, protection against different attacks (e.g., spoofing, man-in-the-middle, brute-force, or insertion attacks), and can also include authenticity and integrity checking, and/or confidentiality provisions, for both the network overlay transport and traffic it encapsulates. It seems from this text, something should be specified for the transport encapsulation. From the text in the draft under review: As with many other protocols, without enhancements, the NSH encapsulation could can be spoofed or otherwise modified and is subject to snooping and modification in transit. However, the deployment scope (as defined in [RFC7665]) of the NSH encapsulation is limited to a single network administrative domain as a controlled environment, with trusted devices (e.g., a data center) hence mitigating the risk of unauthorized manipulation of the encapsulation headers or metadata. This is in direct conflict with the Service Overlay requirements in the Security Considerations of RFC7665. Section 8.1 I'd like to see some MUSTs to address the concerns listed in RFC7665 for encapsulation requirements or an addition of integrity protection on NSH itself.
Comment (2017-09-26 for -24)
This introductory text is much improved from a previous version and my comments, thanks for the update. This helps quite a bit. The Network Service Header (NSH) specification defines a new protocol and associated encapsulation for the creation of dynamic service chains, operating at the service plane. The NSH is designed to encapsulate an original packet or frame, and in turn be encapsulated by an outer transport encapsulation (which is used to deliver the NSH to NSH-aware network elements), as shown in Figure 1: Section 8.1: I don't think you need the text on BCP38. It's a helpful recommendation in general, but I don't see how it's directly applicable to this specification. Thank you for adding the text on Boundary protections per the SecDir review, I think this is very helpful.
Eric Rescorla Discuss
Discuss (2017-09-27 for -24)
I concur with Kathleen's DISCUSS. To state my view of things: 1. The assumption that the datacenter is a secure environment is not a reasonable one. As Kathleen and Adam both observer, datacenter breaches are common and that is why people are moving towards encryption inside the data center. I see that this draft has text claiming that this is only to be deployed in safe environments, but we know that technologies like this get deployed outside the locations for which we claim they are to be deployed, and there's nothing here to stop that. Moreover, the whole trend towards cloud computing pushes us away from designs in which you can safely talk about single secure zones. 2. The text in S 8.1 about how you might want to use some kind of transport security does not seem sufficient. As above, we know that if we don't specify something, people will deploy this technology in insecure settings without any kind of security. I concur with Kathleen's point that this document should provide built-in security mechanisms rather than just punting to the under-layer. Given that as S 1 makes clear, all these SFs are part of the same administrative domain, this seems like a comparatively less challenging setting. If there is some reason why that's infeasible, that needs to be explained.
Comment (2017-09-27 for -24)
Line 143 | Original Packet / Frame | +------------------------------+ Nit: I would have expected this stack to go the other way, with TE on the bottom. Line 165 overlay domain using virtual connections and tunnels. A corollary is that a network administrative domain has a well defined perimeter. This is not a reasonable assumption in modern datacenter environments, especially if you have virtualized services. Line 372 1 prior to NSH forwarding lookup. Decrementing by 1 from an incoming value of 0 shall result in a TTL value of 63. The packet MUST NOT be forwarded if TTL is, after decrement, 0. I am having trouble following this, Is the point that I can emit a packet with TTL 0, which is effectively TTL 64? Line 375 This TTL field is the primary loop prevention This TTL mechanism represents a robust complement to the Service Index, as the TTL is Nit: "prevention mechanism. This"? Line 379 better, although not perfect, interoperation with pre-standard implementations that do not support this TTL field. This point would be clearer if it were made before the rule about decrement. Line 403 0x0 - This is a reserved value. Implementations SHOULD silently discard packets with MD Type 0x0. Why is this a SHOULD and not a MUST? That seems like it will create potential interop problems. Line 651 encapsulated packet. It is therefore the last node operating on the service header. Can you also nest NSHs?
Alia Atlas Yes
Deborah Brungard No Objection
Ben Campbell No Objection
Comment (2017-09-27 for -24)
Substantive: - General: This is a mechanism to add metadata to user flows. There is very little discussion about how that metadata may relate to the application layer payloads. It's likely that some of those payloads will be encrypted by the user in an attempt to control what information is shared with middleboxes. I'd like to see some discussion about how this relates to the guidance in RFC 8165. (Note: I am on the fence about whether this should be a DISCUSS. But since "on the fence" is probably insufficient grounds for a DISCUSS, I'm leaving it as a comment.) - General: I support Kathleen's DISCUSS points concerning integrity protection. The document leaves that up to the transporting protocol. I think it's reasonable to recommend that that protocol at least default to providing integrity protection unless there's a good reason not to. -2.2, "version": How is the version field to be used by consumers? That is, what should a recipient do if the field contains a version number it doesn't support/recognize? -2.2, MD type 0x0: "Implementations SHOULD silently discard packets with MD Type 0x0." Why not MUST? -- MD type 0xF: "Implementations not explicitly configured to be part of an experiment SHOULD silently discard packets with MD Type 0xF." Why not MUST? -2.2, Next Protocol Values: Why are there 2 experimental values? (as opposed to 1, or, well, 3). -2.3, last paragraph (and several other places): This draft seems to take a position that a failed SFP means the service level flow fails. Are there no use cases where delivery of the service flow is critical and should happen even if the chain of middleboxes fails? -2.4, paragraph starting with "An SFC-aware SF MUST receive the data semantics..." I'm not sure what the intent of this paragraph is. Is that MUST really a statement of fact? Or is there really and expectation of an out-of-band delivery of some semantic definition? -3, list item 1: "A service classifier MUST insert an NSH at the start of an SFP." What if an initial classifier receives a packet that already has an NSH? Can multiple NSHs be stacked? -7.1, last paragraph: "Depending on the information carried in the metadata, data privacy considerations may need to be considered. " "may need to be considered" is weak sauce. Data privacy always needs to be considered, even if the _output_ of that consideration is that there is nothing sensitive being carried. Please consider dropping the "may". Also, this seems like an odd place to bury a privacy discussion. Please consider moving this to a "Privacy Considerations" section. -8, first paragraph: It seems like insider attacks are worth at least a mention when discussing a single operator environment as a mitigator against attacks. -8.1, 2nd paragraph: This doesn't seem like a single operator scenario, in the sense that part of the flow crosses a network that is not controlled by that operator. -8.3, 4th paragraph: Please elaborate on what is meant by "obfuscating" subscriber identifying information (as opposed to "encrypting" or "leaving it out in the first place".) Editorial: -2.2, "O bit", last paragraph: "The configurable parameter MUST be disabled by default." Does "disabled" mean "unset" (or "set to zero")? -2.2, "unassigned bits": "At reception, all elements MUST NOT modify their actions based on these unknown bits." Isn't that MUST NOT just a restatement of the "MUST ignore" from the previous sentence? There's no problem with reinforcing a point, but there shouldn't be multiple instances of the same 2119 requirement. Also, would logging a warning violate the "MUST NOT modify their actions/MUST ignore" requirement? -8, first paragraph: "NSH is designed for use within operator environments." Is there a missing "single" before "operator"?
Benoit Claise No Objection
Comment (2017-09-28 for -25)
- Section 2.5.1., you might want to mention that no metadata are specified at this point in time. Indeed, "New IETF Assigned Optional Variable Length Metadata Type Registry is specified in this doc., but empty - Section 2.3 OPS question: SPI must be unique per admin domain? Otherwise, you're looking for trouble, right? This would be typically addressed in an "Operational Considerations" section. Where is my "Operational Considerations" section...? - Section 2.4 Fixed length metadata. This specification does not make any assumptions about the content of the 16 byte Context Header that must be present when the MD Type field is set to 1, and does not describe the structure or meaning of the included metadata. An SFC-aware SF MUST receive the data semantics first in order to process the data placed in the mandatory context field. The data semantics include both the allocation schema and the meaning of the included data. I understand that the order of the metadata in the Fixed Length Context Header is important, right? Should it be mentioned? I understand that the fixed length metadata are specific per service, and that's the reason why there is no IANA for fixed length. Should this be mentioned? - if you publish a new version, change the order of these two paragraphs: Unassigned bits: All other flag fields, marked U, are unassigned and available for future use, see Section 11.2.1. Unassigned bits MUST be set to zero upon origination, and MUST be ignored and preserved unmodified by other NSH supporting elements. At reception, all elements MUST NOT modify their actions based on these unknown bits. Length: The total length, in 4-byte words, of the NSH including the Base Header, the Service Path Header, the Fixed Length Context Header or Variable Length Context Header(s). The length MUST be 0x6 for MD Type equal to 0x1, and MUST be 0x2 or greater for MD Type equal to 0x2. The length of the NSH header MUST be an integer multiple of 4 bytes, thus variable length metadata is always padded out to a multiple of 4 bytes. Lacking some time before the telechat, but not worth deferring (there are enough DISCUSS'). FYI, I arrived at section 5.
Spencer Dawkins No Objection
Comment (2017-09-26 for -24)
Thank you for responding to Wes Eddy's TSV-ART review of -19 (and, of course, for making text changes that seemed appropriate). It seems to me that you describe expectations about the applicability of NSH in various places in the document, and in various ways. You might consider (for example) pulling the common elements of statements like (from Section 5) Within a managed administrative domain, an operator can ensure that the underlay MTU is sufficient to carry SFC traffic without requiring fragmentation. Given that the intended scope of the NSH is within a single provider's operational domain, that approach is sufficient. and (from Section 8) NSH is designed for use within operator environments. As such, it does not include any mandatory security mechanisms. As with many other protocols, without enhancements, the NSH encapsulation can be spoofed and is subject to snooping and modification in transit. However, the deployment scope (as defined in [RFC7665]) of the NSH encapsulation is limited to a single network administrative domain as a controlled environment, with trusted devices (e.g., a data center) hence mitigating the risk of unauthorized manipulation of the encapsulation headers or metadata. This controlled environment is an important assumption for NSH. There is one additional important assumption: All of the service functions used by an operator in service chains are assumed to be selected and vetted by the operator. into one section describing the applicability of NSH, appearing MUCH earlier in the document (the most detailed description of your expectations looks like it appears in the Security Considerations section, but parts of that description are applicable to the Fragmentation Considerations section, which appears three sections earlier in the document). The reader would have your intended applicability in mind much earlier and more clearly, and you could just invoke your expectations by reference when you need to explain how they apply elsewhere in the document, so the expectations in play would be consistent across mentions throughout the document. I'm still bothered that this document doesn't explicitly mention ICMP blocking as a problem for PMTUD with IP encapsulations. We're just not good at path MTU discovery, so it seems useful to call this out explicitly when a document expects to use PMTUD. That way, people who use NSH will know to check for ICMP blocking on their networks before they receive their first trouble reports. This almost reached my threshold for balloting Discuss, so I'd hope you folks would consider that. I see that the applicability of NSH includes encapsulations that don't provide a path MTU discovery mechanism, and that your resolution for those encapsulations is to log events when a "too big" packet is dropped. Could you educate me, as to whether all encapsulations detect that this is happening? It might be that encapsulations are using a fixed maximum MTU by definition, so that what you're logging is an attempt to send a payload that violates the protocol definition of the encapsulation, but I don't know that that's true in all cases, so thought I should ask. I saw a suggestion from Joe Touch (in a response to the TSV-ART review) to consider looking at the terminology developed for draft-ietf-intarea-tunnels. I didn't see a reply to that suggestion, and I didn't see a reference to draft-ietf-intarea-tunnels in -24 - was this considered? (I'm also asking because I want to keep track of whether people applying encapsulations find that document useful, of course) (Joe's follow-up is at https://mailarchive.ietf.org/arch/msg/tsv-art/CsdWwR9B5_AB64D0eFl-KIE7_NA)
Suresh Krishnan (was Discuss) No Objection
Comment (2017-10-03 for -25)
Thanks for quickly addressing my DISCUSS and COMMENT points.
Warren Kumari No Objection
Comment (2017-09-27 for -24)
I provided long (and somewhat grumpy!) comments on the previous version of this document -- I'd like to thank the authors, especially Carlos for addressing them. This version is, IMO, much improved.
Mirja Kühlewind (was Discuss) No Objection
I'm clearing my discuss now, however, I don't think all of my comments have been adequately addressed. However, some points could be clarified such that these open points do not warrant to hold a discuss anymore. -------------------- Old discuss text: -------------------- I have a couple of comments on the design. I know, as always in IESG review state, it's probably too late to make any changes to the actual header format, therefore most of my comments are actually in the comment section below. I still decided to note them so at least people can consider these points. However, there are a few things that I need clarification for before publication, which I note in this section: 1) Sec 2.2 "SF/SFF/SFC Proxy/Classifier implementations that do not support SFC OAM procedures SHOULD discard packets with O bit set, but MAY support a configurable parameter to enable forwarding received SFC OAM packets unmodified to the next element in the chain. Forwarding OAM packets unmodified by SFC elements that do not support SFC OAM procedures may be acceptable for a subset of OAM functions, but can result in unexpected outcomes for others; thus, it is recommended to analyze the impact of forwarding an OAM packet for all OAM functions prior to enabling this behavior. The configurable parameter MUST be disabled by default." This part is really unclear to me and I believe needs to be further specified. Where should this configurable parameter be? In the Context header? Why don't you just use one of the unassigned bit to indicate if an unknown (OAM) packet should be forwarded or not? Moreover, I also disagree with this text. If there is a bit/a way to indicate if a not supported OAM packet should be forwarded or not, it should just be defined like this, while any considerations if that bit should be set or not depend on the OAM function itself and do not need to be discussed here. Finally, it is not well explained what an OAM packet is at all. Is that a 'fake' packet that is generated by the operator to actively test the (potentially newly configured) SFP? If so, why does a SF need to know if a packet is an OAM packet or not? Usually it's a bad idea to use different kind of traffic for testing compared to what will be used in operations. Please provide more explanation here! 2) section 2.4 "An SFC-aware SF MUST receive the data semantics first in order to process the data placed in the mandatory context field. The data semantics include both the allocation schema and the meaning of the included data. How an SFC-aware SF gets the data semantics is outside the scope of this specification." This is really confusing to me. I think this is what you need an actually data semantics aka type field for in the base header. Or is there an actual reason to not put this information directly in the base header where it is need but instead assuming some magical way this information may take to reach the node? If the assumption is that the SF is configured to know based of the SFI what the content of the context header has to be, you a) need to say that in the draft, and b) that's really error-prone because it's really hard to tell if the conext header actually holds the information that you need or just random crap (of course depending of the expected data type of this information). In short, I think you really need a type field somewhere here. In any case, you really need to explain this more! Also, the text further says: "An SF or SFC Proxy that does not know the format or semantics of the Context Header for an NSH with MD Type 1 MUST discard any packet with such an NSH..." How does the SFC proxy know that it knows the format or not if there is no type field or identifier that indicates what the format should be? Also, a related question from me: why is the context header present in all types of NSH if there is no use for it defined in this document yet? Why is there no fixed length NSH without a context header then? 3) Section 2.5.1: "If multiple instances of the same metadata are included in an NSH packet, but the definition of that context header does not allow for it, the SFC-aware SF MUST process the first instance and ignore subsequent instances." This seems error prone to me. If the same metadata appears multiple where it should not, that seems clearly like an error case for me. Just using the first one and proceed normally might not be the right thing to do. In any case I think such an occasion should at least be logged. If the multiple instances are just a copy of each other and carry the same information, it's probably okay to use that information and proceed. If the different instances carry different information, it maybe a bit dangerous to just use the first one and ignore others silently. In this case I would rather recommend to drop the packet... 4) In line with the second comment from the tsv-art review (thanks Wes!), I don't really understand why this documents says (sec 6) that there can be multiple next hops for the same SFP or SFs can be traversed in a different order. May understanding (from a quick look at RFC7665) would be that, if those things are needed e.g. for load balancing, then one should define different SFPs and the Classifier must have the knowledge that two SFP are equivalent and select them respectively. The reason why I'm really concerned about this is that usually a number of packets below to a flow and all packets belonging to the same flow just ideally take the same route. But usually only the Classifier has a notion of what a flow is and respectively will assign the SFI to the packets belonging to the same flow. If now any SF on the path can more or less randomly decided to forward packet belong to the same flow to one or another next nodes, I would assume that this is not only a problem for the flow, e.g. reordering, but also for SF itself in many cases. ------------------------ Old comment text: ------------------------ Further considerations: 1) I don't really see why a TTL in the base header is needed. I mostly understand why there is the Service Index in service header, also I think there should be better mean to validate that you SFP is correct and I ideally you should really not need this. However, loop prevention can be provided by both mechanism and moreover there is probably also often a TTL in the encapsulation protocol and loop prevention should really be a function of that forwarding protocol and not the NSH. 2) I don't see why you need the type field in the base header. This is fully redundant because because all you need is the length field. If the length is 0x2 it is what you have defined as type 1 if the length field is larger it is type 2. I also don't see the need for any other types in future because you also have the version field; if you need anything else you should go or probably have to go for a new version. Note that the general probably with unnecessary redundancy is that is add complexity. If you keep this redundancy you have to separately handle and implement the case(s) where the type is 1 but the length is larger than expected. If at all you could probably just use one bit to indicate that the length field is present and if not the length is 0x2. However, saving bits does not really seem to be a concern for you, so that might not actually be an advantage. 3) sec 2.5.1 "Unassigned bit: One unassigned bit is available for future use. This bit MUST NOT be set, and MUST be ignored on receipt." Is there an actual reason to have an unassigned bit here? Because I would assume that the type already provided enough flexibility for way to extend the metadata format in any way needed. 4) Also section 2.5.1: " Length: Indicates the length of the variable metadata, in bytes. In case the metadata length is not an integer number of 4-byte words, the sender MUST add pad bytes immediately following the last metadata byte to extend the metadata to an integer number of 4-byte words. The receiver MUST round up the length field to the nearest 4-byte word boundary, to locate and process the next field in the packet." Your definition of the length field might be more error-prone than needed. It would probably be easier to simply define the length as 4-byte words, and the type of course defines the content of the metadata field and as such can simply define which part of the total metadata field holds certain data of a certain type and with part is padding. 5) And finally I have to say it is unclear to me why the SFI and SI field are described as a separate header. Given they have to be present in all SFH, I would consider them as two fields of the base header. But it is after all really just an editorial issue. However, all this together with my previous comments makes the protocol spec actually much more complicated than it needs to be... 6) I also have to agree to the last comment of the tsv-art review: I think it would have been nicer to not only described the NSH but also define mappings to a set of possible encapsulations because I would assume that for each encapsulation there are a couple specific considerations that need to be made to make things work successfully. I don't think that all encapsulations can be captured by general consideration and I cannot make up my mind to go through all cases in my head to figure out if there are things that needs to be noted.
Terry Manderson No Objection
Alexey Melnikov No Objection
Comment (2017-09-27 for -24)
I am agreeing with Kathleen's DISCUSS. Also, have you thought about likelyhood of introducing new versions, and if it is likely, what kind of restrictions do you want to impose on future versions (e.g. requirements on backward compatibility) and what are the criteria for bumping the version number? For example, future versions must use the same Base Header and Service Path header, but can add new mandatory fields after that. Etc.
Alvaro Retana No Objection
Comment (2017-09-26 for -24)
(1) While describing the MD Type field, Section 2.2. (NSH Base Header) talks about the specific scenario in which "a device will support MD Type 0x1 (as per the MUST) metadata, yet be deployed in a network with MD Type 0x2 metadata packets", and it specifies that "the MD Type 0x1 node, MUST utilize the base header length field to determine the original payload offset if it requires access to the original packet/frame." This is the case where the node in question *does not* support MD Type 0x2, right? If so, then the specification above seems to go against (in the last sentence of the same paragraph): "Packets with MD Type values not supported by an implementation MUST be silently dropped." IOW, if the node doesn't support 0x2, why wouldn't it just drop the packet? (2) Section 2.5.1. (Optional Variable Length Metadata) says that this document "does not make any assumption about Context Headers that are mandatory-to-implement or those that are mandatory-to-process. These considerations are deployment-specific." But the next couple of paragraphs specify explicit actions for them (mandatory-to-process): Upon receipt of a packet that belongs to a given SFP, if a mandatory- to-process context header is missing in that packet, the SFC-aware SF MUST NOT process the packet and MUST log an error at least once per the SPI for which the mandatory metadata is missing. If multiple mandatory-to-process context headers are required for a given SFP, the control plane MAY instruct the SFC-aware SF with the order to consume these Context Headers. If no instructions are provided and the SFC-aware SF will make use of or modify the specific context header, then the SFC-aware SF MUST process these Context Headers in the order they appear in an NSH packet. Maybe I'm confused about considerations being deployment specific vs specifying what to do here. Can you please clarify? (3) "SFFs MUST use the Service Path Header for selecting the next SF or SFF in the service path." Section 6 explains most of what has to be done -- what I think is still not clear in this document is where the information in Tables 1-4 comes from. There may be different ways for an SFF to learn that, and I would imagine that it is out-of-scope of this document. Please say so -- maybe there's a relevant reference to rfc7665 (?). (4) Section 11.1. (NSH EtherType) seems out of place in this document because (1) the document doesn't discuss the transport itself, and (2) it is in the IANA section... (5) What is the "IETF Base NSH MD Class" (Section 11.2.4)? Ahh, I see that Section 11.2.6 talks about "the type values owned by the IETF"; it would be good to say that MD Class 0x0000 is being assigned to the IETF (in 11.2.4). Nits: In section 2.2. (NSH Base Header), it would be nice to have a forward reference when the Service Index is first mentioned. It may be nice to explicitly state in the description of the MD Type field (Section 2.2) that for length = 0x2 and MD Type = 0x2, there are in fact no optional context headers. (I know there's some text about this later in section 2.5.) "...all domain edges MUST filter based on the carried protocol in the VxLAN-gpe". That "MUST" is out of place because the text is an example.
Adam Roach No Objection
Comment (2017-09-27 for -24)
I have the same concern as Kathleen's DISCUSS, and would have blocked the draft on the same grounds if such a position were not already in place. The "crunchy perimeter, soft center" model of security was flawed to start with; and, even in those arenas where it was once fashionable, it's starting to be considered dated (e.g., much of the traffic inside data centers is secured using TLS -- see the recent discussions in the TLS working group for evidence of this situation). More notably, this "unconditionally trusted network zone" approach to security has led to some spectacular exploits recently (cf. https://www.wired.com/2016/04/the-critical-hole-at-the-heart-of-cell-phone-infrastructure/). Rather than explicitly fostering this model, the security section really needs to normatively disallow it. (n.b., I reviewed version -21 of the document -- but I don't find the changes between that version and -24 to address the issue Kathleen raises) ---- Section 3 says the following about reclassification behavior: When the logical classifier performs re- classification that results in a change of service path, it MUST replace the existing NSH with a new NSH with the Base Header and Service Path Header reflecting the new service path information and MUST set the initial SI. The O bit, as well as unassigned flags, MUST be copied transparently from the old NSH to a new NSH. Metadata MAY be preserved in the new NSH. I don't see anything here about copying the TTL. If the TTL isn't copied, you can end up with a stable (and unending) loop involving two classifiers (which seems even more damaging than usual, as the SI value won't generally survive a reclassification, right?). I would suggest adding "TTL" to the list of things that MUST be copied when reclassification occurs.