Ballot for draft-ietf-sfc-oam-framework

Comment (2020-05-06 for -13) Sent

[[ nits ]]

Several minor grammar tweaks, which others have identified, and hopefully can
be fixed in these final stages of publication. A few of them I did mention
below.

[ section 2 ]
* The first paragraph describing Figure 1 has a few grammar bugs. May I suggest
  something to the effect of:

   In Figure 1, the service layer elements such as classifier and SF are
   depicted as virtual machines that are interconnected using an overlay
   network.  The underlay network may comprise multiple intermediate nodes
   not shown in the figure that provide underlay connectivity between the
   service layer elements.

[ section 3.1.1 ]
* s/the got/the/

* The 2nd to last paragraph could use a simplifying rewrite. The conversational
  tone I think impedes the direct, clear transmission of intended meaning.

* "The SF availability can be performed": does this refer to the *measurement*
  of SF availability?

[ section 3.2.1 ]
* s/comprised of/composed of/

[ section 7 ]
* Do 6.4.1 and/or 6.4.4 suffice to perform topology exploration of the SFC?

Comment (2020-05-07 for -13) Sent

Thank you for engaging my DISCUSS question about Section 3.1.1.  I don't feel it was fully resolved, but I also don't feel there's anything to be gained by pressing my point further.

My original comments:

I think this is pretty well done.  I had little trouble following it and this is my first foray into the realm of SFC.

I find myself almost suggesting that you don't need the BCP 14 boilerplate or its language at all in this document.  You barely use it, and it might not even be needed in the places you do have it especially since this is a framework document and not a protocol document.

Finally, lots of editorial stuff:

Section 2:
* “In Figure 1, the service layer element such as …” -- s/element/elements/
* “The underlay network may comprise of …” -- s/comprise/consist/, or alternatively, s/comprise of/comprise/
* “... nodes but not shown …” -- remove “but”, or s/but/but these/

Section 3, all three numbered list entries:
* “... applicable at this component includes …” -- s/includes/include/

Section 3.1.1:
OLD:
"SF availability is an aspect that raises an interesting question -- How to determine that a service function is available?."
NEW:
"SF availability is an aspect that raises an interesting question: How does one determine that a service function is available?"

* “... the packet did indeed get the got expected service.”  -- remove “got”
* This section uses both “a SF” and “an SF”.  I don’t know which one is grammatically correct, but we should find out and use that. Or, if both are, pick one and be consistent.

Section 3.2.1:
* “An SFC could be comprised of …” -- either remove “of”, or use “composed of”

Section 3.3:
* Either capitalize “Classifier” generally, or not at all.

Section 3.4:
* What constitutes an availability check of the underlay network?  You discussed this in 3.1.1, but not here.  If this is covered in Section 4.1, a forward reference would be helpful here.

Section 3.5:
* “... and are mostly transparent …” -- s/are/is/; “network” is singular
* What constitutes an availability check of the overlay network?

Section 4:
* “...for more than one SFC components.”  -- use “component”, or change “more than one” to “multiple”

Section 4.1:
* “Verify any packet re-ordering and corruption.” -- perhaps “detect” would be better than “verify” here

Section 4.2:
* “Ability to provision continuity check …” -- put an “a” before “continuity”
* “... supported by continuity function are as follows:” -- s/function/functions/

Section 4.3:
* “... from every transit devices …” -- s/devices/device/
* “Ability to trigger action from …” -- s/action/an action/

Section 5.1:
* I would quote “ping” and “trace”.
* “Table 3 below is not exhaustive” -- needs a period at the end
* Table 3 needs a bit of tidying in terms of alignment.

Section 6.1:
* “... network layer must mark the packet …” -- why isn’t that a MUST?

Section 6.2:
* “... skipping an SF might have implication …” -- s/implication/implications/
* “Any SFC-aware node that initiates an OAM packet must set …” -- why isn’t that a MUST?

Section 6.3:
* “... indicates it as OAM packet …” -- s/as/as an/

Section 6.4.1:
* “[RFC0792] and [RFC4443] describes …” -- s/describes/describe/
* “... verify the availability of SF or SFC.” -- s/SF/an SF/
* “... can generate ICMP echo …” -- s/ICMP/an ICMP/
* “... from last SFF and thereby …” -- add comma after “SFF”
* “Alternately, …  Alernatively, … ” -- use “Alternatively” for the first one, and “or” for the second, all in one sentence

Section 6.4.2:
* “[RFC5880] defines Bidirectional …” -- s/defines/defines the/
* “... to perform continuity function …” -- “functions”, or “the continuity function”
* “... value as last SFF …” -- s/last/the last/
* “... with relevant DIAG code.” -- s/with/with the/  (or “a”)

Section 6.4.3:
* “... transported using NSH header.” -- s/using/using the/

Section 6.4.4:
* “... is implemented in Open Daylight and available.”  -- s/and/and is/

Section 7:
* Table 4 needs some work on spacing consistency.

Section 8:
* “... are applicable for this document.” -- s/are/is/, as “consideration” is singular
* “The OAM information from service layer …” -- s/from/from the/
* “... service function paths etc.” -- s/paths etc./paths, etc./
* “... information from SFC layer raises a need for careful security considerations” -- s/from/from the/, and the sentence is missing a period
* “... SFC and SF OAM may provide mechanism …” -- s/mechanism/mechanisms/
* “... the OAM solution for SF component should …” -- s/component/components/

Comment (2020-05-06 for -13) Sent

** Section 1. Per “OAM controllers are assumed to be within the same administrative domain as the target SFC enabled domain”, is there a circumstance where the OAM controller would be working across administrative domains? Can this statement be made stronger – “OAM controller MUST be within the same administrative …”

** Section 2. Example technologies for the underlay (i.e., IP MPLS) and link (i.e., POS DWDM) layers are provided. It might help readability to provide such examples for the overlay network layer too.

** Section 3.0. “Testing an SF may not be restricted to connectivity to the SF, but also whether the SF is providing its intended service. Refer to Section 3.1.1 for a more detailed discussion.” Section 3.1.1 appears to discuss availability but not validation practices to ensure “the SF is providing its intended service”. Is providing the intended service as simple as the service being up? Or does it include confirmation that functionally the right service is happening?

** Section 3.4. Typo. s/and so/so/

** Section 3.5. Per “The overlay network establishes the service plane between the SFC components and are mostly transparent to the SFC data plane elements.”, in what way is the overlay network not transparent?

** Section 5.1. Per “Tools like ping and trace …” and “BFD is another tool …”, what specific instances of ping and trace are you referencing? Or is this generically saying ping = Section 6.4.1, trace = Section 6.4.4 and BFD = Section 6.4.2?

** Section 5.1. I initially found the contents of this table confusing. The paragraph introducing this table refers to “ping” and “trace” as tools. However, E-OAM, IPPM, etc. which are also noted in this table don’t seem easily categories by the “tool” designation.

** Section 6.4.3. Is there a reason that In-Situ OAM is mentioned here but not in Section 5.1?

** Section 8. Per “The sensitivity of the information from SFC layer raises a need for careful security considerations”, what are these concerns? It also isn’t clear what information is being mentioned.

** Section 8. (I’m not calling this a DISCUSS since this is a trivial editorial fix but it MUST be done before publication) Per “To address the above concerns, SFC and SF OAM may provide mechanism for:
o Misuse of the OAM channel for denial-of-services, …”

A crucial missing word here as in – s/provide mechanisms for:/provide mechanisms for mitigating:/

** Section 8. Per “To address the above concerns, SFC and SF OAM may provide mechanism for:
…
o Leakage of OAM packets across SFC instances, and
o Leakage of SFC information beyond the SFC domain.”

-- the text above mentions the risk of risk of SFC evasion but this isn’t mentioned here

-- the two bullets here seem appropriate and needed OAM activities but they don’t follow from the text above (as explicitly promised by the text “to address the above …”)

** Section 8. What is the different between the guidance:
-- “… SRC and SF OAM may provide mechanism to [mitigate]: … leakage of SFC information beyond the SFC domain”
-- “The documents proposing the OAM solution for any service layer component should consider some form of message filtering to prevent leaking any internal service layer information outside the administrative domain”

To me they are saying the same thing, except the second item explicitly notes message filtering.

Comment (2020-05-06 for -13) Sent

Thank you for the work put into this document. The document is easy to read; BTW, I found that its content is more about the justifications / requirements for an OAM system and tools descriptions than about for a framework description.

The core of the document appears to be section 6: this should probably be reflected in the abstract and introduction

Please address the comments raised in the Internet directorate review by Carlos Bernardos:
https://mailarchive.ietf.org/arch/msg/int-dir/TgQulH7hytGPNxdAPWcSgkTx1IM

Please find below a couple on non-blocking COMMENTs. I would really appreciate a reply to all these COMMENTs.

I hope that this helps to improve the document,

Regards,

-éric

== COMMENTS ==

I would not refer to BCP 14 (RFC 8174) as this is an architectural/framework document (informational) and not a protocol specification.

It seems that most of the described tools are about synthetic traffic. Is there any other means to do OAM in SFC (not that I have any suggestion...)?.

-- Section 1 --
About "to be applied to packets and/or frames", for me packets are layer-3 PDUs and frames are layer-2 PDUs. While I am not familiar with SFC, I could envision SFC being applied to transport or application layers PDUs. So, why restricting the use of this document to layers 2 and 3 only ?

-- Section 2 --
Is there a reason why all 'virtual links' are not mentioned in this section? I.e., SR-IOv network, tun/tap, ...

Similar question about why limiting the example of VM and not including containers ?

-- Section 3 --
The word "performance" is often used in the document but it is not described in depth though: is it about the SF CPU/memory or 'client traffic' latency & throughput ? Section 4 partially addresses my question but not completely; also, adding forward pointers to section 4 would be nice.

-- Section 4.3 --
Please bear with my ignorance of SFC world... but, if a SF is doing proxying / rewriting the application message, how useful is an end-to-end PMTUd check? As there are two stitched TCP connections ?

The overall assumption of this section is that all SF are pure layer-3, leaving the IP header intact so that ECMP & TTL checks can be done. Is it always the case ?

Section 5.2 addresses the above points, but, I suggest that section 4.3 to be restricted to ' link-layer OAM'

-- Section 6.4.1 --
"TTL field in NSH header to 63", not familiar with NSH, but, if there is a TTL field in NSH, then it could be useful to point to the RFC & section describing it. Esp in a section whose title is "ICMP" (referring obviously to the IP header).

-- Section 8 --
In this security section, I wonder whether the trace tool deserves a paragraph or two as if trusted while being forgeable/spoofed, then operators could trust a SFC which is "owned" and not reliable (i.e., with a bypass of some security SF). Trusting the security AD to raise a DISCUSS if they think it is a DISCUSS.

== NITS ==

-- section 6.3 --
Is it really required to re-specify the use of bit O in NSH ?

-- Section 6.4.1 --
Sigh... using the IPv4 terminology of TTL...

Yes (for -13) Unknown

No Objection (2020-07-05) Sent

Thanks for addressing my DISCUSS and COMMENT.

No Objection (2020-05-07 for -13) Sent

I support Murray's DISCUSS.

I appreciate the fact that "More specifics on the mechanism to characterize SF-specific OAM to validate the service offering are outside the scope of this document." (§3.1.1)

The issue I want to point out, which I believe is a significant omission in this document, is the lack of mention in §4 and §5 of the validation of the service offering as an SFC OAM function or in the gap analysis.  IOW, the availability of the SF from the point of view of its ability to provide the service is pointed out as important in §3.1.1, but there is no further consideration later in the document.  


[I believe that this issue borders on a DISCUSS -- and while I would really like to see further consideration in the text, I decided to trust that the authors and the responsible AD will take care of it.]

No Objection (2020-05-05 for -13) Sent

Please use the new BCP 14 boilerplate exactly as in RFC 8174.

No Objection (2020-05-06 for -13) Sent

Section 2

I'm not sure that I understand why a node in the underlay network (the
"VM2" one) doesn't line up with a node in the link layer, in Figure 1.

Section 3

       classifiers, controllers, other service nodes).  Testing an SF
       may not be restricted to connectivity to the SF, but also whether

nit(?) perhaps "may be more expansive than just checking connectivity to
the SF", to avoid questions about what the "not" binds to.

   3.  Classifier component: OAM functions applicable at this component
       includes testing the validity of the classification rules and
       detecting any incoherence among the rules installed in different
       classifiers.

It seems important to include both positive and negative tests of
classification functionality (i.e., that traffic that should not match
in fact does not match).  Section 3.3 might be a good place to mention
this.

Section 3.1.1

   service function is available?.  On one end of the spectrum, one
   might argue that an SF is sufficiently available if the service node
   (physical or virtual) hosting the SF is available and is functional.
   On the other end of the spectrum, one might argue that the SF's
   availability can only be concluded if the packet, after passing

I agree with the other reviewers that the first "end" of the spectrum
seems surprising.

   firewall).  The cost of this approach is that the OAM mechanism for
   some SF will need to be continuously modified in order to "keep up"
   with new functionality being introduced: lack of extendibility.

nit: the grammar doesn't seem right around "lack of extendibility"
(also, is "extendibility" preferred or "extensibility"?).

   The SF availability can be performed using a generalized approach
   (i.e., an adequate granularity to provide a basic SF service).  More

nit: I think it's an availability *check* that can be performed.

Section 3.2.2

Mandating the ability to measure every arbitrary segment of SFs within
an SFC seems like it might be over-constraining.

Section 4

   to perform OAM functionality at different layers.  In order to apply
   such OAM functions at the service layer, they need to be enhanced to
   operate a single SF/SFF to multiple SFs/SFFs in an SFC and also in
   multiple SFCs.

I don't understand what "operate a single SFF to multiple SFs/SFFs"
means.

Section 4.1

   o  Verify any packet re-ordering and corruption.

nit: this wording doesn't really make sense.  Do we want to verify the
absence of such things, or that it is within configured tolerances, or
something else?  Just noting any occurrences and verifying that the
noted occurrences are as noted doesn't seem useful...

   o  Verify the policy of an SFC or SF.

nit: similarly, is this to verify the configuration, or to verify that
operation matches the expected configuration?

Section 4.2

   Continuity is a model where OAM messages are sent periodically to
   validate or verify the reachability to a given SF within an SFC or

nit: while it's "connectivity to", I think it's "reachability of".

   o  Notifying the detected failures to other OAM functions or
      applications to take appropriate action.

nit: the subject of a notification is the entity receiving notification,
not the content of the notification.  So "notifying other OAM functions
or applications of the detected failures so they can take appropriate
action", or "Sending notifications of the detected failures to other
[...]".

Section 4.4

   delay [RFC7679] is important.  In order to measure one-way delay,
   time synchronization MUST be supported by means such as NTP, PTP,
   GPS, etc.

I think (informational) references are in order for these.  (PTP is not
listed as "well-known" at
https://www.rfc-editor.org/materials/abbrev.expansion.txt , though the
other two are.)

   One-way delay variation [RFC3393] could also be calculated by sending
   OAM packets and measuring the jitter between the packets passing
   through an SFC.

Looking at jitter between (measurement) packets to ascertain delay
variation seems to require foreknowledge of the (e.g., periodic) pacing
of the initial packet transmission.  If, on the other hand, the idea is
to look at the jitter across the measured delay of each packet, then
that works fine (but that's not what the current text says).

   o  Ability to measure the packet loss [RFC7680] within an SF or an
      SFP bound to a given SFC.

nit(?): packet loss "within an SF" (as opposed to between two SFs) is
not something I would have expected to need measuring, on first thought.
Though on further reflection it is less surprising; still, I wanted to
check that this is indeed as intended.

Section 5.2

   As shown in Table 3, there are no standards-based tools available for
   the verification of SFs and SFCs.

Some note about "at the time of this writing" or similar seems advised;
otherwise this statement is unlikely to age well.

Section 6.2

   An SFF may choose not to forward the OAM packet to an SF if the SF
   does not support OAM or if the policy does not allow to forward OAM
   packet to an SF.  The SFF may choose to skip the SF, modify the
   header and forward to next SFC node in the chain.  It should be noted
   that skipping an SF might have implication on some OAM functions
   (e.g. the delay measurement may not be accurate).  The method by

This behavior was initially surprising to me, and "might have
implication on" feels like weaker text than is merited.  While I can
perhaps imagine that not forwarding an OAM packet to an SF that will
choke on it instead of doing something useful, it seems like it's rather
divergent from the OAM expectations to silently bypass a given SF and is
quite likely to affect the accuracy of the resulting OAM results.

   process OAM packets is outside the scope of this document.  It could
   be a configuration parameter instructed by the controller or it can
   be done by dynamic negotiation between the SF and SFF.

(Is there an existing mechanism for dynamic negotation between SF and
SFF?)

Section 6.3

   As described in Section 4, there are different OAM functions that may
   require different OAM solutions.  While the presence of the OAM
   marker in the overlay header (e.g., O bit in the NSH header)
   indicates it as OAM packet, it is not sufficient to indicate what OAM
   function the packet is intended for.  The Next Protocol field in NSH
   header may be used to indicate what OAM function is intended to or
   what toolset is used.

Elsewhere in the document we make reference to what would be required of
a non-NSH encapsulation header; is it appropriate to also do so here?

Section 6.4.2

   BFD or S-BFD could be leveraged to perform continuity function for SF
   or SFC.  An initiator could generate a BFD control packet and set the
   "Your Discriminator" value as last SFF in the control packet.  Upon

nit: I think this "your discriminator" would be the address or
identifier of the last SFF, not just "the last SFF" itself, right?
Or be set "to indicate" the last SFF, or similar.
(Also occurs a few sentences later.)

   with relevant DIAG code.  The TTL field in the NSH header could be
   used to perform partial SFC availability.  For example, the initiator

nit: availability checks/checking

Section 6.4.4

   [I-D.penno-sfc-trace] defines a protocol that checks for path
   liveliness and traces the service hops in any SFP.  Section 3 of
   [I-D.penno-sfc-trace] defines the SFC trace packet format while
   Sections 4 and 5 of [I-D.penno-sfc-trace] defines the behavior of SF
   and SFF respectively.  While [I-D.penno-sfc-trace] has expired, the
   proposal is implemented in Open Daylight and available.

Why is draft-penno-sfc-trace not progressing towards publication?

Section 7

I think this table really needs more lead-in to what it's communicating.

   As depicted in Table 4, information and data models are needed for
   configuration, manageability and orchestration for SFC.  With

I don't see where that is actually indicated by the table.

Section 8

OAM information (though most usefully as summary statistics), if leaked,
could also be used by an attacker to gauge the efficacy of an ongoing
attack.

   Any security consideration defined in [RFC7665] and [RFC8300] are
   applicable for this document.

The rest of the document implies that NSH is not mandatory, so I'd
suggest rewording this reference to clarify what from RFC 8300 is (or is
not) applicable to the whole of the document.

   The mapping and the rules information at the classifier component may
   reveal the traffic rules and the traffic mapped to the SFC.  The SFC
   information collected at an SFC component may reveal the SF
   associated within each chain and this information together with

nit: s/the SF/the SFs/

   To address the above concerns, SFC and SF OAM may provide mechanism
   for:

(1) The crucial missing word that Roman notes is indeed crucial!
(2) Can we say something stronger than "may"?

   The documents proposing the OAM solution for any service layer
   components should consider some form of message filtering to prevent
   leaking any internal service layer information outside the
   administrative domain.

"should consider" is fairly weak guidance; would "should provide" be
appropriate?

Also, is it worth mentioning that this filtering would include dropping
OAM-marked messages from outside the domain (at least by default)?

Section 12.1

The only citation to RFC 8300 that appears to require a normative
reference is the bit in the security considerations that I noted was
unclear about its scope of applicability.

Section 12.2

RFC 6291 is a BCP, so we should probably cite it as such.  Also, given
its content, perhaps it ought to be normative?

No Objection (for -13) Not sent

No Objection (2020-05-04 for -13) Sent

Please address the issues in the (detailed!) Tsvarea review, in particular the bits from section 4 that incorrectly describe the work coming out of IPPM.

Additional nits:

Sec 3.1.1: s/extendibility/extensibility

Sec 3.1.2: there are also “hybrid” methods like IOAM that do not fit the active and passive definitions neatly.

4.1 s/packet to be of variable length packet size/packet to be of variable length

4.1 you are probably not trying to “verify” packet reordering and corruption! I suggest “detect” instead.

6.3 s/is intended to/is intended
 
6.3 s/in NSH header/in the NSH header

6.4.1 s/describes/describe

6.4.1 s/incrementing the ttl/incrementing ttl

6.4.3 s/using NSH header/using the NSH header

8. s/Any security consideration/Any security considerations

8. Missing period at end of second paragraph 

8. s/mechanism/mechanisms

No Objection (2020-05-07 for -13) Sent

I found this document to be pretty easy to read and understand, so thank you for your work in this area.

I have a few comments, that may have already been raised by other reviewers:

2. SFC Layering Model

While Figure 1 depicts an example where SFs are enabled as virtual
entities, the SFC architecture does not make any assumptions on how
the SFC data plane elements are deployed. The SFC architecture is
flexible and accommodates physical or virtual entity deployment. SFC
OAM accounts for this flexibility and accordingly it is applicable
whether SFC data plane elements are deployed directly on physical
hardware, as one or more Virtual Machines, or any combination
thereof.

Would "SF data plane elements" be more clear than "SFC data plane elements"?

3. SFC OAM Components

3. Classifier component: OAM functions applicable at this component
includes testing the validity of the classification rules and
detecting any incoherence among the rules installed in different
classifiers.

It was not entirely clear to me what is meant by different classifiers, so possibly this could be elaborated slightly.

4.3. Trace Functions

"TTL" is used in various places. Does that need to be listed in the acronyms?

6.2. OAM Packet Processing and Forwarding Semantic

Upon receiving an OAM packet, SFC-aware SFs may choose to discard the
packet if it does not support OAM functionality or if the local
policy prevents them from processing the OAM packet. When an SF
supports OAM functionality, it is desirable to process the packet and
provide an appropriate response to allow end-to-end verification. To
limit performance impact due to OAM, SFC-aware SFs should rate limit
the number of OAM packets processed.

Doesn't this mean that SFC is potentially altering the thing being measured? Wouldn't it be better instead to rate limit the number of OAM packets that are being generated in the first place?

6.1. SFC OAM Packet Marker

The SFC OAM function described in Section 4 performed at the service
layer or overlay network layer must mark the packet as an OAM packet
so that relevant nodes can differentiate an OAM packet from data
packets. The base header defined in Section 2.2 of [RFC8300] assigns
a bit to indicate OAM packets. When NSH encapsulation is used at the
service layer, the O bit must be set to differentiate the OAM packet.
Any other overlay encapsulations used at the service layer must have
a way to mark the packet as OAM packet.

"must be set" => "MUST be set" & "must have a way" => "MUST have a way"?

But I question whether these should be musts at all. E.g. by setting an OAM bit you allow the intermediate functions to potentially modify their behaviour, making it harder to know that the thing under test isn't changing its behaviour because it is being tested. E.g. could another choice be to use some reserved address space to simulate flows without requiring the packets to be explicitly marked?

7. Manageability Considerations

It would probably be useful for YANG to be listed here as well under configuration and Orchestration. RESTCONF or gNMI could potentially also be listed, although I note that this table is not intended to be exhaustive.

There is also a base YANG topology model, RFC 8345, and other extensions being defined, at least for Overlay and Underlay networks. Would they be appropriate for the topology column?

Regards,
Rob