Telechat Review of draft-ietf-sfc-oam-framework-13
review-ietf-sfc-oam-framework-13-tsvart-telechat-brockners-2020-05-04-00

Request Review of draft-ietf-sfc-oam-framework-13
Requested rev. 13 (document currently at 15)
Type Telechat Review
Team Transport Area Review Team (tsvart)
Deadline 2020-05-05
Requested 2020-04-20
Requested by Martin Duke
Authors Sam Aldrin, Carlos Pignataro, Nagendra Nainar, Ramki Krishnan, Anoop Ghanwani
Draft last updated 2020-05-04
Completed reviews Opsdir Last Call review of -13 by Tim Chown (diff)
Secdir Last Call review of -13 by Tirumaleswar Reddy.K (diff)
Tsvart Telechat review of -13 by Frank Brockners (diff)
Intdir Telechat review of -13 by Carlos Bernardos (diff)
Comments
I would like this OAM architecture to be vetted with respect to existing OAM work in IPPM. There might be willing candidates outside TSVART that are best position to do this review.
Assignment Reviewer Frank Brockners
State Completed
Review review-ietf-sfc-oam-framework-13-tsvart-telechat-brockners-2020-05-04
Posted at https://mailarchive.ietf.org/arch/msg/tsv-art/SMMGKEGLmC8Ds0K7VmYWqFMoNCc
Reviewed rev. 13 (document currently at 15)
Review result Ready with Nits
Review completed: 2020-05-04

Review
review-ietf-sfc-oam-framework-13-tsvart-telechat-brockners-2020-05-04

This document has been reviewed as part of the transport area review team's
ongoing effort to review key IETF documents. These comments were written
primarily for the transport area directors, but are copied to the document's
authors and WG to allow them to address any issues raised and also to the IETF
discussion list for information.

When done at the time of IETF Last Call, the authors should consider this
review as part of the last-call comments they receive. Please always CC
tsv-art@ietf.org if you reply to or forward this review.


This document provides a reference framework for OAM for SFC. 

Comments:

Section 3.1.1 SF availability: The text makes explicit reference to multiple instances of a SF. Consequently, it should be defined how availability of a SF is computed/determined in case multiple instances are deployed. This further leads to the question, whether availability is always a "binary" state (available / not-available), or could a SF be e.g. 99% available?
Section 3.1.2 SF performance: What is the impact of a "multiple instance SF deployment" on SF performance measurement? The section only talks about loss and delay as performance criteria. It would be good to state that other performance criteria (e.g. specific to the SF, throughput, etc.) exist.
Section 3.2.1 SFC availability: The current definition is very focused on connectivity verification, i.e. it tries to answer the question: "Does my SFC transport packets?". IMHO we should also ask the question "Does my SFC process the packets correctly?" - because if packets are not processed per the SFC definition, we might not call the SFC available. While 3.2.2 states that "any SFC-aware network device should have the ability to make performance measurements" a similar statement isn't found in 3.2.1. IMHO the ability for availability checks is probably a prerequisite for performance measurement.
Section 3.2.2 SFC performance measurement: The section only mentions the need for performance measurement. It misses the definition of what SFC performance measurement is.
Section 3.3. Classifier component: The section mentions the need for the ability to perform performance measurement of the classifier component. 
What is performance measurement of the classifier? What does performance measurement of the classifier component comprise?
Section 3.4. / 3.5. Availability/PM of the underlay and overlay network: It would be good to add a sentence that states that the mechanisms for availability/PM which are offered by the technologies used by the overlay/underlay are used, rather than new methods specifically for SFC would be defined.
Section 4. SFC OAM Functions: It would be good, if examples in section 4 could also include more "recent" methods such as OWAMP/TWAMP (RFC4656, RFC 5357).
Section 4.4. Performance Measurement: Focus is entirely on the PM of the connectivity, rather than on the SF. How about covering PM for the SF as well?
Section 5.1 OAM Tool Gap Analysis:
 - Not sure what "NVo3 OAM" refers to. Could that be explained below the table and in section 1.2.1?
 - E-OAM needs to be detailed. Is seems that CFM (802.1ag) and not 802.3ah is refered to here.
 - "Trace" in the "Trace" column need to be extended on. Is this traceroute? Paris-Traceroute? IOAM-Loopback?
 - IPPM needs to be detailed, because IPPM is not a tool as such but an IETF WG. Does this refer to OWAMP/TWAMP/etc. as defined by IPPM? 
Section 6.4.3 IOAM: 
- The section states that IOAM "may be used to perform various SFC OAM functions as well". It would be good to expand on this statement: E.g. 
IOAM Trace-Option Type could be leveraged for SFC tracing. IOAM Direct-Export Option Type could be leveraged. 
- How would we deal with the IOAM Active Flag (draft-ietf-ippm-ioam-flags-01) when used with SFC OAM?
- The text states "In-Situ OAM could be used with O bit set": Why would IOAM be used with the overflow bit set for SFC OAM? For details on IOAM's O-bit, see section 4.4.1 in https://tools.ietf.org/html/draft-ietf-ippm-ioam-data-09. 
Section 6.4.4 SFC Traceroute: 
- This section refers to an expired draft (even calling out the fact that the draft has exipred), but also mentions that functionality is available and implemented in OpenDaylight. Consider removing the references to the expired draft and rather add references to OpenDaylight documents.
- IOAM Loopback (see draft-ietf-ippm-ioam-flags-01) could apply SFC Traceroute as well.


Detailed set of nits that I encountered while reading through the document ([x] references line number x) – hope that they are helpful in further improving the doc: 

[global] s/an SF/a SF/ -- and similarly SFC/SFF
[176] "OAM Controller" not defined
[202] Why just Virtual Machines and no containers? Suggest to make things generic and talk about virtual and physical entities.
      This comment applies throughout the document.
[216] Ethernet OAM: Add reference. Do you refer to physical layer Ethernet OAM (802.3ah) or CFM (802.1ag)?
[243] s/uses the overlay network/uses the overlay network layer/
[246] Could we add a few examples of "various overlay network technologies"? For the underlay network layer several examples are listed.
[248] What does "mostly transparent" mean?
[254] What does "tight coupling" between the link layer and the physical technology mean?
[255] Suggest to avoid terms like "popular" - popularity can change, standards stay
[256] Acronyms "POS" and "DWDM" are not defined
[274] Link start/end-points don't seem to always align with the underlay network in the diagram
[287] s/may comprise of/may consist of/
[288] s/but not shown/but is not shown/
[307] s/devices/device/
[308] What is a "controller"?
[314] s/includes/include/
[319] Add hSFC to list of acronyms in section 1.2.1
[320] Add IBN to list of acronyms in section 1.2.1
[325] s/includes/include/
[359] The function/term "controller" requires definition.
[383] s/?./?/
[398] s/get the got/got/
[461] s/devices/device/
[469] Does it have to be equal cost multipath at the service layer, or could unequal cost multipath also be an option for load-balancing?
[521] Not sure whether the overlay network establishes the service plane. Isn't it that the overlay network establishes connectivity for the SFC-related functions in the service plane?
[531] s/components/component/
[545] remove "underlay"
[595] s/devices/device/
[600] s/action/an action/
[601] Expand on "TTL or other means" (TTL also needs to be added to acronyms in 1.2.1). Is this specific to NSH? Or specific to IPv4?
[630] Mention that for "approximation of packet loss for a given SFC can be derived" to be applicable, SFC OAM packets would need to be forwarded the same as live user traffic.
[636] Is uppercase "MUST" applicable to an informational document? Especially given that RFC2119/RFC8174 is explicitly referenced by the draft.
[666] Add MPLS, TRILL to acronyms in 1.2.1
[678] s/exhaustive/exhaustive./
[705] What are "underlying layers"?
[720] Is uppercase "SHOULD" applicable to an informational document? Especially given that RFC2119/RFC8174 is explicitly referenced by the draft.
[722] Is uppercase "MAY" applicable to an informational document? Especially given that RFC2119/RFC8174 is explicitly referenced by the draft.
[754] s/packet/packets/
[755] s/to next node/to the next node/
[771] How does this requirement align with the earlier paragraph, e.g. in case a node sends an ICMP reply? It would probably make sense to scope the statement to e.g. NSH.
[806] s/function/functions/
[809] s/from relevant node/from the relevant node/
[810] s/generate ICMP/generate an ICMP/
[812] s/from last/from the last/
[830] s/perform continuity/perform the continuity/
[834] s/with relevant/with the relevant
[835] s/perform partial SFC availability./perform a partial SFC availability check./
[851] For "In-Situ OAM data fields" add a normative reference to draft-ietf-ippm-ioam-data
[905] Add "CLI" to section 1.2.1 acronyms
[920] Add a reference for NETCONF ->RFC6241