Skip to main content

Early Review of draft-ietf-idr-sla-exchange-10
review-ietf-idr-sla-exchange-10-tsvart-early-black-2017-02-21-00

Request Review of draft-ietf-idr-sla-exchange
Requested revision No specific revision (document currently at 13)
Type Early Review
Team Transport Area Review Team (tsvart)
Deadline 2017-02-21
Requested 2017-01-31
Requested by Susan Hares
Authors Shitanshu Shah , Keyur Patel , Luis Tomotaki , Mohamed Boucadair
I-D last updated 2017-02-21
Completed reviews Rtgdir Early review of -05 by Bruno Decraene (diff)
Opsdir Early review of -10 by Joe Clarke (diff)
Rtgdir Early review of -10 by Ron Bonica (diff)
Tsvart Early review of -10 by David L. Black (diff)
Comments
Please review for the transport SLA considerations.  David Black had discussed this draft so he would be a good reviewer.  
If more questinos, ask Sue Hares. 
Assignment Reviewer David L. Black
State Completed
Request Early review on draft-ietf-idr-sla-exchange by Transport Area Review Team Assigned
Reviewed revision 10 (document currently at 13)
Result Not ready
Completed 2017-02-21
review-ietf-idr-sla-exchange-10-tsvart-early-black-2017-02-21-00
I've reviewed this document as part of the transport area directorate's ongoing
effort to review key IETF documents. These comments were written primarily for
the transport area directors, but are copied to the document's authors for
their information and to allow them to address any issues raised. When done at
the time of IETF Last Call, the authors should consider this review together
with any other last-call comments they receive. Please always CC
tsv-art@ietf.org if you reply to or forward this review.

Document: draft-ietf-idr-sla-10
Reviewer: David Black
Review Date: February 21, 2017

Review result: Not Ready

This is an early TSV-ART review of a working group draft, requested by the IDR
Working Group.

This draft defines an extension to BGP to allow exchange of traffic handling
parameters (e.g., configured rates, burst sizes, drop thresholds).  While this
is a useful area of technology to standardize across network operators, this
draft has significant problems, and parts of it could use some serious
rethought.

This reviewer has discussed small portions of this draft with some of the
authors in the past, but this is his first comprehensive reading and review of
the draft.

Major Issues:

[1] The draft is misnamed.  This is not an SLA (Service Level Agreement) draft
- it's a TCA (Traffic Conditioning Agreement) draft - see the definition of TCA
in RFC 2475.  This draft should start from that definition and generalize the
applicability of TCA beyond Diffserv.  Large areas of SLA content are not
covered by this draft - for more details, see the Wikipedia article on SLA:
https://en.wikipedia.org/wiki/Service-level_agreement .

[2] Section 3.3.2.1's reuse of the TSpec construct from RFC 2115 to specify a
token bucket is a good idea, but it's not a good idea to respecify that
construct in terms of L2 (link-layer, e.g., Ethernet) octets, as the RFC 2115
TSpec is specified in terms of IP octets.  This change to specification in
terms of L2 octets results in needing the L2_OVERHEAD TLV to cope with the
possible differences in L2 (link) framing overhead at sender and receiver of
this information.  The TSpec should be respecified at the IP layer, with L2
framing overhead left to the Producer and Consumer to factor into their
calculations based on each's direct knowledge of L2 functionality and
configuration in the local AS.  That ought to enable elimination of the
L2_OVERHEAD TLV, thereby reducing complexity.

[3] A token bucket should have two rate-related marking parameters based on its
token fill rate, i.e., min-rate, not the four rate-related marking parameters
in this draft.  The max-rate parameters in sections 3.3.2.5-6 ought to be
specified against a second token bucket.  In addition, the handling precedence
algorithm in section 3.3.2.7 is an overly complex way to specify the
relationship of two token buckets.  All of this is even more important, because
max-rate, as defined in RFC 2115, is only applicable to bursting - that
max-rate for bursting often turns out to be an interface line rate, which is
not generally useful for the traffic provisioning purposes of this draft.

The following should be done instead:
        - Define TSpecs for two token buckets, a primary/committed token bucket
                and a secondary/peak token bucket that MUST be nested, i.e.,
                traffic that is in-profile for the secondary/peak token bucket
                is always in-profile for the primary/committed token bucket. 
                Some of the details of how to specify this are subtle, see RFC
                2698 for a worked example. Use of a secondary/peak token bucket
                requires use of the primary/ committed token bucket, but a
                primary/committed token bucket can be used without a
                secondary/peak token bucket.
        - For a single token bucket, define two handling TLVs, Committed (in
        profile) and
                Excess (out of profile).
        - For two token buckets, define three handling TLVs, Committed (in
        profile for both
                token buckets), Peak (out of profile for primary/committed
                token bucket, but in profile for secondary/peak token bucket)
                and Excess (out of profile for both token buckets).
NB: Could use Green/Yellow/Red terms instead of Committed/Peak/Excess terms.

[4] The drop threshold TLV in section 3.3.2.8 is not specified sufficiently to
be implemented interoperably.  For example, I don't understand what an
implementation is supposed to do when it receives 3 drop thresholds.

[5] The relative priority TLV in section 3.3.2.9 has the same insufficient
specification problem as the drop threshold TLV, compounded by a functional
incompleteness problem - if the recipient is using a weighted packet
transmission scheduler (e.g., WRR), priorities cannot be used to configure that
scheduler.  Hence, some specification of weights and scheduling algorithms that
use weights needs to be added.

[6] I have no idea what the sub-traffic classes TLV in section 3.3.2.10 is
supposed to do, as that TLV is specified based on "Traffic Class TLVs" which is
an undefined term in this draft (e.g., that term is not used outside of section
3.3.2.10.

[7] This draft's QoS contents need to be functionally aligned with
work-in-progress on YANG QoS models, in order to provide some assurance that
that this draft is implementable for actual network switch/router data paths. 
The current acknowledgement of the existence of YANG, NETCONF and RESTConf at
the end of Section 1 does not suffice.

[8] There are significant complexity and correctness problems caused by the
option to not specify the Source AS - e.g., Section 3.2 defines an SLA ID as an
"identifier which is unique in the scope of Source AS" which is meaningless if
there is no Source AS.  It would be simpler to always specify Source AS, even
in the point-to-point case.

[9] In section 3.2, the "intended for the peer receiver of the BGP UPDATE
message" text in the specification of bit 0 of the SLA Subtype flags is
unclear.  I suspect that this is intended to differentiate the two usages
described in sections 4.1.1 (Point-to-Point) and 4.1.2 (Multiple Hops), in
which case the parenthesized terms (or similar terminology) should be used with
cross-references to those two sections.

[10] There needs to be a coherent discussion in one place about how SLA
advertisement, update and withdrawal work.  A single ADVERTISE method may
suffice on the wire, but the details on how initial advertisement, subsequent
advertisement (update) and withdrawal work need to be specified in one place. 
The third paragraph of Section 4 is a start on this material, but it's too
terse; it should be expanded into its own subsection, and moved earlier to come
before the ADVERTISE method in Section 3.2.  This text from Section 3.2 should
be moved into that new subsection and likewise expanded:

      If an advertised SLA ID is different from earlier advertised one,
      for the same prefix and from the same Source AS, indicates Source
      AS is advertising new SLA Content to replace the previous one
      advertised with the same SLA ID.

In addition, I wonder whether functionality should be added to allow withdrawal
of an advertisement by specifying its SLA ID, although that was not part of the
original design.

[11] Notions of context for interpretation of all the IPFIX parameters in 3.3.1
need to be added, e.g.:
        - The first three parameters (DSCP, MPLS EXP field in top label, 802.1q
        priority) can
                and do vary on a link-by-link or LSP-by-LSP basis along a
                traffic's network path.
        - The IP address parameters are rather likely to be VPN-specific when
        there's more
                than one BGP/MPLS VPN that spans or transits the ASs involved.
        - The transport port parameters need specification of which transport
        header and where it
                is located (e.g., for TCP traffic carried by in VXLAN, is this
                the inner TCP header or the outer UDP header in VXLAN).
There are probably simple approaches to specifying context in all cases, but
that context does need to be specified ... in all cases.

[12] The security considerations (section 10) are severely incomplete and
insufficient:

        - Discussion of possible abuse of this BGP option for denial-of-service
        and theft-of-service,
                needs to be added, including possible countermeasures and
                mitigations.

        - This sentence at the end of the second paragraph in Section 10 is
        content-free:

                   It is NOT RECOMMENDED to enable this attribute at the
                   scale of the Internet unless if means to prevent leaking
                   sensitive information are enforced.

                What exactly is an implementer or admin supposed to do?  How
                does one figure out whether an implementation or deployment is 
                "at the scale of the Internet" ??

        - The next to last paragraph in section 10 is almost content-free, as
        it leaves
                decisions on implementation and deployment of key security
                functionality as "an exercise for the reader" - that's not
                acceptable.

        - Last, but not least, the final paragraph in Section 10 is a joke that
        will
                be lost on a security directorate reviewer - to understand why,
                see Section 2 of RFC 6919, and take note of the publication
                date of RFC 6919.

------------------------

Having noted a dozen major issues, I'll end the review here for now, as I
believe a serious revision of the draft is called for, which would be a better
starting point to review for minor issues and editorial items.

--------------------------------------------------------
David L. Black, Distinguished Engineer
Dell EMC, 176 South St., Hopkinton, MA  01748
+1 (508) 293-7953     Cell: +1 (978) 394-7754
David.Black@dell.com  <=== NEW ===
--------------------------------------------------------