Skip to main content

Last Call Review of draft-ietf-pals-endpoint-fast-protection-04

Request Review of draft-ietf-pals-endpoint-fast-protection
Requested revision No specific revision (document currently at 05)
Type Last Call Review
Team Transport Area Review Team (tsvart)
Deadline 2016-12-06
Requested 2016-11-22
Authors Yimin Shen , Rahul Aggarwal , Wim Henderickx , Yuanlong Jiang
I-D last updated 2016-12-05
Completed reviews Genart Last Call review of -04 by Dale R. Worley (diff)
Secdir Last Call review of -04 by Chris M. Lonvick (diff)
Opsdir Last Call review of -04 by Susan Hares (diff)
Rtgdir Early review of -00 by Mach Chen (diff)
Rtgdir Early review of -00 by John Drake (diff)
Tsvart Last Call review of -04 by David L. Black (diff)
Assignment Reviewer David L. Black
State Completed
Request Last Call review on draft-ietf-pals-endpoint-fast-protection by Transport Area Review Team Assigned
Reviewed revision 04 (document currently at 05)
Result Ready w/issues
Completed 2016-12-05
I've reviewed this document as part of TSV-ART's ongoing effort to review key
IETF documents. These comments were written primarily for the transport area
directors, but are copied to the document's authors for their information and
to allow them to address any issues raised. When done at the time of IETF Last
Call, the authors should consider this review together with any other last-call
comments they receive. Please always CC if you reply to or
forward this review.

This draft specifies local pseudowire (PW) repair mechanisms to quickly react
to PW egress failures by rerouting traffic around the failure until
slower-to-react repair mechanisms at larger scope are able to effect longer
term repairs, e.g., via network topology changes.

-- TSV-ART review comments:

I found a couple of minor transport-related issues, both of which should be
resolvable with modest amounts of additional explanation:

* ECMP: The ECMP discussion in Section 4.1 on Applicability takes a
conservative approach to avoiding packet reordering by recommending (SHOULD)
that the entire ECMP set be rerouted as part of local repair.  It's not clear
what sort of ECMP is involved, as that acronym is used without a reference (or
even expansion), so I'd suggest citing a reference.   If the ECMP used is
flow-aware so that reordering across ECMP branches within an ECMP set does not
cause reordering within any of the flows involved, then it ought to be safe
from a reordering perspective to reroute an ECMP branch or set of branches that
are less than the full ECMP set, although such partial rerouting could cause
potentially undesirable forwarding latency differences within the ECMP set. 
This ought to be discussed, as situations in which rerouting the entire ECMP
bundle is overly conservative seem likely to arise in practice.

* Traffic Engineering: Considering the intended speed of local repair, "order
of tens of milliseconds" in the abstract, the bandwidth used by the repair
paths has to be provisioned in advance of any failure that causes repair path
usage - traffic engineering is a likely means of provisioning that bandwidth. 
I see "TE domain," "TE metric" and "TE path," which I assume refer to Traffic
Engineering, but that TE acronym is not expanded, and I did not find text
requiring traffic engineering and/or advance (bandwidth) provisioning of repair
paths.  I assume that this advance bandwidth provisioning of repair paths is
intended as part of local repair, as not doing that invites immediate repair
path failure due to lack of forwarding resources, which is definitely not
desired.  A sentence or two ought to be added to point this bandwidth
provisioning requirement out, possibly in Section 4.1 (Applicability).  Adding
that text would also reinforce the conclusion in the Security Considerations
section that local repair reroutes are not a security threat, as the new text
would add the rationale that local repair reroutes are anticipated and planned
for by the network operator's traffic engineering.

--  Other comments:

* Having found two acronyms that were not expanded, I'd suggest a general look
for such acronyms.   OTOH, this is an area of network technology where many
acronyms are in common use, and hence expansion of every acronym on first use
may be excessive - among the ways of avoiding this could be citation of a
reference at the start of Section 3 where commonly used PW terms and acronyms
are defined.