Skip to main content

An Architecture for IP/LDP Fast Reroute Using Maximally Redundant Trees (MRT-FRR)
draft-ietf-rtgwg-mrt-frr-architecture-10

Yes

(Alvaro Retana)

No Objection

(Alissa Cooper)
(Barry Leiba)
(Ben Campbell)
(Deborah Brungard)
(Spencer Dawkins)

Recuse

(Alia Atlas)

Note: This ballot was opened for revision 09 and is now closed.

Alvaro Retana Former IESG member
Yes
Yes (for -09) Unknown

                            
Alissa Cooper Former IESG member
No Objection
No Objection (for -09) Unknown

                            
Barry Leiba Former IESG member
No Objection
No Objection (for -09) Unknown

                            
Ben Campbell Former IESG member
No Objection
No Objection (for -09) Unknown

                            
Benoît Claise Former IESG member
No Objection
No Objection (2016-02-04 for -09) Unknown
We're lucky to have two OPS DIR reviews for this document.

From Fred Baker:

In my view, although I have concerns as I am about to state, I consider the draft to be ready for IESG review and potential publication as an RFC at Proposed Standard. I have no specific issues I would like to see addressed, nor do I believe the technology or draft to be fundamentally flawed.


Speaking in general terms, this draft describes a solution for the problem posed in RFC 5714, which is to say a solution for fast reroute in a network whose routing is implemented using IS-IS and LDP. It is not the only possible solution. In terms of graph theory, we might define a "connected graph" as a set of "nodes" and a set of "links" that interconnect them, such that every node is connected via some sequence of links and nodes to each of the other nodes in the connected graph. The Maximally Redundant Tree model seeks to divide the connected graph into two or more connected sub-graphs, each of which connects the same set of nodes, but using sets of interconnecting links whose intersection set is null, or is at least minimized. In the event that a link in one connected sub-graph fails, the network can continue to use another connected sub-graph to guide routing during the outage.

There are obvious degenerate cases, in which the sets of links in sub-graphs are forced to overlap to some degree, or some nodes are not found in all sub-graphs. Part of the architecture is designed to identify those cases (which might occur, for example, in the presence of multiple simultaneous failures, or when the network is inherently deficient for reasons unrelated to and perhaps in violation of the mathematics) and handle them as best it can.

As one might imagine, this is not trivial. My first comment on reading the architecture (and on reading the algorithm, which is a separate document) is that the algorithm is complex, and therefore (like anything that is complex) prone to errors and failures of various kinds, and potentially has failure modes that have not yet been detected. This is not to be considered as a strike against it, but a point of caution; the operator using the approach wants to ensure that s/he has the tools necessary to monitor network health, and to quickly discover and correct errors if and when they occur. The algorithm draft contains several proofs of correctness for various parts or in various cases, and refers to papers containing such proofs, with the intent of minimizing the inherent risk. That said, to my knowledge there is not a global proof of correctness, as there is for example in the Shortest Path First algorithm or other algorithms used in the network. The risk is therefore not zero.

From the perspective of the IETF, that is precisely the reason a protocol like this should be used operationally at the Proposed Standard level, updated as needed, and ultimately re-released as an Internet Standard when the algorithm and implementations have been operationally proven.


With that introduction, the first question in my mind is whether the description is such that two implementors are likely to be able to implement interoperable implementations, or whether ambiguities or lack of clarity would prevent that. This draft identifies two proprietary prototype implementations, by Huawei and Juniper, which if they are interoperable would address the question to a considerable degree. The draft does not, however, describe interoperability testing between them, which at least suggests that this might be yet future. On this score, given the complexity of the design, I personally would be greatly comforted by a test report along the lines of RFC 1246. Since such tests usually find text that needs tweaking, I might suggest that the publication at RFC be delayed until such testing can be performed and the lessons learned, whatever they are, incorporated in the documents. Failing that, experience leads me to believe that there will be subsequent documents that update or obsolete these.

The corollary question in my mind is whether an operator reading the architecture will be able to figure out how to effectively use it. On this score, I give the draft a thumbs-up. It is well written, the various issues are raised and dealt with, and the ramifications are in my view clear.



Now the review from Nevil Brownlee:

This is a long draft, presenting the MRT-FRR architecture, and exploring
in some detail the design alternatives that were possible during that
process.

There are many acronyms used throughout the draft, that will work well

for routers familiar with Routing in general, and MPLS in particular.
Others will find it useful to keep a browser window at hand!  For me,
PLR (Point of Local Repair) was new.

In section 11.1, the equations that test whether a path is loop-free
for nodes S and F use D_opt() as an abbreviation for Distance_opt()
[RFC 5286] - I understand the authors wish to get these equations onto
single lines, but the phrase "where D_opt() means Distance_opt()" would
be helpful.

Throughout the draft the phrase "protocol extensions to .. will be
defined elsewhere" appears, similarly the IANA Considerations section
defines an MPLS Multi-Topology Identifiers Registry, but says that
codepoints in it will be defined elsewhere.  Clearly this draft is
the first in what will become a cluster of RFCs.

On the Operations side of things, section 1.2 notes that "MRT-FRR
supports partial deployment."  That will allow Operators to deploy
it in stages (one MRT Island at a time?).

Further, several sections consider the possibility of "link-protecting
alternates causing route looping," it seems that MRT-FRR should remain
loop-free.

Section 13, Implementation Status [to be removed by the RFC Editor],
demonstrates that at least two implementations exist, clearly that has
helped the authors to work through the design decisions I commented on
above.

Section 14, Operational Considerations, works through the most important
of the decisions an Operator will need to make if they plan to implement
MRT-FRR - this seems very useful.

Overall, the draft is well-written and easy to read (apart from its
high acronym density), I believe it is ready for publication as an RFC.
Brian Haberman Former IESG member
No Objection
No Objection (2016-02-03 for -09) Unknown
The IANA Considerations section creates a new registry for the MRT Profiles. It allocates "Values 221-255 are for vendor private use." Are there limitations/guidance on how vendors use this range? Should Section 8,14 or 17 say something about dealing with these ranges in operational networks?
Deborah Brungard Former IESG member
No Objection
No Objection (for -09) Unknown

                            
Joel Jaeggli Former IESG member
No Objection
No Objection (2016-02-04 for -09) Unknown
Nevil Brownlee performed the opsdir review
Spencer Dawkins Former IESG member
No Objection
No Objection (for -09) Unknown

                            
Stephen Farrell Former IESG member
No Objection
No Objection (2016-02-04 for -09) Unknown
- abstract: "IP/LDP" is a bit ambiguous - it could be read as
"IP over LDP," "IP and LDP" or "IP or LDP" not all of which
make sense I guess:-) Be better if the abstract said "IP and
LDP" I think.

- section 3: the definitions are very dense - would re-ordering
them help maybe? Not sure myself, but maybe think about
it.

- Section 8: with so many parameters, why choose an 8-bit
profile ID? That seems to be a bit short-sighted maybe? 

- 12.2: The sequence presented here has the look of something
that might be a potential DoS vector, but maybe the
computation isn't that significant, not sure.  Did you
consider a potential attack where the bad actor takes down
then re-instates one or more links so as to increase the
computation to the max? (Perhaps you did and the max is still
not a big deal.)

- As noted by the secdir review [1] this is highly acronym
laden. I'd encourage an editing pass to try reduce that where
possible. I'll also bet a beer that not every acronym used is
either expanded or else present in the list of "well known"
acronyms (which is of course pretty outdated). 

   [1] https://www.ietf.org/mail-archive/web/secdir/current/msg06344.html
Alia Atlas Former IESG member
Recuse
Recuse (for -09) Unknown