Skip to main content

Early Review of draft-bashandy-rtgwg-segment-routing-ti-lfa-00
review-bashandy-rtgwg-segment-routing-ti-lfa-00-rtgdir-early-bryant-2017-05-31-00

Request Review of draft-bashandy-rtgwg-segment-routing-ti-lfa
Requested revision No specific revision (document currently at 05)
Type Early Review
Team Routing Area Directorate (rtgdir)
Deadline 2017-06-15
Requested 2017-05-25
Requested by Min Ye
Authors Ahmed Bashandy , Clarence Filsfils , Bruno Decraene , Stephane Litkowski , Pierre Francois , Daniel Voyer , Francois Clad , Pablo Camarillo
I-D last updated 2017-05-31
Completed reviews Rtgdir Early review of -00 by Stewart Bryant (diff)
Assignment Reviewer Stewart Bryant
State Completed
Request Early review on draft-bashandy-rtgwg-segment-routing-ti-lfa by Routing Area Directorate Assigned
Reviewed revision 00 (document currently at 05)
Result Has issues
Completed 2017-05-31
review-bashandy-rtgwg-segment-routing-ti-lfa-00-rtgdir-early-bryant-2017-05-31-00
These review comments were incorrectly posted against the uloop draft,
apologies for any confustion.

I have been asked to perform an early review of this document on behalf of the
Routing Directorate.

Summary:

A document on this subject is something that the WG should publish, but I think
that there are number of issues that the WG need to discuss and reach consensus
on before deciding whether or not they should adopt this draft as a starting
point for that work.

Major Issues:

Before I get into the substance I am surprised that there are no IPR
disclosures. In an earlier and related work
(draft-francois-segment-routing-ti-lfa-00) there were three IPR disclosures.

The work has four basic components, the concept of resolving the problem of P
and Q being non-adjacent, the use of SR to solve the non-adjacency, the use of
the post convergence path following failure and the applicability of these
techniques to an SR network. The first and second points seem of utility in
non-SR networks, and so I am surprised that they are not called out as such, in
the first case perhaps with consideration to strategically places RSVP tunnels,
or binding segments.

The issue of mapping repair path to the post convergence path to the something
that has always concerned me in this concept. It is true that traffic that
always passes through the PLR will experience the properties the authors
describe, but not all traffic will pass through the PLR post convergence. The
post failure path will be topology dependent, and may take a different path
from the point of ingress.

I am also concerned that the authors do not discuss the need for loop free
convergence, since although traffic going through the repair path will be
loop-free, traffic arriving at the PLR might not be. Consider for example a
topology fragment that looks like a clock with a router at each minute. Traffic
enters at 9 o'clock, leave at 3 o'clock and goes via 12 o'clock and 12 o'clock
fails.  The routers 9..12 will re-converge at different times and this may give
rise to the micro-looping of traffic trying to get to the PLR. A summary of the
problem and a pointer to the companion draft may be sufficient.

Finally on the basic concept it would be good to state up from whether the
proposal is constrained solely to SR networks, or whether the authors believe
that the concept is of wider applicability. It see no reason why it would be
constrained to only work on SR networks.

There is no discussion of multiple failures, nor as far as I can see of
failures that are worse than anticipated. This is an important point that needs
to be established early. Some methods, (MRT) intrinsically address multiple
failures, others (NV) intrinsically exclude them. Simple LFA needs a supervisor
to quickly abandon all hope when they occur.

In an SR network the paths used are not the shortest paths, they are a
collection of shortest paths, so there needs to be some discussion on the
interaction between the SR paths and repair paths to consider whether it is
unconditionally safe against forwarding loops. It would presumably be so if the
authors borrowed the concept of repair addresses rather than normal forwarding
addresses from not-via, but I don't think they have done this.

There should also be some discussion on the original path constraints that are
applicable to the repair. Presumably the ingress node constrained the traffic
to go though failed node F for a reason. If the repair is unconstrained that
reason could be violated, but this is not discussed in the text.

In the Security section you say:

   The behavior described in this document is internal functionality
   to a router that result in the ability to guarantee an upper bound
   on the time taken to restore traffic flow upon the failure of a
   directly connected link or node. As such no additional security
   risk is introduced by using the mechanisms proposed in this
   document.

SB> I am not sure that the above is correct. There may be a security reason
SB> why a packet was steered along a path which breaks when you use this
SB> technique.

In the conclusion you say:

   The
   mechanism is able to calculate the backup path irrespective of the
   topology as long as the topology is sufficiently redundant.

SB> That is certainly true in classic. I am not sure this is universally
SB> true under SR which includes the use of non-shortest path and
SB> binding segments.

Minor issues:

   For each destination in the network, TI-LFA prepares a data-plane
   switch-over to be activated upon detection of the failure of a
   link used to reach the destination.

SB> To make the scaling clearer to the reader, I think you need
SB> to make it clear that for each protected link, you determine
SB> the repair needed to reach every destination reachable over that
SB> link. You sort of say that, but it's a bit hidden.

   We provide the TI-LFA approach that achieves guaranteed coverage
   against link, node, and local SRLG failure, in any IGP network,
   relying on the flexibility of SR.

SB> Should that be any SINGLE link.... failure?

In the text (and the text that follows)

   To do so, S applies a "NEXT" operation on Adj(S-F) and then two
   consecutive "PUSH" operations: first it pushes a node segment for F,
   and then it pushes a protection list allowing to reach F while
   bypassing S-F.

You need to reference the SR operations.

Also you are considering Adj segments, and presumably they were there
for a reason, but you do not discuss that.

In 5.3.1 and 5.3.2 you have a list of conditions, but do not make it clear
whether any or all must be true.

Nits

1. Introduction

   Segment Routing aims at supporting services with tight SLA
   guarantees [1]. This document provides a local repair mechanism
   relying on SR-capable of restoring end-to-end connectivity in the
   case of a sudden failure of a network component.

SB> Grammar needs a little work in the last sentence.

In Fig 1, I assume that the blobs are network fragments.

In the conclusion you say:
   This document proposes a mechanism that is able to pre-calculate a
   backup path for every primary path so as to be able to protect
   against the failure of a directly connected link or node.
SB> you need to add SRLG