Network Working Group                                        P. Francois
Internet Draft                                            O. Bonaventure
Expiration Date: January 2006                                   M. Shand
                                                              S. Previdi
                                                               S. Bryant

                                                              July 2005

            Loop-free convergence using ordered FIB updates
                   draft-francois-ordered-fib-00.txt


Status of this Memo

   By submitting this Internet-Draft, each author represents that any
   applicable patent or other IPR claims of which he or she is aware
   have been or will be disclosed, and any of which he or she becomes
   aware will be disclosed, in accordance with Section 6 of BCP 79.

   Internet Drafts are working documents of the Internet Engineering
   Task Force (IETF), its Areas, and its Working Groups. Note that other
   groups may also distribute working documents as Internet Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than a "work in progress."

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/ietf/1id-abstracts.txt

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html.


Abstract

   This draft proposes a mechanism to prevent transient loops during
   non-urgent topology changes by ordering the FIB updates on routers,
   in networks which use link state routing protocols. This mechanism
   can be used in case of graceful link shutdowns, metric changes and
   when a link is enabled. The solution can also be used in conjunction
   with an IPFRR mechanism wich turns a sudden link failure in a non-
   urgent change, by ensuring a local protection of the link. After a
   non-urgent topology change, each routers computes a rank that defines
   the time at which it can safely update its FIB. A completion message
   mechanism is also proposed in order to speed up the convergence
   process.



Francois et al.           Expires January 2006                  [Page 1]


Internet Draft             Ordered FIB Updates                July 2005


1. Introduction

   With link-state protocols [ISIS,OSPF], when the network topology
   changes, some routers need to modify their Forwarding Information
   Base (FIB) to take into account the new topology. Each topology
   change causes a convergence phase. During this phase, routers may
   transiently have inconsistent FIBs, which leads to packet loops and
   losses, even if the reachability of the destinations is not
   compromised after the topology change. When the link state change is
   a metric update and when a new link is brought up in the network,
   there is no direct loss of connectivity, but transient loops during
   the convergence phase may still cause packet loss. The same
   observation can be made in the case of a link failure. If the link is
   protected [IPFRR,MPLSFRR], a loop free convergence mechanism may also
   be used in place of the usual fast, best effort approach.

   The remainder of this document is organised as follows. We first show
   in section 2 that there is an ordering for the FIB updates when a
   single link fails, is activated or when an IGP metric changes. We
   also present an ordering for the cases of node failure and node
   activation. In section 3, we present an implementation of the
   ordering schemes. In section 4, a mechanism aimed at speeding up the
   loop-free convergence process is presented. In section 5, we present
   additional shortcuts to the ordering scheme.

2. Ordering of the FIB updates

   In this section, we briefly present an ordering of the FIB updates
   that allows to avoid transient forwarding loops in the network in the
   case of topology changes. A more precise analysis of the rerouting
   dynamics and correctness proofs of the mechanism can be found in
   [PFOB05].

2.1 Link down / metric increase

   We first consider the failure of a link or the increase of the metric
   associated with one directed link. In this case, to ensure the
   transient consistency of the forwarding tables of the routers, a
   router R should update its FIB AFTER all the routers that used R
   BEFORE the event, to reach the affected link.

   Let us show that this rule ensures the transient consistency of the
   routers.  We assume that the link X->Y goes down and is protected, or
   its metric is increased.

   Firstly, once a packet reaches a router R with an outdated FIB for
   its destination, it will only traverse routers with an outdated FIB,
   according to the rule.  It thus reaches X, whose protection will



Francois et al.           Expires January 2006                  [Page 2]


Internet Draft             Ordered FIB Updates                July 2005


   ensure that the packet reaches its destination. In the case of a
   metric increase, the packet will simply be forwarded along X->Y and
   reach its destination.

   Secondly, a packet cannot loop while being exclusively forwarded by
   routers with an updated FIB, so that any packet is proved to never
   enter a loop.

   In other words, when the ordering is respected, a packet entering the
   network that follows a path composed of updated or unaffected routers
   cannot loop. If it reaches an outdated router, it cannot reach an
   updated router anymore, so that it will not loop.

2.2 Link up / metric decrease

   In the case of link up events or metric decreases, to ensure the
   transient consistency of the forwarding tables of the routers, a
   router R should update its FIB BEFORE the routers that WILL use R to
   reach the affected link.

   Let us show that this rule ensures the transient consistency of the
   routers. We assume that the link X->Y goes up or that its metric is
   decreased.

   Firstly, when a packet reaches a router R that has updated its FIB,
   all the routers on the path from R to X have updated their FIB, so
   that the packet will reach X and be forwarded along X->Y, to finally
   reach its destination.

   Secondly, a packet could not loop between routers that have not yet
   updated their FIB, so that any packet is proved to never enter a
   loop.

2.3 Router down

   An ordering of the FIB updates can also be performed to avoid
   transient loops in the case of a router down event. There are
   multiple implementation choices to advertise the failure of a node.
   Firstly, a router being shut down can send a new link-state packet
   that describes the failure of all its links, by setting the age of
   each link to MAX_AGE. Secondly, the router can purge its own link-
   state packet. Finally, and only in IS-IS, the router can send its
   link-state packet by setting the overload-bit.  All those solutions
   will let the other routers of the network know about the failure by
   receiving one single message.

   The other routers of the network can avoid loops while adapting to
   the node failure by respecting a rule that is very similar to the one



Francois et al.           Expires January 2006                  [Page 3]


Internet Draft             Ordered FIB Updates                July 2005


   mentioned in the case of a link down event.

   A router R should update its FIB AFTER all the routers that use R to
   reach the router that is shutdown.

   Let us show that this ordering ensures the transient consistency of
   the forwarding.  We assume that the router X is shut down.

   Firstly, if a packet to a destination D reaches a router that has not
   updated its FIB and that used X to reach D, it will only traverse
   routers that have not updated their FIB, to finally reach X. X will
   then forward the packet to its nexthop for D, and the packet will
   reach its destination. This nexthop could not have X in its path(s)
   to D before the event, as the contrary would mean that there was a
   loop before the event.  Secondly, a packet that does not reach a
   router with an outdated FIB cannot loop.


2.4 Router up

   In IS-IS, when a router X is brought up, it should firstly send its
   link-state packet with an overload bit set.  After a while, each of
   its neighbours will have sent a link-state packet describing their
   links to X as being up. At that moment, X can send a link-state
   packet with an overload bit unset, so that the processing of this
   link-state packet by the other routers will make them consider all
   the links connected to X as being up, in one single FIB update.

   In OSPF, the overload-bit does not exists. An OSPF router that is
   brought up should thus wait for a while before sending its own LSA,
   or firstly send its LSA with a MAX-METRIC assigned to each of its
   links, and only send a LSA with the real metrics after a while.

   By proceeding like this, we have the opportunity to order those FIB
   updates and avoid forwarding inconsistencies during the convergence.

   The ordering of the FIB updates follows a rule that is very similar
   to the rule mentioned in the case of a link up event. A router R
   should update its FIB BEFORE all the routers that WILL use R to reach
   the router that is brought up.

   Let us show that this ordering ensures the transient consistency of
   the forwarding.  We assume that the router X comes in the network.

   Firstly, when a packet with destination D reaches a router R that has
   already updated its FIB, and that uses X to reach D, the packet will
   reach X as it will traverse routers that have already updated their
   FIB. X will then forward the packet to its nexthop for destination D.



Francois et al.           Expires January 2006                  [Page 4]


Internet Draft             Ordered FIB Updates                July 2005


   The path from this nexthop to D cannot contain X, so that the packet
   is proved to be consistently forwarded to D.  Secondly, a packet that
   exclusively traverses routers with an outdated FIB cannot loop.



3. Implementation of the ordering

   In this section, we show how the proposed ordering can be applied by
   routers that have to adapt to a topology change.

3.1 Link down / metric increase

   To respect the proposed ordering, routers can compute a rank that
   will be used to determine the time at which they can perform their
   FIB update. In the case of the failure of link X->Y or an increase of
   the metric of link X->Y, router R will compute the reverse Shortest
   Path Tree (rSPT) of Y. This rSPT gives the shortest paths to reach Y.
   The branch of the reverse SPT that is below R corresponds to the set
   of shortest paths to R that are used by the routers that reach X->Y
   via R.

   The rank of router R is defined as the depth (in number of hops) of
   this branch.  To avoid confusion in the case of ECM paths, we can say
   that the the maximum depth must be taken into account.

   Router R will update its FIB at time : T0 + rank * MAX_FIB where T0
   is the arrival time of the link-state packet containing the topology
   change and MAX_FIB a network-wide constant that reflects the maximum
   time required to update a FIB. The value of MAX_FIB is implementation
   specific and is out of the scope of this document.  This value must
   be agreed by all the routers of the network. This agreement can be
   done by using a capability TLV as defined in [SLFTV].

   All the routers that use R to reach X->Y will compute a lower rank
   than R, so that the order will be respected.  It should be noted that
   only the routers that used X->Y before the event have to compute
   their rank.


3.2 Link up / metric decrease

   In the case of a link up or a link metric decrease affecting link
   X->Y, a router R must have a rank that is higher than the rank of the
   routers that it will use to reach X->Y, according to the rule
   mentioned in 2.2. The rank of R is thus equal to the number of hops
   between R and X in its Shortest Path Tree. When R has multiple equal
   cost paths to X, the rank is the length of the longest path in hops



Francois et al.           Expires January 2006                  [Page 5]


Internet Draft             Ordered FIB Updates                July 2005


   to X.

   Router R will update its FIB at time : T0 + rank * MAX_FIB where T0
   is the arrival time of the link-state packet containing the topology
   change and MAX_FIB a network-wide constant that reflects the maximum
   time required to update a FIB.

   It should be noted that only the routers that use X->Y after the
   event have to compute a rank, i.e. only the routers that have X->Y in
   their SPT after the link-state change.

   3.3 Router down

   When a router X is shutdown, the rank that has to be applied by a
   router R is equal to the depth of the branch under R in rSPT(X). In
   the case of ECM paths to R, the maximum of the length must be taken
   into account.  Note that X will only be shutdown once all the routers
   have updated their FIB.  The rank that is computed by X thus gives
   the time at which X can be effectively shut down.


   3.4 Router up

   When a router X comes in the network, the rank that has to be applied
   by a router R is equal to the maximum length of its ECM paths to X,
   in its updated Shortest Path Tree. Note that X will be the first to
   update its FIB, as it will have a rank equal to 0. This translates
   the fact that X must have built its FIB before any router can use it
   to forward packets.

4. Completion messages

   As noted in the previous sections, only the routers that are
   currently using a link affected by a topology change need to update
   their FIB. The FIB of all the other routers does not need to be
   updated at all. Furthermore, the routers that need to update their
   FIB do not necessarily need MAX_FIB seconds to perform their FIB
   update [CCRJULY].

   In this section, we show how completion messages can be used to speed
   up the convergence after a non-urgent topology change by shortcuting
   the computed rank while respecting the transient consistency of the
   routers.

   To do this, we adapt the mechanism proposed in [INFOCOM].  Routers
   maintain a waiting list, that lists the neighbours from which they
   have to wait for a completion message before being allowed to update
   their own FIB. Upon reception of a completion message from a



Francois et al.           Expires January 2006                  [Page 6]


Internet Draft             Ordered FIB Updates                July 2005


   neighbour, a router removes this neighbour from its waiting list.
   Once its waiting list becomes empty, the router is allowed to update
   its FIB.  The next sections explain how the waiting list must be
   built for each type of event, and when completion messages are sent
   by the routers.


4.1 Link down / metric increase

   Let us assume that the link X->Y goes down, or its metric is
   increased.  A router R that currently has X->Y in its SPT computes
   rSPT(Y) to determine its rank.  By doing this, it computes the set of
   routers (and thus implicitly the set of its neighbours) that use it
   to reach link X->Y. The waiting list of R is equal to the set of
   neighbours that use it to reach X->Y. These are the neighbours below
   R in rSPT(Y).

   The current SPT of R gives the set of neighbours that it uses to
   reach X->Y.  These neighbours have to wait for R, and R is thus
   present in their waiting list. This set of neighbours is the set of
   neighbours to which R will send its completion messages, once it has
   updated its FIB.


4.2 Link up / metric decrease

   Let us assume that the link X->Y goes up, or its metric is updated.
   A router R will compute its new SPT (SPT_new(R)). If X->Y is in
   present in SPT_new(R), R will use link X->Y and thus needs to compute
   its rank.  For this, R computes the set of routers (and implicitly
   its set of neighbours) that it uses to reach X->Y. The nexthops for X
   in SPT_new(R).  R must reroute after those routers to respect the
   ordering. Its waiting list is thus equal to those nexthops.

   When R has updated its FIB, it can send a completion message to its
   neighbours, so that a neighbour that has R in its waiting list will
   be allowed to remove R from it.

4.3 Router down

   Let us assume that a router X goes down. A router R will compute
   rSPT(X) to determine its rank.  When doing this, it also computes the
   set of its neighbours that uses R to reach X. The waiting list of R
   is equal to this set.

   The current SPT of R gives the set of neighbours that will have R in
   their waiting list.  Once R has updated its FIB, it will send its
   completion messages to those neighbours.



Francois et al.           Expires January 2006                  [Page 7]


Internet Draft             Ordered FIB Updates                July 2005


4.4 Router up

   Let us assume that a router X comes in the network. A router R will
   compute its new SPT.  It thus also computes its nexthops for X. This
   set of nexthops (more than one in the case of equal cost paths to X)
   is the waiting list that R will use.

   Once R has updated its FIB, it sends a completion message to its
   neighbours, so that a neighbour that has R in its waiting list will
   be allowed to remove R from it.


5. Ranking Shortcuts.

   In the case of a link (X->Y) down or metric increase, a router R can
   sometimes shortcut its rank because it can decide, when computing the
   rSPT, which neighbours are not affected by the event. These are the
   neighbours that did not use the link X->Y, so that their paths to all
   the destinations do not change.

   Let us assume that link X->Y goes down. While R computes rSPT(Y), it
   can mark the routers that use the link X->Y. These are the routers
   that are below X->Y in the rSPT. If a neighbour N of R is not marked,
   all the FIB updates that will let packets be forwarded to N by R can
   be performed directly.  Note that if N uses Y->X, R can still reroute
   to N because R and N are not affected for the same set of
   destinations, so that the destinations affected in R, using X->Y, are
   never affected in a router N using Y->X.

   If a FIB update contains an ECMP entry for a destination d, and not
   all the nexthops for d were not marked, then the normal ranking can
   be applied for this destination, or the FIB update will have to be
   modified in order to only reroute to the unmarked nexthops.  In this
   case, a new FIB update will have to be performed for d when the rank
   timer has elapsed or all the necessary completion messages have
   arrived.  The choice between these two solutions is a trade-off
   between an increased number of FIB updates and a faster convergence.


References

   [OSPF]    J. Moy, OSPF Version 2. RFC 2328. Apr. 1998.

   [IS-IS]   "Intermediate system to Intermediate system routeing
             information exchange protocol for use in conjunction
             with the Protocol for providing the Connectionless-mode
             Network Service (ISO 8473)," ISO/IEC 10589:2002,
             Second Edition.



Francois et al.           Expires January 2006                  [Page 8]


Internet Draft             Ordered FIB Updates                July 2005


   [IPFRR]   M. Shand, S. Bryant, IP Fast Reroute Framework,
             draft-ietf-rtgwg-ipfrr-framework-03.txt, June 2005

   [MPLSFRR] Pan, P. et al, "Fast Reroute Extensions to RSVP-TE for LSP
             Tunnels", RFC 4090

   [PFOB05]  P. Francois, O. Bonaventure. Avoiding transient loops
             during IGP convergence. IEEE INFOCOM 2005, March 2005,
             Miami, Fl., USA

   [CCRJULY] P. Francois, C. Filsfils, J. Evans, O. Bonaventure.
             Achieving sub-second IGP convergence in large IP networks,
             ACM SIGCOMM Computer Communication Review, July 2005

   [SLFTV]   A. Atlas, S. Bryant, M. Shand, Synchronization of Loop Free
             Timer Values, draft-atlas-bryant-shand-lf-timers-00.txt



Appendix A : Pseudo-codes

   Figure 1 shows the pseudocode for the router reaction in case of non-
   urgent link failure or metric increase.

   Pseudo-code for link (X->Y) down/metric increase in router R :

   If(X->Y in SPT(R)){
     //R is affected by the event
     //Compute rspt(Y)
     rspt = rSPT(Y);

     //compute the rank from rspt (can also be done directly
     //during rspt computation)

     worst_case_update_delay = depth(R,rspt) * MAX_FIB

     //Build the waiting list : these are the neighbours below R in rspt
     //
     WaitingList = getNeighboursUpstream(R,rspt);
     //Build the list of neighbours to which R will send a
     //completion message :
     //these are the nexthops for X in the current SPT of R
     completionMessageSendingList = nexthops(X,SPT(R));

     //Wait for the waiting list to be empty or the rank time
     //to elapse.
     while(not WaitingList.empty()
           || worst_case_update_delay.elapsed()){



Francois et al.           Expires January 2006                  [Page 9]


Internet Draft             Ordered FIB Updates                July 2005


       if(completionMessageReceived()){
         //New completion message received
         WaitingList.remove(completionMessage.sender);
       }
     }
     //Rank time has elapsed or waiting list is empty
     UpdateFIB();
     //FIB has been updated, send completion messages
     ForEach(N in completionMessageSendingList){
     send(N,"FIB Updated for X->Y");
     }
   }

   else{
   //R is not affected by the event, nothing to do
   }
            Figure 1 : Pseudocode for link down/metric increase

   Figure 2 provides the pseudocode for the reaction of a router to a
   non-urgent link up or metric decrease event.


   Pseudo-code for link (X->Y) up/metric decrease in router R :

   //Compute the new SPT.
   newSPT = computeSPT(R)
   If(X->Y in newSPT){
     //R is affected by the event
     //compute the rank of the router. This is the length (in hops)
     //from R to X in the new spt. (note that it is equal to the
     //length from R to X in the old spt.
     worst_case_update_delay = PathLength(R,X,newSPT) * MAX_FIB

     //Build the waiting list : these are the nexthops for X in the new SPT.
     WaitingList = nexthops(R,X,newSPT);

     //Wait for the waiting list to be empty or the rank time to elapse.
     while(not WaitingList.empty() || worst_case_update_delay.elapsed()){
       if(completionMessageReceived()){
         //New completion message received
         WaitingList.remove(completionMessage.sender);
       }
     }
     //Rank time has elapsed or waiting list is empty
     UpdateFIB();
     //Fib updated, send completion messages to the neighbours.
     ForEach(N in Neighbours(R))
     Send(N,"Fib Updated X->Y");



Francois et al.           Expires January 2006                 [Page 10]


Internet Draft             Ordered FIB Updates                July 2005


   }

   else{
   //R is not affected by the event, nothing to do
   }
             Figure 2 : Pseudocode for link up/metric decrease

Authors Addresses

   Pierre Francois
   Universite catholique de Louvain
   Place Sainte Barbe, 2
   B-1348, Louvain-La-Neuve
   Belgium
   Email: pfr@info.ucl.ac.be

   Olivier Bonaventure
   Universite catholique de Louvain
   Place Ste Barbe, 2
   B-1348, Louvain-La-Neuve
   Belgium
   Email: Bonaventure@info.ucl.ac.be

   Mike Shand
   Cisco Systems,
   250, Longwater Avenue,
   Green Park,
   Reading, RG2 6GB,
   United Kingdom
   Email: mshand@cisco.com

   Stefano Previdi
   Cisco Systems
   Via Del Serafico 200
   00142 Roma - Italy
   Email : sprevidi@cisco.com


   Stewart Bryant
   Cisco Systems,
   250, Longwater Avenue,
   Green Park,
   Reading, RG2 6GB,
   United Kingdom
   Email: stbryant@cisco.com






Francois et al.           Expires January 2006                 [Page 11]


Internet Draft             Ordered FIB Updates                July 2005


Intellectual Property Considerations

   The IETF takes no position regarding the validity or scope of any
   Intellectual Property Rights or other rights that might be claimed to
   pertain to the implementation or use of the technology described in
   this document or the extent to which any license under such rights
   might or might not be available; nor does it represent that it has
   made any independent effort to identify any such rights. Information
   on the procedures with respect to rights in RFC documents can be
   found in BCP 78 and BCP 79.

   Copies of IPR disclosures made to the IETF Secretariat and any
   assurances of licenses to be made available, or the result of an
   attempt made to obtain a general license or permission for the use of
   such proprietary rights by implementers or users of this
   specification can be obtained from the IETF on-line IPR repository at
   http://www.ietf.org/ipr.

   The IETF invites any interested party to bring to its attention any
   copyrights, patents or patent applications, or other proprietary
   rights that may cover technology that may be required to implement
   this standard. Please address the information to the IETF at ietf-
   ipr@ietf.org.

Full Copyright Statement

   Copyright (C) The Internet Society (2005).  This document is subject
   to the rights, licenses and restrictions contained in BCP 78, and
   except as set forth therein, the authors retain all their rights.

   This document and the information contained herein are provided on an
   "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
   OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET
   ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED,
   INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE
   INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
   WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.














Francois et al.           Expires January 2006                 [Page 12]