INTERNET DRAFT A Framework for Loop-free Convergence Oct 2004
Network Working Group S. Bryant
Internet Draft M. Shand
Expiration Date: Apr 2005 Cisco Systems
Oct 2004
A Framework for Loop-free Convergence
<draft-bryant-shand-lf-conv-frmwk-00.txt>
Status of this Memo
By submitting this Internet-Draft, we certify that any applicable
patent or other IPR claims of which we are aware have been
disclosed, or will be disclosed, and any of which we become aware
will be disclosed, in accordance with RFC 3668.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as
Internet-Drafts.
Internet-Drafts are draft documents valid for a maximum of six
months and may be updated, replaced, or obsoleted by other
documents at any time. It is inappropriate to use Internet-Drafts
as reference material or to cite them other than a "work in
progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/1id-abstracts.html
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html
Abstract
This draft describes mechanisms that may be used to prevent or to
suppress the formation of micro-loops when an IP or MPLS network
undergoes topology change due to failure, repair or management
action.
Conventions used in this document
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in
this document are to be interpreted as described in RFC 2119
[RFC2119].
Bryant, Shand Expires Apr 2004 [Page 1]
INTERNET DRAFT A Framework for Loop-free Convergence Oct 2004
Table of Contents
1. Introduction........................................................3
2. The Nature of Micro-loops...........................................4
3. Micro-loop Control Strategies.......................................5
4. Micro-loop Prevention...............................................5
4.1. Incremental Cost Advertisement..................................6
4.2. Single Tunnel Per Router........................................6
4.3. Distributed Tunnels.............................................8
4.4. Ordered SPFs....................................................8
4.5. Synchronised FIB Updates........................................9
5. Loop Suppression....................................................9
6. Loop mitigation....................................................10
7. Compatibility Issues...............................................11
8. IANA considerations................................................11
9. Security Considerations............................................11
10. Intellectual Property Statement...................................12
11. Full copyright statement..........................................12
12. Normative References..............................................12
13. Informative References............................................13
14. Authors' Addresses................................................13
Bryant, Shand Expires Apr 2004 [Page 2]
INTERNET DRAFT A Framework for Loop-free Convergence Oct 2004
1. Introduction
When the topology of a network changes (due to link or router
failure, recovery or management action), the routers need to
converge on a common view of the new topology. During this process,
referred to as a routing transition, packet delivery between
certain source/destination pairs may be disrupted. This occurs due
to the time it takes for the topology change to be propagated
around the network plus the time it takes each individual router to
determine and then update the forwarding information base (FIB) for
the affected destinations. During this transition, packets are lost
due to the continuing attempts to use of the failed component, and
due to forwarding loops. Forwarding loops arise due to the
inconsistent FIBs that occur as a result of the difference in time
taken by routers to execute the transition process. This is a
problem that occurs in both IP networks and MPLS networks that use
LDP [LDP] as the label switched path (LSP) signaling protocol.
The service failures caused by routing transitions are largely
hidden by higher-level protocols that retransmit the lost data.
However new Internet services are emerging which are more sensitive
to the packet disruption that occurs during a transition. To make
the transition transparent to their users, these services require a
short routing transition. Ideally, routing transitions would be
completed in zero time with no packet loss.
Regardless of how optimally the mechanisms involved have been
designed and implemented, it is inevitable that a routing
transition will take some minimum interval that is greater than
zero. This has lead to the development of a TE fast-reroute
mechanism for MPLS [MPLS-TE]. Alternative mechanisms that might be
deployed in an MPLS network and mechanisms that may be used in an
IP network are work in progress in the IETF [IPFRR]. Any repair
mechanism may however be disrupted by the formation of micro-loops
during the period between the time when the failure is announced,
and the time when all FIBs have been updated to reflect the new
topology.
The disruptive effect of micro-loops is not confined to periods
when there is a component failure. Micro-loops can, for example,
form when a component is put back into service following repair.
Micro-loops can also form as a result of a network maintenance
action such as adding a new network component, removing a network
component or modifying a link cost.
There is an emerging need for extremely reliable networks, with
fast repair. However there is little point in providing this level
of reliability without also deploying mechanisms that prevent the
Bryant, Shand Expires Apr 2004 [Page 3]
INTERNET DRAFT A Framework for Loop-free Convergence Oct 2004
disruptive effects of micro-loops which may starve the repair or
cause congestion loss as a result of looping packets.
This framework provides a summary of the mechanisms that have been
proposed to address the micro-loop issue.
2. The Nature of Micro-loops
Micro-loops form during the periods when a network is reconverging
following a topology change, and are caused by inconsistent FIBs in
the routers. Micro-loops may occur over a single link between a
pair of routers that have each other as the next hop for a prefix.
Micro-loops may also form when a cycle of routers have the next
router in the cycle as a next hop for a prefix. Cyclic micro-loops
always include at least one link with an asymmetric cost, and/or at
least two symmetric cost link cost changes.
Micro-loops have two undesirable side-effects, congestion and
repair starvation. A looping packet consumes bandwidth until it
either escapes as a result of the re-synchronization of the FIBs,
or its TTL expires. This transiently increases the traffic over a
link by as much as 128 times, and may cause the link to congest.
This congestion reduces the bandwidth available to other traffic
(which is not otherwise affected by the topology change). As a
result the "innocent" traffic using the link experiences increased
latency, and is liable to congestive packet loss.
In cases where the link or node failure has been protected by a
fast re-route repair, the inconsistency in the FIBs prevents some
traffic from reaching the failure and hence being repaired. The
repair may thus become starved of traffic and hence become
ineffective. Thus in addition to the congestive damage, the repair
is rendered ineffective by the micro-loop. Similarly, if the
topology change is the result of management action the link could
have been retained in service throughout the transition (i.e. the
link acts as its own repair path), however, if micro-loops form,
they prevent productive forwarding during the transition.
Unless otherwise controlled, micro-loops may form in any part of
the network that forwards (or in the case of a new link, will
forward) packets over a path that includes the affected topology
change. The time taken to propagate the topology change through the
network, and the non-uniform time taken by each router to calculate
the new SPT and update its FIB may significantly extend the
duration of the packet disruption caused by the micro-loops. In
some cases a packet may be subject to disruption from microloops
which occur sequentially at links along the path, thus further
extending the period of disruption beyond that required to resolve
a single loop.
Bryant, Shand Expires Apr 2004 [Page 4]
INTERNET DRAFT A Framework for Loop-free Convergence Oct 2004
3. Micro-loop Control Strategies.
Micro-loop control strategies fall into three basic classes:
1. Micro-loop prevention
2. Micro-loop suppression
3. Micro-loop mitigation
A micro-loop prevention mechanism controls the re-convergence of
network in such a way that no micro-loops form. Such a micro-loop
prevention mechanism allows the continued use of any fast repair
method until the network has converged on its new topology, and
prevents the collateral damage that occurs to other traffic for the
duration of each micro-loop. These mechanisms normally extend the
duration of the re-convergence process. In the case of a fast
re-route repair this means that the network requires the repair to
remain in place longer than would otherwise be the case. This
causes extended problems to any traffic which is NOT repaired by an
imperfect repair (as does ANY method which delays re-convergence).
When a component is returned to service, or when a network
management action has taken place, this additional delay does not
cause traffic disruption, because there is no repair involved.
However the extended delay is undesirable because it leaves the
network vulnerable to multiple failures for a longer period.
A micro-loop suppression mechanism attempts to eliminate the
collateral damage done by micro-loops to other traffic. This may be
achieved by, for example, using a packet monitoring method, which
detects that a packet is looping and drops it. Such schemes make no
attempt to productively forward the packet throughout the network
transition.
A micro-loop mitigation scheme works by converging the network in
such a way that it reduces, but does not eliminate, the formation
of micro-loops. Such schemes cannot guarantee the productive
forwarding of packets during the transition.
4. Micro-loop Prevention
Five micro-loop prevention strategies have been proposed:
o Incremental cost advertisement
o Single Tunnel
o Distributed Tunnels
o Ordered SPF
Bryant, Shand Expires Apr 2004 [Page 5]
INTERNET DRAFT A Framework for Loop-free Convergence Oct 2004
o Synchronised FIBS
4.1. Incremental Cost Advertisement
When a link fails, the cost of the link is normally changed from
its assigned metric to "infinity". However it can be proved that:
if the link cost is increased in suitable increments, and the
network is allowed to stabilize before the next cost increment is
advertised, then no micro-loops will form. Once the link cost has
been increased to a value greater than that of the lowest
alternative cost around the link, the link may be disabled without
causing a micro-loop.
This approach has the advantage that it requires no change to the
routing protocol and hence will work in any network that uses a
link-state IGP. However the method can be extremely slow,
particularly if large metrics are used. For the duration of the
transition some parts of the network continue to use the old
forwarding path, and hence use any repair mechanism for an extended
period. In the case of a failure that cannot be fully repaired,
some destinations may become unreachable for an extended period.
Where the micro-loop prevention mechanism was being used to support
a fast re-route repair the network may be vulnerable to a second
failure for the duration of the controlled re-convergence. This is
because of the difficulty of producing non-conflicting repair
paths.
Where the micro-loop prevention mechanism was being used to support
a reconfiguration of the network the extended time is of less of an
issue. In this case, because the real forwarding path is available
throughout the whole transition, there is no conflict between
concurrent change actions throughout the network.
It will be appreciated that when a link is returned to service, its
cost is reduced in small steps from "infinity" to its final cost,
thereby providing similar micro-loop prevention during a
"good-news" event.
4.2. Single Tunnel Per Router
This mechanism works by creating an overlay network using tunnels
whose path is not effected by the topology change and carrying the
traffic affected by the change in that new network. When all the
traffic is in the new, tunnel based, network, the real network is
allowed to converge on the new topology. Because all the traffic
that would be affected by the change is carried in the overlay
network no micro-loops form. When all micro-loop preventing routers
have their tunnels in place, all the routers in the network are
informed of the change in the normal way, at which point
micro-loops may form within isolated islands of non-micro-loop
Bryant, Shand Expires Apr 2004 [Page 6]
INTERNET DRAFT A Framework for Loop-free Convergence Oct 2004
preventing routers. However, only traffic entering the network via
such routers can micro-loop. All traffic entering the network via a
micro-loop preventing router will be tunneled correctly to the
nearest repairing router, including, if necessary being tunneled
via a non-micro-loop preventing router, and will not micro-loop.
When all the non-micro-loop preventing routers have converged, the
micro-loop preventing routers can change from tunneling the packets
to forwarding normally according to the new topology. This
transition can occur in any order without micro-loops forming.
When a failure is detected (or a link is withdrawn from service),
the router adjacent to the failure issues a new ("covert") routing
message announcing the topology change. This message is propagated
through the network by all routers, but is only understood by
routers capable of using one of the tunnel based micro-loop
prevention mechanisms.
Each of the micro-loop preventing routers builds a tunnel to the
closest router adjacent to the failure. They then determine which
of their traffic would transit the failure and place that traffic
in the tunnel. When all of these tunnels are in place, the failure
is then announced as normal. Because these tunnels will be
unaffected by the transition, and because the routers protecting
the link will continue the repair (or forward across the link being
withdrawn), no traffic will be disrupted by the failure. When the
network has converged these tunnels are withdrawn, allowing traffic
to be forwarded along its new "natural" path. The order of tunnel
insertion and withdrawal is not important, provided that the
tunnels are all in place before the normal announcement is issued.
This method is faster then the incremental cost method because it
completes in fewer flood-SPF-FIBupdate cycles, and more importantly
completes in bounded time.
This technique has the disadvantage that it requires traffic to be
tunneled during the transition. This is an issue in IP networks
because not all router designs are capable of high performance IP
tunneling. It is also an issue in MPLS networks because the
encapsulating router has to know the labels set that the
decapsulating router is distributing.
A further disadvantage of this method is that it requires
co-operation from all the routers within the routing domain to
fully protect the network against micro-loops. However it can be
shown that these micro-loops will be confined to contiguous groups
of routers not executing this micro-loop prevention mechanism, and
that it will only affect traffic arriving at the network through
one of those routers.
It can be shown that this mechanism also works correctly when a
link is repaired or a new link added.
Bryant, Shand Expires Apr 2004 [Page 7]
INTERNET DRAFT A Framework for Loop-free Convergence Oct 2004
When a management change to the topology is required, again exactly
the same mechanism protects against micro-looping of packets by the
micro-loop preventing routers.
4.3. Distributed Tunnels
This is similar to the single tunnel per router approach except
that all micro-loop preventing routers calculate a set of link
failure paths using the methods described in [TUNNEL].
This reduces the load on the tunnel endpoints, but the length of
time taken to calculate the repairs increases the convergence time.
This method suffers from the same disadvantages as the single
tunnel method.
4.4. Ordered SPFs
Micro loops occur when a node closer to the failed component
revises its routes to take account of the failure before a node
which is further away. By analyzing the reverse spanning tree over
which traffic is directed to the failed component, it is possible
to determine a strict ordering which ensures that nodes closer to
the root always process the failure after any nodes further away,
and hence micro loops are prevented.
When the failure has been announced, each router waits a multiple
of some time delay value. The multiple is determined by the nodes
position in the reverse spanning tree, and the delay value is
chosen to guarantee that a node can complete its processing within
this time. The convergence time may be reduced by employing a
signaling mechanism to notify the parent when all the children have
completed their processing, and hence when it was safe for the
parent to instantiate its new routes.
The property of this approach is therefore that it imposes a delay
which is bounded by the network diameter although in most cases it
will be much less.
When a link is returned to service the convergence process above is
reversed. A router first calculates the reverse spanning tree
rooted at the far end of the new link, and determines its distance
from the new link (in hops). It then waits a time that is
proportional to that distance before updating its FIB. It will be
seen that network management actions can similarly be undertaken by
treating a cost increase in a manner similar to a failure and a
cost decrease similar to a restoration.
The ordered SPF mechanism requires all nodes in the domain to
operate according to these procedures, and the presence of non
co-operating nodes can give rise to loops for any traffic which
Bryant, Shand Expires Apr 2004 [Page 8]
INTERNET DRAFT A Framework for Loop-free Convergence Oct 2004
traverses them (not just traffic which is originated through them).
Without additional mechanisms these loops could remain in place for
a significant time.
It should be noted that this method requires per router ordering,
but not per prefix ordering. A router must wait its turn to update
its FIB, but it should then update its entire FIB.
Another way of viewing the operation of this method is to realize
that there is a horizon of routers affected by the failure. Routers
beyond the horizon do not send packets via the failure. Routers at
the horizon have a neighbor that does not send packets via the
failure. It is then obvious that routers on the horizon can use
that neighbor as a loop free alternate to the destination and can
update their FIBs immediately. Once these routers have updated
their FIBs, they move over the horizon and it is their neighbors
closer to the failure that becomes the new horizon routers.
Only routers within the horizon need to change their FIBs and hence
only those routers need to delay changing their FIBs.
4.5. Synchronised FIB Updates
Micro-loops form because of the asynchronous nature of the FIB
update process during a network transition. In many router
architectures it is the time taken to update the FIB itself that is
the dominant term. One approach would be to have two FIBs and, in a
synchronized action throughout the network, to switch from the old
to the new.
This approach has a number of major issues. Firstly two complete
FIBs are needed which may create a scaling issue and secondly a
suitable network wide synchronization method is needed. However,
neither of these are insurmountable problems.
Since the FIB change synchronization will not be perfect there may
be some interval during which micro-loops form. Whether this scheme
is classified as a micro-loop prevention mechanism or a micro-loop
avoidance mechanism within this taxonomy is therefore dependent on
the degree of synchronization achieved.
5. Loop Suppression
A micro-loop suppression mechanism recognizes that a packet is
looping and drops it. One such approach would be for a router to
recognize, by some means, that it had seen the same packet before.
It is difficult to see how sufficiently reliable discrimination
could be achieved without some form of per-router signature such as
route recording. A packet recognizing approach therefore seems
infeasible.
Bryant, Shand Expires Apr 2004 [Page 9]
INTERNET DRAFT A Framework for Loop-free Convergence Oct 2004
An alternative approach would be to recognize that a packet was
looping by recognizing that it was being sent back to the place
that it had just come from. This would work for the types of loop
that form in symmetric cost networks, but would not suppress the
cyclic loops that form in asymmetric networks.
The problem with this class of micro-loop control strategies is
that whilst they prevent collateral damage they do nothing to
enhance the productive forwarding of packets during the network
transition.
6. Loop mitigation
The only known loop mitigation approach is described in [ZININ]. A
micro-loop free Next-hop safety condition is defined:
After a topology change, it is safe for router X to switch to
neighbor Y as its next-hop for a specific destination if the path
through Y satisfies both of the following criteria:
1. X considered Y as its loop-free neighbor based on the
topology before change AND
2. X considers Y as its downstream neighbor based on the
topology after change.
Based on this criteria, routers are then classified into three
classes:
Type A routers: Routers unaffected by the change and also routers
whose next hop after the change satisfies the safety criteria.
Type B routers: Routers whose new primary next-hops after the
topology change do not satisfy the safety condition, but that have
at least one other neighbor that does.
Type C routers: All other routers.
Following a topology change, Type A routers immediately change to
the new topology. Type B routers immediately change to the next hop
that satisfies the safety criteria, even though this is not the
shortest path. Type B routers continue to use this path until all
Type C routers have switched to their new next hop. Type C routers
wait for the Type B routers to switch to their intermediate (safe)
next hop, and then change to their new next hop.
Simulations indicate that this approach produces a significant
reduction in the number of links that are subject to micro-looping.
However unlike all of the micro-loop prevention methods it is only
a partial solution. In particular, micro-loops may form on any link
joining a pair of type C routers.
Bryant, Shand Expires Apr 2004 [Page 10]
INTERNET DRAFT A Framework for Loop-free Convergence Oct 2004
Although type C routers delay their FIB update, they will however
route towards the failure during the time when the type B routers
are changing, and hence will continue to productively forward
packets provided that viable repair paths exist.
A backwards compatibility issue arises with the safe-next-hop
scheme. If a router is not capable of micro-loop control, it will
not correctly delay it's FIB update. If all such routers were type
A routers this loop migration mechanism would work as it was
designed. Alternatively, if all such incapable were type C routers,
the "covert" announcement mechanism used to trigger the tunnel
based schemes could be used to cause the A and B routers to
configure themselves, with the incapable and type C routers
delaying until they received the "real" announcement.
Unfortunately, these two approaches are mutually incompatible.
It should be noted that the classification of a router as type A, B
or C is a per-destination classification. Routers update their FIBs
in three phases. A router first updates destinations for which it
is classified as type A or type B, it then updates destinations for
which it is type C, and finally it corrects the temporary next hop
used for destinations for which it is type B.
7. Compatibility Issues
Deployment of any micro-loop control mechanism is a major change to
a network. Full consideration must be given to interoperation
between routers that are capable of micro-loop control, and those
that are not. Additionally there may be a desire to limit the
complexity of micro-loop control by choosing a method based purely
on its simplicity. Any such decision must take into account that if
a more capable scheme is needed in the future, its deployment will
be complicated by interaction with the scheme previously deployed.
8. IANA considerations
There are no IANA considerations that arise from this draft.
9. Security Considerations
All micro-loop control mechanisms raise significant security issues
which must be addressed in their detailed technical description.
Bryant, Shand Expires Apr 2004 [Page 11]
INTERNET DRAFT A Framework for Loop-free Convergence Oct 2004
10. Intellectual Property Statement
The IETF takes no position regarding the validity or scope of any
Intellectual Property Rights or other rights that might be claimed
to pertain to the implementation or use of the technology described
in this document or the extent to which any license under such
rights might or might not be available; nor does it represent that
it has made any independent effort to identify any such rights.
Information on the procedures with respect to rights in RFC
documents can be found in BCP 78 and BCP 79.
Copies of IPR disclosures made to the IETF Secretariat and any
assurances of licenses to be made available, or the result of an
attempt made to obtain a general license or permission for the use
of such proprietary rights by implementers or users of this
specification can be obtained from the IETF on-line IPR repository
at http://www.ietf.org/ipr.
The IETF invites any interested party to bring to its attention any
copyrights, patents or patent applications, or other proprietary
rights that may cover technology that may be required to implement
this standard. Please address the information to the IETF at
ietf-ipr@ietf.org.
11. Full copyright statement
Copyright (C) The Internet Society (2004). This document is subject
to the rights, licenses and restrictions contained in BCP 78, and
except as set forth therein, the authors retain all their rights.
This document and the information contained herein are provided on
an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE
REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND
THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES,
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT
THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR
ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A
PARTICULAR PURPOSE.
12. Normative References
There are no normative references.
Bryant, Shand Expires Apr 2004 [Page 12]
INTERNET DRAFT A Framework for Loop-free Convergence Oct 2004
13. Informative References
Internet-drafts are works in progress available from
<http://www.ietf.org/internet-drafts/>
[IPFRR] Shand, M., "IP Fast-reroute Framework",
<draft-ietf-rtgwg-ipfrr-framework-01.txt>, June
2004, (work in progress).
[LDP] Andersson, L., Doolan, P., Feldman, N.,
Fredette, A. and B. Thomas, "LDP
Specification", RFC 3036,
January 2001.
MPLS-TE] Ping Pan, et al, "Fast Reroute Extensions to
RSVP-TE for LSP Tunnels",
<draft-ietf-mpls-rsvp-lsp-fastreroute-07.txt>,
(work in progress).
[TUNNEL] Bryant, S., Shand, M., "IP Fast Reroute using
tunnels", <draft-bryant-ipfrr-tunnels-00.txt>,
May 2004 (work in progress).
[ZININ] Zinin, A., "Analysis and Minimization of
Microloops in Link-state Routing Protocols",
<draft-zinin-microloop-analysis-00.txt>,
October 2004 (work in progress).
14. Authors' Addresses
Mike Shand
Cisco Systems,
250, Longwater,
Green Park,
Reading, RG2 6GB,
United Kingdom. Email: mshand@cisco.com
Stewart Bryant
Cisco Systems,
250, Longwater,
Green Park,
Reading, RG2 6GB,
United Kingdom. Email: stbryant@cisco.com
Bryant, Shand Expires Apr 2004 [Page 13]