Skip to main content

IS-IS Distributed Flooding Reduction
draft-ietf-lsr-distoptflood-07

Document Type Active Internet-Draft (lsr WG)
Authors Russ White , Shraddha Hegde , Tony Przygienda , Luay Jalil , Daniel Voyer
Last updated 2024-10-19
Replaces draft-white-lsr-distoptflood
RFC stream Internet Engineering Task Force (IETF)
Intended RFC status (None)
Formats
Additional resources Mailing list discussion
Stream WG state WG Document
Document shepherd (None)
IESG IESG state I-D Exists
Consensus boilerplate Unknown
Telechat date (None)
Responsible AD (None)
Send notices to (None)
draft-ietf-lsr-distoptflood-07
Network Working Group                                           R. White
Internet-Draft                                                    Akamai
Intended status: Standards Track                                S. Hegde
Expires: 21 April 2025                                     T. Przygienda
                                                        Juniper Networks
                                                                L. Jalil
                                                                 Verizon
                                                                D. Voyer
                                                             Bell Canada
                                                         18 October 2024

                  IS-IS Distributed Flooding Reduction
                     draft-ietf-lsr-distoptflood-07

Abstract

   In dense topologies (such as data center fabrics based on the Clos
   and butterfly though not limited to those; in fact any large topology
   or one with relatively high degree of connectivity qualifies here)
   IGP flooding mechanisms designed originally for rather sparse
   topologies can "overflood", or in other words generate too many
   identical copies of same information arriving at a given node from
   other devices.  This normally results in longer convergence times and
   higher resource utilization to process and discard the superfluous
   copies.  Flooding algorithm extensions that restrict the amount of
   flooding performed can be constructed and can reduce resource
   utilization significantly, while improving convergence performance.

   One such flooding modification (based on previous art) optimized for
   operational considerations, described further in Section 2, is
   described in this document.

Status of This Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at https://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

White, et al.             Expires 21 April 2025                 [Page 1]
Internet-Draft    IS-IS Distributed Flooding Reduction      October 2024

   This Internet-Draft will expire on 21 April 2025.

Copyright Notice

   Copyright (c) 2024 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents (https://trustee.ietf.org/
   license-info) in effect on the date of publication of this document.
   Please review these documents carefully, as they describe your rights
   and restrictions with respect to this document.  Code Components
   extracted from this document must include Revised BSD License text as
   described in Section 4.e of the Trust Legal Provisions and are
   provided without warranty as described in the Revised BSD License.

Table of Contents

   1.  MANET-Based, Load-Balancing Flooding Extension  . . . . . . .   2
     1.1.  Empirical Evidence of Correctness and Efficiency
           Improvement . . . . . . . . . . . . . . . . . . . . . . .   3
     1.2.  Flooding Modifications  . . . . . . . . . . . . . . . . .   3
       1.2.1.  Example Network . . . . . . . . . . . . . . . . . . .   3
       1.2.2.  Optimizing Flooding . . . . . . . . . . . . . . . . .   4
       1.2.3.  Optimization Process Details  . . . . . . . . . . . .   6
       1.2.4.  Flooding Failures . . . . . . . . . . . . . . . . . .   8
       1.2.5.  Flooding Example  . . . . . . . . . . . . . . . . . .   8
       1.2.6.  A Note on Performance . . . . . . . . . . . . . . . .   9
   2.  Operational Considerations  . . . . . . . . . . . . . . . . .   9
   3.  Security Considerations . . . . . . . . . . . . . . . . . . .  11
   4.  IANA Section  . . . . . . . . . . . . . . . . . . . . . . . .  11
   5.  Contributors  . . . . . . . . . . . . . . . . . . . . . . . .  11
   6.  Reference Hash Implementation . . . . . . . . . . . . . . . .  11
   7.  Normative References  . . . . . . . . . . . . . . . . . . . .  12
   8.  Informative References  . . . . . . . . . . . . . . . . . . .  12
   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  12

1.  MANET-Based, Load-Balancing Flooding Extension

   The following section describes a distributed algorithm similar to
   and based on those implemented in OSPF to support mobile ad-hoc
   networks, as described in [RFC5449],[RFC5614].  These solutions have
   been widely implemented and deployed.

White, et al.             Expires 21 April 2025                 [Page 2]
Internet-Draft    IS-IS Distributed Flooding Reduction      October 2024

1.1.  Empirical Evidence of Correctness and Efficiency Improvement

   Laboratory tests based on a well known open source codebase show that
   modifications similar to the algorithm presented here reduce flooding
   in a large scale emulated butterfly network topology significantly.
   Under unmodified flooding procedures intermediate systems receive, on
   average, 40 copies of any changed LSP fragment in a 2'500 nodes
   butterfly network.  With the changes described in this document said
   systems received, on average, two copies of any changed LSP fragment.
   In many cases, only a single copy of each changed LSP was received
   and processed per node.  In terms of performance, overall convergence
   times were cut in roughly half.  Other topologies under
   experimentation in CLOS networks using another implementation show
   similar performance and simulations of the extension indicate
   significant reductions in flooding volumes.

   An early version of mechanisms described here has been implemented in
   the FR Routing open source routing stack as part of `fabricd` daemon
   and the described modification has been implement by commercial
   vendors.

1.2.  Flooding Modifications

   This section describes detailed modifications to the IS-IS flooding
   process to reduce the full topology to a dominating connected set of
   links used for flooding.  It does at the same time balance the
   remaining flooding across all links in the topology to prevent hot-
   spots.

1.2.1.  Example Network

   Following spine and leaf fabric will be used in further description
   of the introduced modifications.

White, et al.             Expires 21 April 2025                 [Page 3]
Internet-Draft    IS-IS Distributed Flooding Reduction      October 2024

   +====+ +====+ +====+ +====+ +====+ +====+
   | 1A | | 1B | | 1C | | 1D | | 1E | | 1F | (T0)
   +====+ +====+ +====+ +====+ +====+ +====+

   +====+ +====+ +====+ +====+ +====+ +====+
   | 2A | | 2B | | 2C | | 2D | | 2E | | 2F | (T1)
   +====+ +====+ +====+ +====+ +====+ +====+

   +====+ +====+ +====+ +====+ +====+ +====+
   | 3A | | 3B | | 3C | | 3D | | 3E | | 3F | (T2)
   +====+ +====+ +====+ +====+ +====+ +====+

   +====+ +====+ +====+ +====+ +====+ +====+
   | 4A | | 4B | | 4C | | 4D | | 4E | | 4F | (T1)
   +====+ +====+ +====+ +====+ +====+ +====+

   +====+ +====+ +====+ +====+ +====+ +====+
   | 5A | | 5B | | 5C | | 5D | | 5E | | 5F | (T0)
   +====+ +====+ +====+ +====+ +====+ +====+

                                  Figure 1

   The above picture does not contain the connections between devices
   for readability purposes.  The reader should assume that each device
   in a given layer is connected to every device in the layer above it
   in a butterfly network fashion.  For instance:

   *  5A is connected to 4A, 4B, 4C, 4D, 4E, and 4F

   *  5B is connected to 4A, 4B, 4C, 4D, 4E, and 4F

   *  4A is connected to 3A, 3B, 3C, 3D, 3E, 3F, 5A, 5B, 5C, 5D, 5E, and
      5F

   *  4B is connected to 3A, 3B, 3C, 3D, 3E, 3F, 5A, 5B, 5C, 5D, 5E, and
      5F

   *  etc.

   The tiers or stages of the fabric are marked for easier reference.
   Alternate representation of this topology is a "folded Clos" with T2
   being the "top of the fabric" and T0 representing the leaves.

1.2.2.  Optimizing Flooding

   The simplest way to conceive of the solution presented here is in two
   stages:

White, et al.             Expires 21 April 2025                 [Page 4]
Internet-Draft    IS-IS Distributed Flooding Reduction      October 2024

   *  Stage 1: Forward Optimization

      -  Find the group of intermediate systems that will all flood to
         the same set of neighbors as the local IS

      -  Decide (deterministically) which subset of the intermediate
         systems within this group should re-flood any received LSPs

   *  Stage 2: Reverse Optimization

      -  Find neighbors on the shortest path towards the origin of the
         change

      -  Do not flood towards these neighbors

   The first stage is best explained through an illustration.  In the
   network above, if 5A transmits a modified Link State Protocol Data
   Unit (LSP) to 4A-4F, each of 4A-4F nodes will, in turn, flood this
   modified LSP to 3A (for instance).  With this, 3A will receive 6
   copies of the modified LSP, while only one copy is necessary for the
   intermediate systems shown to converge on the same view of the
   topology.  If 4A-4F could determine that all of them will all flood
   identical copies of the modified LSP to 3A, it would be possible for
   all of them except one to decide not to flood the changed LSP to 3A.

   The technique used in this draft to determine such flooding group is
   for each intermediate system to calculate a special SPT (shortest-
   path spanning tree) from the point of view of the transmitting
   neighbor.  As next step, by setting the metric of all links to 1 and
   truncating the SPT to two hops, the local IS can find the group of
   neighbors it will flood any changed LSP towards and the set of
   intermediate systems (not necessarily neighbors) which will also
   flood to this same set of neighbors.  If every intermediate system in
   the flooding set performs this same calculation, they will all obtain
   the same flooding group.

   Once such a flooding group is determined, the members of the flooding
   group will each (independently) choose which of the members should
   re-flood the received information.  A common hash function is used
   across a set of shared variables so each member of the group comes to
   the same conclusion as to the designated flooding nodes.  The group
   member which is in such a way `selected` to flood the changed LSP
   does so normally; the remaining group members suppress the flooding
   of the LSP initially.

   Each IS calculates the special, truncated SPT separately, and
   determines which IS should flood any changed LSPs independently based
   on a common hash function.  Because these calculations are performed

White, et al.             Expires 21 April 2025                 [Page 5]
Internet-Draft    IS-IS Distributed Flooding Reduction      October 2024

   using a shared view of the network, however (based on the common link
   state database) and such a shared hash function, each member of the
   flooding group will make the same decision under converged
   conditions.  In the transitory state of nodes having potentially
   different view of topologies the flooding may either overflood or in
   worse case not flood enough for which we introduce a 'quick-patching'
   mechanism later but ultimately will converge due to periodic CSNP
   origination per normal protocol operation.

   The second stage is simpler, consisting of a single rule: do not
   flood modified LSPs along the shortest path towards the origin of the
   modified LSP (with one notable exception).  This rule relies on the
   observation that any IS between the origin of the modified LSP and
   the local IS should receive the modified LSP from some other IS
   closer to the source of the modified LSP.  It is worth to observe
   that if all the nodes that should be designated to flood within a
   peer group are pruned by the second stage the receiving node is at
   the `tail-end` of the flooding chain and no further flooding will be
   necessary.  Also, per normal protocol procedures flooding to the node
   from which the LSP has been received will not be performed.

1.2.3.  Optimization Process Details

   This section provides normative description of the specification.
   Any node implementing this solution MUST exhibit external behavior
   that conforms to the algorithms provided.

   Each intermediate system will determine whether it should re-flood
   LSPs as described below.  When a modified LSP arrives from a
   Transmitting Neighbor (TN), the result of the following algorithm
   obtains the necessary flooding decision unless TN is originator of
   the LSP and the LSP is older than an existing copy in the LSDB (this
   exception allows for fast flushing of LSPs retained in the network on
   TN's restart):

   Step 1: Build the Two-Hop List (THL) and Remote Neighbor's List (RNL)
   of nodes:

   A)  Set all link metrics to 1

   B)  Calculate an SPT truncated to 2 hops from the perspective of TN

   C)  For each IS that is two hops away (has a metric of two in the
       truncated SPT) from TN:

       i.    If the IS is the LSP originator, skip

       ii.   If the IS is a neighbor of the LSP originator, skip

White, et al.             Expires 21 April 2025                 [Page 6]
Internet-Draft    IS-IS Distributed Flooding Reduction      October 2024

       iii.  If the IS is on the shortest path from the TN towards
             towards the originator of the modified LSP, skip

       iv.   If the IS is *not* on the shortest path from the TN towards
             the originator of the modified LSP, add it to THL

   D)  Add each IS that is one hop away from TN to the RNL

   Step 2: Sort nodes in RNL by system IDs, from the least value to the
   greatest.

   Step 3: Take the initial value of the fragment ID and shift it by 4
   bits to the right.  Calculate a number H by adding shifted fragment
   ID to the each byte of system ID, starting from the highest network
   order byte, for all bytes XOR each with the current value and rotate
   after each byte the current value to the left by 4 bits.  Section 6
   provides reference implementation.

   The shifting of the fragment ID will put 16 sequent fragments onto
   the same flooding tree to minimize re-ordering and the subsequent
   XOR'ing and rotating of the system ID will maximize the amount of
   entropy obtained for a good hash value.  RNum is the number of nodes
   in the RNL.  Consequently, set N to the H MOD of RNum (N=H MOD RNum).
   With that N will be less than the number of members of RNL.
   (footnote 1: this allows for some balancing of LSPs coming from same
   system ID).

   Step 4: Starting with the Nth member of RNL: where N is the index
   into the members in RNL, with index starting from zero (Index zero
   assigned to the IS with lowest system-id):

   A)  If THL is empty or the walk wrapped around, move to Step 5

   B)  If this member of RNL is the local calculating IS, it MUST re-
       flood the modified LSP to at least the remaining members in the
       THL it is adjacent to; move to Step 5

   C)  If this member of the RNL signals capability to or running
       another flood reduction extension or signals that it runs a
       different version of this extension move to the next member of
       RNL

   D)  Remove all members of THL connected to (adjacent to) this member
       of RNL

   E)  Move to the next member of RNL, wrapping to the beginning of RNL
       if necessary

White, et al.             Expires 21 April 2025                 [Page 7]
Internet-Draft    IS-IS Distributed Flooding Reduction      October 2024

   Note 1: This description is leaning towards clarity rather than
   optimal performance when implemented.

   Note 2: An implementation in a node MAY choose independently of
   others to provide a configurable parameter to allow for more than one
   node in RNL to re-flood, e.g. it may re-flood even if it's only the
   member that would be chosen from the RNL if a double coverage of THL
   is required.  The modifications to the algorithm are simple enough to
   not require further text.

1.2.4.  Flooding Failures

   It is possible that during initial convergence or in some failure
   modes the flooding will be incomplete due to the optimizations
   outlined.  Specifically, if a re-flooder fails, or is somehow
   disconnected from all the links across which it should be re-
   flooding, an LSP could be only partially distributed through the
   topology.  To speed up convergence under such partition failures
   (observe that periodic CSNPs will under any circumstances converge
   the topology though at a slower pace), an intermediate system which
   does not re-flood a specific LSP (or fragment) SHOULD:

   A)  Set a short, configurable timer which should be significantly
       shorter than CSNP interval used.

   B)  When the timer expires, send Partial Sequence Number Packet
       (PSNP) of all LSPs that have *not* been re-flooded during the
       timer runtime to all neighbors unless an up-to-date PSNP or CSNP
       has been already received from the neighbor.

   C)  Per normal protocol procedures process any Partial Sequence
       Number Packets (PSNPs) received that indicate that neighbors
       still have older versions of the LSP will lead to the usual
       synchronization of the databases that are out of sync due to
       optimized flooding.

   D)  If such resynchronizations above a configurable threshold are
       required (i.e.  PSNPs are sent to the neighbors and are answered
       with requests), an implementation SHOULD notify the network
       operator via the according mechanism about the condition.

1.2.5.  Flooding Example

   Assume, in the network specified, that 5A floods some modified LSP
   towards 4A-4F and we only use a single node to re-flood.  To
   determine whether 4A should flood this LSP to 3A-3F:

White, et al.             Expires 21 April 2025                 [Page 8]
Internet-Draft    IS-IS Distributed Flooding Reduction      October 2024

   *  5A is TN; 4A calculates a truncated SPT from 5A's perspective with
      all link metrics set to 1

   *  4A builds THL, which contains 3A, 3B, 3C, 3D, 3E, 3F, 5B, 5C, 5D,
      5E and 5F

   *  4A builds RNL, which contains 4A,4B,4C,4D,4E and 4F, sorting it by
      the system ID

   *  4A computes hash on the received LSP-ID to get N; assume N is 1 in
      this case

   *  Since 4A is the 1st member of RNL and there are members in THL, 4A
      must re-flood; the loop exits

1.2.6.  A Note on Performance

   The calculations described here seem complex, which might lead the
   reader to conclude that the cost of calculation is so much higher
   than the cost of flooding that this optimization is counter-
   productive.  First, The description provided here is designed for
   clarity rather than optimal calculation.  Second, many of the
   involved calculations can be easily performed in advance and stored,
   rather than being performed for each LSP occurrence and each
   neighbor.  Optimized versions of the process described here have been
   implemented, and do result in strong convergence speed gains.

2.  Operational Considerations

   The extension introduced to flooding in this document exhibits per se
   many desirable properties important for large production IS-IS
   networks.  The critical function of the IGP makes their flag day
   migration close to impossible and any significant outage, i.e. non-
   negligible blast radius, introduces a significant operational risk to
   any service relying on the relevant IGP backbone.  Such outages are
   caused often by configuration changes or issuance of unintended
   commands, especially since those networks tend to be deployed for
   tens of years under changing personnel and varying degree of
   competence.  Moreover, due to geographic and co-location realities in
   many cases the network topology changes its properties and solution
   that is resilient to such properties presents a lower risk than one
   that can misfunction on node failures in certain topological
   configurations.  Obviously it is desirable to detect easily which
   nodes are using the feature and allow for simple fixing of defects
   without disturbing the topology significantly.

White, et al.             Expires 21 April 2025                 [Page 9]
Internet-Draft    IS-IS Distributed Flooding Reduction      October 2024

   The solution proposed in this draft pays particular attention to
   those realities and operational requirements.  It is designed to be
   deployable without initial configuration, allows node by node
   introduction and removal of the feature into the network, balances
   the reduced flooding not only on a single or few trees but uses the
   information about the origin of the fragment to spread the load
   across whole topology.  All of those properties are guaranteed to
   work while asserting minimal blast radius on introduction of the
   feature and also changes of any node or link.  Deployment of this
   extension presents the same risk no matter the specific properties of
   the underlying topology.

   To simplify deployment and make the feature detectable on nodes
   deploying it a node _running_ the algorithm SHOULD advertise a Sub-
   TLV of the IS-IS Router Capability TLV-242 carrying the currently
   active version of the algorithm.

      0                   1                   2                   3
      0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |     Type      |     Length    |    Version      |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

                                  Figure 2

   *  Type: TBD1

   *  Length: 1

   *  Version: version of the algorithm active on the node.

   A change to the extension presented in this document that does not
   guarantee backwards compatibility and advertises this sub-TLV MUST
   reserve and use a new version number.

   Additionally, a node deploying the presented algorithm on point-to-
   point links MUST send CSNPs on such links.  This does not represent a
   dramatic change given most deployed implementations today already
   exhibit this behavior to prevent possible slow synchronization of IS-
   IS database across such links and to provide additional periodic
   consistency guarantees.

White, et al.             Expires 21 April 2025                [Page 10]
Internet-Draft    IS-IS Distributed Flooding Reduction      October 2024

3.  Security Considerations

   This document outlines flooding algorithm modification to the IS-IS
   protocol for operation most useful at large scale or in high density
   network topologies.  The extension does not present any new attack
   vectors even if nodes start to advertise a byzantine attack of
   signalling that they run the extension while still following standard
   behavior.  As always, ISIS implementations SHOULD implement IS-IS
   cryptographic authentication, as described in [RFC5304], and should
   enable other security measures in accordance to best common practices
   for the IS-IS protocol.

4.  IANA Section

   IANA is requested to set up a registry called "IGP MANET Flooding
   Extension Version" under the existing "Interior Gateway Protocol
   (IGP) Parameters" IANA registry.

   Values in this registry come from the range 0 .. 255.

   The following values are defined:

   *  0-255: Reserved with values representing version of this
      extension.

5.  Contributors

   The following people have contributed to this draft or provided
   valuable comments and are mentioned without any particular order:
   Abhishek Kumar, Nikos Triantafillis, Ivan Pepelnjak, Christian
   Franke, Hannes Gredler, Les Ginsberg, Naiming Shen, Uma Chunduri,
   Nick Russo, Tony Li and Rodny Molina.  Acee Lindem pointed out a
   particularly interesting corner case where the optimization provided
   by the algorithm should be omitted.

6.  Reference Hash Implementation

White, et al.             Expires 21 April 2025                [Page 11]
Internet-Draft    IS-IS Distributed Flooding Reduction      October 2024

      fn balancing_hash(systemid: [u8; 6], fragment_nr: u8) -> u64 {
          let h = ((fragment_nr & 0xff) >> 4) as _;

          systemid.iter().fold(h, | prev_value, byte | {
          (prev_value ^ (*byte as u64)).rotate_left(4)
          })
      }

      for (hash, expected) in [
      ( balancing_hash([1,2,3,4,5,6], 000), 19088736u64 ),
      ( balancing_hash([1,2,3,4,5,6], 015), 19088736),
      ( balancing_hash([1,2,3,4,5,7], 015), 19088752),
      ( balancing_hash([6,5,4,3,2,1], 254), 156512784),
      ( balancing_hash([6,5,4,3,2,1], 253), 156512784),
      ].iter() {
          assert_eq!(hash, expected);
      }

                                  Figure 3

7.  Normative References

8.  Informative References

   [RFC5304]  Li, T. and R. Atkinson, "IS-IS Cryptographic
              Authentication", RFC 5304, DOI 10.17487/RFC5304, October
              2008, <https://www.rfc-editor.org/info/rfc5304>.

   [RFC5449]  Baccelli, E., Jacquet, P., Nguyen, D., and T. Clausen,
              "OSPF Multipoint Relay (MPR) Extension for Ad Hoc
              Networks", RFC 5449, DOI 10.17487/RFC5449, February 2009,
              <https://www.rfc-editor.org/info/rfc5449>.

   [RFC5614]  Ogier, R. and P. Spagnolo, "Mobile Ad Hoc Network (MANET)
              Extension of OSPF Using Connected Dominating Set (CDS)
              Flooding", RFC 5614, DOI 10.17487/RFC5614, August 2009,
              <https://www.rfc-editor.org/info/rfc5614>.

   [RFC8126]  Cotton, M., Leiba, B., and T. Narten, "Guidelines for
              Writing an IANA Considerations Section in RFCs", BCP 26,
              RFC 8126, DOI 10.17487/RFC8126, June 2017,
              <https://www.rfc-editor.org/info/rfc8126>.

Authors' Addresses

   Russ White
   Akamai
   Email: russ@riw.us

White, et al.             Expires 21 April 2025                [Page 12]
Internet-Draft    IS-IS Distributed Flooding Reduction      October 2024

   Shraddha Hegde
   Juniper Networks
   Email: shraddha@juniper.net

   Tony Przygienda
   Juniper Networks
   Email: prz@juniper.net

   Luay Jalil
   Verizon
   Email: luay.jalil@verizon.com

   Daniel Voyer
   Bell Canada
   Email: daniel.voyer@bell.ca

White, et al.             Expires 21 April 2025                [Page 13]