[Search] [pdf|bibtex] [Tracker] [Email] [Diff1] [Diff2] [Nits]

Versions: 00 01 02 03 04 rfc2992                           Informational
Internet Engineering Task Force                       Christian E. Hopps
INTERNET-DRAFT                                             Merit Network
Expires July 2000                                        7 February 2000

             Analysis of an Equal-Cost Multi-Path Algorithm

Status of this Memo

   This document is an Internet-Draft and is in full conformance with
   all provisions of Section 10 of RFC2026.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as Internet-

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet- Drafts as reference
   material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at

   The list of Internet-Draft Shadow Directories can be accessed at

Copyright Notice

   Copyright (C) The Internet Society (2000).  All Rights Reserved.

Expires July 2000                                              [Page 1]

Draft                 Analysis of an ECMP Algorithm        February 2000


   Equal-cost multi-path (ECMP) is a routing technique for routing
   packets along multiple paths of equal cost.  The forwarding engine
   identifies paths by next-hop.  When forwarding a packet the router
   must decide which next-hop (path) to use. This document gives an
   analysis of one method for making that decision.  The analysis
   includes the performance of the algorithm and the disruption caused
   by changes to the set of next-hops.

1.  Hash-Threshold

   One method for determining which next-hop to use when routing with
   ECMP can be called hash-threshold.  The router first selects a key by
   performing a hash (e.g., CRC16) over the packet header fields that
   identify a flow. The N next-hops have been assigned unique regions in
   the key space. The router uses the key to determine which region and
   thus which next-hop to use.

   As an example of hash-threshold, upon receiving a packet the router
   performs a CRC16 on the packet's header fields that define the flow
   (e.g., the source and destination fields of the packet), this is the
   key.  Say for this destination there are 4 next-hops to choose from.
   Each next-hop is assigned a region in 16 bit space (the key space).
   For equal usage the router may have chosen to divide it up evenly so
   each region is 65536/4 or 16k large.  The next-hop is chosen by
   determining which region contains the key (i.e., the CRC result).

2.  Analysis

   There are a few concerns when choosing an algorithm for deciding
   which next-hop to use. One is performance, the computational
   requirements to run the algorithm.  Another is disruption (i.e., the
   changing of which path a flow uses).  Balancing is a third concern;
   however, since the algorithm's balancing characteristics are directly
   related to the chosen hash function this analysis does not treat this
   concern in depth.

   For this analysis we will assume regions of equal size.  If the
   output of the hash function is uniformly distributed the distribution
   of flows amongst paths will also be uniform, and so the algorithm
   will properly implement ECMP.  One can implement non-equal-cost
   multi-path routing by using regions of unequal size; however, non-

Expires July 2000                                              [Page 2]

Draft                 Analysis of an ECMP Algorithm        February 2000

   equal-cost multi-path routing is outside the scope of this document.

2.1.  Performance

   The performance of the hash-threshold algorithm can be broken down
   into three parts: selection of regions for the next-hops, obtaining
   the key and comparing the key to the regions to decide which next-hop
   to use.

   The algorithm doesn't specify the hash function used to obtain the
   key.  Its performance in this area will be exactly the performance of
   the hash function.  It is presumed that if this calculation proves to
   be a concern it can be done in hardware parallel to other operations
   that need to complete before deciding which next-hop to use.

   Since regions are restricted to be of equal size the calculation of
   region boundaries is trivial.  Each boundary is exactly regionsize
   away from the previous boundary starting from 0 for the first region.
   As we will show, for equal sized regions, we don't need to store the
   boundary values.

   To choose the next-hop we must determine which region contains the
   key.  Because the regions are of equal size determining which region
   contains the key is a simple division operation.

                regionsize = keyspace.size / #{nexthops}
                region = key / regionsize;

   Thus the time required to find the next-hop is dependent on the way
   the next-hops are organized in memory.  The obvious use of an array
   indexed by region yields O(1).

2.2.  Disruption

   Protocols such as TCP perform better if the path they flow along does
   not change while the stream is connected.  Disruption is the
   measurement of how many flows have their paths changed due to some
   change in the router.  We measure disruption as the fraction of total
   flows whose path changes in response to some change in the router.
   This can become important if one or more of the paths is flapping.
   For a description of disruption and how it affects protocols such as

Expires July 2000                                              [Page 3]

Draft                 Analysis of an ECMP Algorithm        February 2000

   TCP see [1].

   Some algorithms such as round-robin (i.e., upon receiving a packet
   the least recently used next-hop is chosen) are disruptive regardless
   of any change in the router.  Clearly this is not the case with hash-
   threshold.  As long as the region boundaries remain unchanged the
   same next-hop will be chosen for a given flow.

   Because we have required regions to be equal in size the only reason
   for a change in region boundaries is the addition or removal of a
   next-hop.  In this case the regions must all grow or shrink to fill
   the key space.  The analysis begins with some examples of this.

             |   1   |   2   |   3   |   4   |   5   |
             |    1    |    2    |    4    |    5    |

              Figure 1. Before and after deletion of region 3

   In figure 1. region 3 has been deleted.  The remaining regions grow
   equally and shift to compensate.  In this case 1/4 of region 2 is now
   in region 1, 1/2 (2/4) of region 3 is in region 2, 1/2 of region 3 is
   in region 4 and 1/4 of region 4 is in region 5.  Since each of the
   original regions represent 1/5 of the flows, the total disruption is
   1/5*(1/4 + 1/2 + 1/2 + 1/4) or 3/10.

   Note that the disruption to flows when adding a region is equivalent
   to that of removing a region.  That is, we are considering the
   fraction of total flows that changes regions when moving from N to
   N-1 regions, and that same fraction of flows will change when moving
   from N-1 to N regions.

Expires July 2000                                              [Page 4]

Draft                 Analysis of an ECMP Algorithm        February 2000

             |   1   |   2   |   3   |   4   |   5   |
             |    1    |    2    |    3    |    5    |

              Figure 2. Before and after deletion of region 4

   In figure 2. region 4 has been deleted.  Again the remaining regions
   grow equally and shift to compensate.  1/4 of region 2 is now in
   region 1, 1/2 of region 3 is in region 2, 3/4 of region 4 is in
   region 3 and 1/4 of region 4 is in region 5.  Since each of the
   original regions represent 1/5 of the flows the, total disruption is

   To generalize, upon removing a region K the remaining N-1 regions
   grow to fill the 1/N space.  This growth is evenly divided between
   the N-1 regions and so the change in size for each region is
   1/N/(N-1) or 1/(N(N-1)).  This change in size causes non-end regions
   to move.  The first region grows and so the second region is shifted
   towards K by the change in size of the first region. 1/(N(N-1)) of
   the flows from region 2 are subsumed by the change in region 1's
   size. 2/(N(N-1)) of the flows in region 3 are subsumed by region 2.
   This is because region 2 has shifted by 1/(N(N-1)) and grown by
   1/(N(N-1)).  This continues from both ends until you reach the
   regions that bordered K.  The calculation for the number of flows
   subsumed from the Kth region into the bordering regions accounts for
   the removal of the Kth region.  Thus we have the following equation.

                           K-1              N
                           ---    i        ---  (i-K)
             disruption =  \     ---    +  \     ---
                           /   (N)(N-1)    /   (N)(N-1)
                           ---             ---
                           i=1            i=K+1

   We can factor 1/((N)(N-1)) out as it is constant.

Expires July 2000                                              [Page 5]

Draft                 Analysis of an ECMP Algorithm        February 2000

                                /  K-1         N        \
                          1     |  ---        ---       |
                     =   ---    |  \    i  +  \   (i-K) |
                       (N)(N-1) |  /          /         |
                                \  ---        ---       /
                                     1        i=K+1

   We now use the the concrete formulas for the sum of integers.  The
   first summation is (K)(K-1)/2.  For the second summation notice that
   we are summing the integers from 1 to N-K, thus it is (N-K)(N-K+1)/2.

                             (K-1)(K) + (N-K)(N-K+1)
                           = -----------------------

   Considering the summations, one can see that the least disruption is
   when K is as close to half way between 1 and N as possible.  This can
   be proven by finding the minimum of the concrete formula for K
   holding N constant.  First break apart the quantities and collect.

                            2K*K - 2K - 2NK + N*N + N
                          = -------------------------

                             K*K - K - NK      N + 1
                          = --------------  + -------
                               (N)(N-1)        2(N-1)

   Since we are minimizing for K the right side (N+1)/2(N-1) is constant
   as is the denominator (N)(N-1) so we can drop them.  To minimize we
   take the derivative.

Expires July 2000                                              [Page 6]

Draft                 Analysis of an ECMP Algorithm        February 2000

                             -- (K*K - (N+1)K)

                             = 2K - (N+1)

   Which is zero when K is (N+1)/2.

   The last thing to consider is that K must be an integer.  When N is
   odd (N+1)/2 will yield an integer, however when N is even (N+1)/2
   yields an integer + 1/2.  In the case, because of symmetry, we get
   the least disruption when K is N/2 or N/2 + 1.

   Now since the formula is quadratic with a global minimum half way
   between 1 and N the maximum possible disruption must occur when edge
   regions (1 and N) are removed.  If K is 1 or N the formula reduces to

   The minimum possible disruption is obtained by letting K=(N+1)/2.  In
   this case the formula reduces to 1/4 + 1/(4*N).  So the range of
   possible disruption is (1/4, 1/2].

   To minimize disruption we recommend adding new regions to the center
   rather than the ends.

3.  Comparison to other algorithms

   Other algorithms exist to decide which next-hop to use.  These
   algorithms all have different performance and disruptive
   characteristics.  Of these algorithms we will only consider ones that
   are not disruptive by design (i.e., if no change to the set of next-
   hops occurs the path a flow takes remains the same).  This will
   exclude round-robin and random choice.  We will look at modulo-N and
   highest random weight.

   Modulo-N is a "simpler" form of hash-threshold.  Given N next-hops
   the packet header fields which describe the flow are run through a
   hash function. A final modulo-N is applied to the output of the hash.
   This result then directly maps to one of the next-hops.  Modulo-N is
   the most disruptive of the algorithms; if a next-hop is added or
   removed the disruption is (N-1)/N.  The performance of Modulo-N is

Expires July 2000                                              [Page 7]

Draft                 Analysis of an ECMP Algorithm        February 2000

   equivalent to hash-threshold.

   Highest random weight (HRW) is a comparative method similar in some
   ways to hash-threshold with non-fixed sized regions.  For each next-
   hop, the router seeds a pseudo-random number generator with the
   packet header fields which describe the flow and the next-hop to
   obtain a weight.  The next-hop which receives the highest weight is
   selected.  The advantage with using HRW is that it has minimal
   disruption (i.e., disruption due to adding or removing a next-hop is
   always 1/N.)  The disadvantage with HRW is that the next-hop
   selection is more expensive than hash-threshold.  A description of
   HRW along with comparisons to other methods can be found in [2].
   Although not used for next-hop calculation an example usage of HRW
   can be found in [3].

   Since each of modulo-N, hash-threshold and HRW require a hash on the
   packet header fields which define a flow, we can factor the
   performance of the hash out of the comparison.  If the hash can not
   be done inexpensively (e.g., in hardware) it too must be considered
   when using any of the above methods.

   The lookup performance for hash-threshold, like modulo-N is an
   optimal O(1). HRW's lookup performance is O(N).

   Disruptive behavior is the opposite of performance.  HRW is best with
   1/N.  Hash-threshold is between 1/4 and 1/2.  Finally Modulo-N is

   If the complexity of HRW's next-hop selection process is acceptable
   we think it should be considered as an alternative to hash-threshold.
   This could be the case when, for example, per-flow state is kept and
   thus the next-hop choice is made infrequently.

   However, when HRW's next-hop selection is seen as too expensive the
   obvious choice is hash-threshold as it performs as well as modulo-N
   and is less disruptive.

4.  Security Considerations

   This document is an analysis of an algorithm used to implement an
   ECMP routing decision.  This analysis does not directly affect the
   security of the Internet Infrastructure.

Expires July 2000                                              [Page 8]

Draft                 Analysis of an ECMP Algorithm        February 2000

5.  References

[1]  Thaler, D., and Hopps, C., "Multipath Issues in Unicast and
     Multicast", Work in progress, June 1999.

[2]  Thaler, D., and C.V. Ravishankar, "Using Name-Based Mappings to
     Increase Hit Rates", IEEE/ACM Transactions on Networking, February

[3]  Estrin, D., Farinacci, D., Helmy, A., Thaler, D., Deering, S.,
     Handley, M., Jacobson, V., Liu, C., Sharma, P., and L. Wei,
     "Protocol Independent Multicast-Sparse Mode (PIM-SM): Protocol
     Specification", RFC 2362, June 1998.

6.  Author's Address

     Christian E. Hopps
     Merit Network
     4251 Plymouth Road, Suite C.
     Ann Arbor, MI  48105
     Phone: +1 734 936 0291
     EMail: chopps@merit.edu

7.  Full Copyright Statement

   Copyright (C) The Internet Society 2000. All Rights Reserved.

   This document and translations of it may be copied and furnished to
   others, and derivative works that comment on or otherwise explain it
   or assist in its implementation may be prepared, copied, published
   and distributed, in whole or in part, without restriction of any
   kind, provided that the above copyright notice and this paragraph are
   included on all such copies and derivative works.  However, this
   document itself may not be modified in any way, such as by removing
   the copyright notice or references to the Internet Society or other
   Internet organizations, except as needed for the  purpose of
   developing Internet standards in which case the procedures for
   copyrights defined in the Internet Standards process must be
   followed, or as required to translate it into languages other than

   The limited permissions granted above are perpetual and will not be
   revoked by the Internet Society or its successors or assigns.

Expires July 2000                                              [Page 9]

Draft                 Analysis of an ECMP Algorithm        February 2000

   This document and the information contained herein is provided on an

Expires July 2000                                             [Page 10]