Internet Engineering Task Force                           Dave Thaler
INTERNET-DRAFT                                                  Merit
Expires July 1998                                     13 January 1997



               Multipath Issues in Unicast and Multicast
                  <draft-ietf-thaler-multipath-00.txt>





Status of this Memo

This document is an Internet Draft.  Internet Drafts are working
documents of the Internet Engineering Task Force (IETF), its Areas, and
its Working Groups.  Note that other groups may also distribute working
documents as Internet Drafts.

Internet Drafts are valid for a maximum of six months and may be
updated, replaced, or obsoleted by other documents at any time.  It is
inappropriate to use Internet Drafts as reference material or to cite
them other than as a "work in progress".


1.  Introduction

Various routing protocols, including OSPF [1] and ISIS, allow ''Equal-
Cost Multipath'' routing.  At least two vendors also allow equal-cost
multipath usage with RIP [2].  Using equal-cost multipath means that if
multiple equal-cost routes to the same destination exist, they can all
be discovered and used to provide load balancing among redundant paths.

The effects of multipath routing on a forwarder is that the forwarder
potentially has several next-hops for any given destination and must use
some method to choose which next-hop should be used for a given data
packet.   In this memo, we describe current practice, problems with
this, and potential solutions.











Expires July 1998                                               [Page 1]


Draft                          Multipath                    January 1997


2.  Concerns

At least two deployed router implementations allow multipath forwarding.
This is typically done via round-robin, where each packet matching a
given destination route is forwarded using the subsequent next-hop in a
round-robin fashion.  This approach does provide a form of load
balancing, but there are several problems with this approach:


Variable Path MTU
     Since each of the redundant paths may have a different MTU, this
     means that the overall path MTU can change on a packet-by-packet
     basis, negating the usefulness of path MTU discovery.


Variable Bandwidth
     Since each of the redundant paths may have a different bandwidth
     available, this may also change on a packet-by-packet basis.
     Rate-adaptive protocols such as TCP are designed to optimize their
     performance to adapt to the available bandwidth.  Varying this on a
     packet-by-packet basis causes problems with TCP's congestion
     control mechanisms, resulting in much lower throughputs.


Variable Latencies
     Since each of the redundant paths may have a different latency
     involved, having packets take separate paths can cause packets to
     always arrive out of order, increasing delivery latency and
     buffering requirements.


Debugging
     Common debugging utilities such as ping and traceroute are much
     less reliable in the presence of multiple paths and may even
     present completely wrong results.

In multicast routing, the problem with multiple paths is that loops and
duplicates are prevented by constructing a single tree to all receivers
of the same group address.  Multicast routing protocols available today
construct use shortest-path trees rooted at some point (either the
source address, or the address of another router known as a Core or
Rendezvous Point) [2].  Hence, the way they insure that duplicates will
not arise is that a given tree must use only a single next-hop towards
the root of the tree.






Expires July 1998                                               [Page 2]


Draft                          Multipath                    January 1997


3.  Requirements

All of the problems outlined in the previous section arise when packets
in the same (unicast or multicast) session are split among multiple
paths.  The natural solution is therefore to insure that packets for the
same session (or flow) always use the same path.

Two additional features are desirable:


Minimal disruption
     When multipath is used, meaning that multiple routes contribute
     valid next-hops, the chances are higher of routes being added and
     deleted from consideration than when only the "best" route is used
     (in which case metric changes in alternate routes have no effect on
     traffic paths).  Hence, it is desirable to minimize the number of
     sessions affected by the addition or deletion of another path.


Fast implementation
     The amount of additional computation required to forward a packet
     must be as small as possible.  For example, when doing round-robin,
     this computation might consist of incrementing (modulo the number
     of next-hops) a next-hop index.


4.  Solutions

We now provide two possible methods for improving the performance of
multipath and then discuss their applicability to unicast and multicast
forwarding.


Modulo-N Hash
     To select a next-hop from the list of N next-hops, the router
     performs a modulo-N hash over the IP header fields that identify a
     session.  This has the advantage of being fast, at the expense of
     (N-1)/N of all sessions changing paths whenever a next-hop is added
     or removed.


Highest Random Weight (HRW)
     The router uses a simple pseudo-random number function seeded with
     the IP header fields that identify a session, as well as a next-hop
     identifier (address or index) to assign a weight to each of the N





Expires July 1998                                               [Page 3]


Draft                          Multipath                    January 1997


     next hops. An analysis of various deterministic weight functions
     can be found in [3].  The next-hop receiving the highest weight is
     chosen as the next hop.  This has the advantage of minimizing the
     number of sessions affected by a next-hop addition or deletion, but
     is approximately N times as expensive as a modulo-N hash.

The applicability of these two alternatives depends on (at least) two
factors: whether the forwarder maintains per-flow state (or any finer
classification than "all unicast traffic" and "all multicast traffic"),
and how precious CPU is to a multipath forwarder.

If per-flow state is maintained in a multipath forwarder, then
computation of the next-hop can be done by the router at state creation
time. This entails no additional computations at packet forwarding time,
since the next-hop is precomputed.  In this case, any method can be
used, including round-robin, random, modulo-N, or HRW.  Hash functions
such as modulo-N and HRW are better if the forwarder state may be
deleted for any reason during the lifetime of a session since subsequent
next-hop computations by the router will always select the same path.
This also improves the usefulness of debugging utilities such as
traceroute.  Finally, to maximize the stability of paths (and hence the
usefulness of traceroute, etc.), we specifically recommend the use of
HRW.

If no state finer than "all packets" is maintained in the forwarder,
then using multiple next-hops requires that the next-hop be calculated
at packet arrival time.  When CPU is more precious than stability of
session paths, a simple modulo-N hash may be used.


4.1.  Unicast Forwarding

Depending on the implementation, unicast forwarding may or may keep
per-flow state.  We recommend that where forwarder implementations keep
flow state, routers should use HRW at state creation time (and next-hop
deletion time) to select next-hop, and that forwarders without per-flow
state use a modulo-N hash over the source and destination addresses.


4.2.  Multicast Forwarding

Multicast forwarding uses a cache of forwarding entries indexed by group
(or group prefix) and source (or source prefix).  This means that,
logically, a multicast forwarder always keeps per-"session" state,
although a "session" may be fairly coarse (e.g., traffic from all





Expires July 1998                                               [Page 4]


Draft                          Multipath                    January 1997


sources to the same destination), depending on the multicast routing
protocol in use. Since per-session state is kept by the forwarder, it is
recommended that the router always use HRW to select the next-hop.

Routers using explicit-joining protocols such as PIM-SM [4] should thus
use the multipath information when determining to which neighbor a join
message should be sent.  For example, when multiple next-hops exist for
a given Rendezvous Point (RP) toward which a (*,G) Join should be sent,
it is recommended that HRW be used to select the next-hop to use for
each group.

5.  References

[1]  Moy, J., "OSPF Version 2", RFC 2178, July 1997.

[2]  Semeria, C., and T. Maufer, "Introduction to IP Multicast Routing",
     draft-ietf-mboned-intro-multicast-03.txt, October 1997.

[3]  Thaler, D., and C.V. Ravishankar, "Using Name-Based Mappings to
     Increase Hit Rates", IEEE/ACM Transactions on Networking, February
     1998.

[4]  Estrin, D., Farinacci, D., Helmy, A., Thaler, D., Deering, S.,
     Handley, M., Jacobson, V., Liu, C., Sharma, P., and L. Wei,
     "Protocol Independent Multicast - Sparse Mode (PIM-SM): Protocol
     Specification", RFC 2117, June 1997.


6.  Security Considerations

Security issues are not discussed in this memo.


7.  Author's Address

     Dave Thaler
     Merit Network, Inc
     4251 Plymouth Rd., Suite C
     Ann Arbor, MI  48105-2785
     Phone: +1 313 647 4813
     EMail: thalerd@merit.net









Expires July 1998                                               [Page 5]