Internet Engineering Task Force Dave Thaler
INTERNET-DRAFT Microsoft
Expires May 1999 15 November 1998
Multipath Issues in Unicast and Multicast
<draft-thaler-multipath-01.txt>
Status of this Memo
This document is an Internet Draft. Internet Drafts are working
documents of the Internet Engineering Task Force (IETF), its Areas, and
its Working Groups. Note that other groups may also distribute working
documents as Internet Drafts.
Internet Drafts are valid for a maximum of six months and may be
updated, replaced, or obsoleted by other documents at any time. It is
inappropriate to use Internet Drafts as reference material or to cite
them other than as a "work in progress".
1. Introduction
Various routing protocols, including OSPF [1] and ISIS, explicitly allow
"Equal-Cost Multipath" routing. Some router implementations also allow
equal-cost multipath usage with RIP and other routing protocols. Using
equal-cost multipath means that if multiple equal-cost routes to the
same destination exist, they can be discovered and used to provide load
balancing among redundant paths.
The effect of multipath routing on a forwarder is that the forwarder
potentially has several next-hops for any given destination and must use
some method to choose which next-hop should be used for a given data
packet. This memo summarizes current practices, problems, and solutions.
2. Concerns
Several router implementations allow multipath forwarding. This is
sometimes done naively via round-robin, where each packet matching a
Expires May 1999 [Page 1]
Draft Multipath Issues November 1998
given destination route is forwarded using the subsequent next-hop, in a
round-robin fashion. This does provide a form of load balancing, but
there are several problems with a round-robin approach:
Variable Path MTU
Since each of the redundant paths may have a different MTU, this
means that the overall path MTU can change on a packet-by-packet
basis, negating the usefulness of path MTU discovery.
Variable Bandwidth
Since each of the redundant paths may have a different amount of
bandwidth available, bandwidth may also change on a packet-by-
packet basis. Rate-adaptive protocols such as TCP are designed to
optimize their performance to adapt to the available bandwidth.
Varying the bandwidth on a packet-by-packet basis causes problems
with TCP's congestion control mechanisms, resulting in much lower
throughputs.
Variable Latencies
Since each of the redundant paths may have a different latency
involved, having packets take separate paths can cause packets to
always arrive out of order, increasing delivery latency and
buffering requirements.
Debugging
Common debugging utilities such as ping and traceroute are much
less reliable in the presence of multiple paths and may even
present completely wrong results.
In multicast routing, the problem with multiple paths is that multicast
routing protocols prevent loops and duplicates by constructing a single
tree to all receivers of the same group address. Multicast routing
protocols deployed today (DVMRP, PIM-DM, PIM-SM) construct shortest-path
trees rooted at either the source, or another router known as a Core or
Rendezvous Point. Hence, the way they insure that duplicates will not
arise is that a given tree must use only a single next-hop towards the
root of the tree.
Expires May 1999 [Page 2]
Draft Multipath Issues November 1998
3. Requirements
All of the problems outlined in the previous section arise when packets
in the same unicast or multicast "flow" (or session) are split among
multiple paths. The natural solution is therefore to insure that
packets for the same flow always use the same path.
Two additional features are desirable:
Minimal disruption
When multipath is used, meaning that multiple routes contribute
valid next-hops, the chances are higher of routes being added and
deleted from consideration than when only the "best" route is used
(in which case metric changes in alternate routes have no effect on
traffic paths). Hence, it is desirable to minimize the number of
active flows affected by the addition or deletion of another next-
hop.
Fast implementation
The amount of additional computation required to forward a packet
must be as small as possible. For example, when doing round-robin,
this computation might consist of incrementing (modulo the number
of next-hops) a next-hop index.
4. Solutions
We now provide two possible methods for improving the performance of
multipath and then discuss their applicability to unicast and multicast
forwarding.
Modulo-N Hash
To select a next-hop from the list of N next-hops, the router
performs a modulo-N hash over the packet header fields that
identify a flow. This has the advantage of being fast, at the
expense of (N-1)/N of all flows changing paths whenever a next-hop
is added or removed.
Highest Random Weight (HRW)
The router uses a simple pseudo-random number function seeded with
the packet header fields that identify a flow, as well as a next-
Expires May 1999 [Page 3]
Draft Multipath Issues November 1998
hop identifier (address or index), to assign a weight to each of
the N next hops. The next-hop receiving the highest weight is
chosen as the next hop. This has the advantage of minimizing the
number of flows affected by a next-hop addition or deletion, but is
approximately N times as expensive as a modulo-N hash. An analysis
of various deterministic weight functions can be found in [3].
The applicability of these two alternatives depends on (at least) two
factors: whether the forwarder maintains per-flow state, and how
precious CPU is to a multipath forwarder.
If per-flow state is maintained in a multipath forwarder, then
computation of the next-hop can be done by the router at state creation
time. This entails no additional computations at packet forwarding time,
since the next-hop is precomputed. In this case, any method can be
used, including round-robin, random, modulo-N, or HRW. Hash functions
such as modulo-N and HRW are better if the forwarder state may be
deleted for any reason during the lifetime of a flow since subsequent
next-hop computations by the router will always select the same path.
This also improves the usefulness of debugging utilities such as
traceroute. Finally, to maximize the stability of paths (and hence the
usefulness of traceroute, etc.), the use of HRW is recommended.
If per-flow state is not maintained by the forwarder, then using
multiple next-hops requires that the next-hop be calculated at packet
arrival time. When CPU is more precious than stability of flow paths, a
simple modulo-N hash over all packet header fields identifying a flow is
recommended.
4.1. Unicast Forwarding
Depending on the implementation, unicast forwarding may or may not keep
per-flow state. We recommend that where forwarder implementations keep
flow state, routers should use HRW at state creation time (and next-hop
deletion time) to select the next-hop, and that forwarders without per-
flow state use a modulo-N hash over the source and destination
addresses.
4.2. Multicast Forwarding
Today's multicast forwarding engines use a cache of forwarding entries
indexed by group (or group prefix) and source (or source prefix). This
means that today's multicast forwarder's always keep per-flow state,
Expires May 1999 [Page 4]
Draft Multipath Issues November 1998
although for some multicast routing protocols, the "flow" may be fairly
coarse (e.g., traffic from all sources to the same destination). Since
per-flow state is kept by the forwarder, it is recommended that the
router always use HRW to select the next-hop.
Routers using explicit-joining protocols such as PIM-SM [4] should thus
use the multipath information when determining to which neighbor a join
message should be sent. For example, when multiple next-hops exist for
a given Rendezvous Point (RP) toward which a (*,G) Join should be sent,
it is recommended that HRW be used to select the next-hop to use for
each group.
5. Redundant Parallel Links
A related problem occurs when multiple parallel links are used between
the same pair of routers. A common solution is to bundle the two links
together into a "super"-link when is then used for routing. For
multicast forwarding, this results in the two links being reduced to a
single next-hop (over the combined link) which can be used to prevent
duplicates. When a unicast or multicast packet is queued to the
combined link, some method, such as those discussed earlier, is still
required to determine the physical link on which to transmit the packet.
If the parallel links are identical, then most of the concerns discussed
in this document are avoided with the combined link. The exception is
packet reordering, which can still occur with round-robin, adversely
affecting TCP.
6. Security Considerations
This document discusses issues with various methods of choosing a next-
hop from among multiple valid next-hops. As such, it does not directly
impact the security of the Internet infrastructure or its applications.
7. References
[1] Moy, J., "OSPF Version 2", RFC 2178, July 1997.
[2] Semeria, C., and T. Maufer, "Introduction to IP Multicast Routing",
draft-ietf-mboned-intro-multicast-03.txt, October 1997.
[3] Thaler, D., and C.V. Ravishankar, "Using Name-Based Mappings to
Increase Hit Rates", IEEE/ACM Transactions on Networking, February
Expires May 1999 [Page 5]
Draft Multipath Issues November 1998
1998.
[4] Estrin, D., Farinacci, D., Helmy, A., Thaler, D., Deering, S.,
Handley, M., Jacobson, V., Liu, C., Sharma, P., and L. Wei,
"Protocol Independent Multicast-Sparse Mode (PIM-SM): Protocol
Specification", RFC 2362, June 1998.
8. Author's Address
Dave Thaler
Microsoft
One Microsoft Way
Redmond, WA 98052
Phone: +1 425 703 8835
EMail: dthaler@microsoft.com
Expires May 1999 [Page 6]