Inter-Domain Multicast Routing (IDMR) A. J. Ballardie
INTERNET-DRAFT University College London
February 9th, 1996
Core Based Trees (CBT) Multicast Architecture
<draft-ietf-idmr-cbt-arch-03.txt>
Status of this Memo
This document is an Internet Draft. Internet Drafts are working do-
cuments of the Internet Engineering Task Force (IETF), its Areas, and
its Working Groups. Note that other groups may also distribute work-
ing documents as Internet Drafts).
Internet Drafts are draft documents valid for a maximum of six
months. Internet Drafts may be updated, replaced, or obsoleted by
other documents at any time. It is not appropriate to use Internet
Drafts as reference material or to cite them other than as a "working
draft" or "work in progress."
Please check the I-D abstract listing contained in each Internet
Draft directory to learn the current status of this or any other In-
ternet Draft.
Abstract
CBT is a new multicast architecture that builds a single, shared
delivery tree. Traditional multicast algorithms build a source-based
tree per sender subnetwork, as does DVMRP, the protocol currently de-
ployed in the Internet's multicast backbone (MBONE) [1]. The primary
advantage of the shared tree approach is that it typically offers
more favourable scaling characteristics than do existing multicast
algorithms.
The CBT protocol [2] is a network layer multicast protocol that
builds a shared delivery tree. The CBT protocol also allows for the
incorporation of security features [2, 3], and the potential to
reserve network resources as part of delivery tree set-up [14].
The sending and receiving of multicast data by hosts on a subnetwork
conforms to the traditional IP multicast service model [4]; multicast
data on a subnetwork appears no different to that of existing
schemes.
Expires August 9th, 1996 [Page 1]
INTERNET-DRAFT CBT Multicast Architecture February 1996
CBT is progressing through the IDMR working group of the IETF. The
CBT protocol is described in an accompanying document [2]. For this,
and all IDMR-related documents, see http://www.cs.ucl.ac.uk/ietf/idmr
TABLE OF CONTENTS
1. Background................................................... 3
2. Introduction................................................. 4
3. Source Based Tree Algorithms &
the Motivation for Shared Trees.............................. 5
3.1 Distance-Vector Multicast Algorithm...................... 5
3.2 Link State Multicast Algorithm........................... 6
3.3 The Motivation for Shared Trees.......................... 7
4. CBT - The New Architecture................................... 8
4.1 Design Requirements...................................... 8
4.2 Architectural Overview................................... 11
4.3 Core Selection, Placement, and Management................ 13
5. Architectural Justification:
Comparisons & Simulation Results............................. 15
5.1 Traffic Concentration.................................... 19
5.2 Shared Trees and Load Balancing.......................... 19
6. Conclusions.................................................. 19
Acknowledgements................................................ 20
APPENDIX........................................................ 21
Expires August 9th, 1996 [Page 2]
INTERNET-DRAFT CBT Multicast Architecture February 1996
1. Background
Centre based forwarding was first described in the early 1980s by
Wall in his PhD thesis on broadcast and selective broadcast. At this
time, the benefits and uses of multicast were not fully understood.
It wasn't until much later that the IP multicast address space was
defined (class D space), and Deering defined the IP multicast service
model [4]. Deering's work was pioneering in that he invented algo-
rithms that allowed hosts to arbitrarily join a multicast group (IGMP
[5]), and enable a local multicast-capable router to become part of a
receiver-initiated wide-area delivery tree. The latter consituted
algorithms that build sourced-based delivery trees, with one delivery
tree per sender subnetwork. These algorithms are documented in [4].
The multicast-capable part of the Internet (MBONE [1]) primarily
implements an a protocol instance of the distance-vector multicast
algorithm [4] called DVMRP.
Now we have several years practical experience with multicast, and a
diversity of multicast applications, from shared workspace tools, to
audio- and video-conferencing tools, with new applications emerging
all the time (e.g. distributed video gaming), some of which have
wide-ranging multicast requirements. Furthermore, the MBONE has been
constantly growing since the first public audiocast (audio multicast)
of a 1992 IETF meeting [6] took place. Then, the MBONE intercon-
nected 40 subnets in 4 different countries. Statistics suggest that
the MBONE has doubled in size over the past eight months, and as of
July 1995 comprises over 2,500 subnetworks spanning 25 countries [1].
The obvious popularity and growth of multicast means that the scaling
aspects of wide-area multicasting cannot be overlooked; some predic-
tions talk of thousands of groups being present at any one time in
the Internet. Distributed Interactive Simulation (DIS) applications
[15] have real-time requirements (in terms of join latency, group
membership, group sender populations) that far exceed the require-
ments of most applications currently in use.
Scalability is measured in terms of network state maintenance,
bandwidth efficiency, and protocol overhead. Other factors that can
affect these parameters include sender set size, and wide-area dis-
tribution of group members.
Expires August 9th, 1996 [Page 3]
INTERNET-DRAFT CBT Multicast Architecture February 1996
2. Introduction
Deering's algorithms build source-based multicast delivery trees,
that is, the delivery tree is rooted at a multicast-capable router on
the subnetwork directly attached to a sender. The IGMP protocol [5]
operates between hosts and multicast router(s) on a subnetwork, ena-
bling multicast routers to monitor group presence on attached sub-
nets, so that on receipt of a multicast packet, a multicast router
knows over which interfaces group members are located.
Multicasting on the local subnetwork does not require either the
presence of a multicast router or the implementation of a multicast
algorithm; on most shared media (e.g. Ethernet), a host, which need
not necessarily be a member, simply sends a multicast data packet,
which is received by any member hosts. For multicasts to extend
beyond the scope of the local subnetwork, the subnet must have a
multicast-capable router attached, which itself is attached (possibly
virtually) to another multicast-capable router, and so on. The col-
lection of these (virtually) connected multicast routers forms the
Internet's MBONE [1]. Each multicast router must implement the same
multicast routing protocol to ensure full multicast connectivity
(footnote). For wide-area multicasting then, multicast routers "con-
spire" to get multicast data to the group's receivers, wherever they
be located. This is the job of the multicast routing algorithm.
Different algorithms use different techniques for establishing a dis-
tribution tree. If we classify these algorithms into source-based
tree algorithms and shared tree algorithms, we'll see that the dif-
ferent classes have considerably different scaling characteristics,
and the characteristics of the resulting trees differ too, for exam-
ple, average delay. Let's look at source-based tree algorithms first.
_________________________
Domains that deploy different multicast protocols may
be interconnected using a common multicast protocol at
domain boundaries (border routers). A special protocol
interface needs to be implemented in these border
routers.
Expires August 9th, 1996 [Page 4]
INTERNET-DRAFT CBT Multicast Architecture February 1996
3. Source-Based Tree Algorithms & the Motivation for Shared Trees
The strategy we'll use for motivating (CBT) shared tree multicast is
based, in part, in explaining the characteristics of source-based
tree multicast, in particular its scalability. We then summarize the
primary motivation for shared trees in section 3.3.
Source-based tree multicast algorithms are often referred to as
"dense-mode" algorithms; they assume that the receiver population
densely populates the domain of operation, and therefore the accom-
panying overhead (in terms of state, bandwidth usage, and/or process-
ing costs) is justified. Whilst this might be true of a local
environment, it is almost never true of the wide-area environment,
especially since wide-area group membership tends to be sparsely dis-
tributed throughout the Internet. There may be "pockets" of dense-
ness, but if one views the global picture, wide-area groups tend to
be sparsely distributed.
Source-based multicast trees are either built by a distance-vector
style algorithm, which may be implemented separately from the unicast
routing algorithm (as is the case with DVMRP), or the multicast tree
may be built using the information present in the underlying unicast
routing table (as is the case with PIM-DM [7]). The other algorithm
used for building source-based trees is the link-state algorithm (a
protocol instance being M-OSPF [8]).
3.1. Distance-Vector Multicast Algorithm
The distance-vector multicast algorithm builds a multicast delivery
tree using a variant of the Reverse-Path Forwarding technique [9].
The technique basically is as follows: when a multicast router
receives a multicast data packet, if the packet arrives on the inter-
face used to reach the source of the packet, the packet is forwarded
over all outgoing interfaces, except leaf subnets (footnote) with no
members attached. Otherwise the packet is discarded. This consti-
tutes a "broadcast & prune" approach. Subsequently, outgoing inter-
faces may be "pruned"; when a data packet reaches a leaf router, if
that router has no membership registered on its directly attached
subnetworks, the router may send a prune message one hop back towards
_________________________
A "leaf" subnet is one with no downstream routers at-
tached. A "leaf" router is one which no other router
uses to reach the source.
Expires August 9th, 1996 [Page 5]
INTERNET-DRAFT CBT Multicast Architecture February 1996
the source. The receiving router then checks its leaf subnets for
group membership, and checks whether it has received a prune from all
of its downstream routers. If the checks succeed, the router itself
can send a prune upstream over the interface leading to the source.
Thus, prune messages prune the tree branches not leading to group
members. Prune messages are stored per <source, group> pair, and are
subject to timeout, after which data flows over the branch again. If
there are still no members present, the pruning process repeats
itself. State that "times out" in this way is referred to as "soft
state".
It can be inferred from the above algorithm that multicast routers
must maintain state per group per active source. This source-specific
state manifests itself in the multicast forwarding table, where, for
each active source, the outgoing interface list is dependent on the
prunes received, and on the time since they were received (because
they time out).
As we said, most wide-area groups are likely to be sparsely distri-
buted. As a result, it is fair to assume that some potentially large
number of routers in the internetwork, i.e. those not leading to
downstream receivers, are incurred the cost of storing <source,
group> state.
The "broadcast & prune" nature of the distance-vector multicast algo-
rithm, the sparseness of receivers, combined with the fact that many
wide-area links have limited bandwidth, demonstrates that such an
algorithm cannot scale to a large internetwork that supports large
numbers of groups.
3.2. Link-State Multicast Algorithm
Routers implementing a link state algorithm periodically collect
reachability information to their directly attached neighbours, then
flood this throughout the routing domain in so-called link state
update packets. Deering extended the link state algorithm for multi-
casting by having a router additionally detect group membership
changes on its incident links before flooding this information in
link state packets.
Each router then, has a complete, up-to-date image of a domain's
topology and group membership. On receiving a multicast data packet,
each router uses its membership and topology information to calculate
a shortest-path tree rooted at the sender subnetwork. Provided the
Expires August 9th, 1996 [Page 6]
INTERNET-DRAFT CBT Multicast Architecture February 1996
calculating router falls within the computed tree, it forwards the
data packet over the interfaces defined by its calculation. Hence,
multicast data packets only ever traverse routers leading to members,
either directly attached, or further downstream. That is, the
delivery tree is a true multicast tree right from the start.
However, the flooding (reliable broadcasting) of group membership
information is the predominant factor preventing the link state mul-
ticast algorithm being applicable over the wide-area. The other lim-
iting factor is the processing cost of the Dijkstra calculation to
compute the shortest-path tree for each active source.
3.3. The Motivation for Shared Trees
The algorithms described in the previous sections clearly motivate
the need for a multicast algorithm(s) that is more scalable. CBT was
designed primarily to address the topic of scalability; a shared tree
architecture offers an improvement in scalability over source tree
architectures by a factor of the number of active sources (where
source is a subnetwork aggregate). Source trees scale O(S * G), since
all sources use the same shared tree; shared trees eliminate the
source (S) scaling factor, and hence a shared tree scales O(G). The
implication of this is that applications with many active senders,
such as distributed interactive simulation applications, and distri-
buted video-gaming (where most receivers are also senders), scale
considerably better if shared trees are used.
In the table below we compare the amount of state required by CBT and
DVMRP for different group sizes with different numbers of active
sources:
Expires August 9th, 1996 [Page 7]
INTERNET-DRAFT CBT Multicast Architecture February 1996
|--------------|---------------------------------------------------|
| Number of | | | |
| groups | 10 | 100 | 1000 |
====================================================================
| Group size | | | |
| (# members) | 20 | 40 | 60 |
-------------------------------------------------------------------|
| No. of srcs | | | | | | | | | |
| per group |10% | 50% |100% |10% | 50% |100% |10% | 50% | 100% |
--------------------------------------------------------------------
| No. of DVMRP | | | | | | | | | |
| router | | | | | | | | | |
| entries | 20 | 100 | 200 |400 | 2K | 4K | 6K | 30K | 60K |
--------------------------------------------------------------------
| No. of CBT | | | |
| router | | | |
| entries | 10 | 100 | 1000 |
|------------------------------------------------------------------|
Figure 1: Comparison of DVMRP and CBT Router State
There are also additional significant bandwidth and state savings
with shared trees in contrast to source trees; firstly, the tree only
spans a group's receivers (including links/routers leading to
receivers) -- there is no cost to routers/links in other parts of the
network. Secondly, routers between a non-member sender and the
delivery tree are not incurred any cost pertaining to multicast, and
indeed, these routers need not even be multicast-capable -- packets
from non-member senders are encapsulated and unicast towards a core
on the tree.
4. CBT - The New Architecture
4.1. Design Requirements
The CBT shared tree design was geared towards several design objec-
tives:
Expires August 9th, 1996 [Page 8]
INTERNET-DRAFT CBT Multicast Architecture February 1996
+ scalability
+ routing protocol independence
+ robustness
+ interoperability
As we have mentioned, a shared multicast tree improves scalability by
an order of magnitude.
A facet of existing source tree algorithms is that they are all tied
inextricably, one way or another, to a particular routing protocol.
For example, DVMRP is based heavily on RIP, and relies for its
correct operation on some of the features of RIP (e.g. poison
reverse). Similarly, with M-OSPF.
With the multicast infrastructure fast converging on the unicast
infrastructure, which is heterogeneous in nature, the extent to which
a particular multicast algorithm can be deployed is severely limited
if deployment is dependent on the presence of a particular unicast
algorithm. Therefore, it makes good sense to separate out these
dependencies from the multicast algorithm; this is exactly what CBT
does, as does PIM [7]. The CBT protocol makes use of the underlying
unicast routing table when building its delivery tree, independent of
how the unicast table(s) is computed.
Source-based tree algorithms are clearly robust; a sender simply
sends its data, and intervening routers "conspire" to get the data
where it needs to, creating state along the way. This is the so-
called "data driven" approach -- there is no set-up protocol
involved.
It is not as easy to achieve the same degree of robustness in shared
tree algorithms, since there is network entity involved which is the
focal point of the tree, which effectively, keeps a shared tree fully
connected. This entity is just a pre-nominated router, to which all
receivers (their directly-attached routers) must explicitly join. In
actual fact, any shared tree is likely to have several such so-called
"cores", or "rendevous points (RPs)" to increase robustness.
CBT and PIM diverge in their approaches to robustness; PIM attempts
to maintain a data-driven approach, with only one RP active at any
one time. In CBT, all cores are active. PIM's desire to emulate the
robustness of source trees comes at a cost, especially in terms of
Expires August 9th, 1996 [Page 9]
INTERNET-DRAFT CBT Multicast Architecture February 1996
protocol complexity, and an overall increased bandwidth requirement
[10]. CBT, on the other hand, adopts the "hard state" approach,
whereby a tree branch is created reflecting underlying unicast at the
instant it is created; thereafter, the same tree branch does not
necessarily change to reflect underlying unicast changes. This has
positive implications for the case where a branch is created together
with a resource reservation. The reservation remains fixed unless a
neighbouring link/router fails, in which case there is no option but
to rejoin the tree by means of explicit protocol, and request the
resources again. However, the router to which a branch is rejoined
may not be able to honour the same reservation request.
CBT branches are torn down by means of explicit protocol messages,
whereas PIM branches time out. PIM incurs protocol complexity to
manage its various timers (to keep state where it is required), and
protocol "refresh" messages consume bandwidth. CBT implements a
"keepalive" mechanism between adjacent routers on a tree; these are
CBT's only periodic tree maintenance messages, and they are aggre-
gated such that there is one per link, not one per group per link.
Both protocols reduce the frequency of tree maintenance messages as
the number of messages increases.
A comparison of protocol costs/state/scalability etc. is summarised
in section 5, and is presented in detail in [10].
With regards to interoperability, the type of multicast delivery tree
interconnecting member hosts (subnets) over the wide-area is tran-
sparent to those hosts; hosts send/receive multicast data in tradi-
tional IP format, and hosts are not involved explicitly in any way
with tree set-up. This is the sole responsibility of local multicast
routers.
CBT also accommodates interoperability between "clouds" or domains
operating different multicast protocols. It is expected that intero-
perability between different multicast protocols only be relevant at
domain (or region, or "cloud") boundaries; inside a particular domain
only a single multicast protocol is expected to be present. One
method of how different trees can be "spliced" together at a domain
boundary is presented in [11].
Homogeneous clouds, as described in the previous paragraph, means
that multicast data can travel in what we call "native mode", i.e.
no encapsulation is required that keeps data, from different groups
using different multicast protocols, separate. CBT also specifies a
"CBT mode", where a particular cloud is heterogeneous, i.e. multiple
Expires August 9th, 1996 [Page 10]
INTERNET-DRAFT CBT Multicast Architecture February 1996
multicast protocols are present possibly on the same subnet; CBT mode
data is encapsulated with a CBT header to distinguish it from any
other multicast traffic, for example, traffic that is using a source
based tree. These two "modes" are described in the CBT specification
document [2].
4.2. Architectural Overview
Shared trees were first described by Wall in his investigation into
low-delay approaches to broadcast and selective broadcast [12]. Wall
concluded that delay will not be minimal, as with shortest-path
trees, but the delay can be kept within bounds that may be accept-
able. Simulations have recently been carried out to compare various
scaling characteristics of centre-based and shortest-path trees. A
summary of these simulations can be found in section 5, and the
details are presented in [10].
A CBT shared tree involves having a small set of pre-nominated cores
(routers), to which routers connected to member hosts join by means
of an explicit protocol control message. Any core is eligible for
joining, although only a single core is targetted by a join message.
On receipt of a join, the core router replies with an acknowledge-
ment, which traverses the reverse-path of the corresponding join (the
join sets up transient state along the way). As such, tree branches
are created, and parent-child relationships set up between adjacent
CBT routers on the tree, the parent being the router nearer the core.
When the ack is received, the router creates a CBT forwarding infor-
mation base (FIB) entry, listing interfaces corresponding to a par-
ticular group. Due to the way the tree is formed, each tree link is
symmetric, and the tree reflects an undirected graph.
Tree branches are made up of so-called non-core routers, which form a
shortest forward path between a member-host's directly attached
router, and the core. A router at the end of a branch is known as a
leaf router on the tree.
A diagram showing a single-core CBT tree is shown in the figure
below. Only one core is shown to demonstrate the principle.
Expires August 9th, 1996 [Page 11]
INTERNET-DRAFT CBT Multicast Architecture February 1996
b b b-----b
\ | |
\ | |
b---b b------b
/ \ / KEY....
/ \/
b X---b-----b X = Core
/ \ b = non-core router
/ \
/ \
b b------b
/ \ |
/ \ |
b b b
Figure 2: Single-Core CBT Tree
Join messages do not necessarily travel all the way to the core
before being acknowledged; if a join message hits a router on the
tree (on-tree router) on its journey towards a core, that on-tree
router terminates the join and acknowledges it. A router that is
itself in the process of joining the tree (i.e. one that has not yet
received an ack itself) can terminate and cache any subsequent
received joins, but cannot acknowledge them until its own join has
been successfully acknowledged.
For simplicity, part of the CBT protocol [2] is data driven; one of a
set of cores for a group is elected (by a group initiator) as the
PRIMARY core, all others are termed SECONDARY cores. The ordering of
the secondary cores is irrelevant, however, the primary must be uni-
form across the whole group. Whenever a secondary core is joined, it
first acks the join then proceeds to join the primary core -- the
primary core simply listens for incoming joins, it need never send a
join itself. In this way, a CBT tree becomes fully connected in
response to members joining.
The tree joining process is triggered when a subnet's CBT-capable
router receives an IGMP group membership report [5]. If more than one
CBT router is present on a subnetwork, only one router, the subnet's
designated router (DR), generates a join message. A subnet's DR is
implicitly elected (i.e. no CBT protocol involvement) as a "side-
effect" of IGMP.
Expires August 9th, 1996 [Page 12]
INTERNET-DRAFT CBT Multicast Architecture February 1996
CBT builds a loop-free shared tree. In some scenarios it is necessary
to take explicit precautions against loops, in others it is not. For
example, a loop cannot form between a router that wishes to join the
tree, if that router has no child tree branches (interfaces). There
is a potential for loops if a joining router has at least one child
attached. This scenario would constitute a rejoin. Once the rejoining
router has received an ack, it must generate a loop detection which
is sent over its parent interface. Receiving routers must forward
this packet over their parent interface only, unless it is received
by its originator, in which case a loop is present, or it is recieved
by the primary core, in which case no loop is present. In the former
case, the loop is broken by the originator sending a quit packet to
its new parent, and the latter case solicits an ack from the primary
core, confirming the absence of a loop.
Tree maintenance takes place between adjacent on-tree routers in the
form of protocol "keepalive" messages. Only one of these need be sent
per link (not per group), and this message is sent in the direction
child -> parent. The parent sends a corresponding acknowledgement.
This is how tree connectivity is monitored, and breakages/failures
detected.
When a CBT router receives a multicast data packet, it simply for-
wards it over all outgoing interfaces, as specified in its FIB entry
for the group. Protocol mechanisms help ensure that data packets
never loop.
The CBT protocol is simple and lightweight. The CBT protocol specifi-
cation is described in an separate document [2].
4.3. Core Selection, Placement, and Management
The issues of core (RP) selection, placement, and management are
still under review, although several recent advancements have been
made [13].
Until recently, it was thought that hosts would need to be explicitly
involved in discovering core addresses corresponding to a particular
group. This would require host system changes, and modifications to
applications, such that an application could request the host to
establish a <core, group> mapping for a given group, as well as some
sort of core advertisement protocol in which hosts and core routers
Expires August 9th, 1996 [Page 13]
INTERNET-DRAFT CBT Multicast Architecture February 1996
participate.
A dependency on host or application involvement with core (RP)
discovery is therefore very undesirable. Indeed, the type of multi-
cast tree to be joined should be invisible to hosts (and even appli-
cations).
A new proposal has recently emerged called Hierarchical PIM [13],
which proposes having a multi-level hierarchy of RPs (cores), but
which are known only to multicast routers; cores (RPs) remain invisi-
ble to hosts. Whilst its name suggests it is specific to PIM, its
principles are not.
Each level of hierarchy corresponds to a multicast "scope level",
with a certain multicast address range corresponding to each scope.
This therefore, requires the multicast address space to be officially
"administratively scoped"; a group initiator chooses a multicast
address based on the perceived scope of the group. On receipt of an
IGMP group membership report from a host, a local router (the
lowest-level of the core (RP) hierarchy) knows, based on the group
address, whether it has to join a core at the next level up in the
hierarchy; each candidate RP at a particular level advertises itself
within its own level (using a well-known scoped multicast address),
and to the level below. Hence, an RP should always know how to reach
an RP at the next level up in the hierarchy. A next-level RP is
chosen by using a hash function across the group address.
The lower the hierarchy level, the more densely RPs populate that
level. However, at the top level (for globally-scoped groups) it is
expected there will be sufficient RP distribution so that shared tree
paths (i.e. delay) is reasonably efficient.
It is expected that there will probably be six or seven levels of
hierarchy (local, region, country, continent, etc.), with the candi-
date RPs changing relatively infrequently (say every 24 hrs), with
the candidate RP announcements being made more frequently, e.g.
every few minutes.
The finer details of this scheme have still to be worked out, but due
to the significant advantage of being host-transparent, it is highly
likely that this RP/Core selection/placement/management scheme will
be adopted (footnote).
_________________________
This work is currently progressing under the auspices
of the IDMR working group of the IETF.
Expires August 9th, 1996 [Page 14]
INTERNET-DRAFT CBT Multicast Architecture February 1996
5. Architectural Justification: Comparisons & Simulation Results
This majority of this section summarises the results of in-depth
simulations carried out by Harris Corp., USA, investigating the per-
formance and resource cost comparisons of multicast algorithms for
distributed interactive simulation applications [10]. More pre-
cisely, the report summarises the work on the Real-time Information
Transfer & Networking (RITN) program, comparing the cost and perfor-
mance of PIM [7] and CBT [2] in a DIS environment. As we said ear-
lier, DIS applications have wide-ranging requirements. We feel it is
important to take into account a wide range of requirements so that
future applications can be accommodated with ease; all other multi-
cast architectures are tailored to the requirements of applications
in the current Internet, particularly audio and video applications.
Figure 3 shows a comparison of application requirements [10].
We also present results into the study of whether source-based trees
or shared trees are the best choice for multicasting in the RITN pro-
gram. In the study of shortest-path trees (SPTs) vs. shared trees,
PIM Dense-Mode and PIM-SM with SPTs were used as SPTs, with CBT and
PIM-SM used as shared trees. This section assumes the reader is fami-
liar with the different modes of PIM [7].
Expires August 9th, 1996 [Page 15]
INTERNET-DRAFT CBT Multicast Architecture February 1996
|--------------|----------------------------------------------------|
| | Paradigm |
| |----------------------------------------------------|
| Requirement | | audio/video | audio/video |
| | DIS | lecture dist'n | conference |
=====================================================================
| senders | many | small number | small number |
---------------------------------------------------------------------
| receivers |also senders | many recvrs | also senders |
---------------------------------------------------------------------
| no. of grps | | | |
| per appl'n | many | one or few | one or few |
---------------------------------------------------------------------
| Data tx | real time | real time | real time |
---------------------------------------------------------------------
| e2e delay | small | non-critical | moderate |
---------------------------------------------------------------------
| set up | real time | non real time | non real time |
---------------------------------------------------------------------
| join/leave | rapid for | rapid for | can be rapid for |
| dynamics | participants| receivers | participants |
---------------------------------------------------------------------
| | must be | must be scalable| |
| scalability | scalable to | to large n/ws | need scalability |
| | large n/ws &| and large nos | large n/ws |
| | large grps, | of recvrs per | |
| | with large | group | |
| | nos senders | | |
| | & recvrs per| | |
| | group | | |
---------------------------------------------------------------------
| | based upon | | |
| multicast | the DIS | rooted on src | incl participants |
| tree | virtual | and includes | and can slowly move|
| | space, with | current recvrs | over phys topology |
| | rapid mvmt | | |
| | over phys | | |
| | topology | | |
---------------------------------------------------------------------
Figure 3: Comparison of Multicast Requirements
Expires August 9th, 1996 [Page 16]
INTERNET-DRAFT CBT Multicast Architecture February 1996
The following criteria were considered in the simulations:
+ end-to-end delay
+ group join time
+ scalability (all participants are both senders & receivers)
+ bandwidth utilization
+ overhead traffic
+ protocol complexity
A brief summary of the the results of the evaluation follow. For a
detailed description of the simulations and test environments, refer
to [10].
+ End-to-end delay. It was shown that PIM-DM and PIM-SM with SPTs
deliver packets between 13% and 31% faster than CBT. PIM-SM has
about the same delay characteristics as CBT. Processing delay
was not taken into account here, and this stands in PIM's
favour, since PIM routers are likely to have much larger routing
tables, and thus, much greater search times.
+ Join time. There was very little difference between any of the
algorithms.
+ Bandwidth efficiency. It was shown that PIM-SM with shared
trees, and PIM-SM with SPTs both required only about 4% more
bandwidth in total, to deliver data to hosts. PIM-DM however, is
very bandwidth inefficient, but this improves greatly as the
network becomes densely populated with group receivers.
+ Overhead comparisons. CBT exhibited the lowest overhead percen-
tage, even less than PIM-SM with shared trees. PIM-DM was shown
to have more than double the overhead of PIM-SM with SPTs due to
the periodic broadcasting & pruning.
The Harris simulations [10] measured the average number of rout-
ing table entries required at each router for CBT, PIM-DM, PIM-
SM with SPTs, and PIM-SM with shared trees. The following param-
eters were used in the simulations:
+ N = number of active multicast groups in the network.
Expires August 9th, 1996 [Page 17]
INTERNET-DRAFT CBT Multicast Architecture February 1996
+ n = average number of senders to a group.
+ b = fraction of groups moving to source tree in PIM-SM.
+ c = average percentage of routers on a shared tree for a
group (or source tree for a particular sender).
+ s = average percentage of routers on a source-based tree for
a group (but not on shared tree).
+ d = average number of hops from a host to the RP.
+ r = total number of routers in the network.
The following results were calculated, given b = 1, c = 0.5,
s = 0.25, and d/r = 0.05.
|--------------|----------------------------------------------------|
| | Group size parameters |
| |----------------------------------------------------|
| Protocol | N = 1000 | N = 1000 | N = 20,000 | N = 20,000 |
| | n = 10 | n = 200 | n = 10 | n = 200 |
=====================================================================
| | | | | |
| CBT | 500 | 500 | 10,000 | 10,000 |
---------------------------------------------------------------------
| | | | | |
| PIM-Dense | 10,000 | 200,000 | 200,000 | 4,000,000 |
---------------------------------------------------------------------
| PIM-Sparse | | | | |
| with SPT | 8000 | 150,500 | 160,000 | 3,010,000 |
---------------------------------------------------------------------
| PIM-Sparse, | | | | |
| shared tree | 1000 | 1,500 | 20,000 | 210,000 |
---------------------------------------------------------------------
Figure 4: Comparison of Router State Requirements
+ Complexity comparisons. Protocol complexity, protocol traffic
overhead, and routing table size were considered here. CBT was
found to be considerably simpler than all other schemes, on all
counts (footnote).
_________________________
The comparisons carried out were subjective.
Expires August 9th, 1996 [Page 18]
INTERNET-DRAFT CBT Multicast Architecture February 1996
5.1. Traffic Concentration
One of the criticisms leveled at shared trees is that traffic is
aggregated onto a smaller subset of network links, than with source
tree protocols.
It has been shown that if shared trees, spanning a large number of
receivers, with some (not insignificant proportion of these being
also senders), if such a shared tree has only a small number of cores
(RPs), traffic concentration is a problem on the core incident links,
and for the core itself. The Harris simulation results [10] suggest
increasing the number of cores as group size increases, thereby
largely eliminating the problem.
5.2. Shared Trees and Load Balancing
An important question raised in the SPT vs. CBT debate is: how effec-
tively can load sharing be achieved by the different schemes? The
scope of ability for DVMRP to do load-sharing is limited primarily by
its forwarding algorithm (RPF); this allows for only single-path
routing.
With shared tree schemes however, each receiver can choose which of
the small selection of cores it wishes to join. Cores and on-tree
nodes can be configured to accept only a certain number of joins,
forcing a receiver to join via a different path. This flexibility
gives shared tree schemes the ability to achieve load balancing.
In general, spread over all groups, CBT has the ability to randomize
the group set over different trees (spanning different links around
the centre of the network), something that would not seem possible
under SPT schemes.
6. Conclusions
In comparing CBT with the other shared tree architecture, PIM, CBT
was found to be more favourable in terms of scalability, complexity,
and overhead. Other characteristics were found to be very similar.
When comparing SPTs with shared trees, we find that the end-to-end
delays of shared trees are usually acceptable, and can be improved by
strategic core placement. Routing table size is another important
factor in support of shared trees, as figures 1 and 4 clearly illus-
trate.
Expires August 9th, 1996 [Page 19]
INTERNET-DRAFT CBT Multicast Architecture February 1996
Shared trees also have more potential to load balance traffic across
different links or trees. In the absence of multi-path multicast
routing, this does not seem possible with source-based tree tech-
niques.
Acknowledgements
Special thanks goes to Paul Francis, NTT Japan, for the original
brainstorming sessions that brought about this work.
Thanks also to Bibb Cain et al. (Harris Corporation) for allowing the
publication of their simulation results, and the duplication of fig-
ures 3 and 4.
Ken Carlberg (SAIC) reviewed the text, and provided useful feedback
and suggestions.
I would also like to thank the participants of the IETF IDMR working
group meetings for their general constructive comments and sugges-
tions since the inception of CBT.
Expires August 9th, 1996 [Page 20]
INTERNET-DRAFT CBT Multicast Architecture February 1996
APPENDIX
The following formulae were used by Harris Corp. in calculating the
values in figure 4. The meaning of the formulae arguments precedes
figure 4.
+ average no. of entries in each PIM-DM router is approximated by:
T(PIM-DM) ~ N * n
+ average no. of entries a router maintains is the likelihood, c,
that the router will be on a shared tree, times the total
number, N, of shared trees:
T(CBT) = N * c
+ average no. of entries a router maintains due to each src based
tree is the average no. of hops, d, from a host to the RP,
divided by the number of routers, r, in the network:
T(PIM-SM, shared tree) = N * c + N * n * d/r
+ average no. of routing table entries for PIM-SM with some groups
setting up source-based trees:
T(PIM, SPT) = N * [B * n + 1] * c + b * n * s
For more detailed information on these formulae, refer to [10].
References
[1] MBONE, The Multicast BackbONE; M. Macedonia and D. Brutzman;
available from http://www.cs.ucl.ac.uk/mice/mbone_review.html.
[2] Core Based Trees (CBT) Multicast: Protocol Specification; A. J.
Ballardie, N. Jain, and S. Reeve;
ftp://cs.ucl.ac.uk/darpa/IDMR/draft-ietf-idmr-cbt-spec-04.txt.
[3] A. J. Ballardie. Scalable Multicast Key Distribution
(ftp://cs.ucl.ac.uk/darpa/IDMR/draft-ietf-idmr-mkd-01.{ps,txt}). Work-
ing draft, 1995.
[4] DVMRP. Described in "Multicast Routing in a Datagram Internet-
work", S. Deering, PhD Thesis, 1990. Available via anonymous ftp from:
Expires August 9th, 1996 [Page 21]
INTERNET-DRAFT CBT Multicast Architecture February 1996
gregorio.stanford.edu:vmtp/sd-thesis.ps.
[5] W. Fenner. Internet Group Management Protocol, version 2 (IGMPv2),
(draft-idmr-igmp-v2-01.txt). Working draft.
[6] First IETF Internet Audiocast; S. Casner, S. Deering; In proceed-
ings of ACM Sigcomm'92.
[7] D. Estrin et al. USC/ISI, Work in progress.
(http://netweb.usc.edu/pim/).
[8] J. Moy. Multicast Routing Extensions to OSPF. Communications of
the ACM, 37(8): 61-66, August 1994.
[9] Y.K. Dalal and R.M. Metcalfe. Reverse path forwarding of broad-
cast packets. Communications of the ACM, 21(12):1040--1048, 1978.
[10] T. Billhartz, J. Bibb Cain, E. Farrey-Goudreau, and D. Feig.
Performance and Resource Cost Comparisons of Multicast Routing Algo-
rithms for Distributed Interactive Simulation Applications; a report
prepared for NRL ****, July 1995.
[11] A. J. Ballardie. Defining a Level-1/Level-2 Interface in the
Presence of Multiple Protocols. Working draft, January 1996.
ftp://cs.ucl.ac.uk/darpa/IDMR/draft-ietf-idmr-hier-intf-00.txt
[12] D. Wall. Mechanisms for Broadcast and Selective Broadcast. PhD
thesis, Stanford University, June 1980. Technical Report #90.
[13] M. Handley, J. Crowcroft, I. Wakeman. Hierarchical Rendezvous
Point proposal, work in progress.
(http://www.cs.ucl.ac.uk/staff/M.Handley/hpim.ps) and
(ftp://cs.ucl.ac.uk/darpa/IDMR/IETF-DEC95/hpim-slides.ps).
[14] A. J. Ballardie. "A New Approach to Multicast Communication in a
Datagram Internetwork", PhD Thesis, 1995. Available via anonymous ftp
from: cs.ucl.ac.uk:darpa/IDMR/ballardie-thesis.ps.Z.
[15] D. J. Hook, J. 0. Calvin, M. K. Newton, D. A. Fuscio. "An
Approach to DIS Scalability", Eleventh DIS Workshop, Orlando, Florida,
Sept. 26th-30th, 1994, pp 94-141.
Expires August 9th, 1996 [Page 22]
INTERNET-DRAFT CBT Multicast Architecture February 1996
Author's Address:
Tony Ballardie,
Department of Computer Science,
University College London,
Gower Street,
London, WC1E 6BT,
ENGLAND, U.K.
Tel: ++44 (0)71 419 3462
e-mail: A.Ballardie@cs.ucl.ac.uk
Expires August 9th, 1996 [Page 23]