draft-ietf-idmr-igmp-sparse-00

IDMR Working Group                                     Steve Deering
INTERNET-DRAFT                                            Xerox PARC
Expires April 1993                                    Deborah Estrin
<draft-ietf-idmr-igmp-sparse-00.txt>                         USC/ISI
                                                      Dino Farinacci
                                                       cisco Systems
                                                        Van Jacobsen
                                                                 LBL
                                                    October 18, 1993


    IGMP Router Extensions for Routing to Sparse Multicast-Groups


Status of this Memo

   This document is an Internet Draft.  Internet Drafts are working
   documents of the Internet Engineering Task Force (IETF), its Areas,
   and its Working Groups. Note that other groups may also distribute
   working documents as Internet Drafts).

   Internet Drafts are draft documents valid for a maximum of six
   months. Internet Drafts may be updated, replaced, or obsoleted by
   other documents at any time.  It is not appropriate to use Internet
   Drafts as reference material or to cite them other than as a "working
   draft" or "work in progress."

   Please check the I-D abstract listing contained in each Internet
   Draft directory to learn the current status of this or any other
   Internet Draft.


**************************************************************************
*********************P*L*E*A*S*E*****R*E*A*D******************************

***October 18th version solves the problem of deadtimes between time when
receiver DR sets up S,G and when the associated join is processed all the
way upstream (the problem was that as soon as S,G is set up then packets
arriving on *,G would get dropped because the incoming interface does not
match). The fix is that you associate a bit with the S,G entry and when
it is cleared, packets that do not match the incoming interface are
checked agains *,G before being dropped. If they match *,G, they are
forwarded accordingly. Once a router sees a packet that matches S,G (both
longest match for source and incoming interface) it sets the bit
associated with S,G and from then on data packets must match the incoming
interface for S,G or be dropped.  A DR waits to send a prune up the
RP,tree until the SPT bit for the S,G entry is set.

***DISCLAIMER: In my usual rush to get a revised version of this document
to the IDMR list, Deering, Farinacci, and Jacobson did not get the chance to
review the recent detailed  changes to the document. I will be out of
town and largely off-the-net from October 18 until IETF. please send
comments to the list anyway and hopefully someone will be able to respond
in a timely manner. At the very least, we will collect them at IETF and
respond to them in real time.

***At the Amsterdam IETF we thought that we could eliminate S,G state on
the shared tree. After chasing our tails for a while we realized that in
order to maintain our ability to do an incoming interface check on data
packets, to detect multicast loops, we needed S,G specific state "offtree" of the
shared tree (to use CBT terminology).  However, to deal with sparse
traffic and with data packets sent before the S,G state is established we
support encapsulation of data packets in registers so that it is possible
that the S,G state might never be set up if it is not deemed necessary
(data packets are sporadic and few) and that at least data packets do not
need to be dropped (or buffered) until the S,G state is setup.

***This brings up an important point, that we believe it critical to
do incoming interface checks on all multicast data packets because of the potentially
severe consequences of looping multicast packets. Any multicast protocol
that we design should have this capability.

***The biggest open issue with respect to this scheme is the need to
introduce an aggregation mechanism for S,G state and messages. Van has
proposed something called proxy. It looks doubtful that something will
get written up by IETF, but it will be the primary agenda item
immediately after.

***The major changes to this document since the last release is that we
1. added back the S,G state upstream of the RP on the shared tree.
2. added a mechanism to disambiguate which RP is being used in the
multiple RP case (see additional check added to ESL-Join  processing
and additional check added to RP-reachability message processing).
3. Data packets travel encapsulated in register packets until the state
is established for S,G (e.g. the RPs' join propagates upstream to the
first hop router)
4. Register packets are sent unicast to the RP who sets up S,G and sends
joins upstream towards S to set up S,G state between the source and RP.
5. All *,G entries have as their incoming interface the interface used to
reach the RP.
6. Added option for Register and RP-reachability messages to carry
source, mask information downstream so that S,G entries can be set up
with appropriate (subnet) mask information.


***Note:  The name ESL is not perfect; at least I, DE do not think so.
This is particularly true in light of the dense mode scheme which will
really be part of the same protocol. But to avoid changing the name
repeatedly we are sticking with ESL and down the rode when the
protocols themselves stabilize, we will reopen the question of names...
**************************************************************************

1. Introduction and Motivation

This document describes a mechanism for efficiently routing to sparse
multicast groups that span wide-area (and inter-domain) internets. The
implementation of our approach is based on extensions to IGMP[RFC1112]
and makes use of explicit source lists. We refer to the scheme as ESL.

The mechanism proposed here complements existing multicast routing
mechanisms such as those implemented in MOSPF and DVMRP. These traditional
multicast schemes are well suited for use within local or wide area regions
where a group is widely represented. However, when group members, and
senders to those group members, are distributed SPARSELY across a wide
area, these schemes are not efficient; data packets (in the case of DVMRP)
or membership report information (in the case of MOSPF) are sent over many
links that do NOT lead to receivers or senders, respectively. The Explicit
Source List (ESL) extensions to IGMP, proposed here, efficiently establish
multicast distribution trees to reach members of a multicast group in
regions where members are NOT densely represented.

Stated simply, when group members densely populate the internet, it is
efficient to assume that most networks or subnetworks contain members,
and to prune off the exception networks explicitly. In contrast, when
group members are sparse, it is efficient to assume that most networks or
subnetworks do NOT contain members, and to join on the exception networks
explicitly. Thus the basis of the scheme described in this document is an
explicit joining mechanism. In another document, in preparation, we will
describe a companion multicast routing protocol for dense groups that is
based  on implicit joining and explicit pruning.

The Core Based Tree (CBT) protocol supports sparse multicast groups
with a shared (center-based) tree.[CBT] In contrast, the ESL protocol
proposed here supports both center and source-specific distribution trees
in order to provide  higher quality data distribution, when needed.

In the remainder of this section we enumerate our design goals and then
present necessary background on existing Reverse Path Multicasting (RPM)
mechanisms. Section 2 summarizes basic protocol operation and limitations.
Section 3 describes the protocol in detail (including packet formats).
Section 4 addresses robustness and Section 5 and 6 addresses interoperation
with networks that do not implement ESL-multicast. Section 7 briefly
compares our approach to CBT[cbt93] and addresses several open design
issues.

1.1 Design Goals

We had several design objectives in mind when designing this protocol:

1.1.1  Efficient Sparse Group Support

Our primary design goal is efficient support for sparsely distributed
wide-area multicast groups ("skinny trees"). We define a sparse group
as one in which a) the number of networks/domains with members is
significantly smaller than number of networks/domains in the Internet,
so that traditional RPM or Link-State style multicast (e.g., MOSPF) is
inefficient; b) group members span an area that is too large/wide
to rely on scope control; and c) the internetwork spanned by the group is
not sufficiently resource rich to ignore the overhead of current schemes.
Sparse groups are not necessarily "small"; therefore we must
support dynamic groups with large numbers of receivers.


1.1.2 High Quality Data Distribution

We wish to support low-delay data distribution when needed by the
application. In particular, we avoid IMPOSING a single shared tree in which
data packets are forwarded to receivers along a common tree, independent of
their source. Source-specific trees are superior when (a) multiple sources
send data simultaneously and would experience poor service when the traffic
is all concentrated on a single shared tree, or (b) the path lengths
between sources and destinations in the shortest path tree (SPTs) are
significantly shorter than in the shared tree. In particular, to support
low-delay distribution for "continuous" media such as voice and video, it
is desirable (and perhaps essential) to avoid concentrating the traffic of
all sources onto a common distribution tree; it is preferable to spread the
traffic load and have the data travel over source-specific trees. Moreover,
for all types of traffic, if one wants to minimize path length, data
packets should travel along a shortest path distribution tree rooted in the
source; i.e., data packets are forwarded based on the {Multicast-Address,
Source} tuple (as in MOSPF).

For some applications or contexts group trees are appropriate; for example,
resource-discovery applications where large numbers of sources transmit
packets intermittently. However, shared trees should not be imposed as the
only delivery option. In the scheme presented here a shared tree is
maintained as a rendezvous mechanism for new receivers and sources;
however, in steady state, data can be delivered over a shortest path from
receiver to sender, or over the shared tree. We use the term rendezvous
points (RPs) in this document because although similar, the role and
function of an RP is different from a Core router in the CBT scheme.

The protocol described here is based on Reverse Path Forwarding. All RPF
schemes construct "reverse shortest path" trees. When unicast routes are
symmetric (i.e., the shortest path from a source to receiver is the same
as the shortest path from the receiver to the source), reverse shortest
path trees will provide equivalent paths to forward shortest path trees.
However, when unicast routes are not symmetric, the reverse shortest path
may be longer than the forward shortest path. Nevertheless, the  reverse
shortest path tree delays will still be superior, in general, to the use
of a shared  tree.


1.1.3 Routing Protocol Independent

The protocol should rely on existing unicast routing functionality to
adapt to topology changes, but at the same time be independent of the
particular protocol employed. This is achievable if the multicast
protocol makes use of the unicast routing tables, independent of how
those tables are computed.

1.1.4 Interoperability

We need interoperability with traditional RPM and link-state multicast
routing, both intra- and inter-domain. For example, a single conversation
should allow intra-domain distribution to be established by IP-style
multicast (e.g. MOSPF) and inter-domain distribution established by ESL;
and to allow some inter-domain distribution to be established by ESL, and
some by another inter-domain multicast approach designed for more-densely
distributed groups. (This sidesteps the question of when to use which type
of multicast routing approach.). However, to interoperate with some
existing IGPs it will be necessary to impose some additional protocol or
configuration overhead.

In support of interoperation with IP multicast, AND in support of groups
with very large numbers of receivers, we also wish to maintain the logical
separation of roles between receivers and senders. We may introduce some
optimization to support senders that are also receivers (this is often
appropriate to the application), but we do not want to impose the dual
role/symmetry.


1.2. Reverse Path Forwarding

Before proceeding with a description of ESL we briefly describe existing
RPM mechanisms. This provides more detailed motivation for the extensions
proposed here, and provides background as to the existing mechanisms with
which ESL must interoperate.


1.2.1 RPM mechanisms

Reverse Path Forwarding (RPF) is a technique used to forward multicast
datagrams. A router will forward a multicast datagram, from a given source,
if it was received on the same interface that it uses to forward unicast
datagrams to that source. Hence, the reverse path is interrogated before
forwarding is performed. RPF forwards the data packet out all interfaces
except the incoming. Reverse Path Broadcasting (RPB) eliminates some of the
duplicate packets generated by RPF by only sending packets out on a subset
of the outgoing interfaces; this subset is based on a local database of
parent-child information obtained from the unicast routing protocol.
Truncated RPF or RPB detects leaf networks without members and stops
forwarding multicast packets for that group. Reverse Path Multicasting
(RPM) has mechanisms for detecting and distributing prune information
upstream of the truncated leaf networks to stop distribution of multicast
packets to parts of the network that do not have members. RPM schemes
periodically flush this prune information and revert back to RPF or RPB
behavior. In addition, explicit graft messages may be used to undo pruning
and thereby join new members into the distribution tree more quickly.

1.2.2 Router to Host interaction mechanisms

Hosts inform routers which multicast groups they are members of. Routers
use this information to tell other routers that group members are (MOSPF),
or are not (DVMRP), present. IGMP is the protocol mechanism for Router to
Host interaction for IP multicasting. In particular, Routers send IGMP Host
Query packets periodically to ask hosts for group membership information.
Hosts reply with IGMP Host Report packets for each multicast group they are
a member of; however when a host hears a membership join report from
another host on the same LAN, it will supress its join to avoid duplicates
[IGMP]. Hosts do not have to inform routers explicitly when they want to
send a multicast datagram.

ESL uses this same mechanism to identify the location of group members.
ESL will also use a router-to-router IGMP Query packet so adjacent
multicast routers can be identified. See section 5 for details.


1.2.3 Limitations of Existing Multicast Routing Mechanisms

Existing multicast routing mechanisms work efficiently when most networks
have local members and therefore the default RPM treatment of data packets
(as in DVMRP), or flooding of link state advertisements with local
membership information (as in MOSPF) is appropriate. In this context the
lack of local members is best treated as an exception by issuing a "Prune"
message. However when most networks do NOT have local members there is
significant overhead associated with these schemes. One of the major
incentives to use multicast is efficient bandwidth usage (otherwise
multicast routing support would not be needed to begin with). This is most
critical in wide area and multi-domain internetworks where resources are
not as uniformly available as in the local and campus network context.

1.2.4 Scope Control

One way of limiting the overhead of multicast is to define a maximum
number of hops that all messages will traverse. In this way the multicast
group information will only be distributed within a limited region.
This is a perfect mechanism for local groups. However, for groups that
span wider areas, the scope would have to be set so high that the
reverse path multicasting of data packets, or the flooding of membership
information,  would once again consume excessive resources.


1.3 Directly connected ESL routers

A directly connected router is one that either a) shares a physical network
with a receiver host (i.e., are both physically connected to a common
multi-access network), or b) is in the same domain as a receiver and some
other multicast routing protocol within the domain is used to distribute
multicast packets and to signal membership. A Directly connected ESL router
is the last hop ESL router, from the perspective of a receiver; it is the
first hop ESL router from the perspective of a source.



2. Overview of the Explicit Source List (ESL)  protocol design


In the remainder of this document we describe a multicast
routing mechanism for sparse groups which can be realized with
relatively simple extensions to IGMP[RFC112].

2.1 ESL

We introduce two new IGMP message types to establish distribution trees
between sources and receivers (group members); routers send ESL-Join
messages upstream towards sources and routers send ESL-Register messages
downstream from sources to the RPs. An ESL-Join message contains both a
join and prune list; the former enumerates the sources from which the
downstream receivers wish to receive packets (via this router, i.e., this
upstream router) and the latter indicates the sources from which the
downstream receivers do not expect to receive packets (via this route). An
ESL-Register message identifies the group and is sent directly to each of
the RPs associated with the group. The  ESL messages are sent to unicast
addresses using raw IP.

We can summarize the operation of the ESL scheme as follows.  One or
more Rendezvous Points (RPs) are used INITIALLY to propagate data
packets from sources to receivers. An RP may be an ESL-speaking router
that is close to one of the members of the group, or it may be some
other host or router in the network.  A sparse mode group, i.e., one
that the receiver's directly connected ESL router will join using ESL.
is identified by the presence of  RP address(es) associated with the
group in question. The mapping information may be configured or may be
learned through another protocol mechanism.

When sources start sending to a multicast group, the first hop
ESL-router sends an ESL-Register message to the RP(s) for that group.
When a receiver joins an ESL multicast group, its first hop ESL router
sends an ESL-Join message towards one of the RPs. If source-specific
distribution trees are desired, the first hop ESL router for each
member (receiver) eventually joins the source-rooted distribution tree
for each source by sending an ESL-Join message towards the source and
after data packets are received on the new path, it sends
an IGMP prune message toward the RP (assuming these represent
different uplinks/branches).  The state maintained in routers is the
same as the forwarding information that is currently maintained by
routers running existing IP multicast protocols such as MOSPF, i.e.,
source (S), multicast address (G), outgoing interface (oif), incoming
interface (iif). We refer to this forwarding information as the
multicast forwarding entry for (S,G).  The  ESL messages sent upstream
by receivers include an explicit list of the sources known to the
downstream receivers (thus the name).


2.2 Design Tradeoffs

Referring back to our design objectives, we selected the ESL approach over
an alternative, source-initiated approach, where each receiver contacts
each source and each source sends out a message to the routers to install a
distribution tree from that source to all the listed receivers[ST-II]. The
source-initiated approach is less desirable because it a) requires each
source to deal with join requests from each receiver (or each receiver
aggregate such as a domain), and b) requires that member-domains be listed
explicitly. The last concern is the most significant because it means that
the source-initiated scheme's overhead increases with the number of
receivers (receiver domains) in the group.

One limitation of the ESL approach, mentioned earlier, occurs when there
are asymmetric paths. This occurs when the unicast path from a given
receiver is different than the multicast path from the source to that
receiver. Since RPM is used, the path chosen is the one from receiver to
source. It is our opinion that this route asymmetry problem is NOT
critical. In the future, if routing protocols become more load-sensitive,
and as a result more routes are asymmetric due to asymmetric traffic
loading, we may need to rely on other aspects of the adaptive routing
service to address this problem.\Footnote{For example, if unicast routing
could provide a special QoS route whose characteristic was that it
represented the preferred path FROM the indicated destination, instead of
TO the destination, then the  ESL messages used in our protocol could be
sent using that QoS, and the deficiency described here would be avoided. }

ESL avoids explicit enumeration of receivers, but does require enumeration
of sources. If there are very large numbers of sources sending to a group
but the sources' average data rates are low, then the group can be
supported with a shared tree instead which has less per-source overhead. If
shortest path trees are used then when the number of sources grows very
large, some form of aggregation or proxy mechanism will be needed; see
section 6. We selected this tradeoff because in many existing and
anticipated applications, the number of receivers is much larger than the
number of sources. And when the number of sources is very large, the
average data rate tends to be very low (e.g. resource discovery).


3. Protocol Description

Below is a description of the protocol steps and messages.

3.1 Overview

ESL-Join messages traveling up from receivers to the RP create a
RP-rooted distribution tree that is used to distribute data packets from
new sources to all receivers and from all sources to new receivers.
ESL-Register messages traveling from sources to the RP causes the RP to
send  join messages upstream  to the sources and thereby create
distribution paths from the sources to the RP-rooted distribution tree.
In this way, a shared tree is formed between the sources and the RP
and the RP  and receivers.

If shortest path, source-specific, trees are to be used, then data packets
>from new sources will trigger ESL-Join messages to travel up from receivers
via their shortest paths to sources.

            |                                          |
            |** MR-1 ************ MR-2 ******** MR-3 --|
            |      .                           . *@    |-- Receiver-1-Ga
  Source-1 -|       .                         .  *@    |
            |        .                       .   *@    |
            |         .                     .    *@
            |           RP .................    MR-8
                      .                     .    *@
                     .                       .   *@
                    .                         .  *@
            |      .                           . *@    |
            |@@ MR-4 @@@@@@@@@@@@ MR-5 @@@@@@@@ MR-6 --|
            |      \                           /       |-- Receiver-2-Ga
  Source-2 -|       \                         /        |
            |        \                       /         |
            |         --------- MR-7 --------


... RP-rooted distribution tree
*** Source-1 based distribution tree
@@@ Source-2 based distribution tree
MR  Multicast Router

3.2   Receiver/Upstream messages

This section describes the sequence of messages sent  as receivers
join a group, as well as the actions taken to establish distribution
paths to the receivers.

1) Host sends IGMP-Report message identifying a particular group, G,
in response to a directly-connected Router's IGMP-Query message. From
this point on we refer to such a host as a receiver, R, (or member) of
the group G.

2) When a designated  router (DR) receives a report for a new group
G it checks to see if it has RP address(es) associated with G. The
mechanism for learning this  mapping of G to RP(s) is
somewhat orthogonal to the specification of this protocol;
however, we require some mechanism in order for the protocol to work.
At the very least this information must be manually configurable. In
addition, as discussed in Section 7, we propose the use of a new
IGMP-RP-report message that would allow hosts to inform their
directly-connected ESL routers of G,RP(s) mappings. This is important
for dynamic groups where hosts participate in special applications
to advertise and learn of multicast addresses and their associated
RP(s)

A DR will identify a new group (i.e., one for which it has
no existing multicast entries) as needing ESL support by checking if
there exists an RP mapping. If there is no RP mapping provided in IGMP
report messages, and there is no mapping provided in the appropriate
configuration file, then the router will assume that the group is NOT
to be supported with ESL. Even when a group has an associated RP, it
may be that some outgoing and incoming interfaces do not require ESL,
but are handled using a dense mode scheme such as MOSPF, DVMRP, or
Dense mode ESL. In this case the router will flag individual
interfaces as dense or sparse mode, to allow differential treatment of
different interfaces. For the sake of clarity, we will ignore these
added complexities throughout most of the protocol description.

For the remainder of this description we will also assume a single RP just
for the sake of clarity. We describe the direct extensibility to operation
with multiple RPs later in the document.

3) The DR creates a multicast forwarding cache for (*,G) . The RP
address is included in a special record in the forwarding entry, so
that it will be included in  upstream join messages. The outgoing
interface is set to that over which  the IGMP report was received from
the new member. The incoming interface is  set to the interface used
to send unicast packets to the RP. A wildcard  (WC) bit is
associated with this entry.

The DR sets an RP-timer for this entry. The timer is reset each time  an
RP-reachable message is received for *,G (see 3.3).


4) The router creates an ESL-Join message with the RP address in its
join list with  the WC bit  set; nothing is listed in its
prune list.  The WC bit indicates that the receiver expects to receive
packets from new sources via this path and therefore upstream routers
should create or add to *,G forwarding entries.  The WC bit also indicates
that the particular IP address is being used as an RP and that the
router with that address should send an RP-reachability message
downstream; these messages are effectively sent periodically in
response to the receipt of periodic join messages. The message is sent
as an IP packet addressed to the next hop router upstream towards the
RP; the payload contains the IGMP information Multicast-Address=G,
ESL-join={WCbit}, ESL-prune=NULL.

5) Each upstream router creates or updates its multicast forwarding
entry for (*,G) when it receives an ESL-Join with the WC bit set.  The
interface on which the ESL-Join message arrived is added to the list
of outgoing interfaces for (*,G).  As a result each upstream router
between the receiver and the RP sends an ESL-Join message in which the
join list includes the RP and the WC bit.   The
messages are sent using IP addressed to the next hop router used to
reach the RP. The payload IGMP packet contains Multicast-Address=G,
ESL-join={WCbit}, ESL-prune=NULL.

The RP recognizes its own address and does not attempt to send join
messages for this entry upstream. Because the RP recognizes itself as the
RP it knows to send RP-reachability messages in response to the periodic
join messages received from downstream. In addition, the incoming
interface in the RP's *,G entry is set to null.

6) When an ESL-router has directly-connected members that want to join the
group with shortest paths, the router notices data
packets for G that are NOT sourced by an address for which it has a
multicast forwarding entry. The router initiates a new multicast
forwarding entry for (Sn,G), clears the "SPT-bit" for that entry,  and sets
a timer for the S,G entry.

The router also triggers the generation of IGMP messages upstream. For
example, an ESL-Join message will be sent upstream to the best next
hop towards the new source, Sn, with Sn in the join list:
Multicast-Address=G, ESL-join={Sn}, ESL-prune=NULL.  The ESL-Join
message that gets sent upstream toward the RP will have Sn in the
prune list (at the point where the two upstream paths diverge) when the
SPT bit on the DR's S,G entry is set:
Multicast-Address=G, ESL-join={RP,*}, ESL-prune={Sn}.

In order to
avoid missing data packets the DR should send the ESL message
toward the new Sn before sending the prune message toward the RP. The
DR knows it is time to send the prune when it starts receiving
new packets from Sn on the interface used to reach Sn.  Therefore Sn is
not included in the prune list sent toward the RP until the SPT bit is
set for the S,G  entry.

When the Sn,G  entry is created, the outgoing interface list is copied
>from *,G. In this way when a data packet from Sn arrives and matches on
this entry, all receivers will continue to receive sources packets along
this path unless and until the receivers choose to prune themselves.

Note that a DR may adopt a policy of not setting up a S,G entry
(and therefore not sending an ESL-Join message toward the source)
until it has received m data packets from the source within some
interval of n seconds. This would eliminate the overhead of S,G state
upstream when small numbers of packets are sent sporadically. However,
data packets distributed in this manner may be delivered over the
suboptimal paths of the shared RP tree.

The DR may also choose to remain on the RP-distribution tree
indefinitely instead of moving to the shortest path tree.

7) In the steady state each router sends periodic refreshes of ESL messages
upstream to each of the next hop routers that is en route to each source,
S, for which it has a multicast forwarding entry (S,G); as well as for
the RP listed in the (*,G) entry. These messages are
sent periodically to capture state, topology, and membership changes. An
ESL message is also sent on an event-triggered basis each time a new
forwarding entry is established for some new (Sn,G) (note that some damping
function may be applied, e.g., a merge time). Optionally the ESL message
could contain only the incremental information about the new source and
only be sent to the next hop toward that source. ESL messages are not
sent reliably; lost packets will be recovered from at the next periodic
refresh time.

The  join list in an ESL-Join message sent to a neighboring router, X,
includes an address for each source, S, for which:
        1) there is a multicast forwarding entry (S,G), or S is listed as the
        RP-entry for (*,G); AND,

        2) X is the next-hop router  used to send unicast packets to S
        (or if S is a directly connected host, then include S if X is
        the DR for S's LAN), AND,

        3) the outgoing interface list in the forwarding entry is NOT null.


The prune list in an ESL-Join message sent to a neighboring router Y,
includes an address for each  source, S, for which:
        1) there is a S,G multicast forwarding entry with, the SPT bit set,  a
        null outgoing   interface list and Y is the next hop to reach S (or, if
        S is a  directly connected host, then include S if Y is the DR for S's
        LAN), and

In addition, if Y is the next hop used to reach the RP, the prune list also
includes an address for each source S for which:
        1) there is a S,G multicast forwarding entry, the SPT bit is set,  and
        Y is not the next hop  used to reach S.

ESL-Join messages are sent periodically and the join and
prune lists are populated as specified above. In addition four
events will trigger ESL-Join messages:
        1) receipt of an IGMP report message for a new group (i.e., one for which
        the receiving router does not have any S,G or *,G entries) will trigger an
        ESL-Join message toward the RP with the RP address and WC bit set in the
        join list, and

        2) receipt of an ESL-Join message for an S,G pair (including *,G) for
        which there is no current forwarding entries, will trigger an ESL-Join
        message toward S (or RP) with S (or RP with WC bit set) in the join list.

        3) receipt of packet on the NEW S,G entry over the appropriate incoming
        interface triggers a) setting  of the SPT bit,  and b) sending  a prune
        message up the  RP tree.

        4) when the outgoing interface list becomes null, indicating no more
        downstream receivers, a prune is sent upstream. We do not trigger prunes
        based on data packets. Data packets that arrive on the wrong incoming
        interface are silently dropped.

Note that each source address listed in an ESL may be a specific IP
address, or may indicate a subnet or a general aggregate. To support
this generality in the future each ESL entry is represented by a {mask length,
Address} pair.  The distribution of mask information is described in
Section 3.3 where reachability messages are described.
The potential for using proxy or aggregate information is described
briefly in Section 7.


8) Each router that receives an  ESL message processes it as follows:

        a) notes the interface on which the ESL-Join message arrived, call it I.

        b) if one of the  Si  has has the WC bit set, and a *,G forwarding entry
        already exists, add I to the *,G forwarding entry and set the timer.
        If I is a new interface  in the *,G forwarding entry add I to all other
        existing Si,G forwarding  entries also, with the exception of those Si
        listed in the prune list. If the value of Si with the WC bit set is
        different from the  RP-entry listed in the existing *,G forwarding entry
        then:

                i. if Si is greater than the listed RP-entry value,
                set RP-entry to Si,

                ii. if Si is less than the listed RP-entry value,
                leave the RP-entry as is.  Do not reset the RP-entry
                timer. (These steps are taken so that in the case of
                multiple RPs, loops can be avoided in the RP-based
                shared tree. This is achieved by making sure that
                within any branch of the shared tree, routers will
                converge on using a single RP until it fails.)

        The incoming interface is set to the RPF interface to the RP in the *,G
        forwarding entries.

        c) for any Si without the WC bit set that is included in the ESL-join
        list, for which there is NO existing (Si,G) forwarding entry, the router
        initiates one. The outgoing interface is set to I, and the incoming
        interface is set to the interface used to send unicast packets to Si. IF
        the interface used to reach Si is the same as the outgoing interface being
        built (i.e., the interface on which the ESL-Join  message arrived) this
        represents an error and the join should not be processed.

        d) for any Si, included in the ESL-join list, for which there IS an
        existing (Si,G) forwarding entry, the router adds I to the
        list of outgoing interfaces, IF I is not the same as the
        existing incoming interface; If I is the same as the existing
        incoming interface, the existing incoming
        interface takes precedence and the join is dropped.

        e) for each Si, included in the ESL-prune list, for which
        there is an existing (Si,G) forwarding entry, the router
        deletes I from the list of outgoing
        interfaces. If the router has a current *,G forwarding entry,
        and if an Si,G entry also exists then the
        forwarding entry is maintained for (Si,G) even if its outgoing
        interface list is NULL. If there is no (Si,G) entry, then one
        is created with the outgoing interface list copied from *,G, and
        the interface on which the prune was received is deleted. This
        acts as a negative cache so that packets  from Si are
        not forwarded to the pruning receiver.

9) A timer is maintained for each outgoing interface listed in each S,G
or *,G entry. The timer is set when the interface is added.
The timer is reset each time an ESL-join message is received on that
interface for that forwarding entry (i.e., S,G or *,G).

When a timer expires, the corresponding outgoing interface is deleted
>from the outgoing interface list. When the outgoing interface list is
null a prune message is sent upstream and the entry is deleted after 3
times the refresh period (i.e., 180 seconds).


3.3 Source/Downstream messages

Two types of messages are sent downstream: Registers and RP-reachabilty
messages.


3.3.1 Register messages

1) When a source, S, wishes to send to a multicast group, G, for the
first time, S simply sends a data packet addressed to the group.

2) When a data packet from S addressed to G arrives at the first hop ESL
designated router (DR), and the DR has no current forwarding entry for
(S,G), the router looks up the RP(s) address(es) associated with G.

The RP information may be configured or may be provided by a new
IGMP-RP-report message. If no RP information exists, then the router
assumes the group is handled as a dense group and simply sends the data
packets out all non-incoming interfaces. The RP mapping function is only
performed by the the first ESL router to see the source's packets before
the *,G entry is established; i.e, the mapping is not performed by each ESL
router on a distribution tree. The RP information should be cached for
future use.

3) The router sends an ESL-Register message to the RP. The message
indicates the group for which the  source is registering, and has the WC
bit set. Mask information for the source may be included.

The original data packet is encapsulated inside the Register
packet.

The message is  sent as a unicast packet  to the RP; it is not processed
by  the intermediate routers. If there are multiple RPs associated with
the multicast group, then the source sends a Register message to each of them.

Subsequent data packets sent to the same group will trigger the same
action until an S,G entry is set up in the first hop router in response
to a join message received from downstream.  The RP information should be
cached so that multiple lookups can be avoided for subsequent data
packets sent to the same group.


4) When a router (i.e., the RP) receives a Register message, the
router
        a) decapsulates the data packet, and forwards it according its local
        *,G forwarding entry,  and

        b) sets up an S,G forwarding entry with the outgoing interface list copied
        from the *,G outgoing interface list. The S,G entry is set up using the
        mask information, if provided, in the Register message. A
        timer is set for the S,G entry.


The S,G entry causes the RP to send an ESL-Join message for the
indicated group toward the source of the Register message. The
ESL-Join message includes the source's address and mask information;
note the source here is the source of the Register message, i.e., the
source-host's directly-connected ESL router, NOT the source host
itself.  This message is triggered and processed like any other
ESL-Join message by the intermediate routers, which either create or
augment the S,G forwarding state in exactly the same way as was
described in 3.2: the ESL-Join message's incoming interface is added
to the outgoing interface list, and the incoming interface for the
entry is set to the interface used to reach the source.

Note that an RP may adopt a policy of not setting up a S,G entry (and
therefore not sending an ESL-Join message toward the source) until it has
received m Register messages (with encapsulated data packets) from the
source within som interval of n seconds. This would eliminate the
overhead of S,G state upstream of the RP when small numbers of packets
are sent sporadically. However, data packets distributed in this manner
may be delivered on very suboptimal paths because they travel all the way
to the RP before being multicasted.

5) Once the ESL-Join messages have propagated upstream from the RP, data
packets from the source will follow the S,G distribution path state
established. The packets will travel to the receivers via the
distribution paths established  by the ESL-Join messages sent upstream
>from receivers toward the RP.  Multicast packets will arrive at some
receivers before reaching the RP if  the receivers and the source are
both "upstream" of the RP.

When the receivers initiate shortest-path distribution, additional
outgoing interfaces will be added to the S,G entry and the data
packets will be delivered via the shortest paths to receivers.

6)  Data packets will continue to travel from the source to the RP(s)  in
order to reach new receivers. Similarly, receivers continue to receive
some data packets via the RP tree in order to pick up new senders.
However, when source-specific tree distribution is used,  most data
packets will arrive at receivers over a shortest path
distribution tree.

7) Data packets travel from the source via the reverse shortest path
tree rooted in the source because routers between the source and the
receiver have a multicast forwarding entry for (Sn,G) whose outgoing
interface list includes all the interfaces on which the routers
received ESL-Join messages from downstream receivers.

3.3.2 RP-Reachability Messages

1) A router starts sending periodic "RP-reachability" messages
downstream when:
        (a) it receives an ESL-Join message with its own
        address AND WC-bit set in the join list, and
        (b) the incoming interface on its (*,G) entry is Null.
The first condition is to make sure that it is an RP.
The second condition is to make sure that only the "dominant" RP
will send RP-reachability messages, so the traffic can be minimized.

This obviates the need to do any kind of special configuration of RPs;
any router can be an RP since RP behavior is triggered by the protocol
itself.  A router is responsible for initiating RP-reachability messages
to downstream nodes if it has a *,G entry with a NULL incoming interface.

2) The router sends the periodic RP-reachability messages out all the
outgoing interfaces in the *,G entry. The period for this message is 90
seconds. The messages are addressed to the 224.0.0.1 class D address and the
message contents includes the RP and G  and an optional list of
source, mask information.


2) When a router receives an RP-reachability message for a group G it
must compare the RP address listed in the message to the RP address
listed in the current *,G RP-entry.

If the RP listed in the message is greater than the RP listed in the *,G
RP-entry, and if the next hop used to reach the listed RP is the same as
the next hop used to reach the RP-entry, then the router replaces its
current RP-entry with the RP address from the RP-reachability message

This is necessary to eliminate routing loops that can occur in some
instances when downstream receivers select different upstream RPs and the
RP-centerod distribution trees overlap.

If the RP listed in the message is less than the RP listed in the *,G
RP-entry, OR if the RP-reachability message did not come in on the RPF
interface to the RP listed in the message, then the message is not
forwarded.

In more detail, when a router receives an RP-reachability message it
does the following; assume that router X receives an RP-reachability message
of RP1 from incoming interface I.
        1. Perform RPF check. If I is not the best next hop to RP1, drop this
        RP-reachability message.
        2. Else, If the incoming interface of (*,G) is not NULL and not I,
        drop the RP-reachability message.
        3. If the incoming interface of (*,G) is I,
        compare  RP1 with the address in RP-entry, say RP2.
        If RP1 is larger than RP2, set RP-entry to RP1 and propagate
        the RP-reachability message downstream. Otherwise, drop the
        RP-reachability message.
        4. If the incoming interface of (*,G) is NULL and WC-bit is set
        then this router is currently acting as an RP for G. In this case,
        compare  RP1 with X. If RP1 is larger than X, set RP-entry to RP1, set
        the incoming interface to the RPF interfac used to reach RP1, and
        clear the WC-bit for that router. Also, propagate RP-reachability
        message downstream.
        Otherwise, if RP1 is less than X, drop the RP-reachability message.

4) If there are any *,G  entries the message is forwarded with
the same  class D address out the outgoing interfaces from the
G entries.  If a downstream router does not have any  *,G
entries then the packet is dropped.

When DRs with directly connected group members receive this message
they reset their RP-timers on  the RP-entry in *,G. This allows
group-members' directly-connected ESL routers to  detect when an RP
becomes unreachable and trigger a join toward an  alternate RP, if one
exists.

5) The RP-reachability message may optionally contain Source/Mask
information for (S,G) entries maintained by the RP.
This mask information is optionally obtained via Register
messages sent to the RP by sources' first hop routers. The
masking information can be  used by last-hop ESL routers to
consolidate S,G entries, and  consequently ESL-Join lists
sent upstream.

3.4 Multicast Data Packet Processing.

Data packets are processed in a similar manner to existing multicast
schemes. An incoming interface check is performed and if it fails the
packet is dropped, otherwise the packet is forwarded to all the interfaces
listed in the outgoing interface list (whose timers have not expired).
There are two exception actions that are introduced if packets are to be
delivered continuously, even during the transition from a shared to
shortest path tree. First, when a data packet matches on an S,G entry with
a cleared SPT bit, if the packet does not match the incoming interface for
that entry, then the packet is forwarded according to the *,G entry; i.e.,
it is sent to the outgoing interfaces listed in *,G IF the incoming
interface matches that of the *,G. In addition, when a data packet matches
on an S,G entry with a cleared SPT bit, AND the incoming interface of the
packet matches that of the S,G entry, then the packet is forwarded and the
SPT bit is set for that entry.

Data packets never trigger prunes . Data packets may trigger
actions which in turn trigger prunes. In particular data packets from a
new source can trigger creation of a new S,G forwarding entry. This
causes S to be included in the prune list in a triggered ESL messages toward
the RP; just as it causes S to be included in the join list in a
triggered ESL message toward the source.


3.5 Packet Types

RFC 1112 specifies two types of IGMP packets for hosts and routers
to convey multicast group membership and reachability information.

An IGMP Query packet is transmitted periodically by routers to ask
hosts to report which multicast groups they are members of. An IGMP
Report packet is transmitted by hosts in response to received
Queries advertising group membership.

This document introduces new types of IGMP packets that are used
by ESL routers. The following packet format is used:

       0                   1                   2                   3
       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |Version| Type  |      Code     |           Checksum            |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                         Group Address                         |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

%%DE  Dino, please add Version/Type/TOS/Code that you proposed in comments

      Version
         This memo specifies version 1 of IGMP.  Version 0 is specified
         in RFC-988 and is now obsolete.

      Type
         There are five types of IGMP messages:

            1  = Host Membership Query
            2  = Host Membership Report
            3  = Router DVMRP Messages
            4  = Router ESL Messages

      Code
         Codes for specific message types. Used only by DVMRP and ESL.
         ESL codes are:

             0  = Query
             1  = Register
             2  = Join/Prune
             3  = RP-Reachable
             4  = Assert        dense-mode ESL only
             5  = Mode          dual-mode ESL only
             6  = Mode-Ack      dual-mode ESL only

      Checksum
         The checksum is the 16-bit one's complement of the one's
         complement sum of the entire IGMP message.  For computing
         the checksum, the checksum field is zeroed.

      Group Address
         In a Host Membership Query message, the group address field
         is zeroed when sent, ignored when received.

         In a Host Membership Report message, the group address field
         holds the IP host group address of the group being reported.

         In a Register, Join/Prune, Query, and RP-Reachable message,
         the group address field is zeroed when sent, ignored when
        received.

3.5.1 ESL-Register, ESL-Join, and Assert messages.

The Register, Join/Prune and Assert messages have additional information
appended to the fixed header:


       0                   1                   2                   3
       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |   Reserved    | Maddr Length  |  Addr Length  |  Num groups   |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                   Multicast Group Address-1                   |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |     Number of Join Sources    |    Number of Prune Sources    |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                   Join Source Address-1                       |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                               .                               |
      |                               .                               |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                   Join Source Address-n                       |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                   Prune Source Address-1                      |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                               .                               |
      |                               .                               |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                   Prune Source Address-n                      |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                               .                               |
      |                               .                               |
      |                               .                               |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                   Multicast Group Address-n                   |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |     Number of Join Sources    |    Number of Prune Sources    |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                   Join Source Address-1                       |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                               .                               |
      |                               .                               |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                   Join Source Address-n                       |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                   Prune Source Address-1                      |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                               .                               |
      |                               .                               |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                   Prune Source Address-n                      |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

      Reserved
         Unused field, zeroed when sent, ignored when received.

      Addr Length
         The length in bytes of the encoded source addresses in the
         Join and Prune lists.

      Maddr Length
         The length in bytes of the encoded multicast addresses.

      Num Groups
          The number of multicast group sets contained in the message.

      Multicast group address
          For IP, it is a 4-byte Class D address.

      Number of Join Sources
          Number of join source addresses listed for a given group.

      Join Source Address-1 - n
          This list contains the sources that the sending router will forward
          multicast datagrams for if received on the interface this
          message is sent on. The address 0.0.0.0 indicates a join for
          all sources.

          For a Register message, the source address specifies the
          address(es) of the Rendezvous Point(s).

      Number of Prune Sources
          Number of prune source addresses listed for a given group.

      Prune Source Address-1 - n
          This list contains the sources that the sending router does
          not want to forward multicast datagrams for when received on the
          interface this message is sent on. The address 0.0.0.0
          indicates a prune for all sources.

In Router messages, all source addresses will have the following
format:


        <WC-bit><Mask Length><Address>

        <WC-bit> is a 1 bit value. If 1, packets should propagate to
        this address and a wildcard multicast entry should be built
        with interface information based on the receiving and sending
        interface for Router messages. If 0, the <Address> is a source
        address. The Address should be added to the address list
        associated with the wildcard multicast entry for the group.

        <Mask Length> is 7 bits. The value is the number of contiguous bits
        left justified used as a mask which describes the <Address>.

        <Address> is the length indicated from the "Addr Length" field
        at the beginning of the header. The <Mask Length> must be less than
        or equal to "Addr Length" * 8.

      A source address could be a host IP address:

      <0><32><192.1.1.17>

      A source address could be the RP's IP address:

      <1><32><131.108.13.111>

      A source address could be a subnet address:

      <0><28><192.1.1.16>

      A source address could be a general aggregate:

      <0><16><192.1.0.0>

ESL messages are always sent as unicast IP addressed packets. These
messages are sent towards the direction of the Join and Prune source
addresses. This is achieved by doing a route lookup for each source
address and IP addressing it to the next-hop router along the path to
the source. Each router along the way does this until the destination
is reached.

%df - is this still true?
Router messages may be data linked multicast when transmitted on
subnetworks that support multicast.

3.5.2 RP-reachability message

The RP-reachable packet format is as follows:

       0                   1                   2                   3
       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |Version| Type  |     Code      |           Checksum            |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                         Group Address                         |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                          RP Address                           |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                       Number of Entries                       |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |  Mask Length  |        Address ...                            |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                                      .
                                      .
                                      .


      Group Address
          Group address associated with RP.

      RP Address
          The Rendezvous Point IP address of the sender.

      Number of Entries
        The number of {Mask Length, Address} entries in the message. This
        value is 0 if no addresses are included.

      Mask Length
        Number of bits in the network mask for the corresponding address.

      Address
        4-byte IP address. The corresponding zero bits in the mask should
        be set to zero in the address.


Each RP will send RP-Reachable messages to all routers on its
distribution tree for a particular group. These messages are sent
so routers can detect that an RP is unreachable. Routers that have
attached host members for a group will process the message. On a
multi-home stub subnetwork, the DR is responsible for processing
the message. A router that processes the message is one that
updates the RP address timer to indicate that the RP is still
alive. Other routers that are on the RP-distribution tree
propagate the message.

The RPs will address the RP-Reachable messages to 224.0.0.1.
Routers that have state for the group with respect to the RP
distribution tree will propagate the message. Otherwise, the message
is discarded.

If an RP address timer expires, the DR should attempt to
send an ESL join message toward an alternate RP provided for
that group if one is available.

3.5.3 Other message types

In a future version of this document we will specify a new
IGMP message type that will allows hosts to advertise a list of 1 to n
RP addresses associated with a particular group address.

3.5.4 Examples

Following are examples of source address encodings in Register, Join/Prune,
and RP-reachability messages:

Register scenario:
        A host (131.108.1.1) sends a multicast datagram to 224.1.1.1. The
        first hop designated router (131.108.1.2) for the attached LAN,
        sends an ESL-Register message to the RP (131.108.10.2). The multicast
        datagram is encapsulated in the Register message.

        IP Header:
            Source IP address:      131.108.1.2
            Destination IP address: 131.108.10.2
        IGMP Header:
            IGMP Type:              ESL
            IGMP Code:              Register
            Group Address:          0.0.0.0
        ESL Header:
            Maddr Length:           4
            Addr Length:            4
            Num Groups:             1
            Multicast Group:        224.1.1.1
            # Join/Prune Sources:   1/0
            WC-bit:                 0
            Mask Length:            24
            Address:                131.108.1.0
        Encapsulated datagram:
            Source IP address:      131.108.1.1
            Destination IP address: 224.1.1.1

Join scenario:
        The RP (131.108.10.2) sends an ESL-Join upstream towards the source
        (131.108.1.0) in response to the above Register message recieived.
        The RP's next-hop to reach 131.08.1.0 is 131.08.20.2.

        IP Header:
            Source IP address:      131.108.10.2
            Destination IP address: 131.108.20.2
        IGMP Header:
            IGMP Type:              ESL
            IGMP Code:              Join/Prune
            Group Address:          0.0.0.0
        ESL Header:
            Maddr Length:           4
            Addr Length:            4
            Num Groups:             1
            Multicast Group:        224.1.1.1
            # Join/Prune Sources:   1/0
            WC-bit:                 0
            Mask Length:            24
            Address:                131.108.1.0

RP-reachability scenario:
        Now that 131.108.10.2 knows it is an RP (because it received a
        Register message), it must send ESL RP-Reachability messages
        downstream on the RP-distribution tree for 224.1.1.1. It will
        send the following packet on all outgoing interfaces of the
        (*, 224.1.1.1) entry. Each router on the path will build a new
        IP header with its own IP address as the source IP address.

        IP Header:
            Source IP address:      131.108.10.2
            Destination IP address: 224.0.0.1
        IGMP Header:
            IGMP Type:              ESL
            IGMP Code:              RP-Reachability
            Group Address:          224.1.1.1
        ESL Header:
            RP Address:             131.108.10.2
            Number of Entries:      1
            Mask Length:            24
            Address:                131.108.1.0



4. Robustness Features

4.1 Lost  ESL messages

The protocol is fairly robust to lost control messages.
If an ESL-Register message gets lost then data packets will continue to
be encapsulated in subsequent ESL-Register messages until the RP
initializes an S,G entry and the associated ESL-Join messages
propagate up to the Source.

If an ESL-Join   message is lost then for the remainder of
the refresh period, packets will not be forwarded on the new path, or
will continue to be  forwarded until the refresh is sent.

%%Editorial note: we are not at all fixed on these timer values...
It is recommended that ESL  messages be transmitted at a
rate of 60 seconds. Information that is cached should be timed out
after 3 times the transmission period if no ESL message for the entries
have been received. When a forwarding entry has no more outgoing
interfaces it is deleted and a prune can be sent upstream (or the
router can wait until the next period when the ESL list will no longer
include the Source for the deleted entry and the state will eventually
be timed out upstream).

4.1 Multiple Rendezvous Points and RP failure scenarios

If there is one RP then there is no concern about sources and
receivers actually being able to rendezvous, but there is a
reliability issue. If there are more than one RPs then each receiver
still joins to a single RP, but each source must register to EACH and
EVERY RP. In other words there are multiple RP distribution trees, and
so long as each source sends its packets to all of them, receivers
need only join to one.

When the RP fails or becomes unreachable by receivers, members who have
already joined will continue to receive packets from sources that had
previously sent to the group and for which the receivers had already
switched to the SPT (assuming the SPT is not affected by the same failure
as makes the RP unreachable). However, new members will send toward the
unreachable RP and will NOT be successfully joined to the group unless
their join packets reach existing SPTs of the sources before they reach the
RP. New sources will attempt to register and send to the RP. Their packets
will either not arrive at the RP in which case they will only be forwarded
to receivers who are upstream of the RP with respect to the source, or
their packets  will get to the RP but will not reach downstream
receivers. In the latter  case, the SPT from the source to receivers
will never be set up even if the  paths that make up the SPT are
available. This leads to the motivation for employing multiple RPs.

Unreachable RPs are detected using the RP reachability message. When a
*,G entry is established by a router with local members, a timer is set.
The timer is reset each time an RP reachability message is received. If this
timer expires, the router looks up an alternate RP for the group, sends a
join toward the new RP. A new *,G entry is established with the incoming
interface set to the interface used to reach the new RP. The outgoing
interface list includes only those interfaces on which IGMP Reports for
the group were received. (Other outgoing interfaces may no longer be
valid since the router in question may not be on the shortest path
between the downstream branch and the new RP. If the router is on this
shortest path as well, it will eventually receive an explicit join from
that downstream branch as the last hop routers take the same action).

When multiple RPs are used, each source registers and sends data
packets towards each of the RPs, but Receivers only join toward a
single RP.  If one of the RPs fails, receivers that joined to that RP
will stop receiving RP-reachability messages and will start sending
joins to one of the alternative RPs. Sources do not need to take
special action. When an RP is unreachable it will not receive the
source's Register messages and therefore will not respond with joins
and so the outgoing interfaces in *,G pointing toward the unreachable
RP will time out; without any explicit action on the part of the
source.

Because each receiver's directly connected router  selects an RP
independently, it is possible for routers on the same part of the
distribution tree to specify different RPs while both are still
available. This can lead to looping in some topologies. To avoid looping,
RP address information carried in ESL-Join and RP-reachability messages is
examined to converge to a common RP (the larger numbered RP dominates).


4.2 Unicast routing changes


When unicast routing changes an RPF check is done and all affected expected
incoming interfaces are updated. If the new incoming interface appears in
the outgoing interface list, it is deleted from the outgoing list. The
previous incoming interface may be added to the outgoing interface list by
a subsequent join from downstream. Joins received on the current incoming
interface are ignored. Joins received on new interfaces or existing
outgoing interfaces are not ignored. Other outgoing interfaces are left as
is until they are explicitly pruned by downstream routers or are timed out
due to lack of appropriate join messages.

The ESL-router must send an ESL-Join message out its new interface to
inform upstream routers that it expects multicast datagrams over the
interface. It must send an ESL-Prune message out the old interface, if
the link is operational, to inform upstream routers that this part of
the distribution tree is going away.

If the unicast route goes unreachable, all multicast entries for (S,Gi)
should be modified to have null outgoing interface lists, however the
entries should not be deleted immediately; this causes periodic
prunes to be sent and multicast packets to be discarded. The entry
should be kept alive for the remainder of the timeout lifetime.  This
helps to eliminate transient multicast routing forwarding loops. If the
unicast route has a new next-hop interface, the (S,Gi) entries must be
updated.


The following diagram shows how a multicast forwarding loop can be
avoided. Assume all LANs have members for a given group. The arrows
represent the expected interface each router will receive multicast
datagrams on, as well as the outgoing interfaces with respect to Source.

     ----------      ----------
         |               |
         ^               ^
       +----+         +----+
       | R1 | >-----> | R2 |
       +----+         +----+
        ^  v            |
        |  |            |
        |  |            |      |
        |  |          +----+   |
        |  +--------> | R3 | >-|
        |             +----+   |
        |               |      |
        |               |
       Source   --------+

If the path to Source changes, R1 and R3 may converge on the new path
before R2, e.g.,  R1 uses its link to R3 as its expected incoming
interface and R3 uses its new shortest path link to Source. In this
state,

     ----------      ----------
         |               |
         ^               ^
       +----+         +----+
       | R1 | >-----> | R2 |
       +----+         +----+
        |  ^            |
        |  |            |
        |  |            |      |
        |  |          +----+   |
        |  +--------- | R3 | >-|
        |             +----+   |
        |               ^      |
        |               |
       Source   --------+

R1 and R3 were informed about a topology change for Source and changed
their incoming interfaces. Both R1 and R3 send joins up their new
incoming interfaces. R1 also deleted its outgoing interface to
R3 because this interface is used as an incoming interface. R2,
however has not been informed about the topology change.

If R1 received a multicast datagram on its old expected interface, it
would silently drop it. This would happen if upstream routers from R1
to the  Source had old routing information. If upstream routers have
converged on a new path all datagrams will enter this part of the
network through R3, and it would forward appropriately to its LAN and
to R1 which expects it. R1 would forward to its LAN as well as R2. R2,
using out of date expected incoming interface, would also forward the
packet. Once R2 is informed of the topology change, it will change its
expected incoming interface to R3 and will send a prune to R1 and a
join to R3. The final state would look like:

     ----------      ----------
         |               |
         ^               ^
       +----+         +----+
       | R1 | ------- | R2 |
       +----+         +----+
        |  ^            ^
        |  |            |
        |  |            ^      |
        |  |          +----+   |
        |  +--------< | R3 | >-|
        |             +----+   |
        |               ^      |
        |               |
       Source   --------+


More generally, if unicast routing changes and the router in question  has not
converged then one of two situations exists.
In the first, one or more of the  existing
outgoing interfaces may no longer reach any receivers. In this case data
packets are forwarded until they reach a router that has converged and
finds that the incoming interface for the packet is not right; in which
case the packet will be dropped.
The cost of this transient condition is the continued sending of data
packets down links that do not lead to receivers; this can occur for the
duration of a refresh period.

The second situation occurs when data packets begin to arrive over an
incoming interface other than the one listed in the corresponding S,G
entry. This occurs when upstream has converged and the router in question
has not. In this case, data packets will be dropped instead of delivered
for a time less than or equal to the convergence time + refresh period.




5. ESL Routers on multi-access subnetworks

There are several multiaccess subnetwork configurations that require
special consideration.

5.1 Designated Routers

    +----+         +----+
    | R1 |         | R2 |
    +----+         +----+
       |              |
  -------------------------
     |       |       |
    H1a     H2a     H3b        (Hxy - Host x is a member to group y)

When there are multiple ESL routers on a multi-access network, only a
single router must be responsible for the following actions:

    o Soliciting group membership from hosts and sending ESL-Join messages.
    o Sending Register messages on behalf of a  connected
      source host when it sends a multicast packet.
    o Forwarding multicast packets onto the multi-access network.

This is done with a simple Designated Router (DR) election.
Neighboring routers send ESL Query packets to each other.  The
packets are sent to 224.0.0.2. The largest IP addressed system will
assume role as DR. The default transmission interval is 30 seconds. A
router should detect the DR as unreachable when it does not receive a
Query in 3 times the transmission interval. There will be one DR that
supports all groups per multi-access network.  The DR sends periodic
IGMP Host Query packets to 224.0.0.1 soliciting hosts to respond. The
DR sends multicast packets using the data link address that is mapped from
the IP Class D multicast address.

5.2 Multiaccess subnetwork as a transit network

The following diagram shows the case where a multi-access network is
used as a transit network.


       |              |
       |              |
    +----+         +----+          |
    | R1 |         | R2 |          |
    +----+         +----+          |
       |              |            | Downstream to group members
       |              v            |
  -------------------------        |
         |         |               |
         v         v               |
       +----+    +----+            V
       | R3 |    | R4 |
       +----+    +----+
         |         |
         v         v
         |         |


When a LAN is used as a transit network among routers, it is required
that a single router forward multicast packets to downstream routers.
This router is known as the Router-DR. A single Router-DR is chosen
by the receipt of ESL messages unicasted by each of the downstream
routers. All routers that use the LAN as their incoming interface for
multicast packets from a particular source, will expect it from a single
Router-DR. This router is the one they use for sending unicast packets to
the source. All routers will select the same router. In the case of
equal-cost unicast paths, the largest IP addressed next-hop is used.
The Router-DR forwards multicast packets using data link address that
is mapped from the IP Class D multicast address.
The multicasted data packets will be seen by all other routers connected
to the LAN. For each router, if it has an entry for S,G or *,G and the
LAN is the indicated incoming interface then the router will forward the
packet. If there is no such entry or if the incoming interface is not the
LAN then the packet will be silently dropped.



In the above diagram, both R3 and R4 have downstream group members. They
will send their ESL-Join messages towards the RP, and therefore select
either R1 or R2 to send the message. Assume R2 is the shorter path to
the RP. Later, when multicast datagrams travel from the RP, they will
come through R2 only, avoiding duplicates on the LAN. The same procedure
takes place when the source-based distribution tree is built. Whatever
router on the LAN is chosen is also responsible for delivering multicast
packets to host members, if they were present.


5.3 Parallel routers

The following diagram illustrates the behavior of routers that are in
parallel and the interaction of DR and RP routers.

                   S
                   |
  ------------------------------------- LAN1
       |              |          |
    +----+         +----+      +----+ DR
    | R1 | RP      | R2 |      | R3 |
    +----+         +----+ DR   +----+
       |              |          |
  ------------------------------------- LAN2
                |        |
               H1a      H2a



Assume R1 is the RP for Ga, R3 is DR for LAN1,   R2 the DR for LAN2, and
that LAN1 is the preferred path among routers to reach the RP.
If the receivers of Ga join first, R2 will send an ESL-join to R1, the
RP, out LAN1. R2 builds a multicast  entry for (*,Ga) with incoming
interface LAN1 and outgoing interface LAN2.
R1 receives the ESL-Join message and builds a multicast  entry for
(*,Ga) with incoming interface set to {} (since it is the RP)  and
outgoing interface  set to LAN1.

When S sends a multicast datagram, the DR for LAN1, R3, will encapsulate
the data packet in a Register message and send it to R1, the RP, out LAN1.

The RP, R1,  decapsulates the data packet and forwards it
onto LAN1, as indicated in the outgoing interface list of R1's *,Ga
entry.

The RP, R1, then processes the Register part of the message and sets
up an S,Ga entry with LAN1 as the incoming interface and a null
outgoing interface list; the outgoing interface list is copied from
*,Ga but LAN1 is NOT included in S,Ga outgoing interface list because
LAN1 is the incoming interface.

R1 triggers a join toward S. When R1 does a lookup on S it finds that
S is on a directly connected LAN and sends the join to the DR for that
LAN, i.e., R3.

When the DR, R3, receives the join it builds an S,Ga entry with LAN1
as the incoming interface. Since there is no *,Ga entry, the outgoing
interface list is set to null.

Subsequent data packets from S for Ga that arrive from LAN1 will
be silently dropped by R1 and R3 since the S,Ga entries have  null
outgoing interface lists.

For this example we assume that shortest path trees are desired. In
this case, when R2 receives the multicast datagram from S and finds
that the longest match is on *,Ga (i.e., there is no S,Ga entry in
R2), R2 creates an S,Ga entry, sets the incoming interface to LAN1
(since that is the interface used to send packets to S), and copies
the outgoing interface list from the existing *,Ga entry (in this case
LAN2).  R2 forwards this and subsequent multicast datagrams for S,G
onto LAN2.

When R2 creates S,Ga, it also triggers an ESL-join message
to the next hop to S; in this case S is directly connected so
the ESL-join is sent to the DR for that LAN, R3.  R3 receives the join
but does not add LAN1 to its outgoing interface list for S,Ga because LAN1 is the
incoming interface for S,Ga.



5.4 Leaf-Router Prunes
LAN connected routers must also detect when there are
no more downstream routers. The following protocol is used: when a router
whose incoming interface is the LAN has all of its outgoing interfaces go
to null, the router multicasts a prune message for S,G onto the LAN. All other
routers hear this prune and if there is any router that has the LAN as
its incoming interface for the same S,G and has non-null outgoing
interface list, then the router sends a join message onto the
LAN to override the prune. The join should
go to single upstream router that is the right previous hop to the source or
RP; however, at the same time we want others to hear the join so that
they supress their own joins. For this reason the  join is data link
multicasted, with the IP  address  set to the
upstream router.


6. Interoperation with non-ESL networks/regions

A network or collection of networks should be able to choose whether
to use ESL or traditional multicast to  join a distribution tree,
depending on the density of the membership in that region.
If the density is high then there is no need to carry ESL
messages and state overhead within the region; it is  more
efficient to use RPM or flood membership reports since in general
most links will be on a path from some source to some destination and the
overhead for these traditional IP multicast mechanisms is not a
function of the number of sources.

In addition, we wish to interoperate with networks that do not have
hosts and routers modified to generate and interpret ESL-Join
messages.

The basic problem of splicing these "IP clouds" onto ESL trees is
identifying which border router for the IP cloud should be the entry
point for data packets from a particular source, and therefore which
sources individual border routers should put in their join and prune
lists.
This is analogous to the LAN case when there is more than one router
serving it. The designated router is the one that takes responsibility
for serving the members on the LAN.

If the Border routers are running IBGP then they have the information
necessary to determine which BR should include a particular host in
its join list. Similarly if the BRs are running OSPF then the
information can be computed. However, if the domain is running DVMRP
or some other scheme, there may need to be some additional mechanism
employed in the  BRs.  This is an open issue still to be resolved in
order to achieve maximum interoperation with existing networks.

An additional problem arises when interoperating with a non-ESL cloud.
Namely when a receiver decides to join a group inside of a cloud in
which there are no other members  then the BRs of that cloud must
be notified in order to trigger sending of an ESL-Join join message.
In the case
of MOSPF new group membership is advertised to backbone routers but not
necessarily to all BRs. In the case of DVMRP and most other distance vector
IGPs membership is not advertised at all. Therefore in both cases,  some
additional mechanism is needed.

We can solve this problem in a manner similar to the multi-access LAN
case.
        1. Two internal (to the cloud) multicast groups are created
        Multicast-Reporters (MR) and All-ESL-BRs.

        2a. If the cloud runs MOSPF then one (or a small number for reliability) of
        the backbone routers joins the MR group.
        2b. If the cloud runs DVMRP, then ALL internal routers that
        have the potential of being DRs for a network must join the MR group (i.e.,
        any router that will process an IGMP report).

        3. All BRs that speak ESL join the All-ESL-BRs group AND the MR group.

        4. Members of All-ESL-BRs do a Designated BR election among themselves

        5. The resulting DBR sends an IGMP-query to the MR group.

        6. Members of MR respond by sending IGMP-report messages to the
        MR group. Members of MR listen to these reports and supress
        sending reports for groups that have been reported by other routers.

As a result, all ESL-BRs hear of all groups for which internal members
exist. Based on this information, and information obtained from IBGP or
OSPF, the BRs can determine which of them should send an ESL-Join
message to the RP for each group for which there is a local member.
Note that DBRs are source, group specific.

We will describe two scenarios to illustrate the interoperability
issue: one case where the source of a multicast datagram is in the
non-ESL cloud and receivers in a group are outside of the cloud, and
one case where the source is outside of the cloud and receivers are
inside the cloud.

           ---------------
          /               \
         /             BR1 \  -----...-------- RP
        |                   |                 /
        |   S               |                /
        |                   |               /
        |                   |              .
         \             BR2 / -------------.
          \  BR3          /              .
           ---------------              /
              |                        /
              +---------...------------

S sends a multicast packet that gets to all border routers. Protocols
such as MOSPF and DVMRP will cause the multicast packet to hit all
border routers. If all border routers know of each other the one with
the shortest path to S is elected the Border DR. In case of tie, the
largest IP addressed router becomes Border DR. The Border DR sends the
Register message to the RP. All others discard the multicast packet.
The Border DR sends the multicast packets along the path to the RP after
join messages for S,G propagate back to the DBR to establish S,G state.

In the receiver in the cloud case:

           ---------------
          /               \
         /             BR1 \  -----...-------- RP
        |   H1a             |                 /
        |                   |                /
        |     H2a           |               /
        |                   |              .
         \             BR2 / -------------.
          \  BR3          /              .
           ---------------              /
              |                        /
              +---------...------------


Border Routers need to know which one sends ESL-Join messages to RP
for which groups. If MOSPF is running all borders know of each
other, they can determine which one is closer to the RP. The one
closer, sends the ESL-Join message. Similarly, iBGP provides the BRs
with the information to determine whether each particular BR has the
preferred route to the RP.

Similarly, once a data packet from a new source arrives at the BR, it
must determine which border router
is closest to that source. If the BR itself is the closest, it
forwards the packet internally to the multicast group, sets up the
source-specific forwarding entry, and sends an ESL-join message toward
the source.  Otherwise, it encapsulates the packet and unicasts it to
the correct border, and that border router takes the same action.

In summary,  borders need to learn about each other and their respective
routes to RPs or sources using one of the following:
    o OSPF or IS-IS
    o IBGP or IIDRP
    o Configured mesh of tunnels and unicast routing is running over
      tunnels.




7. Design Issues

7.1 Comparison with Core Based Tree

CBT was proposed to address similar scaling problems, however it has
several differences; some represent functional differences and some
engineering tradeoffs.

7.1.1 Tree Types

The first major issue is that CBT imposes a
single shared tree for each multicast group. We justified our desire
to avoid this scenario earlier.  CBT must rely on more "cores" in
order to obtain efficient distribution paths. This means that the
core(s) must be selected carefully to avoid excessively high delay
distribution paths. Even if the core is placed optimally, there is
still the significant issue for continuous media types of
concentrating all traffic onto a common data distribution tree.

In ESL If some application does not want shortest path tree
distribution then a host does not have to add all new sources to its
ESL. This fact must also be signaled to the routers so that they also
operate in shared-tree mode.
This will cause the RP-based tree to continue to be used as the
distribution tree. In that way an application can choose a group tree
instead of a shortest path tree. Actually first hop routers can make
this decision independently, and a host could even choose differently
for different sources. However, if RP-based distribution is maintained in
any cases then the choice of RPs is more critical than when RPs are
used only as a transition path to shortest path trees.

7.1.2 Group Specific State

There are also protocol engineering differences between the two.  One
of these issues is a tradeoff between requiring group specific state
on the routers in between sources and the RP, vs. carrying an option
in all DATA packets sent to the group. In CBT, data packets travel from
the source to the CBT with an option attached. THis allows the
packets to be sent initially towards the core, by non-CBT routers, and
then to be routed along the CBT once they hit a router that is on the
core tree. In ESL we have chosen to use a Register packet and
establish explicit S,G forwarding entries so that data packets need not
require as much processing.


7.1.3 Soft state vs. explicit reliability mechanism


CBT uses explicit hop by hop  mechanisms  to achieve reliable delivery
of control messages. ESL uses periodic refreshes as its primary means
of reliability.  This approach reduces the complexity of the protocol
and covers a wide range of protocol and network failures in a single
simple mechanism. On the other hand, it can introduce additional
message protocol overhead.

7.1.4 Effect on Host Service Model

CBT requires that hosts be modified to participate in the CBT protocol.
ESL proposes to make use of optional new IGMP Report messages that
include a list of zero  to n RPs; however hosts do not  otherwise have to
participate directly in the ESL protocol.

7.1.5 Incoming interface check on all multicast data packets

If multicast data packets loop the result can be severe; unlike unicast
packets, multicast packets fan out each time they loop. Therefore we
assert that all multicast data packets should be subject to an incoming
interface check comparable to the one performed by DVMRP and MOSPF. In
order to do this check *,G state can only be used downstream of the RP.
As a consequence, in any particular router on the shared tree, a specific
S,G entry must be  maintained for sources that are upstream of the RP
relative to that  router.


7.2 Selecting and Identifying RPs

An RP for a particular multicast group can be any IP-addressable
entity in the internet.  However, it is most efficient and convenient
for the RP to be the directly-connected ESL router of the members of
the group. If an RP has local members of the group then there is no
wasted overhead associated with sources continually sending their data
packets to the RP since it needed to be delivered there anyway for
delivery to those members.

Nevertheless, we need not be overly concerned with placement of the RPs
when shortest path trees are used because the RP will
not remain on the distribution path for most receivers, unless it happens
to be centrally located. Obviously, pathological cases should be avoided,
such as putting the RP on the other end of a very narrow link that is
exceeded by the datarate of sources. The RP address can be configured or
can be dynamically discovered by mapping from the multicast address, query
of a directory service, or from information obtained via new ESL-RP-Report
messages. The mapping of G to RP addresses should be cached.

While the mapping of multicast addresses to  RP addresses is an open
issue in the long term, in
the short term we will implement two mechanism. The first approach is
to simply manually configure the mapping. The second approach is to
allow hosts (both sources and receivers) to inform routers of the
mapping using a new ESL-RP-Report message. The latter approach is
needed to support dynamic groups that hosts advertise and discover by
participating in a special application, e.g., the session directory
(sd) tool developed by V. Jacobson.  Advertising hosts will advertise
RP addresses along with the multicast address and other hosts that
wish to send to or join the group will send an ESL-RP-Report message
with the RP address(es) in response to IGMP Queries.

The DNS is not a general solution because it is not appropriate for
advertising dynamic information quickly as is needed for dynamic
multicast groups. In the future if the DNS is used for multicast address
advertisement, RP addresses can be advertised along with them.


7.4 Separating receiver and sender roles

We chose to continue with the design philosophy of IP multicast for
two reasons. The first is that in order to interoperate seamlessly
with IP multicast we needed to maintain the separation between
receivers and senders. The second is that the separation  allows us
to build a protocol that has less overhead per receiver by introducing
more overhead per source.

While some applications might like to have explicit information about
all receivers in a group, the aggregation mechanisms proposed for very
large groups would interfere with the utility of this information
anyway; i.e., explicit receiver information would only tell which
domains were receiving the packets, not which hosts within those
domains. It seems that some other mechanism is needed if an
application really wants to enforce access control on the multicast
group. This is a subject for further study.

In many applications the source should be a receiver as well in order
to obtain feedback and facilitate debugging\cite{Van}. For this reason
we might add an optimization whereby an IGMP Register message that is
appropriately flagged, would be interpreted and processed as both an
IGMP Register and an IGMP  join message.

7.5 State overhead

State overhead is of considerable concern given the large number of
multicast groups that will exist and the large number of potential
sources that do exist.
The ESL protocol described here entails the following state.
1. On the RP downstream tree (RP and routers downstream of RP) there is:
a *,G state for each Group, and negative or positive cache information
for   each Si,G  when SPT's used..

2. On the SPT's Si,G for subset of Si's whose SPTs pass through that
particular router

3. On the upstream RP Tree (between source and RP, what CBT calls offtree),
there is also Si,G state for each source whose shortest path to the RP
passes through the particular router.

In the periphery the number of sources with SPTs through a router is not so
large. The number of groups may still be large but is still not as large as
in center of the network.

However,  a very large number of sources' SPTs
pass through "central routers" and a very large number of groups have
distribution  trees that pass through the central routers as well.
Source specific state is unavoidable if you want SPTs. If you do not need
SPTs or do not need all of the tree to be SPT, use *,G instead.
We should investigate a situation in which  periphery routers switch to
their SPT interfaces but central
routers stick with *,G RP tree entry. We need to answer two questions:
What kind/quality of trees do we
end up with?  and what is the implication for traffic Concentration in
center of network, even given the greater aggregate BW found there.

There remains the open issue of aggregation across groups as well. Scott
Brim has proposed some mechanisms for dense mode operation.
CBT does avoid group specific state on the routers that lie between
sources and the shared tree for that group by employing and processing
an option in all  data packets sent to sparse multicast groups.
For now, we wish to avoid interfering with data packet processing and pay
with state. But we must due further studies to determine how many
groups can we   supported before the shared-tree mechanism or cross-group
aggregation is mandated?


7.6 Aggregation of information in ESL

There are several motivations for aggregating source information beyond
the subnet level supported in the current specification; the
most important are ESL  message size and the amount of memory used for
routing forwarding entries.

One possibility is to use the highest level aggregate available for an
address when setting up the multicast forwarding entry. This is
optimal with respect to forwarding entry space. It is also optimal
with respect to ESL message size. However,  ESL messages
will carry very coarse information and  when the messages arrive at
routers closer to the source(s) where more specific routes exist there
will be a large fanout and ESL messages will travel toward all members
of the aggregate which would be inefficient in most/many cases.

If ESL is being used for inter-domain routing, and  routers are
able to map from IP address to domain identifier,  then one possibility
is to use the domain level aggregate for a source in ESL messages (AS
numbers or RDI's). Then the ESL message will travel to the BR(s) of
the domain and the BRs can use the internal multicast protocol's
mechanism for propagating the join within the domain (e.g. send
appropriate LSA in MOSPF or register a "local member" and do not prune
in the case of RPF).   However this approach requires that it is both possible
and efficient to map from IP to domain address when processing data
packets, as well as control packets.

%%Editorial Note: the following is a very gross high level description
%%of Vans scheme. It is just a placeholder for the difinitive paragraph
%%that I will eventually extract from him.
Another possibility is to use proxies as suggested by V. Jacobson.
In this case within ESL clouds, ESL messages need only refer to proxies
for sources outside the cloud. In this scheme BRs would join an ESL
tree externally and inject themselves as sources internally. When data
packets arrived, the data packet would be forwarded into the cloud and
routers would see a new source. They would then need to determine
which is the entry BR for the particular source and forward the packet
on the multicast tree associated with that BR. The router could cache
a forwarding entry for the new source in order to avoid repeating this
step on each data packet.  To create efficient multicast distribution
trees that do not generate duplicate packets this scheme requires that internal
routers be able to map from an IP address to the entry BR  used by
that IP address as source. If such a mechanism is not available, possible
approximations may be employed that map packets based on the previous hop
router. This technique is currently being
developed and would be deployable as an addition to the current protocol
without affecting the protocol specification per se.

In the absence of aggregation or proxy techniques, when the number of
sources get to some threshold value (to be determined), receivers
could compromise the quality of the distribution tree in exchange for
accommodating large numbers of unaggregatable sources. In particular
receivers could continue to receive packets over the group tree
instead of moving them off to a shortest path tree. For example,
Receivers could send a wildcard IGMP to an RP to maintain distribution
of all sources packets to that multicast address via the RP.  While
this would result in a suboptimal distribution tree, it would avoid
explicit enumeration of sources.  Alternatively, the receiver could
send a wildcard with explicit sources listed in the prune portion of
the list. This would allow the receiver to get shortest path delivery from
a subset of the sources.


%%DE added
One problem with leaving selection of shared vs. shortest path trees to
the receivers is that the burden of excessive S,G entries will most
likely be in the center of the network far away from receivers. In this
case routers should be able to act unilaterally to decline requested
establishment of new S,G entries. If a router does not process a join
then the downstream receivers will not receive packets over the shortest
path. Assuming a strategy is used whereby receivers do not prune the
shared tree until packets arrive on the shortest path tree, then
receivers will simply remain on the shared tree until more state becomes
available on the shortest path tree.


7.7 Interaction with policy based routing

ESL messages and data packets will travel over paths that include policy so
long as the policy does not preclude them, to the same extent that unicast
routing does. In addition, in the future we will construct a special ESL
message type that embeds a Source Demand Route (SDRP route) and thereby
causes the ESL message and the multicast forwarding state to be on an
alternative distribution tree branch.

To obtain policy sensitive distribution of multicast packets we need to
consider the paths chosen for forwarding ESL-Join and Register messages.

If the path to reach the RP or some source is indicated as being the
appropriate QOS and indicated as being
symmetric then ESL routers can determine that if they forward joins
upstream that the data packets will allowed to travel downstream.

This implies that BGP/IDRP should carry two QOS flags: symmetry flag and
multicast willing flag. The former if set indicates that that each AD hop
has local route selection policies that allow data to flow in either
direction. THe latter flag indicates that each AD hop on the path has a
local transit policy that indicates that multicast packets are allowed.
NOTE: there are two types of symmetry. One indicates that it is not in
violation of transit policies to allow data to flow in both directions so
even if route selection is not symmetric, if mcast forwarding entries
point along the reverse route it does not violate policy. THe second type
of symmetry indicates that packets are in fact routed symmetrically--i.e.,
if R1 forwards Packets from S destined for D  out over an interface to
R2, then the route is truely symmetric if R2 forwards packets from D to S
over its interface to R1. For ESL we only need the former information.


If the generic route computed by hop-by-hop routing does not have the
symmetry and mcast bits set, but   there is an SDRP route that does, then
the ESL message should be sent with  an embedded SDRP route. This option
needs to be added to ESL join messages.  Its absence will indicate
forwarding according to the router's unicast
routing tables. Its presence will indicate forwarding according to the SDRP
route. This implies that SDRP should also carry
symmetry and mcast QOS bits AND that ESL should carry an optional SDRP
route inside of it.



7.8 Interaction with Receiver Initiated reservation setup such as RSVP

Once the SP distribution tree has been established
RSVP reservation messages follow the reverse of senders path
messages and the senders path messages will travel according to the
state that ESL installs.  However, one wants to avoid switching
reservation-oriented routes so the receiver could initially receive
all packets via the RP distribution tree and after some delay it could
send  ESL messages to establish the SP tree and then establish
reservations over that tree.  The source's path message
would travel first via the RP path, then to avoid setting up a
reservation on the RP path, the receiver would send its IGMP
message BEFORE it sends out its reservation message and wait for
another path message to travel over the new SP.

In summary we expect  that this receiver initiated routing is well
suited to receiver initiated reservations since if a reservation is
blocked the previous router or the receiver can select an alternative
reverse path to the particular source(s). This is also a subject for
future work that will affect the use of the protocol, and not the
protocol itself.



7.9 Dense Mode

We can use similar IGMP extensions to support a mode of multicasting
that is good (more efficient than ESL-sparse) for forwarding to
receivers that densely  populate a region. Clouds might run this form
of ESL internally as their internal multicast mechanism; or it could
support dense-inter-domain groups.

In this model routers run RPF (forward out all interfaces, except the
incoming, if a packet arrived on the outgoing interface used to get to
the source). Directly connected
routers run IGMP query and report and when they have no members
for a group and receive packets for it, they send IGMP prune messages
that consist of ESL messages with Prune lists only. Similarly, when a
router gets a packet on a source that is NOT its outgoing interface to
the source, that router sends an ESL message with prune information
only. ESL records prune information and propagates it upwards if
entire downstream branches prune themselves. Periodically the prune
information is timed out and the packets are sent again and downstream
routers must resend the prune messages.

A draft specification of dense mode ESL is available from Dino Farinacci.

Running dense mode internally with ESL sparse mode outside has all
the same problems of DVMRP internally--they need to run BGP or tunnels
to identify appropriate BRs, and the need to add a mechanism for
alerting BRs to new group members.





7.10 Open Issues

The open issues associated with ESL are:

1. Aggregation of source lists via use of proxies.

2. Discovering RP addresses: new IGMP-Report-RP message.

3. Aggregating group specific state along the shared tree.

4. Dense to Sparse to Dense transition issues; see dense mode document.

5. Deciding when to switch from shared to shortest path trees.




Acknowledgments

Tony Ballardie, Scott Brim, Jon Crowcroft, Paul Francis, Ching-Gung
(Charley) Liu, Liming Wei and Lixia Zhang provided detailed comments on
previous drafts. The authors of CBT and membership of the IDMR WG provided
many of the motivating ideas for this work and useful feedback on design
details.
Document	Document type	Expired Internet-Draft (idmr WG) Expired & archived
	Select version	00
	Authors	Dr. Deborah Estrin , Van Jacobson , Dino Farinacci , Dr. Steve E. Deering Email authors
	RFC stream
	Intended RFC status	(None)
	Other formats	txt pdf bibtex bibxml
	Additional resources	Mailing list discussion