IDMR Working Group                                          D. Thaler
Internet Engineering Task Force                           U. Michigan
INTERNET-DRAFT                                              D. Estrin
October 30, 1997                                              USC/ISI
Expires April 1998                                           D. Meyer
                                                            U. Oregon
                                                              Editors



               Border Gateway Multicast Protocol (BGMP):
                         Protocol Specification
                      <draft-ietf-idmr-gum-01.txt>





Status of this Memo

This document is an Internet Draft.  Internet Drafts are working
documents of the Internet Engineering Task Force (IETF), its Areas, and
its Working Groups.  Note that other groups may also distribute working
documents as Internet Drafts.

Internet Drafts are valid for a maximum of six months and may be
updated, replaced, or obsoleted by other documents at any time.  It is
inappropriate to use Internet Drafts as reference material or to cite
them other than as a "work in progress".


Abstract

This document describes BGMP, a protocol for inter-domain multicast
routing. BGMP builds shared trees for active multicast groups, and
allows receiver domains to build source-specific, inter-domain,
distribution branches where needed. Building upon concepts from CBT and
PIM-SM, BGMP requires that each multicast group be associated with a
single root (in BGMP it is referred to as the root domain).  BGMP
assumes that at any point in time, different ranges of the class D space
are associated (e.g., with MASC [MASC]) with different domains.  Each of
these domains then becomes the root of the shared domain-trees for all
groups in its range.  Multicast participants will generally receive
better multicast service if the session initiator's address allocator
selects addresses from its own domain's part of the space, thereby










Draft                             BGMP                      October 1997


causing the root domain to be local to at least one of the session
participants.


1.  Acknowledgements

   In addition to the authors, the following individuals have
   contributed to the design of BGMP: Cengiz Alaettinoglu, Tony
   Ballardie, Steve Casner, Steve Deering, Dino Farinacci, Bill Fenner,
   Mark Handley, Ahmed Helmy, Van Jacobson, and Satish Kumar.

   This document is the product of the IETF IDMR Working Group with Dave
   Thaler, Deborah Estrin, and David Meyer as editors.


2.  Purpose

   It has been suggested that inter-domain multicast is better supported
   with a rendezvous mechanism whereby members receive source's data
   packets without any sort of global broadcast (e.g., DVMRP and PIM-DM
   broadcast initial data packets and MOSPF broadcasts membership
   information). CBT [CBT] and PIM-SM [PIMSM] use a shared group-tree,
   to which all members join and thereby hear from all sources (and to
   which non-members do not join and thereby hear from no sources).

   This document describes BGMP, a protocol for inter-domain multicast
   routing.  BGMP builds shared trees for active multicast groups, and
   allows domains to build source-specific, inter-domain, distribution
   branches where needed. Building upon concepts from CBT and PIM-SM,
   BGMP requires that each global multicast group be associated with a
   single root.  However, in BGMP, the root is an entire exchange or
   domain, rather than a single router.

   BGMP assumes that ranges of the class D space have been associated
   (e.g., with MASC [MASC]) with selected domains. Each such domain then
   becomes the root of the shared domain-trees for all groups in its
   range.  An address allocator will generally achieve better
   distribution trees if it takes its multicast addresses from its own
   domain's part of the space, thereby causing the root domain to be
   local.










Expires April 1998                                              [Page 2]


Draft                             BGMP                      October 1997


3.  Terminology

This document uses the following technical terms:

Domain:
     A set of one or more contiguous links and zero or more routers
     surrounded by one or more multicast border routers. Note that this
     loose definition of domain also applies to an external link between
     two domains, as well as an exchange.

Root Domain:
     When constructing a shared tree of domains for some group, one
     domain will be the "root" of the tree.  The root domain receives
     data from each sender to the group, and functions as a rendezvous
     domain toward which member domains can send inter-domain joins, and
     to which sender domains can send data.

Multicast RIB:
     The Routing Information Base, or routing table, used to calculate
     the "next-hop" towards a particular address for multicast traffic.

Multicast IGP (M-IGP):
     A generic term for any multicast routing protocol used for tree
     construction within a domain.  Typical examples of M-IGPs are:
     DVMRP, PIM-DM, PIM-SM, CBT, and MOSPF.

EGP: A generic term for the interdomain unicast routing protocol in use.
     Typically, this will be some version of BGP which can support a
     Multicast RIB, such as BGP4+ [MBGP], containing both unicast and
     multicast address prefixes.

Component:
     The portion of a border router associated with (and logically
     inside) a particular domain that runs the multicast IGP (M-IGP) for
     that domain, if any.  Each border router thus has zero or more
     components inside routing domains. In addition, each border router
     with external links that do not fall inside any routing domain will
     have an inter-domain component that runs BGMP.

External peer:
     A border router in another multicast AS (autonomous system, as used
     in BGP), to which an eBGP session is open.

Internal peer:
     Another border router of the same multicast AS.  A border router





Expires April 1998                                              [Page 3]


Draft                             BGMP                      October 1997


     either speaks iBGP ("internal" BGP) directly to internal peers in a
     full mesh, or indirectly through an route reflector [REFLECT].

Next-hop peer:
     The next-hop peer towards a given IP address is the next EGP router
     on the path to the given address, according to multicast RIB routes
     in the EGP's routing table (e.g., in BGP4+, routes whose Subsequent
     Address Family Identifier field indicates that the route is valid
     for multicast traffic).

target:
     Either an EGP peer, or an M-IGP component on the same router.

Tree State Table:
     This is a table of (S-prefix,G-prefix) entries (including (*,G-
     prefix) entries) that have been explicitly joined by a set of
     targets.  Each entry has, in addition to the source and group
     addresses and masks, a list of targets that have explicitly
     requested data (on behalf of directly connected hosts or on behalf
     of downstream routers).  (S,G) entries also have an "SPT" bit.


4.  Protocol Overview

   BGMP maintains group-prefix state in response to messages from BGMP
   peers and notifications from M-IGP components. Group-shared trees are
   rooted at the domain advertising the group prefix covering those
   groups.  When a receiver joins a specific group address, the border
   router towards the root domain generates a group-specific Join
   message, which is then forwarded Border-Router-by-Border-Router
   towards the root domain (see Figure 1).  Note that BGMP Join and
   Prune messages are sent over TCP connections between BGMP peers, and
   hence BGMP protocol state is refreshed by the TCP keep-alives.

   BGMP routers build group-specific bidirectional forwarding state as
   they process the BGMP Join messages. Bidirectional forwarding state
   means that packets received from any target are forwarded to all
   other targets in the target list without any RPF checks.  No group-
   specific state or traffic exists in parts of the network where there
   are no members of that group.

   BGMP routers build source-specific unidirectional forwarding state
   only where it is needed to be compatible with source-specific M-IGP
   distribution trees.  For example, a transit domain that uses DVMRP,
   PIM-DM, or PIM-SM as its M-IGP, may need to inject multicast packets





Expires April 1998                                              [Page 4]


Draft                             BGMP                      October 1997


   from different sources via different border routers (to be compatible
   with the M-IGP RPF checks).  Therefore, the BGMP router that is
   responsible for injecting a particular source's packets may build a
   source-specific BGMP branch if it is not already receiving that
   source's packets via the shared tree (see Transit_1 in Figure 1, for
   Src_A).  Note however, that a stub domain that has only a single ISP
   connection will receive all multicast data packets through the single
   BGMP router to which all RPF checks point; and therefore that BGMP
   router need never build external source-specific distribution paths
   (see Rcvr_Stub_7 in Figure 1).

                    Root_Domain
                     [BR61]--------------------------\
                        |                            |
                     [BR32]                         [BR41]
                    Transit_3                     Transit_4
                     [BR31]                      [BR42] [BR43]
                        |                          |      |
                     [BR22]                      [BR52] [BR53]
                    Transit_2                     Transit_5
                     [BR21]                         [BR51]
                        |                            |
                     [BR12]                         [BR61]
                    Transit_1[BR11]----------[BR62]Stub_6
                     [BR13]                        (Src_A)
                        |                          (Rcvr_D)
              -------------------
              |                 |
           [BR71]              [BR81]
          Rcvr_Stub_7       Src_only_Stub_8
          (Rcvr_C)             (Src_B)

   Figure 1: Example inter-domain topology. [BRXY] represents a BGMP border
   router.  Transit_X is a transit domain network.  *_Stub_X is a stub
   domain network.


   Data packets are forwarded based on a combination of BGMP and M-IGP
   rules. The router forwards to a set of targets according to a
   matching (S,G) BGMP tree state entry if it exists. If not found, the
   router checks for a matching (*,G) BGMP tree state entry. If neither
   is found, then the packet is sent natively to the next-hop EGP peer
   for G, according to the Multicast RIB (for example, in the case of a
   non-member sender such as Src_B in Figure 1). If a matching entry was
   found, the packet is forwarded to all other targets in the target





Expires April 1998                                              [Page 5]


Draft                             BGMP                      October 1997


   list. In this way BGMP trees forward data in a bidirectional manner.
   If a target is an M-IGP component then forwarding is subject to the
   rules of that M-IGP protocol.


4.1.  Design Rationale

   Several other protocols, or protocol proposals, build shared trees
   within domains [CBT, HPIM, PIM-SM].  The design choices made for BGMP
   result from our focus on Inter-Domain multicast in particular. The
   design choices made by CBT and PIM-SM are better suited to the wide-
   area intra-domain case.  There are three major differences between
   BGMP and other shared-tree protocols:

   (1) Unidirectional vs. Bidirectional trees

   Bidirectional trees (using bidirectional forwarding state as
   described above) minimize third party dependence which is essential
   in the inter-domain context. For example, in Figure 1, stub domains 7
   and 8 would like to exchange multicast packets without being
   dependent on the quality of connectivity of the root domain.
   However, unidirectional shared trees (i.e., those using RPF checks)
   have more aggressive loop prevention and share the same processing
   rules as source-specific entries which are inherently unidirectional.

   The lack of third party dependence concerns in the INTRA domain case
   reduces the incentive to employ bidirectional trees.  BGMP supports
   bidirectional trees because it has to, and because it can without
   excessive cost.

   (2) Source-specific distribution trees/branches

   In a departure from other shared tree protocols, source-specific BGMP
   state is built ONLY where (a) it IS needed to pull the multicast
   traffic down to a BGMP router that has source-specific (S,G) state,
   and (b) that router is NOT already on the shared tree (i.e., has no
   (*,G) state). We build these source specific branches because most
   M-IGP protocols in use today build source-specific distribution trees
   and would suffer unnecessary overhead if they were not able to import
   packets from high datarate sources via the border router that matches
   the domain's source-specific RPF checks (e.g., BR11 in Figure 1, for
   data from Src_A).  Moreover, some cases in which bidirectional-shared
   tree distribution paths are significantly longer than source-specific
   tree distribution paths, will benefit from these source-specific
   short cuts.





Expires April 1998                                              [Page 6]


Draft                             BGMP                      October 1997


   However, we do not build source-specific inter-domain trees because
   (a) inter-domain connectivity is generally less rich than intra-
   domain connectivity, so shared distribution trees should have more
   acceptible path length and traffic concentration properties in the
   inter-domain context, than in the intra-domain case, and (b) by
   having the shared tree state always take precedence over source-
   specific tree state, we avoid ambiguities that can otherwise arise.

   In summary, BGMP trees are, in a sense, a hybrid between CBT and
   PIM-SM trees.

   (3) Method of choosing root of group shared tree

   The choice of a group's shared-tree-root has implications for
   performance and policy.  In the intra-domain case it can be assumed
   that all potential shared-tree roots (RPs/Cores) within the domain
   are equally suited to be the root for a group that is initiated
   within that domain. In the INTER-domain case, there is far more
   opportunity for unacceptably poor locality and administrative
   ownership of a group's shared-tree root. Therefore in the intra-
   domain case, other protocols treat all candidate roots (RPs or Cores)
   as equivalent and emphasize load sharing and stability to maximize
   performance.  In the Inter-Domain case, all roots are not equivalent,
   and we adopt an approach whereby a group's root domain is not random
   and is subject to administrative and performance input.


5.  Protocol Details

   In this section, we describe the detailed protocol that border
   routers perform.  We assume that each border router conforms to the
   component-based model described in [INTEROP].


5.1.  Interaction with the EGP

   A fundamental requirement imposed by BGMP on the design of an EGP is
   that it be able to carry multicast prefixes.  For example, a multi-
   protocol BGP (MBGP) must be able to carry a multicast prefix in the
   Unicast Network Layer Reachability Information (NLRI) field of the
   UPDATE message (i.e., either a IPv4 class D prefix or a IPv6 prefix
   with high-order octet equal to FF [IPv6MAA]). This capability is
   required by BGMP in the implementation of bi-directional trees; BGMP
   must be able to forward data and control packets to the next hop
   towards either a unicast source S or a multicast group G (see section





Expires April 1998                                              [Page 7]


Draft                             BGMP                      October 1997


   5.2). It is also required that the path attributes defined in
   [RFC1771] have the same semantics whether they are accompany unicast
   or multicast NLRI.

   Note that BGP4+ [MBGP] can be easily extended to satisfy the
   requirement described above. [MBGP] defines the optional transitive
   attributes Multiprotocol Reachable NLRI (MP_REACH_NLRI) and
   Multiprotocol Unreachable (MP_UNREACH_NRLI) to carry sets of
   reachable or unreachable destinations, and the appropriate next hop
   in the case of MP_REACH_NLRI. These attributes contain an Address
   Family Information field [RFC1700] which indicates the type of NLRI
   carried in the attribute. In addition, the attribute carries another
   field, the Subsequent Address Family Identifier, or SAFI, which can
   be used to provide additional information about the type of NLRI. For
   example, SAFI value two indicates that the NLRI is valid for
   multicast forwarding.  BGMP's requirement can be satisfied by
   allowing the NLRI field of the MP_REACH_NLRI (or MP_UNREACH_NLRI) to
   carry a multicast prefix in the Prefix field of the NLRI encoding.

   Finally, while not required for correct BGMP operation, the design of
   an EGP should also provide a mechanism that allows discrimination
   between NLRI that is to be used for unicast forwarding and NLRI to be
   used for multicast forwarding. This property is required to support
   multicast specific policy. As mentioned above, BGP4+ specified in
   [MBGP] has this capability.


5.2.  Multicast Data Packet Processing

   For BGMP rules to be applied, an incoming packet must first be
   "accepted":

   o  If the packet was received from an external peer, the packet is
      accepted.

   o  If the packet arrived on an interface owned by an M-IGP, the M-IGP
      component determines whether the packet should be accepted or
      dropped according to its rules.  If the packet is accepted, the
      packet is forwarded (or not forwarded) out any other interfaces
      owned by the same component, as specified by the M-IGP.

   If the packet is accepted, then the router checks the tree state
   table for a matching (S,G) entry.  If one is found, but the packet
   was not received from the next hop target towards S (if the entry's
   SPT bit is True), or was not received from the next hop target





Expires April 1998                                              [Page 8]


Draft                             BGMP                      October 1997


   towards G (if the entry's SPT bit is False) then the packet is
   dropped and no further actions are taken.  If no (S,G) entry was
   found, the router then checks for a matching (*,G) entry.

   If neither is found, then the packet is forwarded towards the next-
   hop peer for G, according to the Multicast RIB.  If a matching entry
   was found, the packet is forwarded to all other targets in the target
   list.

   Forwarding to a target which is an M-IGP component means that the
   packet is forwarded out any interfaces owned by that component
   according to that component's multicast forwarding rules.


5.3.  BGMP processing of Join and Prune messages and notifications

5.3.1.  Receiving Joins

   When the BGMP component receives a (*,G) or (S,G) Join alert from
   another component, or a BGMP (S,G) or (*,G) Join message from a peer
   (either internal or external), it searches the tree state table for a
   matching entry.  If an entry is found, and that peer is already
   listed in the target list, then the entry's timer is restarted and no
   further actions are taken.

   Otherwise, if no (*,G) or (S,G) entry was found, one is created.  In
   the case of a (*,G), the target list is initialized to contain the
   next-hop peer towards G, if it is an external peer. If the peer is
   internal, the target list is initialized to contain the M-IGP
   component owning the next-hop interface.  If there is no next-hop
   peer (because G is inside the domain), then the target list is
   initialized to contain the next-hop component. If a (S,G) entry
   exists for the same G for which the (*,G) Join is being processed,
   and the next-hop peers toward S and G are different, the BGMP router
   must first send a (S,G) Prune message toward the source and clear the
   SPT bit on the (S,G) entry, before activating the (*,G) entry.

   The component or peer from which the Join was received is then added
   to the target list.  The router then looks up S or G in the Multicast
   RIB to find the next-hop EGP peer.  If the target list, not including
   the next-hop target towards G for a (*,G) entry, becomes non-null as
   a result, the next-hop EGP peer must be notified as follows:

   a) If the next-hop peer towards G (for a (*,G) entry) is an external
      peer, a BGMP (*,G) Join message is unicast to the external peer.





Expires April 1998                                              [Page 9]


Draft                             BGMP                      October 1997


      If the next-hop peer towards S (for an (S,G) entry) is an external
      peer, and the router does NOT have any active (*,G) state for that
      group address G, a BGMP (S,G) Join message is unicast to the
      external peer.  A BGMP (S,G) Join message is never sent to an
      external peer by a router that also contains active (*,G) state
      for the same group.  If the next-hop peer towards S (for an (S,G
      entry) is an external peer and the router DOES have active (*,G)
      state for that group G, the SPT bit is always set to False.

   b) If the next-hop peer is an internal peer, a BGMP (*,G) Join
      message (for a (*,G) entry) or (S,G) Join message (for an (S,G)
      entry) is unicast to the internal peer, In addition, a (*,G) or
      (S,G) Join alert is sent to the M-IGP component owning the next-
      hop interface.

   c) If there is no next-hop peer, a (*,G) or (S,G) Join alert is sent
      to the M-IGP component owning the next-hop interface.


5.3.2.  Receiving Prune Notifications

   When the BGMP component receives a (*,G) or (S,G) Prune alert from
   another component, or a BGMP (*,G) or (S,G) Prune message from a peer
   (either internal or external), it searches the tree state table for a
   matching entry.  If no (S,G) entry was found for an (S,G) Prune, but
   (*,G) state exists, an (S,G) entry is created, with the target list
   copied from the (*,G) entry.  If no matching entry exists, or if the
   component or peer is not listed in the target list, no further
   actions are taken.

   Otherwise, the component or peer is removed from the target list. If
   the target list becomes null as a result, the next-hop peer towards G
   (for a (*,G) entry), or towards S (for an (S,G) entry if and only if
   the BGMP router does NOT have any corresponding (*,G) entry), must be
   notified as follows.

   a) If the peer is an external peer, a BGMP (*,G) or (S,G) Prune
      message is unicast to it.

   b) If the next-hop peer is an internal peer, a BGMP (*,G) or (S,G)
      Prune message is unicast to the internal peer.  In addition, a
      (*,G) or (S,G) Prune alert is sent to the M-IGP component owning
      the next-hop interface.







Expires April 1998                                             [Page 10]


Draft                             BGMP                      October 1997


   c) If there is no next-hop peer, a (*,G) or (S,G) Prune alert is sent
      to the M-IGP component owning the next-hop interface.



5.3.3.  Receiving Route Change Notifications

   When a border router receives a route for a new prefix in the
   multicast RIB, or a existing route for a prefix is withdrawn, a route
   change notification for that prefix must be sent to the BGMP
   component.  In addition, when the next hop peer (according to the
   multicast RIB) changes, a route change notification for that prefix
   must be sent to the BGMP component.

   In addition, an internal route for each class-D prefix associated
   with the domain (if any) MUST be injected into the multicast RIB in
   the EGP by the domain's border routers.


   When a route for a new group prefix is learned, or an existing route
   for a group prefix is withdrawn, or the next-hop peer for a group
   prefix changes, a BGMP router updates all affected (*,G) target
   lists.

   When an existing route for a source prefix is withdrawn, or the
   next-hop peer for a source prefix changes, a BGMP router updates all
   affected (S,G) target lists.


5.4.  Interaction with M-IGP components

   When an M-IGP component on a border router first learns that there
   are internally-reached members for a group G (whose scope is larger
   than a single domain), a (*,G) Join alert is sent to the BGMP
   component. Similarly, when an M-IGP component on a border router
   learns that there are no longer internally-reached members for a
   group G (whose scope is larger than a single domain), a (*,G) Prune
   alert is sent to the BGMP component.

   At any time, any M-IGP domain MAY decide to join a source-specific
   branch for some external source S and group G.  When the M-IGP
   component in the border router that is the next-hop router for a
   particular source S learns that a receiver wishes to receive data
   from S on a source-specific path, an (S,G) Join alert is sent to the
   BGMP component.  When it is learned that such receivers no longer





Expires April 1998                                             [Page 11]


Draft                             BGMP                      October 1997


   exist, an (S,G) Prune alert is sent to the BGMP component.  Recall
   that the BGMP component will generate external source-specific Joins
   only where the source-specific branch does not coincide with the
   shared tree distribution tree for that group.

   Finally, we will require that the border router that is the next-hop
   internal peer for a particular address S or G be able to forward data
   for a matching tree state table entry to all members within the
   domain. This requirement has implications on specific M-IGPs as
   follows.


5.4.1.  Interaction with DVMRP and PIM-DM

   DVMRP and PIM-DM are both "flood and prune" protocols in which every
   data packet must pass an RPF check against the packet's source
   address, or be dropped. If the border router receiving packets from
   an external source is the only BR to inject the route for the source
   into the domain, then there are no problems.  For example, this will
   always be true for stub domains with a single border router (see
   Figure 1). Otherwise, the border router receiving packets externally
   is responsible for encapsulating the data to any other border routers
   that must inject the data into the domain for RPF checks to succeed.

   When an intended border router injector for a source receives
   encapsulated packets from another border router in its domain, it
   should create source-specific (S,G) BGMP state.  Note that the border
   router may be configured to do this on a data-rate triggered bases so
   that the state is not created for very low data rate/intermittent
   sources. If source-specific state is created then its incoming
   interface points to the virtual encapsulation interface from the
   border router that forwarded the packet, and it has an SPT flag that
   is initialized to be False.

   When the (S,G) BGMP state is created, the BGMP component will in turn
   send a BGMP (S,G) Join message to the next-hop external peer towards
   S if there is no (*,G) state for that same group, G. The (S,G) BGMP
   state will have the SPT bit set to False if (*,G) BGMP state is
   present.

   When the first data packet from S arrives from the external peer and
   matches on the BGMP (S,G) state, and IF there is no (*,G) state, the
   router sets the SPT flag to True, resets the incoming interface to
   point to the external peer, and sends a BGMP (S,G) Prune message to
   the border router that was encapsulating the packets (e.g., in Figure





Expires April 1998                                             [Page 12]


Draft                             BGMP                      October 1997


   1, BR11 sends the (Src_A,G) Prune to BR12). When the border router
   with (*,G) state receives the prune for (S,G), it should delete that
   border router from its list of targets or outgoing interfaces.

   PIM-DM and DVMRP present an additional problem, i.e., no protocol
   mechanism exists for joining and pruning entire groups; only joins
   and prunes for individual sources are available. We therefore require
   that some form of Domain-Wide Reports (DWRs) [DWR] are available
   within such domains.  Such messages provide the ability to join and
   prune an entire group across the domain. One simple heuristic to
   approximate DWRs is to assume that if there are any internally-
   reached members, then at least one of them is a sender. With this
   heuristic, the presense of any M-IGP (S,G) state for internally-
   reached sources can be used instead.  Sending a data packet to a
   group is then equivalent to sending a DWR for the group.


5.4.2.  Interaction with CBT

   CBT builds bidirectional shared trees but must address two points of
   compatibility with BGMP.  First, CBT is currently not specified to
   accommodate more than one border router injecting a packet.
   Therefore, if a CBT domain does have multiple external connections,
   the M-IGP components of the border routers are responsible for
   insuring that only one of them will inject data from any given
   source.  This mechanism is provided in [CBTDM].

   Second, CBT cannot process source-specific Joins or Prunes.  Two
   options thus exist for each CBT domain:

   Option A:
     The CBT component interprets a (S,G) Join alert as if it were an
     (*,G) Join alert, as described in [INTEROP].  That is, if it is not
     already on the core-tree for G, then it sends a CBT (*,G) JOIN-
     REQUEST message towards the core for G.  Similarly, when the CBT
     component receives an (S,G) Prune alert, and the child interface
     list for a group is NULL, then it sends a (*,G) QUIT_NOTIFICATION
     towards the core for G.  This option has the disadvantage of
     pulling all data for the group G down to the CBT domain when no
     members exist.

   Option B:
     The CBT domain does not propagate any source routes (i.e., non-
     class D routes) to their external peers for the Multicast RIB
     unless it is known that no other path exists to that prefix (e.g.,





Expires April 1998                                             [Page 13]


Draft                             BGMP                      October 1997


     routes for prefixes internal to the domain or in a singly-homed
     customer's domain may be propagated).  This insures that source-
     specific joins are never received unless the source's data already
     passes through the domain on the shared tree, in which case the
     (S,G) Join need not be propagated anyway.  BGMP border routers will
     only send source-specific Joins or Prunes to an external peer if
     that external peer advertises source-prefixes in the EGP.  If a
     BGMP-CBT border router does receive an (S,G) Join or Prune, that
     border router should ignore the message.


5.4.3.  Interaction with MOSPF

   As with CBT, MOSPF cannot process source-specific Joins or Prunes,
   and the same two options are available.  Therefore, an MOSPF domain
   may either:

   Option A:
     send a Group-Membership-LSA for all of G in response to a (S,G)
     Join alert, and "prematurely age" it out (when no other downstream
     members exist) in response to an (S,G) Prune alert, OR

   Option B:
     not propagate any source routes (i.e., non-class D routes) to their
     external peers for the Multicast RIB unless it is known that no
     other path exists to that prefix (e.g., routes for prefixes
     internal to the domain or in a singly-homed customer's domain may
     be propagated)


5.4.4.  Interaction with PIM-SM

   Protocols such as PIM-SM build unidirectional shared and source-
   specific trees.  As with DVMRP and PIM-DM, every data packet must
   pass an RPF check against some group-specific or source-specific
   address.


   The fewest encapsulations/decapsulations will be done when the
   intra-domain tree is rooted at the next-hop internal peer towards G
   (which becomes the RP), since in general that router will receive the
   most packets from external sources.  To achieve this, each BGMP
   border router to a PIM-SM domain should send Candidate-RP-
   Advertisements within the domain for those groups for which it is the
   shared-domain tree ingress router. When the border router that is the





Expires April 1998                                             [Page 14]


Draft                             BGMP                      October 1997


   RP for a group G receives an external data packet, it forwards the
   packet according to the M-IGP (i.e., PIM-SM) shared-tree outgoing
   interface list.

   Other border routers will receive data packets from external sources
   that are farther down the bidirectional tree of domains. When a
   border router that is not the RP receives an external packet for
   which it does not have a source-specific entry, the border router
   treats it like a local source by creating (S,G) state with a Register
   flag set, based on normal PIM-SM rules; the Border router then
   encapsulates the data packets in PIM-SM Registers and unicasts them
   to the RP for the group.  As explained above, the RP for the inter-
   domain group will be one of the other border routers of the domain.

   If a source's data rate is high enough, DRs within the PIM-SM domain
   may switch to the shortest path tree.  If the shortest path to an
   external source is via the group's ingress router for the shared
   tree, the new (S,G) state in the BGMP border router will not cause
   BGMP (S,G) Joins because that border router will already have (*,G)
   state. If however, the shortest path to an external source is via
   some other border router, that border router will create (S,G) BGMP
   state in response to the M-IGP (S,G) Join alert. In this case,
   because there is no local (*,G) state to supress it, the border
   router will send a BGMP (S,G) Join to the next-hop external peer
   towards S, in order to pull the data down directly.  (See BR11 in
   Figure 1.) As in normal PIM-SM operation, those PIM-SM routers that
   have (*,G) and (S,G) state pointing to different incoming interfaces
   will prune that source off the shared tree.  Therefore, all internal
   interfaces may be eventually pruned off the internal shared tree.


6.  Interaction with address allocation

6.1.  Requirements for BGMP components

   Each border router must be able to determine (e.g., from MASC [MASC])
   which class-D prefixes (if any) belong to each domain in which a
   component resides.

   Periodically, the router then multicasts to the domain-scoped ALL-
   PA-RECEIVERS group within each domain that has one or more class-D
   prefixes, a Prefix-Announcement message containing those prefixes.








Expires April 1998                                             [Page 15]


Draft                             BGMP                      October 1997


6.2.  Interaction with Address Allocators

   Each address allocator SHOULD join the domain-scoped ALL-PA-RECEIVERS
   group, and SHOULD allocate addresses from the prefix(es) announced to
   this group.


7.  Transition Strategy

   There have been significant barriers to multicast deployment in
   Internet backbones.  While many of the problems with the current
   DVMRP backbone (MBONE) have been documented in [ISSUES], most of
   these problems require longer term engineering solutions. However,
   there is much that can be done with existing technologies to enable
   deployment and put in place an architecture that will enable a smooth
   transition to the next generation of inter-domain multicast routing
   protocols (i.e., BGMP).  This section proposes a near-term transition
   strategy and architecture that is designed to be simple, risk-
   neutral, and provide a smooth, incremental transition path to BGMP.
   In addition, the transition architecture provides for improved
   convergence properties, some initial policy control, and the
   opportunity for providers to run either native or tunneled multicast
   backbones and exchanges.

   The transition strategy proposed here is to initially use BGP4+
   [MBGP] to provide the desired convergence and policy control
   properties, and PIM-DM for multicast data forwarding.  Once this
   architecture is in place, backbones and exchanges can incrementally
   transition to BGMP and domains running other M-IGPs may be
   incorporated more fully.

   Since the current MBone uses a broadcast-and-prune backbone running
   DVMRP, BGMP may view the entire MBone as a single multi-homed stub
   domain (with a new AS number).  The members-are-senders heuristic can
   then be used initially to provide membership notifications within
   this stub domain.

   A BGMP backbone can then be formed by designating a neutral PIM-DM
   domain (say, a particular exchange) as the initial BGMP backbone.
   This domain is then associated with the group prefix 224/4 which is
   injected into the Multicast RIB by all BGP4+/BGMP border routers on
   that exchange.

   Any domain which meets the following constraints may then transition
   from a normal MBone-connected domain to one running BGMP:





Expires April 1998                                             [Page 16]


Draft                             BGMP                      October 1997


(1)  Must peer with another BGMP domain and participate in M-BGP to
     propagate routes in the Multicast RIB.

(2)  Must establish an internal (to the MBone AS) EGP (e.g., iBGP) peer
     relationship with other border routers of the MBone "stub" domain,
     as is done with unicast routing.  We expect this to eventually
     involve the use of one or more route reflectors [REFLECT] inside
     the MBone domain.

(3)  If the transition will partition the MBone "stub" domain, then it
     must be insured that the MBone domain will be administratively
     split into multiple domains, each with a different multicast AS
     number.





































Expires April 1998                                             [Page 17]


Draft                             BGMP                      October 1997


7.1.  Preventing transit through the MBone stub

   We desire that two AS's which are mutually reachable through BGMP use
   paths which do not pass through the MBone stub domain.  This is
   illustrated in Figure 2, where the MBone stub is AS 5, which is
   multi-homed to both AS 3 and AS 4.  Paths between sources and
   destinations which have already transitioned to BGP4+/BGMP should not
   use AS 5 as transit unless no other path exists.

           ----------------------\   /----------------------------
                                 |   |
           DVMRP         /----\  |   |  /----\  IGP/iBGP
           ..............| BR |+++++++++| BR |-----------
                         \----/  | E |  \----/
                            +    | B |     +          AS 3
           MBone            +    | G |     +
                            +    | P \-----+----------------------
           AS 5        iBGP +    |         + eBGP
                            +    |   /-----+----------------------
                            +    |   |     +
                            +    |   |     +
           DVMRP         /----\  |   |  /----\  IGP/iBGP
           ..............| BR |+++++++++| BR |-----------
                         \----/  |   |  \----/
                                 |   |                AS 4
                                 |   |
           ----------------------/   \----------------------------

              Figure 2: Preventing Transit through MBone Stub


   This requirement is easily solved using standard BGP policy
   mechanisms. The MBone border routers should prefer EGP routes to
   DVMRP routes, since DVMRP cannot tag routes as being external.  Thus,
   external routes may appear in the DVMRP routing table, but will not
   be imported into the EGP since they will be overridden by iBGP
   routes.

   Other EGP routers should prefer routes whose ASpath does not contain
   the well-known MBone AS number.  This will insure that the route
   through the MBone stub is not used unless no other path exists.  For
   safety, routes whose ASpath begins with the MBone AS should receive
   the worst preference.







Expires April 1998                                             [Page 18]


Draft                             BGMP                      October 1997


8.  Packet Formats

   WARNING: These formats are preliminary and may change as a result of
   adding features such as capability negotiation.

   BGMP only uses one type of message, in which join and prune
   information is sent.  Since BGMP messages are sent over TCP, only
   state changes are included.  The TCP keep-alive mechanism thus serves
   as an explicit state refresh mechanism; when the TCP connection goes
   down, all related state should be flushed.

   The message format below allows compact encoding of (*,G-prefix)
   Joins and Prunes (12 bytes per group, for IPv4), while allowing the
   flexibility needed to do (S,G) Joins and Prunes towards soures as
   well as on the shared tree.

     0                   1                   2                   3
     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |   Reserved    |X|R|M| AddrLen | Addr Family   | Encoding Type |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                        Group-Address                          |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                          Group-Mask                           |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                        Source-Entry-1                       ...
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + +
    |                        Source-Entry-2                       ...
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + +
    |                              ...                              |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + +
    |                        Source-Entry-n                       ...
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + +

   Reserved       Transmitted as 0, ignored upon receipt.  This field
                  is reserved for future additions, such as version
                  and message type fields, should they become
                  necessary.
   X              Reserved bit.  Transmitted as 0, ignored upon
                  receipt.
   R              Root-Domain-tree bit.  If set, the sender desires to
                  be (or continue to be) part of the shared tree
                  through the peer, and any source entries are (S,G)
                  joins and prunes on the shared tree.  If clear, the
                  sender does not desire to be part of the shared tree





Expires April 1998                                             [Page 19]


Draft                             BGMP                      October 1997


                  through the peer, and any source entries are (S,G)
                  joins and prunes towards soures.
   M              More-sources bit.  If set, then source entries exist
                  for this group.
   AddrLen        Length, in bytes, of the Group-Address field.
   AddrFamily     Address family (see below) of the group address.
   Encoding Type  The type of encoding used within a specific Address
                  Family. The value `0' is reserved for this field,
                  and represents the native encoding of the Address
                  Family.
   Group-Address  The multicast group address to be joined or pruned.
   Group-Mask     The mask associated with the group address.  The
                  length of this field should be identical to the
                  length of the address field.

   Each Source-Entry has the following format:

     0                   1                   2                   3
     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |   MaskLen     |X|I|M| AddrLen | Addr Family   | Encoding Type |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                        Source-Address                         |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

   Masklen         Length, in bits, of the mask to apply to the
                   address.
   X               Reserved bit.  Transmitted as 0, ignored upon
                   receipt.
   I               Inclusion bit.  When set, the source entry
                   indicates an addition, or join.  When clean, the
                   source entry indicates a removal, or prune.
   M               More-sources bit.  If set, then more source entries
                   follow for the same group.
   AddrLen         Length, in bytes, of the Source-Address field.
   AddrFamily      Address family (see below) of the source address.
   Encoding Type   The type of encoding used within a specific Address
                   Family.  The value `0' is reserved for this field,
                   and represents the native encoding of the Address
                   Family.
   Source-Address  Unicast source address to be joined or pruned.









Expires April 1998                                             [Page 20]


Draft                             BGMP                      October 1997


8.1.  Encoding examples

   R Group     : I Source | Description
   -----------------------+---------------------------------------------
   1 G/mask               | (*,G-prefix) join
   0 G/mask               | (*,G-prefix) prune
   0 G/ffffffff: 1 S/32   | (S,G) Join towards S.  This is also used to
                          | switch from a (*,G) Join to an (S,G) Join,
                          | such as when the next hop peer towards G
                          | changes, but it is advantagous to continue
                          | receiving S's data from the peer.
   0 G/ffffffff: 0 S/32   | (S,G) Prune towards S
   1 G/ffffffff: 0 S/32   | (S,G) Prune towards root-domain.  This is
                          | also used to send an initial (*,G) join with
                          | S pruned, at the same time (such as when the
                          | next hop peer towards G changes after S has
                          | already been pruned off).
   1 G/ffffffff: 1 S/32   | (S,G) Join cancelling prune towards root-
                          | domain.

9.  References

[MBGP]
     Bates, T., Chandra, R., Katz, D., and Y. Rekhter., "Multiprotocol
     Extensions for BGP-4", draft-ietf-idr-bgp4-multiprotocol-01.txt,
     September 1997.

[CBT]
     Ballardie, A. J., "Core Based Trees (CBT) Multicast: Architectural
     Overview and Specification", University College London, November
     1994.

[CBTDM]
     Ballardie, A., "Core Based Tree (CBT) Multicast Border Router
     Specification" draft-ietf-idmr-cbt-br-spec-00.txt, October 1997.

[DVMRP]
     Pusateri, T., "Distance Vector Multicast Routing Protocol", draft-
     ietf-idmr-dvmrp-v3-05.txt, October 1997.

[DWR]
     Fenner, W., "Domain-Wide Reports", Work in progress.

[INTEROP]
     Thaler, D., "Interoperability Rules for Multicast Routing





Expires April 1998                                             [Page 21]


Draft                             BGMP                      October 1997


     Protocols", draft-thaler-multicast-interop-01.txt, March 1997.

[IPv6MAA]
     R. Hinden, S. Deering, "IPv6 Multicast Address Assignments",
     draft-ietf-ipngwg-multicast-assgn-04.txt, July 1997.

[ISSUES]
     Meyer, D., "Some Issues for an Inter-domain Multicast Routing
     Protocol", draft-ietf-mboned-imrp-some-issues-02.txt, June 1997.

[MASC]
     Estrin, D., Handley, M, and D. Thaler, "Multicast-Address-Set
     advertisement and Claim mechanism", Work in Progress, June 1997.

[MOSPF]
     Moy, J., "Multicast Extensions to OSPF", RFC 1584, Proteon, March
     1994.

[PIMDM]
     Estrin, et al., "Protocol Independent Multicast-Dense Mode (PIM-
     DM): Protocol Specification", draft-ietf-idmr-pim-dm-spec-05.txt,
     May 1997.

[PIMSM]
     Estrin, et al., "Protocol Independent Multicast-Sparse Mode (PIM-
     SM): Protocol Specification", RFC 2117, June 1997.

[REFLECT]
     Bates, T., and R. Chandra, "BGP Route Reflection: An alternative to
     full mesh IBGP", RFC 1966, June 1996.

[RFC1700]
     S. J. Reynolds, J. Postel, "ASSIGNED NUMBERS", RFC 1700, October
     1994.

[RFC1771]
     Y. Rekhter, T. Li, "A Border Gateway Protocol 4 (BGP-4)", RFC 1771,
     March 1995.


10.  Security Considerations

Security issues are not discussed in this memo.







Expires April 1998                                             [Page 22]


Draft                             BGMP                      October 1997


11.  Authors' Addresses

     Dave Thaler
     Department of Electrical Engineering and Computer Science
     University of Michigan
     1301 Beal Ave.
     Ann Arbor, MI 48109-2122
     Phone: +1 313 763 5243
     EMail: thalerd@eecs.umich.edu

     Deborah Estrin
     Computer Science Dept./ISI
     University of Southern California
     Los Angeles, CA 90089
     Email: estrin@usc.edu

     David Meyer
     University of Oregon
     1225 Kincaid St.
     Eugene, OR 97403
     Phone: (541) 346-1747
     EMail: meyer@antc.uoregon.edu



Table of Contents


1 Acknowledgements ................................................    2
2 Purpose .........................................................    2
3 Terminology .....................................................    3
4 Protocol Overview ...............................................    4
4.1 Design Rationale ..............................................    6
5 Protocol Details ................................................    7
5.1 Interaction with the EGP ......................................    7
5.2 Multicast Data Packet Processing ..............................    8
5.3 BGMP processing of Join and Prune messages and notifications
     ..............................................................    9
5.3.1 Receiving Joins .............................................    9
5.3.2 Receiving Prune Notifications ...............................   10
5.3.3 Receiving Route Change Notifications ........................   11
5.4 Interaction with M-IGP components .............................   11
5.4.1 Interaction with DVMRP and PIM-DM ...........................   12
5.4.2 Interaction with CBT ........................................   13
5.4.3 Interaction with MOSPF ......................................   14





Expires April 1998                                             [Page 23]


Draft                             BGMP                      October 1997


5.4.4 Interaction with PIM-SM .....................................   14
6 Interaction with address allocation .............................   15
6.1 Requirements for BGMP components ..............................   15
6.2 Interaction with Address Allocators ...........................   16
7 Transition Strategy .............................................   16
7.1 Preventing transit through the MBone stub .....................   18
8 Packet Formats ..................................................   19
8.1 Encoding examples .............................................   21
9 References ......................................................   21
10 Security Considerations ........................................   22
11 Authors' Addresses .............................................   23







































Expires April 1998                                             [Page 24]