IDMR Working Group D. Thaler
Internet Engineering Task Force U. Michigan
INTERNET-DRAFT D. Estrin
March 12, 1998 USC/ISI
Expires September 1998 D. Meyer
U. Oregon
Editors
Border Gateway Multicast Protocol (BGMP):
Protocol Specification
<draft-ietf-idmr-gum-02.txt>
Status of this Memo
This document is an Internet Draft. Internet Drafts are working
documents of the Internet Engineering Task Force (IETF), its Areas, and
its Working Groups. Note that other groups may also distribute working
documents as Internet Drafts.
Internet Drafts are valid for a maximum of six months and may be
updated, replaced, or obsoleted by other documents at any time. It is
inappropriate to use Internet Drafts as reference material or to cite
them other than as a "work in progress".
Abstract
This document describes BGMP, a protocol for inter-domain multicast
routing. BGMP builds shared trees for active multicast groups, and
allows receiver domains to build source-specific, inter-domain,
distribution branches where needed. Building upon concepts from CBT and
PIM-SM, BGMP requires that each multicast group be associated with a
single root (in BGMP it is referred to as the root domain). BGMP
assumes that at any point in time, different ranges of the class D space
are associated (e.g., with MASC [MASC]) with different domains. Each of
these domains then becomes the root of the shared domain-trees for all
groups in its range. Multicast participants will generally receive
better multicast service if the session initiator's address allocator
selects addresses from its own domain's part of the space, thereby
Draft BGMP March 1998
causing the root domain to be local to at least one of the session
participants.
1. Acknowledgements
In addition to the editors, the following individuals have
contributed to the design of BGMP: Cengiz Alaettinoglu, Tony
Ballardie, Steve Casner, Steve Deering, Dino Farinacci, Bill Fenner,
Mark Handley, Ahmed Helmy, Van Jacobson, and Satish Kumar.
This document is the product of the IETF IDMR Working Group with Dave
Thaler, Deborah Estrin, and David Meyer as editors.
2. Purpose
It has been suggested that inter-domain multicast is better supported
with a rendezvous mechanism whereby members receive source's data
packets without any sort of global broadcast (e.g., DVMRP and PIM-DM
broadcast initial data packets and MOSPF broadcasts membership
information). CBT [CBT] and PIM-SM [PIMSM] use a shared group-tree,
to which all members join and thereby hear from all sources (and to
which non-members do not join and thereby hear from no sources).
This document describes BGMP, a protocol for inter-domain multicast
routing. BGMP builds shared trees for active multicast groups, and
allows domains to build source-specific, inter-domain, distribution
branches where needed. Building upon concepts from CBT and PIM-SM,
BGMP requires that each global multicast group be associated with a
single root. However, in BGMP, the root is an entire exchange or
domain, rather than a single router.
BGMP assumes that ranges of the class D space have been associated
(e.g., with MASC [MASC]) with selected domains. Each such domain then
becomes the root of the shared domain-trees for all groups in its
range. An address allocator will generally achieve better
distribution trees if it takes its multicast addresses from its own
domain's part of the space, thereby causing the root domain to be
local.
BGMP uses TCP as its transport protocol. This eliminates the need to
implement message fragmentation, retransmission, acknowledgement, and
sequencing. BGMP uses TCP port [XXX insert IANA-assigned port number
here] for establishing its connections. This port is distinct from
Expires September 1998 [Page 2]
Draft BGMP March 1998
BGP's port to provide protocol independence, and to facilitate
distinguishing between protocol packets (e.g., by packet classifiers,
diagnostic utilities, etc.)
Two BGMP peers form a TCP connection between one another, and
exchange messages to open and confirm the connection parameters.
They then send incremental Join/Prune Updates as group memberships
change. BGMP does not require periodic refresh of individual
entries. KeepAlive messages are sent periodically to ensure the
liveness of the connection. Notification messages are sent in
response to errors or special conditions. If a connection encounters
an error condition, a notification message is sent and the connection
is closed.
3. Terminology
This document uses the following technical terms:
Domain:
A set of one or more contiguous links and zero or more routers
surrounded by one or more multicast border routers. Note that this
loose definition of domain also applies to an external link between
two domains, as well as an exchange.
Root Domain:
When constructing a shared tree of domains for some group, one
domain will be the "root" of the tree. The root domain receives
data from each sender to the group, and functions as a rendezvous
domain toward which member domains can send inter-domain joins, and
to which sender domains can send data.
Multicast RIB:
The Routing Information Base, or routing table, used to calculate
the "next-hop" towards a particular address for multicast traffic.
Multicast IGP (M-IGP):
A generic term for any multicast routing protocol used for tree
construction within a domain. Typical examples of M-IGPs are:
DVMRP, PIM-DM, PIM-SM, CBT, and MOSPF.
EGP: A generic term for the interdomain unicast routing protocol in use.
Typically, this will be some version of BGP which can support a
Multicast RIB, such as BGP4+ [MBGP], containing both unicast and
Expires September 1998 [Page 3]
Draft BGMP March 1998
multicast address prefixes.
Component:
The portion of a border router associated with (and logically
inside) a particular domain that runs the multicast IGP (M-IGP) for
that domain, if any. Each border router thus has zero or more
components inside routing domains. In addition, each border router
with external links that do not fall inside any routing domain will
have an inter-domain component that runs BGMP.
External peer:
A border router in another multicast AS (autonomous system, as used
in BGP), to which a BGMP TCP-connection is open. Assuming BGP4+ is
being used, a separate "eBGP" TCP-connection will also be open to
the same peer.
Internal peer:
Another border router of the same multicast AS. A border router
either speaks iBGP ("internal" BGP) directly to internal peers in a
full mesh, or indirectly through a route reflector [REFLECT]. A
border router need not establish BGMP TCP-connections to internal
peers.
Next-hop peer:
The next-hop peer towards a given IP address is the next EGP router
on the path to the given address, according to multicast RIB routes
in the EGP's routing table (e.g., in BGP4+, routes whose Subsequent
Address Family Identifier field indicates that the route is valid
for multicast traffic).
target:
Either an EGP peer, or an M-IGP component on the same router.
Tree State Table:
This is a table of (S-prefix,G-prefix) entries (including (*,G-
prefix) entries) that have been explicitly joined by a set of
targets. Each entry has, in addition to the source and group
addresses and masks, a list of targets that have explicitly
requested data (on behalf of directly connected hosts or on behalf
of downstream routers). (S,G) entries also have an "SPT" bit.
Expires September 1998 [Page 4]
Draft BGMP March 1998
4. Protocol Overview
BGMP maintains group-prefix state in response to messages from BGMP
peers and notifications from M-IGP components. Group-shared trees are
rooted at the domain advertising the group prefix covering those
groups. When a receiver joins a specific group address, the border
router towards the root domain generates a group-specific Join
message, which is then forwarded Border-Router-by-Border-Router
towards the root domain (see Figure 1). BGMP Join and Prune messages
are sent over TCP connections between BGMP peers, and BGMP protocol
state is refreshed by KEEPALIVE messages periodically sent over TCP.
BGMP routers build group-specific bidirectional forwarding state as
they process the BGMP Join messages. Bidirectional forwarding state
means that packets received from any target are forwarded to all
other targets in the target list without any RPF checks. No group-
specific state or traffic exists in parts of the network where there
are no members of that group.
BGMP routers build source-specific unidirectional forwarding state
only where it is needed to be compatible with source-specific M-IGP
distribution trees. For example, a transit domain that uses DVMRP,
PIM-DM, or PIM-SM as its M-IGP, may need to inject multicast packets
from different sources via different border routers (to be compatible
with the M-IGP RPF checks). Therefore, the BGMP router that is
responsible for injecting a particular source's packets MAY build a
source-specific BGMP branch if it is not already receiving that
source's packets via the shared tree (see Transit_1 in Figure 1, for
Src_A). Note however, that a stub domain that has only a single ISP
connection will receive all multicast data packets through the single
BGMP router to which all RPF checks point; and therefore that BGMP
router need never build external source-specific distribution paths
(see Rcvr_Stub_7 in Figure 1).
Root_Domain
[BR91]--------------------------\
| |
[BR32] [BR41]
Transit_3 Transit_4
[BR31] [BR42] [BR43]
| | |
[BR22] [BR52] [BR53]
Transit_2 Transit_5
[BR21] [BR51]
| |
Expires September 1998 [Page 5]
Draft BGMP March 1998
[BR12] [BR61]
Transit_1[BR11]----------[BR62]Stub_6
[BR13] (Src_A)
| (Rcvr_D)
-------------------
| |
[BR71] [BR81]
Rcvr_Stub_7 Src_only_Stub_8
(Rcvr_C) (Src_B)
Figure 1: Example inter-domain topology. [BRXY] represents a BGMP border
router. Transit_X is a transit domain network. *_Stub_X is a stub
domain network.
Data packets are forwarded based on a combination of BGMP and M-IGP
rules. The router forwards to a set of targets according to a
matching (S,G) BGMP tree state entry if it exists. If not found, the
router checks for a matching (*,G) BGMP tree state entry. If neither
is found, then the packet is sent natively to the next-hop EGP peer
for G, according to the Multicast RIB (for example, in the case of a
non-member sender such as Src_B in Figure 1). If a matching entry was
found, the packet is forwarded to all other targets in the target
list. In this way BGMP trees forward data in a bidirectional manner.
If a target is an M-IGP component then forwarding is subject to the
rules of that M-IGP protocol.
4.1. Design Rationale
Several other protocols, or protocol proposals, build shared trees
within domains [CBT, HPIM, PIM-SM]. The design choices made for BGMP
result from our focus on Inter-Domain multicast in particular. The
design choices made by CBT and PIM-SM are better suited to the wide-
area intra-domain case. There are three major differences between
BGMP and other shared-tree protocols:
(1) Unidirectional vs. Bidirectional trees
Bidirectional trees (using bidirectional forwarding state as
described above) minimize third party dependence which is essential
in the inter-domain context. For example, in Figure 1, stub domains 7
and 8 would like to exchange multicast packets without being
dependent on the quality of connectivity of the root domain.
However, unidirectional shared trees (i.e., those using RPF checks)
Expires September 1998 [Page 6]
Draft BGMP March 1998
have more aggressive loop prevention and share the same processing
rules as source-specific entries which are inherently unidirectional.
The lack of third party dependence concerns in the INTRA domain case
reduces the incentive to employ bidirectional trees. BGMP supports
bidirectional trees because it has to, and because it can without
excessive cost.
(2) Source-specific distribution trees/branches
In a departure from other shared tree protocols, source-specific BGMP
state is built ONLY where (a) it is needed to pull the multicast
traffic down to a BGMP router that has source-specific (S,G) state,
and (b) that router is NOT already on the shared tree (i.e., has no
(*,G) state), and (c) that router does not want to receive packets
via encapsulation from from a router which is on the shared tree. We
allow such source-specific branches because most M-IGP protocols in
use today build source-specific distribution trees and would suffer
unnecessary overhead if they were not able to import packets from
high datarate sources via the border router that matches the domain's
source-specific RPF checks (e.g., BR11 in Figure 1, for data from
Src_A). Moreover, some cases in which bidirectional-shared tree
distribution paths are significantly longer than source-specific tree
distribution paths, will benefit from these source-specific short
cuts.
However, we do not build source-specific inter-domain trees in
general because (a) inter-domain connectivity is generally less rich
than intra-domain connectivity, so shared distribution trees should
have more acceptible path length and traffic concentration properties
in the inter-domain context, than in the intra-domain case, and (b)
by having the shared tree state always take precedence over source-
specific tree state, we avoid ambiguities that can otherwise arise.
In summary, BGMP trees are, in a sense, a hybrid between CBT and
PIM-SM trees.
(3) Method of choosing root of group shared tree
The choice of a group's shared-tree-root has implications for
performance and policy. In the intra-domain case it can be assumed
that all potential shared-tree roots (RPs/Cores) within the domain
are equally suited to be the root for a group that is initiated
within that domain. In the INTER-domain case, there is far more
opportunity for unacceptably poor locality and administrative
Expires September 1998 [Page 7]
Draft BGMP March 1998
ownership of a group's shared-tree root. Therefore in the intra-
domain case, other protocols treat all candidate roots (RPs or Cores)
as equivalent and emphasize load sharing and stability to maximize
performance. In the Inter-Domain case, all roots are not equivalent,
and we adopt an approach whereby a group's root domain is not random
but is subject to administrative and performance input.
5. Protocol Details
In this section, we describe the detailed protocol that border
routers perform. We assume that each border router conforms to the
component-based model described in [INTEROP].
5.1. Interaction with the EGP
A fundamental requirement imposed by BGMP on the design of an EGP is
that it be able to carry multicast prefixes. For example, a multi-
protocol BGP (MBGP) must be able to carry a multicast prefix in the
Unicast Network Layer Reachability Information (NLRI) field of the
UPDATE message (i.e., either an IPv4 class D prefix or an IPv6 prefix
with high-order octet equal to FF [IPv6MAA]). This capability is
required by BGMP in the implementation of bi-directional trees; BGMP
must be able to forward data and control packets to the next hop
towards either a unicast source S or a multicast group G (see section
5.2). It is also required that the path attributes defined in
[RFC1771] have the same semantics whether they are accompany unicast
or multicast NLRI.
BGP4+ [MBGP] satisfies the requirement described above. [MBGP]
defines the optional transitive attributes Multiprotocol Reachable
NLRI (MP_REACH_NLRI) and Multiprotocol Unreachable (MP_UNREACH_NRLI)
to carry sets of reachable or unreachable destinations, and the
appropriate next hop in the case of MP_REACH_NLRI. These attributes
contain an Address Family Information field [RFC1700] which indicates
the type of NLRI carried in the attribute. In addition, the attribute
carries another field, the Subsequent Address Family Identifier, or
SAFI, which can be used to provide additional information about the
type of NLRI. For example, SAFI value two indicates that the NLRI is
valid for multicast forwarding. BGMP's requirement can be satisfied
by allowing the NLRI field of the MP_REACH_NLRI (or MP_UNREACH_NLRI)
to carry a multicast prefix in the Prefix field of the NLRI encoding.
Finally, while not required for correct BGMP operation, the design of
Expires September 1998 [Page 8]
Draft BGMP March 1998
an EGP should also provide a mechanism that allows discrimination
between NLRI that is to be used for unicast forwarding and NLRI to be
used for multicast forwarding. This property is required to support
multicast-specific policy. As mentioned above, BGP4+ specified in
[MBGP] has this capability.
5.2. Multicast Data Packet Processing
For BGMP rules to be applied, an incoming packet must first be
"accepted":
o If the packet was received from an external peer, the packet is
accepted.
o If the packet arrived on an interface owned by an M-IGP, the M-IGP
component determines whether the packet should be accepted or
dropped according to its rules. If the packet is accepted, the
packet is forwarded (or not forwarded) out any other interfaces
owned by the same component, as specified by the M-IGP.
If the packet is accepted, then the router checks the tree state
table for a matching (S,G) entry. If one is found, but the packet
was not received from the next hop target towards S (if the entry's
SPT bit is True), or was not received from the next hop target
towards G (if the entry's SPT bit is False) then the packet is
dropped and no further actions are taken. If no (S,G) entry was
found, the router then checks for a matching (*,G) entry.
If neither is found, then the packet is forwarded towards the next-
hop peer for G, according to the Multicast RIB. If a matching entry
was found, the packet is forwarded to all other targets in the target
list.
Forwarding to a target which is an M-IGP component means that the
packet is forwarded out any interfaces owned by that component
according to that component's multicast forwarding rules.
5.3. BGMP processing of Join and Prune messages and notifications
5.3.1. Receiving Joins
When the BGMP component receives a (*,G) or (S,G) Join alert from
another component, or a BGMP (S,G) or (*,G) Join message from an
Expires September 1998 [Page 9]
Draft BGMP March 1998
external peer, it searches the tree state table for a matching entry.
If an entry is found, and that peer is already listed in the target
list, then no further actions are taken.
Otherwise, if no (*,G) or (S,G) entry was found, one is created. In
the case of a (*,G), the target list is initialized to contain the
next-hop peer towards G, if it is an external peer. If the peer is
internal, the target list is initialized to contain the M-IGP
component owning the next-hop interface. If there is no next-hop
peer (because G is inside the domain), then the target list is
initialized to contain the next-hop component. If an (S,G) entry
exists for the same G for which the (*,G) Join is being processed,
and the next-hop peers toward S and G are different, the BGMP router
must first send a (S,G) Prune message toward the source and clear the
SPT bit on the (S,G) entry, before activating the (*,G) entry.
The target from which the Join was received is then added to the
target list. The router then looks up S or G in the Multicast RIB to
find the next-hop EGP peer. If the target list, not including the
next-hop target towards G for a (*,G) entry, becomes non-null as a
result, the next-hop EGP peer must be notified as follows:
a) If the next-hop peer towards G (for a (*,G) entry) is an external
peer, a BGMP (*,G) Join message is unicast to the external peer.
If the next-hop peer towards S (for an (S,G) entry) is an external
peer, and the router does NOT have any active (*,G) state for that
group address G, a BGMP (S,G) Join message is unicast to the
external peer. A BGMP (S,G) Join message is never sent to an
external peer by a router that also contains active (*,G) state
for the same group. If the next-hop peer towards S (for an (S,G
entry) is an external peer and the router DOES have active (*,G)
state for that group G, the SPT bit is always set to False.
b) If the next-hop peer is an internal peer, a (*,G) or (S,G) Join
alert is sent to the M-IGP component owning the next-hop
interface.
c) If there is no next-hop peer, a (*,G) or (S,G) Join alert is sent
to the M-IGP component owning the next-hop interface.
5.3.2. Receiving Prune Notifications
When the BGMP component receives a (*,G) or (S,G) Prune alert from
another component, or a BGMP (*,G) or (S,G) Prune message from an
Expires September 1998 [Page 10]
Draft BGMP March 1998
external peer, it searches the tree state table for a matching entry.
If no (S,G) entry was found for an (S,G) Prune, but (*,G) state
exists, an (S,G) entry is created, with the target list copied from
the (*,G) entry. If no matching entry exists, or if the component or
peer is not listed in the target list, no further actions are taken.
Otherwise, the component or peer is removed from the target list. If
the target list becomes null as a result, the next-hop peer towards G
(for a (*,G) entry), or towards S (for an (S,G) entry if and only if
the BGMP router does NOT have any corresponding (*,G) entry), must be
notified as follows.
a) If the peer is an external peer, a BGMP (*,G) or (S,G) Prune
message is unicast to it.
b) If the next-hop peer is an internal peer, a (*,G) or (S,G) Prune
alert is sent to the M-IGP component owning the next-hop
interface.
c) If there is no next-hop peer, a (*,G) or (S,G) Prune alert is sent
to the M-IGP component owning the next-hop interface.
5.3.3. Receiving Route Change Notifications
When a border router receives a route for a new prefix in the
multicast RIB, or a existing route for a prefix is withdrawn, a route
change notification for that prefix must be sent to the BGMP
component. In addition, when the next hop peer (according to the
multicast RIB) changes, a route change notification for that prefix
must be sent to the BGMP component.
In addition, an internal route for each class-D prefix associated
with the domain (if any) MUST be injected into the multicast RIB in
the EGP by the domain's border routers.
When a route for a new group prefix is learned, or an existing route
for a group prefix is withdrawn, or the next-hop peer for a group
prefix changes, a BGMP router updates all affected (*,G) target
lists. The router sends a (*,G) Join to the new next-hop target, and
a (*,G) Prune to the old next-hop target, as appropriate.
When an existing route for a source prefix is withdrawn, or the
next-hop peer for a source prefix changes, a BGMP router updates all
Expires September 1998 [Page 11]
Draft BGMP March 1998
affected (S,G) target lists. The router sends a (S,G) Join to the
new next-hop target, and a (S,G) Prune to the old next-hop target, as
appropriate.
5.4. Interaction with M-IGP components
When an M-IGP component on a border router first learns that there
are internally-reached members for a group G (whose scope is larger
than that domain), a (*,G) Join alert is sent to the BGMP component.
Similarly, when an M-IGP component on a border router learns that
there are no longer internally-reached members for a group G (whose
scope is larger than a single domain), a (*,G) Prune alert is sent to
the BGMP component.
At any time, any M-IGP domain MAY decide to join a source-specific
branch for some external source S and group G. When the M-IGP
component in the border router that is the next-hop router for a
particular source S learns that a receiver wishes to receive data
from S on a source-specific path, an (S,G) Join alert is sent to the
BGMP component. When it is learned that such receivers no longer
exist, an (S,G) Prune alert is sent to the BGMP component. Recall
that the BGMP component will generate external source-specific Joins
only where the source-specific branch does not coincide with the
shared tree distribution tree for that group.
Finally, we will require that the border router that is the next-hop
internal peer for a particular address S or G be able to forward data
for a matching tree state table entry to all members within the
domain. This requirement has implications on specific M-IGPs as
follows.
5.4.1. Interaction with DVMRP and PIM-DM
DVMRP and PIM-DM are both "broadcast and prune" protocols in which
every data packet must pass an RPF check against the packet's source
address, or be dropped. If the border router receiving packets from
an external source is the only BR to inject the route for the source
into the domain, then there are no problems. For example, this will
always be true for stub domains with a single border router (see
Figure 1). Otherwise, the border router receiving packets externally
is responsible for encapsulating the data to any other border routers
that must inject the data into the domain for RPF checks to succeed.
Although peering sessions to internal peers are normally not
Expires September 1998 [Page 12]
Draft BGMP March 1998
required, in this situation, BGMP TCP-connections must exist between
such internal peers, and the "virtual" interfaces used for
encapsulation are owned by BGMP.
When an intended border router injector for a source receives
encapsulated packets from another border router in its domain, it
should create source-specific (S,G) BGMP state. Note that the border
router may be configured to do this on a data-rate triggered basis so
that the state is not created for very low data-rate/intermittent
sources. If source-specific state is created, then its incoming
interface points to the virtual encapsulation interface from the
border router that forwarded the packet, and it has an SPT flag that
is initialized to be False.
When the (S,G) BGMP state is created, the BGMP component will in turn
send a BGMP (S,G) Join message to the next-hop external peer towards
S if there is no (*,G) state for that same group, G. The (S,G) BGMP
state will have the SPT bit set to False if (*,G) BGMP state is
present.
When the first data packet from S arrives from the external peer and
matches on the BGMP (S,G) state, and IF there is no (*,G) state, the
router sets the SPT flag to True, resets the incoming interface to
point to the external peer, and sends a BGMP (S,G) Prune message to
the border router that was encapsulating the packets (e.g., in Figure
1, BR11 sends the (Src_A,G) Prune to BR12). When the border router
with (*,G) state receives the prune for (S,G), it then deletes that
border router from its list of targets.
PIM-DM and DVMRP present an additional problem, i.e., no protocol
mechanism exists for joining and pruning entire groups; only joins
and prunes for individual sources are available. We therefore require
that some form of Domain-Wide Reports (DWRs) [DWR] are available
within such domains. Such messages provide the ability to join and
prune an entire group across the domain. One simple heuristic to
approximate DWRs is to assume that if there are any internally-
reached members, then at least one of them is a sender. With this
heuristic, the presense of any M-IGP (S,G) state for internally-
reached sources can be used instead. Sending a data packet to a
group is then equivalent to sending a DWR for the group.
Expires September 1998 [Page 13]
Draft BGMP March 1998
5.4.2. Interaction with PIM-SM
Protocols such as PIM-SM build unidirectional shared and source-
specific trees. As with DVMRP and PIM-DM, every data packet must
pass an RPF check against some group-specific or source-specific
address.
The fewest encapsulations/decapsulations will be done when the
intra-domain tree is rooted at the next-hop internal peer towards G
(which becomes the RP), since in general that router will receive the
most packets from external sources. To achieve this, each BGMP
border router to a PIM-SM domain should send Candidate-RP-
Advertisements within the domain for those groups for which it is the
shared-domain tree ingress router. When the border router that is the
RP for a group G receives an external data packet, it forwards the
packet according to the M-IGP (i.e., PIM-SM) shared-tree outgoing
interface list.
Other border routers will receive data packets from external sources
that are farther down the bidirectional tree of domains. When a
border router that is not the RP receives an external packet for
which it does not have a source-specific entry, the border router
treats it like a local source by creating (S,G) state with a Register
flag set, based on normal PIM-SM rules; the Border router then
encapsulates the data packets in PIM-SM Registers and unicasts them
to the RP for the group. As explained above, the RP for the inter-
domain group will be one of the other border routers of the domain.
If a source's data rate is high enough, DRs within the PIM-SM domain
may switch to the shortest path tree. If the shortest path to an
external source is via the group's ingress router for the shared
tree, the new (S,G) state in the BGMP border router will not cause
BGMP (S,G) Joins because that border router will already have (*,G)
state. If however, the shortest path to an external source is via
some other border router, that border router will create (S,G) BGMP
state in response to the M-IGP (S,G) Join alert. In this case,
because there is no local (*,G) state to supress it, the border
router will send a BGMP (S,G) Join to the next-hop external peer
towards S, in order to pull the data down directly. (See BR11 in
Figure 1.) As in normal PIM-SM operation, those PIM-SM routers that
have (*,G) and (S,G) state pointing to different incoming interfaces
will prune that source off the shared tree. Therefore, all internal
interfaces may be eventually pruned off the internal shared tree.
Expires September 1998 [Page 14]
Draft BGMP March 1998
5.4.3. Interaction with CBT
CBT builds bidirectional shared trees but must address two points of
compatibility with BGMP. First, CBT can not accommodate more than
one border router injecting a packet. Therefore, if a CBT domain
does have multiple external connections, the M-IGP components of the
border routers are responsible for insuring that only one of them
will inject data from any given source. This mechanism is provided
in [CBTDM].
Second, CBT cannot process source-specific Joins or Prunes. Two
options thus exist for each CBT domain:
Option A:
The CBT component interprets a (S,G) Join alert as if it were an
(*,G) Join alert, as described in [INTEROP]. That is, if it is not
already on the core-tree for G, then it sends a CBT (*,G) JOIN-
REQUEST message towards the core for G. Similarly, when the CBT
component receives an (S,G) Prune alert, and the child interface
list for a group is NULL, then it sends a (*,G) QUIT_NOTIFICATION
towards the core for G. This option has the disadvantage of
pulling all data for the group G down to the CBT domain when no
members exist.
Option B:
The CBT domain does not propagate any source routes (i.e., non-
class D routes) to their external peers for the Multicast RIB
unless it is known that no other path exists to that prefix (e.g.,
routes for prefixes internal to the domain or in a singly-homed
customer's domain may be propagated). This insures that source-
specific joins are never received unless the source's data already
passes through the domain on the shared tree, in which case the
(S,G) Join need not be propagated anyway. BGMP border routers will
only send source-specific Joins or Prunes to an external peer if
that external peer advertises source-prefixes in the EGP. If a
BGMP-CBT border router does receive an (S,G) Join or Prune, that
border router should ignore the message.
To minimize en/de-capsulations, CBTv2 BR's may follow the same
scheme as described under PIM-SM above, in which Candidate-Core
advertisements are sent for those groups for which it is the
shared-tree ingress router.
Expires September 1998 [Page 15]
Draft BGMP March 1998
5.4.4. Interaction with MOSPF
As with CBT, MOSPF cannot process source-specific Joins or Prunes,
and the same two options are available. Therefore, an MOSPF domain
may either:
Option A:
send a Group-Membership-LSA for all of G in response to a (S,G)
Join alert, and "prematurely age" it out (when no other downstream
members exist) in response to an (S,G) Prune alert, OR
Option B:
not propagate any source routes (i.e., non-class D routes) to their
external peers for the Multicast RIB unless it is known that no
other path exists to that prefix (e.g., routes for prefixes
internal to the domain or in a singly-homed customer's domain may
be propagated)
6. Interaction with address allocation
6.1. Requirements for BGMP components
Each border router must be able to determine (e.g., from MASC [MASC])
which class-D prefixes (if any) belong to each domain in which a
component resides.
7. Transition Strategy
There have been significant barriers to multicast deployment in
Internet backbones. While many of the problems with the current
DVMRP backbone (MBONE) have been documented in [ISSUES], most of
these problems require longer term engineering solutions. However,
there is much that can be done with existing technologies to enable
deployment and put in place an architecture that will enable a smooth
transition to the next generation of inter-domain multicast routing
protocols (i.e., BGMP). This section proposes a near-term transition
strategy and architecture that is designed to be simple, risk-
neutral, and provide a smooth, incremental transition path to BGMP.
In addition, the transition architecture provides for improved
convergence properties, some initial policy control, and the
opportunity for providers to run either native or tunneled multicast
Expires September 1998 [Page 16]
Draft BGMP March 1998
backbones and exchanges.
The transition strategy proposed here is to initially use BGP4+
[MBGP] to provide the desired convergence and policy control
properties, and PIM-DM for multicast data forwarding. Once this
architecture is in place, backbones and exchanges can incrementally
transition to BGMP and domains running other M-IGPs may be
incorporated more fully.
Since the current MBone uses a broadcast-and-prune backbone running
DVMRP, BGMP may view the entire MBone as a single multi-homed stub
domain (with a new AS number). The members-are-senders heuristic can
then be used initially to provide membership notifications within
this stub domain.
A BGMP backbone can then be formed by designating one or more neutral
PIM-DM domains (say, exchanges) as initial BGMP backbones. Each
exchange is then associated with a group prefix which is injected
into the Multicast RIB by all BGP4+/BGMP border routers on that
exchange.
Any domain which meets the following constraints may then transition
from a normal MBone-connected domain to one running BGMP:
(1) Must peer with another BGMP domain and participate in M-BGP to
propagate routes in the Multicast RIB.
(2) Must establish an internal (to the MBone AS) EGP (e.g., iBGP) peer
relationship with other border routers of the MBone "stub" domain,
as is done with unicast routing. We expect this to eventually
involve the use of one or more route reflectors [REFLECT] inside
the MBone domain.
(3) If the transition will partition the MBone "stub" domain, then it
must be insured that the MBone domain will be administratively
split into multiple domains, each with a different multicast AS
number.
Expires September 1998 [Page 17]
Draft BGMP March 1998
7.1. Preventing transit through the MBone stub
We desire that two AS's which are mutually reachable through BGMP use
paths which do not pass through the MBone stub domain. This is
illustrated in Figure 2, where the MBone stub is AS 5, which is
multi-homed to both AS 3 and AS 4. Paths between sources and
destinations which have already transitioned to BGP4+/BGMP should not
use AS 5 as transit unless no other path exists.
----------------------\ /----------------------------
| |
DVMRP /----\ | | /----\ IGP/iBGP
..............| BR |+++++++++| BR |-----------
\----/ | E | \----/
+ | B | + AS 3
MBone + | G | +
+ | P \-----+----------------------
AS 5 iBGP + | + eBGP
+ | /-----+----------------------
+ | | +
+ | | +
DVMRP /----\ | | /----\ IGP/iBGP
..............| BR |+++++++++| BR |-----------
\----/ | | \----/
| | AS 4
| |
----------------------/ \----------------------------
Figure 2: Preventing Transit through MBone Stub
This requirement is easily solved using standard BGP policy
mechanisms. The MBone border routers should prefer EGP routes to
DVMRP routes, since DVMRP cannot tag routes as being external. Thus,
external routes may appear in the DVMRP routing table, but will not
be imported into the EGP since they will be overridden by iBGP
routes.
Other EGP routers should prefer routes whose ASpath does not contain
the well-known MBone AS number. This will insure that the route
through the MBone stub is not used unless no other path exists. For
safety, routes whose ASpath begins with the MBone AS should receive
the worst preference.
Expires September 1998 [Page 18]
Draft BGMP March 1998
8. Message Formats
This section describes message formats used by BGMP.
Messages are sent over a reliable transport protocol connection. A
message is processed only after it is entirely received. The maximum
message size is 4096 octets. All implementations are required to
support this maximum message size.
All fields labelled "Reserved" below must be transmitted as 0, and
ignored upon receipt.
8.1. Message Header Format
Each message has a fixed-size (4-byte) header. There may or may not
be a data portion following the header, depending on the message
type. The layout of these fields is shown below:
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Length | Type | Reserved |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Length:
This 2-octet unsigned integer indicates the total length of the
message, including the header, in octets. Thus, e.g., it allows
one to locate in the transport-level stream the start of the next
message. The value of the Length field must always be at least 4
and no greater than 4096, and may be further constrained, depending
on the message type. No "padding" of extra data after the message
is allowed, so the Length field must have the smallest value
required given the rest of the message.
Type:
This 1-octet unsigned integer indicates the type code of the
message. The following type codes are defined:
1 - OPEN
2 - UPDATE
3 - NOTIFICATION
4 - KEEPALIVE
Expires September 1998 [Page 19]
Draft BGMP March 1998
8.2. OPEN Message Format
After a transport protocol connection is established, the first
message sent by each side is an OPEN message. If the OPEN message is
acceptable, a KEEPALIVE message confirming the OPEN is sent back.
Once the OPEN is confirmed, UPDATE, KEEPALIVE, and NOTIFICATION
messages may be exchanged.
In addition to the fixed-size BGMP header, the OPEN message contains
the following fields:
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Version | Reserved | Hold Time |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| BGMP Identifier |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
+ (Optional Parameters) |
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Version:
This 1-octet unsigned integer indicates the protocol version number
of the message. The current BGMP version number is 1.
Hold Time:
This 2-octet unsigned integer indicates the number of seconds that
the sender proposes for the value of the Hold Timer. Upon receipt
of an OPEN message, a BGMP speaker MUST calculate the value of the
Hold Timer by using the smaller of its configured Hold Time and the
Hold Time received in the OPEN message. The Hold Time MUST be
either zero or at least three seconds. An implementation may
reject connections on the basis of the Hold Time. The calculated
value indicates the maximum number of seconds that may elapse
between the receipt of successive KEEPALIVE, and/or UPDATE messages
by the sender.
BGMP Identifier:
This 4-octet unsigned integer indicates the BGMP Identifier of the
sender. A given BGMP speaker sets the value of its BGMP Identifier
Expires September 1998 [Page 20]
Draft BGMP March 1998
to a globally-unique value assigned to that BGMP speaker (e.g., an
IPv4 address). The value of the BGMP Identifier is determined on
startup and is the same for every BGMP session opened.
Optional Parameters:
This field may contain a list of optional parameters, where each
parameter is encoded as a <Parameter Length, Parameter Type,
Parameter Value> triplet. The combined length of all optional
parameters can be derived from the Length field in the message
header.
0 1
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-...
| Parm. Type | Parm. Length | Parameter Value (variable)
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-...
Parameter Type is a one octet field that unambiguously identifies
individual parameters. Parameter Length is a one octet field that
contains the length of the Parameter Value field in octets.
Parameter Value is a variable length field that is interpreted
according to the value of the Parameter Type field.
This document defines the following Optional Parameters:
a) Authentication Information (Parameter Type 1):
This optional parameter may be used to authenticate a BGMP peer.
The Parameter Value field contains a 1-octet Authentication Code
followed by a variable length Authentication Data.
0 1 2 3 4 5 6 7 8
+-+-+-+-+-+-+-+-+
| Auth. Code |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
| Authentication Data |
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Authentication Code:
This 1-octet unsigned integer indicates the authentication
mechanism being used. Whenever an authentication mechanism is
Expires September 1998 [Page 21]
Draft BGMP March 1998
specified for use within BGMP, three things must be included in
the specification:
- the value of the Authentication Code which indicates use of the
mechanism, - the form and meaning of the Authentication Data, and
- the algorithm for computing values of Marker fields.
Note that a separate authentication mechanism may be used in
establishing the transport level connection.
Authentication Data:
The form and meaning of this field is a variable-length field
depend on the Authentication Code.
The minimum length of the OPEN message is 14 octets (including
message header).
8.3. UPDATE Message Format
UPDATE messages are used to transfer Join/Prune information between
BGMP peers. The UPDATE message always includes the fixed-size BGMP
header, and one or more attributes as described below.
The message format below allows compact encoding of (*,G) Joins and
Prunes, while allowing the flexibility needed to do other updates
such as (S,G) Joins and Prunes towards soures as well as on the
shared tree. In the discussion below, an Encoded-Address-Prefix is
of the form:
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+
|EnTyp| AddrFam |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Address (variable length) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Mask (variable length) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
EnTyp:
0 - All 1's Mask. The Mask field is 0 bytes long.
1 - Mask length included. The Mask field is 4 bytes long, and
contains the mask length, in bits.
Expires September 1998 [Page 22]
Draft BGMP March 1998
2 - Full Mask included. The Mask field is the same length
as the Address field, and contains the full bitmask.
AddrFam:
The IANA-assigned address family number of the encoded prefix.
These include (among others):
Number Description
------ -----------
1 IP (IP version 4)
2 IPv6 (IP version 6)
Address:
The address associated with the given prefix to be encoded. The
length is determined based on the Address Family.
Mask:
The mask associated with the given prefix. The format (or absence)
of this field is determined by the EnTyp field.
Each attribute is of the form:
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Length | Type | Data ...
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
All attributes are 4-byte aligned.
Length:
The Length is the length of the entire attribute, including the
length, type, and data fields. If other attributes are nested
within the data field, the length includes the size of all such
nested attributes.
Type:
Types 128-255 are reserved for "optional" attributes. If a
required attribute is unrecognized, a NOTIFICATION will be sent and
the connection will be closed. Unrecognized optional attributes
are simply ignored.
Expires September 1998 [Page 23]
Draft BGMP March 1998
0 - JOIN
1 - PRUNE
2 - GROUP
3 - SOURCE
a) JOIN (Type Code 0)
The JOIN attribute indicates that all GROUP or SOURCE options
nested immediately within the JOIN option should be joined.
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Length | Type=0 | Reserved |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Nested Attributes ...
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
No JOIN or PRUNE attributes may be immediately nested within a JOIN
attribute.
b) PRUNE (Type Code 1)
The PRUNE attribute indicates that all GROUP or SOURCE attributes
nested immediately within the PRUNE attribute should be pruned.
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Length | Type=1 | Reserved |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Nested Attributes ...
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
No JOIN or PRUNE attributes may be immediately nested within a JOIN
attribute.
c) GROUP (Type Code 2)
The GROUP attribute identifies a given group-prefix. In addition,
any attributes nested immediately within the GROUP attribute also
apply to the given group-prefix.
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Length | Type=2 | |
Expires September 1998 [Page 24]
Draft BGMP March 1998
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +
| |
| Encoded-Address-Prefix |
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Nested Attributes (optional) ...
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
No GROUP or SOURCE attributes may be immediately nested within a
GROUP attribute.
AddrFam Address family (see below) of the group address.
EncType The type of encoding used within a specific Address
Family. The value `0' is reserved for this field,
and represents the native encoding of the Address
Family.
Encoded-Address-Prefix
The multicast group prefix to be joined to pruned,
in the format described above.
d) SOURCE (Type Code 3):
The SOURCE attribute identifies a given source-prefix. In
addition, any attributes nested immediately within the SOURCE
attribute also apply to the given source-prefix.
The SOURCE attribute has the following format:
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Length | Type=2 | |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +
| |
| Encoded-Address-Prefix |
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Nested Attributes (optional) ...
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
AddrFam Address family of the source address.
EncType The type of encoding used within a specific
Address
Family. The value `0' is reserved for this
field,
and represents the native encoding of the
Address
Family.
Expires September 1998 [Page 25]
Draft BGMP March 1998
Encoded-Address-Prefix
The Source-prefix in the format described
above.
Nested Attributes No GROUP or SOURCE attributes may be
immediately
nested within a SOURCE attribute.
8.4. Encoding examples
Below are enumerated examples of how various updates are built using
nested attributes, where A ( B ) denotes that attribute B is nested
within attribute A.
(*,G-prefix) Join: JOIN ( GROUP )
(*,G-prefix) Prune: PRUNE ( GROUP )
(S,G) Join towards S : GROUP ( JOIN ( SOURCE ) )
(S,G) Join cancelling prune towards G: GROUP ( JOIN ( SOURCE ) )
(S,G) Prune towards S: GROUP ( PRUNE ( SOURCE ) )
(S,G) Prune towards G: GROUP ( PRUNE ( SOURCE ) )
Switch from (*,G) to (S,G): PRUNE ( GROUP ( JOIN ( SOURCE ) ) )
Switch from (S,G) to (*,G): JOIN ( GROUP )
Initial (*,G) Join with S pruned: JOIN ( GROUP ( PRUNE ( SOURCE ) ) )
8.5. KEEPALIVE Message Format
BGMP does not use any transport protocol-based keep-alive mechanism
to determine if peers are reachable. Instead, KEEPALIVE messages are
exchanged between peers often enough as not to cause the Hold Timer
to expire. A reasonable maximum time between the last KEEPALIVE or
UPDATE message sent, and the time at which a KEEPALIVE message is
sent, would be one third of the Hold Time interval. KEEPALIVE
messages MUST NOT be sent more frequently than one per second. An
implementation MAY adjust the rate at which it sends KEEPALIVE
messages as a function of the Hold Time interval.
If the negotiated Hold Time interval is zero, then periodic KEEPALIVE
messages MUST NOT be sent.
A KEEPALIVE message consists of only a message header, and has a
length of 4 octets.
Expires September 1998 [Page 26]
Draft BGMP March 1998
8.6. NOTIFICATION Message Format
A NOTIFICATION message is sent when an error condition is detected.
The BGMP connection is closed immediately after sending it.
In addition to the fixed-size BGMP header, the NOTIFICATION message
contains the following fields:
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Error code | Error subcode | Data |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Error Code:
This 1-octet unsigned integer indicates the type of
NOTIFICATION. The following Error Codes have been defined:
Error Code Symbolic Name Reference
1 Message Header Error Section 9.1
2 OPEN Message Error Section 9.2
3 UPDATE Message Error Section 9.3
4 Hold Timer Expired Section 9.5
5 Finite State Machine Error Section 9.6
6 Cease Section 9.7
Error subcode:
This 1-octet unsigned integer provides more specific
information about the nature of the reported error. Each
Error
Code may have one or more Error Subcodes associated with it.
If no appropriate Error Subcode is defined, then a zero
(Unspecific) value is used for the Error Subcode field.
Message Header Error subcodes:
Expires September 1998 [Page 27]
Draft BGMP March 1998
2 - Bad Message Length.
3 - Bad Message Type.
OPEN Message Error subcodes:
1 - Unsupported Version Number
4 - Unsupported Optional Parameter
5 - Authentication Failure
6 - Unacceptable Hold Time
UPDATE Message Error subcodes:
1 - Malformed Attribute List
2 - Unrecognized Well-known Attribute
5 - Attribute Length Error
10 - Invalid Prefix Field
Data:
This variable-length field is used to diagnose the reason for the
NOTIFICATION. The contents of the Data field depend upon the
Error Code and Error Subcode. See Section 9 below for more
details.
Note that the length of the Data field can be determined from the
message Length field by the formula:
Message Length = 6 + Data Length
The minimum length of the NOTIFICATION message is 6 octets
(including message header).
9. BGMP Error Handling
This section describes actions to be taken when errors are detected
while processing BGMP messages. BGMP Error Handling is similar to
that of BGP [BGP].
When any of the conditions described here are detected, a
NOTIFICATION message with the indicated Error Code, Error Subcode,
and Data fields is sent, and the BGMP connection is closed. If no
Error Subcode is specified, then a zero must be used.
The phrase "the BGMP connection is closed" means that the transport
protocol connection has been closed and that all resources for that
BGMP connection have been deallocated. The remote peer is removed
Expires September 1998 [Page 28]
Draft BGMP March 1998
from the target list of all tree state entries.
Unless specified explicitly, the Data field of the NOTIFICATION
message that is sent to indicate an error is empty.
9.1. Message Header error handling
All errors detected while processing the Message Header are indicated
by sending the NOTIFICATION message with Error Code Message Header
Error. The Error Subcode elaborates on the specific nature of the
error.
If the Length field of the message header is less than 4 or greater
than 4096, or if the Length field of an OPEN message is less than
the minimum length of the OPEN message, or if the Length field of an
UPDATE message is less than the minimum length of the UPDATE message,
or if the Length field of a KEEPALIVE message is not equal to 4, then
the Error Subcode is set to Bad Message Length. The Data field
contains the erroneous Length field.
If the Type field of the message header is not recognized, then the
Error Subcode is set to Bad Message Type. The Data field contains
the erroneous Type field.
9.2. OPEN message error handling
All errors detected while processing the OPEN message are indicated
by sending the NOTIFICATION message with Error Code OPEN Message
Error. The Error Subcode elaborates on the specific nature of the
error.
If the version number contained in the Version field of the received
OPEN message is not supported, then the Error Subcode is set to
Unsupported Version Number. The Data field is a 2-octet unsigned
integer, which indicates the largest locally supported version number
less than the version the remote BGMP peer bid (as indicated in the
received OPEN message).
If the Hold Time field of the OPEN message is unacceptable, then the
Error Subcode MUST be set to Unacceptable Hold Time. An
implementation MUST reject Hold Time values of one or two seconds.
An implementation MAY reject any proposed Hold Time. An
Expires September 1998 [Page 29]
Draft BGMP March 1998
implementation which accepts a Hold Time MUST use the negotiated
value for the Hold Time.
If one of the Optional Parameters in the OPEN message is not
recognized, then the Error Subcode is set to Unsupported Optional
Parameters.
If the OPEN message carries Authentication Information (as an
Optional Parameter), then the corresponding authentication procedure
is invoked. If the authentication procedure (based on Authentication
Code and Authentication Data) fails, then the Error Subcode is set to
Authentication Failure.
9.3. UPDATE message error handling
All errors detected while processing the UPDATE message are indicated
by sending the NOTIFICATION message with Error Code UPDATE Message
Error. The error subcode elaborates on the specific nature of the
error.
If any recognized attribute has Attribute Length that conflicts with
the expected length (based on the attribute type code), then the
Error Subcode is set to Attribute Length Error. The Data field
contains the erroneous attribute (type, length and value).
If the Encoded-Address-Prefix field in some attribute is
syntactically incorrect, then the Error Subcode is set to Invalid
Prefix Field.
If any other is encountered when processing attributes (such as
invalid nestings), then the Error Subcode is set to Malformed
Attribute List, and the problematic attribute is included in the data
field.
9.4. NOTIFICATION message error handling
If a peer sends a NOTIFICATION message, and there is an error in that
message, there is unfortunately no means of reporting this error via
a subsequent NOTIFICATION message. Any such error, such as an
unrecognized Error Code or Error Subcode, should be noticed, logged
locally, and brought to the attention of the administration of the
Expires September 1998 [Page 30]
Draft BGMP March 1998
peer. The means to do this, however, lies outside the scope of this
document.
9.5. Hold Timer Expired error handling
If a system does not receive successive KEEPALIVE and/or UPDATE
and/or NOTIFICATION messages within the period specified in the Hold
Time field of the OPEN message, then the NOTIFICATION message with
Hold Timer Expired Error Code must be sent and the BGMP connection
closed.
9.6. Finite State Machine error handling
Any error detected by the BGMP Finite State Machine (e.g., receipt of
an unexpected event) is indicated by sending the NOTIFICATION message
with Error Code Finite State Machine Error.
9.7. Cease
In absence of any fatal errors (that are indicated in this section),
a BGMP peer may choose at any given time to close its BGMP connection
by sending the NOTIFICATION message with Error Code Cease. However,
the Cease NOTIFICATION message must not be used when a fatal error
indicated by this section does exist.
9.8. Connection collision detection
If a pair of BGMP speakers try simultaneously to establish a TCP
connection to each other, then two parallel connections between this
pair of speakers might well be formed. We refer to this situation as
connection collision. Clearly, one of these connections must be
closed.
Based on the value of the BGMP Identifier a convention is established
for detecting which BGMP connection is to be preserved when a
collision does occur. The convention is to compare the BGMP
Identifiers of the peers involved in the collision and to retain only
the connection initiated by the BGMP speaker with the higher-valued
BGMP Identifier.
Upon receipt of an OPEN message, the local system must examine all of
Expires September 1998 [Page 31]
Draft BGMP March 1998
its connections that are in the OpenConfirm state. A BGMP speaker
may also examine connections in an OpenSent state if it knows the
BGMP Identifier of the peer by means outside of the protocol. If
among these connections there is a connection to a remote BGMP
speaker whose BGMP Identifier equals the one in the OPEN message,
then the local system performs the following collision resolution
procedure:
1. The BGMP Identifier of the local system is compared to the BGMP
Identifier of the remote system (as specified in the OPEN message).
2. If the value of the local BGMP Identifier is less than the remote
one, the local system closes BGMP connection that already exists (the
one that is already in the OpenConfirm state), and accepts BGMP
connection initiated by the remote system.
3. Otherwise, the local system closes newly created BGMP connection
(the one associated with the newly received OPEN message), and
continues to use the existing one (the one that is already in the
OpenConfirm state).
Comparing BGMP Identifiers is done by treating them as (4-octet long)
unsigned integers.
A connection collision with an existing BGMP connection that is in
Established states causes unconditional closing of the newly created
connection. Note that a connection collision cannot be detected with
connections that are in Idle, or Connect, or Active states.
Closing the BGMP connection (that results from the collision
resolution procedure) is accomplished by sending the NOTIFICATION
message with the Error Code Cease.
10. BGMP Version Negotiation
BGMP speakers may negotiate the version of the protocol by making
multiple attempts to open a BGMP connection, starting with the
highest version number each supports. If an open attempt fails with
an Error Code OPEN Message Error, and an Error Subcode Unsupported
Version Number, then the BGMP speaker has available the version
number it tried, the version number its peer tried, the version
number passed by its peer in the NOTIFICATION message, and the
version numbers that it supports. If the two peers do support one or
more common versions, then this will allow them to rapidly determine
Expires September 1998 [Page 32]
Draft BGMP March 1998
the highest common version. In order to support BGMP version
negotiation, future versions of BGMP must retain the format of the
OPEN and NOTIFICATION messages.
11. BGMP Finite State machine
This section specifies BGMP operation in terms of a Finite State
Machine (FSM). Following is a brief summary and overview of BGMP
operations by state as determined by this FSM.
Initially BGMP is in the Idle state.
Idle state:
In this state BGMP refuses all incoming BGMP connections. No
resources are allocated to the peer. In response to the Start
event (initiated by either system or operator) the local system
initializes all BGMP resources, starts the ConnectRetry timer,
initiates a transport connection to other BGMP peer, while
listening for connection that may be initiated by the remote BGMP
peer, and changes its state to Connect. The exact value of the
ConnectRetry timer is a local matter, but should be sufficiently
large to allow TCP initialization.
If a BGMP speaker detects an error, it shuts down the connection
and changes its state to Idle. Getting out of the Idle state
requires generation of the Start event. If such an event is
generated automatically, then persistent BGMP errors may result in
persistent flapping of the speaker. To avoid such a condition it
is recommended that Start events should not be generated
immediately for a peer that was previously transitioned to Idle
due to an error. For a peer that was previously transitioned to
Idle due to an error, the time between consecutive generation of
Start events, if such events are generated automatically, shall
exponentially increase. The value of the initial timer shall be 60
seconds. The time shall be doubled for each consecutive retry.
Any other event received in the Idle state is ignored.
Connect state:
In this state BGMP is waiting for the transport protocol
connection to be completed.
Expires September 1998 [Page 33]
Draft BGMP March 1998
If the transport protocol connection succeeds, the local system
clears the ConnectRetry timer, completes initialization, sends an
OPEN message to its peer, and changes its state to OpenSent. If
the transport protocol connect fails (e.g., retransmission
timeout), the local system restarts the ConnectRetry timer,
continues to listen for a connection that may be initiated by the
remote BGMP peer, and changes its state to Active state.
In response to the ConnectRetry timer expired event, the local
system restarts the ConnectRetry timer, initiates a transport
connection to other BGMP peer, continues to listen for a
connection that may be initiated by the remote BGMP peer, and
stays in the Connect state.
The Start event is ignored in the Active state.
In response to any other event (initiated by either system or
operator), the local system releases all BGMP resources associated
with this connection and changes its state to Idle.
Active state:
In this state BGMP is trying to acquire a peer by initiating a
transport protocol connection.
If the transport protocol connection succeeds, the local system
clears the ConnectRetry timer, completes initialization, sends an
OPEN message to its peer, sets its Hold Timer to a large value,
and changes its state to OpenSent. A Hold Timer value of 4
minutes is suggested.
In response to the ConnectRetry timer expired event, the local
system restarts the ConnectRetry timer, initiates a transport
connection to other BGMP peer, continues to listen for a
connection that may be initiated by the remote BGMP peer, and
changes its state to Connect.
If the local system detects that a remote peer is trying to
establish BGMP connection to it, and the IP address of the remote
peer is not an expected one, the local system restarts the
ConnectRetry timer, rejects the attempted connection, continues to
listen for a connection that may be initiated by the remote BGMP
peer, and stays in the Active state.
The Start event is ignored in the Active state.
Expires September 1998 [Page 34]
Draft BGMP March 1998
In response to any other event (initiated by either system or
operator), the local system releases all BGMP resources associated
with this connection and changes its state to Idle.
OpenSent state:
In this state BGMP waits for an OPEN message from its peer. When
an OPEN message is received, all fields are checked for
correctness. If the BGMP message header checking or OPEN message
checking detects an error (see Section 6.2), or a connection
collision (see Section 6.8) the local system sends a NOTIFICATION
message and changes its state to Idle.
OpenSent state:
In this state BGMP waits for an OPEN message from its peer. When
an OPEN message is received, all fields are checked for
correctness. If the BGMP message header checking or OPEN message
checking detects an error (see Section 6.2), or a connection
collision (see Section 6.8) the local system sends a NOTIFICATION
message and changes its state to Idle.
If there are no errors in the OPEN message, BGMP sends a KEEPALIVE
message and sets a KeepAlive timer. The Hold Timer, which was
originally set to a large value (see above), is replaced with the
negotiated Hold Time value (see section 4.2). If the negotiated
Hold Time value is zero, then the Hold Time timer and KeepAlive
timers are not started. If the value of the Autonomous System
field is the same as the local Autonomous System number, then the
connection is an "internal" connection; otherwise, it is
"external". (This will effect UPDATE processing as described
below.) Finally, the state is changed to OpenConfirm.
If a disconnect notification is received from the underlying
transport protocol, the local system closes the BGMP connection,
restarts the ConnectRetry timer, while continue listening for
connection that may be initiated by the remote BGMP peer, and goes
into the Active state.
If the Hold Timer expires, the local system sends NOTIFICATION
message with error code Hold Timer Expired and changes its state
to Idle.
In response to the Stop event (initiated by either system or
operator) the local system sends NOTIFICATION message with Error
Expires September 1998 [Page 35]
Draft BGMP March 1998
Code Cease and changes its state to Idle.
The Start event is ignored in the OpenSent state.
In response to any other event the local system sends NOTIFICATION
message with Error Code Finite State Machine Error and changes its
state to Idle.
Whenever BGMP changes its state from OpenSent to Idle, it closes
the BGMP (and transport-level) connection and releases all
resources associated with that connection.
OpenConfirm state:
In this state BGMP waits for a KEEPALIVE or NOTIFICATION message.
If the local system receives a KEEPALIVE message, it changes its
state to Established.
If the Hold Timer expires before a KEEPALIVE message is received,
the local system sends NOTIFICATION message with error code Hold
Timer Expired and changes its state to Idle.
If the local system receives a NOTIFICATION message, it changes
its state to Idle.
If the KeepAlive timer expires, the local system sends a KEEPALIVE
message and restarts its KeepAlive timer.
If a disconnect notification is received from the underlying
transport protocol, the local system changes its state to Idle.
In response to the Stop event (initiated by either system or
operator) the local system sends NOTIFICATION message with Error
Code Cease and changes its state to Idle.
The Start event is ignored in the OpenConfirm state.
In response to any other event the local system sends NOTIFICATION
message with Error Code Finite State Machine Error and changes its
state to Idle.
Whenever BGMP changes its state from OpenConfirm to Idle, it
closes the BGMP (and transport-level) connection and releases all
resources associated with that connection.
Expires September 1998 [Page 36]
Draft BGMP March 1998
Established state:
In the Established state BGMP can exchange UPDATE, NOTIFICATION,
and KEEPALIVE messages with its peer.
If the local system receives an UPDATE or KEEPALIVE message, it
restarts its Hold Timer, if the negotiated Hold Time value is
non-zero.
If the local system receives a NOTIFICATION message, it changes
its state to Idle.
If the local system receives an UPDATE message and the UPDATE
message error handling procedure (see Section 6.3) detects an
error, the local system sends a NOTIFICATION message and changes
its state to Idle.
If a disconnect notification is received from the underlying
transport protocol, the local system changes its state to Idle.
If the Hold Timer expires, the local system sends a NOTIFICATION
message with Error Code Hold Timer Expired and changes its state
to Idle.
If the KeepAlive timer expires, the local system sends a KEEPALIVE
message and restarts its KeepAlive timer.
Each time the local system sends a KEEPALIVE or UPDATE message, it
restarts its KeepAlive timer, unless the negotiated Hold Time
value is zero.
In response to the Stop event (initiated by either system or
operator), the local system sends a NOTIFICATION message with
Error Code Cease and changes its state to Idle.
The Start event is ignored in the Established state.
In response to any other event, the local system sends
NOTIFICATION message with Error Code Finite State Machine Error
and changes its state to Idle.
Whenever BGMP changes its state from Established to Idle, it
closes the BGMP (and transport-level) connection, releases all
resources associated with that connection, and deletes all routes
derived from that connection.
Expires September 1998 [Page 37]
Draft BGMP March 1998
12. Security Considerations
Security issues are not discussed in this memo.
13. Authors' Addresses
Dave Thaler
Department of Electrical Engineering and Computer Science
University of Michigan
1301 Beal Ave.
Ann Arbor, MI 48109-2122
Phone: +1 313 763 5243
EMail: thalerd@eecs.umich.edu
Deborah Estrin
Computer Science Dept./ISI
University of Southern California
Los Angeles, CA 90089
Email: estrin@usc.edu
David Meyer
University of Oregon
1225 Kincaid St.
Eugene, OR 97403
Phone: (541) 346-1747
EMail: meyer@antc.uoregon.edu
14. References
[BGP]
Rekhter, Y., and T. Li, "A Border Gateway Protocol 4 (BGP-4)", RFC
1771, March 1995.
[MBGP]
Bates, T., Chandra, R., Katz, D., and Y. Rekhter, "Multiprotocol
Extensions for BGP-4", RFC 2283, February 1998.
[CBT]
Ballardie, A. J., "Core Based Trees (CBT) Multicast: Architectural
Overview and Specification", University College London, November
1994.
Expires September 1998 [Page 38]
Draft BGMP March 1998
[CBTDM]
Ballardie, A., "Core Based Tree (CBT) Multicast Border Router
Specification" draft-ietf-idmr-cbt-br-spec-00.txt, October 1997.
[DVMRP]
Pusateri, T., "Distance Vector Multicast Routing Protocol", draft-
ietf-idmr-dvmrp-v3-05.txt, October 1997.
[DWR]
Fenner, W., "Domain-Wide Reports", Work in progress.
[INTEROP]
Thaler, D., "Interoperability Rules for Multicast Routing
Protocols", draft-thaler-multicast-interop-01.txt, March 1997.
[IPv6MAA]
R. Hinden, S. Deering, "IPv6 Multicast Address Assignments",
draft-ietf-ipngwg-multicast-assgn-04.txt, July 1997.
[ISSUES]
Meyer, D., "Some Issues for an Inter-domain Multicast Routing
Protocol", draft-ietf-mboned-imrp-some-issues-02.txt, June 1997.
[MASC]
Estrin, D., Handley, M, and D. Thaler, "Multicast-Address-Set
advertisement and Claim mechanism", Work in Progress, June 1997.
[MOSPF]
Moy, J., "Multicast Extensions to OSPF", RFC 1584, Proteon, March
1994.
[PIMDM]
Estrin, et al., "Protocol Independent Multicast-Dense Mode (PIM-
DM): Protocol Specification", draft-ietf-idmr-pim-dm-spec-05.txt,
May 1997.
[PIMSM]
Estrin, et al., "Protocol Independent Multicast-Sparse Mode (PIM-
SM): Protocol Specification", RFC 2117, June 1997.
[REFLECT]
Bates, T., and R. Chandra, "BGP Route Reflection: An alternative to
full mesh IBGP", RFC 1966, June 1996.
Expires September 1998 [Page 39]
Draft BGMP March 1998
[RFC1700]
S. J. Reynolds, J. Postel, "ASSIGNED NUMBERS", RFC 1700, October
1994.
[RFC1771]
Y. Rekhter, T. Li, "A Border Gateway Protocol 4 (BGP-4)", RFC 1771,
March 1995.
Table of Contents
1 Acknowledgements ................................................ 2
2 Purpose ......................................................... 2
3 Terminology ..................................................... 3
4 Protocol Overview ............................................... 5
4.1 Design Rationale .............................................. 6
5 Protocol Details ................................................ 8
5.1 Interaction with the EGP ...................................... 8
5.2 Multicast Data Packet Processing .............................. 9
5.3 BGMP processing of Join and Prune messages and notifications
.............................................................. 9
5.3.1 Receiving Joins ............................................. 9
5.3.2 Receiving Prune Notifications ............................... 10
5.3.3 Receiving Route Change Notifications ........................ 11
5.4 Interaction with M-IGP components ............................. 12
5.4.1 Interaction with DVMRP and PIM-DM ........................... 12
5.4.2 Interaction with PIM-SM ..................................... 14
5.4.3 Interaction with CBT ........................................ 15
5.4.4 Interaction with MOSPF ...................................... 16
6 Interaction with address allocation ............................. 16
6.1 Requirements for BGMP components .............................. 16
7 Transition Strategy ............................................. 16
7.1 Preventing transit through the MBone stub ..................... 18
8 Message Formats ................................................. 19
8.1 Message Header Format ......................................... 19
8.2 OPEN Message Format ........................................... 20
8.3 UPDATE Message Format ......................................... 22
8.4 Encoding examples ............................................. 26
8.5 KEEPALIVE Message Format ...................................... 26
8.6 NOTIFICATION Message Format ................................... 27
9 BGMP Error Handling ............................................. 28
9.1 Message Header error handling ................................. 29
9.2 OPEN message error handling ................................... 29
Expires September 1998 [Page 40]
Draft BGMP March 1998
9.3 UPDATE message error handling ................................. 30
9.4 NOTIFICATION message error handling ........................... 30
9.5 Hold Timer Expired error handling ............................. 31
9.6 Finite State Machine error handling ........................... 31
9.7 Cease ......................................................... 31
9.8 Connection collision detection ................................ 31
10 BGMP Version Negotiation ....................................... 32
11 BGMP Finite State machine ...................................... 33
12 Security Considerations ........................................ 38
13 Authors' Addresses ............................................. 38
14 References ..................................................... 38
Expires September 1998 [Page 41]