- 1 -



Network Working Group                                         S. Brim
Request for Comments: DRAFT                        Cornell University
                                                           Y. Rekhter
                                T.J. Watson Research Center, IBM Corp.
                                                          October 1992


                 IP Multicast Communications Using BGP



Status of this Memo

   This document reflects the current status of recommendations for
   supporting inter-domain multicast packet forwarding using BGP.  This
   RFC specifies an IAB standards track protocol for the Internet
   community, and requests discussion and suggestions for improvements.
   Please refer to the current edition of the "IAB Official Protocol
   Standards" for the standardization state and status of this protocol.
   Distribution of this document is unlimited.

   This document is an Internet Draft. Internet Drafts are working
   documents of the Internet Engineering Task Force (IETF), its Areas,
   and its Working Groups. Note that other groups may also distribute
   working documents as Internet Drafts.

   Internet Drafts are draft documents valid for a maximum of six
   months. Internet Drafts may be updated, replaced, or obsoleted by
   other documents at any time. It is not appropriate to use Internet
   Drafts as reference material or to cite them other than as a "working
   draft" or "work in progress".


Abstract


   This document, a major revision of the previous version, reflects the
   current status of recommendations for supporting inter-domain
   multicast packet forwarding using BGP.  Research is underway on other
   methods for inter-domain multicasting, but only what can be done
   today is considered here.









Expiration Date March 1993                                      [Page 1]


                           - 2 -



1 Introduction


   Most communication in the Internet today is unicasting, where there
   is a single specific destination for every packet.  On local area
   networks broadcasting is common, in which the the destination of a
   packet is every node on the network.  Multicasting is like
   broadcasting in that it supports multiple recipients for a single
   packet, but packets are intended for a specific group.  As examples,
   in a local area network environment multicasting is currently often
   used for communication between processors of loosely coupled systems,
   or for communication between routers or bridges.  In these cases the
   sender wants to reach the members of a special group, but not every
   node on the network.  Broadcasting is in fact a special case of
   multicasting in which the special group is all nodes.

   Multicasting over wide areas, as opposed to just on local area
   networks, is an important capability the development of which has
   lagged far behind its need.  Until recently there has only been one
   IP multicasting implementation which could be used on more than just
   a local area network [1], [9], but multicast packets had to be
   encapsulated in order to send them across autonomous system
   boundaries.  Work is now in progress to develop support for wide-area
   multicasting using standard protocols and to make that support an
   integrated part of the Internet protocol suite.  Part of that work is
   being done under the auspices of the IETF BGP Working Group,
   including this document which explores how multicasting can be
   supported between autonomous systems using the Border Gateway
   Protocol (BGP) [5], [7], [8].

   The best introductory reference on multicast forwarding is by Deering
   [2].  It is highly recommended that this paper, plus some of its
   references, as well as the BGP RFCs be read before this one, since
   this one assumes a high degree of understanding of BGP and only
   summarizes some parts of Deering's presentation.

   We speak in terms of multicast groups.  In IP multicasting, multicast
   groups are known by their addresses, which are in the range from
   224.0.0.0 to 239.255.255.255.  Each group has a unique address.  Each
   member of a group is known by one or more multicast addresses in
   addition to one or more unicast addresses.  RFC 1112 [1] defines
   methods for mapping between IP multicast group addresses and the
   level 2 address spaces of ethernet, 802.3, all point-to-point
   protocols, and protocols with broadcast but no multicast capability
   (e.g. LocalTalk).  Mappings are also defined for FDDI [4] and SMDS





Expiration Date March 1993                                      [Page 2]


                           - 3 -



   [6].

   The general issues in wide-area multicasting are:


       - Discovering where packets should be sent.  The destination
         address for a multicast packet refers to a multicast group,
         whose members and locations change over time.  On a local area
         network all destinations hear the same packet, and there is no
         need for forwarding.  When a multicast packet must be forwarded
         to the members of a group not on the same local area network,
         we need mechanisms by which the members can make themselves
         known to the routers and the routers can ensure that every
         member of a group receives at least one and preferably only one
         copy of a packet addressed to that group.

       - Establishing efficient routing paths for multicast packets.  A
         packet with multiple recipients must be replicated and sent on
         multiple links, but copies of the packet should travel over as
         few links as possible.  We need routing protocols defined for
         use both within and between autonomous systems.

   A major issue which has received increasing attention recently is how
   well inter-domain multicast routing can scale, given that we are now
   thinking in terms of an Internet which should be able to support a
   billion domains.

   This document assumes that hosts will inform routers of their
   membership in multicast groups, probably via the Internet Group
   Management Protocol [1].  It attempts to explore three remaining
   problems -- communication between routers of where multicast packets
   should be sent, efficient propagation of packets between autonomous
   systems, and interactions between intra-autonomous system and inter-
   autonomous system routing protocols in the support of multicasting.

2 Reverse Path Forwarding


   The only approach to wide-area routing of multicast packets that has
   been implemented so far uses "reverse path forwarding" and is
   described in RFC 1075 and in [2].  This approach would fit well in
   the BGP environment, offering low overhead and excellent interaction
   with IGPs.  Also the implemented method is directly applicable to BGP
   already.  However, it may not allow the level of administrative
   control of routing paths to which some network administrators have





Expiration Date March 1993                                      [Page 3]


                           - 4 -



   become accustomed (see Section 4.1).

   In every approach to forwarding multicast packets the problem faced
   by a particular router is to determine its position in the paths by
   which multicast packets from a particular source should be forwarded.
   A router needs to (1) determine whether to accept a particular
   multicast packet or to discard it, based on its originating source
   and immediate previous hop, and (2) once a packet has been accepted,
   decide which of its peers to forward it to, if any.

   If a "source tree" defines the paths by which an end system sends
   unicast packets to all other end systems, then a "sink tree" defines
   how an end system is reached by unicast packets from all others.  The
   goal in unicast routing is to make a destination reachable by packets
   from all sources; the goal in multicast routing is to ensure that a
   packet from a single source reaches multiple destinations.  Obviously
   a set of paths that solves the first problem can be used to solve the
   second if we use it in the reverse direction.  The basic reverse path
   forwarding approach uses the fact that the propagation of unicast IP
   routing information already causes the formation of a sink tree --
   the graph of how unicast packets should flow to that IP entity from
   all others.  Thus when multicast packets need to be routed from that
   network to multiple destinations, a broadcast tree with that source
   as the root has already been formed, and this approach simply
   arranges for the multicast packets to flow along certain branches of
   that tree, but in the opposite direction of the unicast packets.

   This procedure establishes paths for efficient broadcast, but network
   bandwidth is still wasted by sending multicast packets along all
   branches of the sink tree even when there are no nodes on those
   branches interested in receiving them.  Further mechanisms can be
   defined to dynamically ensure that multicast packets are only sent to
   those peers which are on paths leading to members of the destination
   multicast group, for example via the "prune" and "graft" messages
   described in RFC 1075.  A prune message is sent to tell a BGP peer
   not to send it packets addressed from a particular source to a
   particular multicast group.  A graft message is sent to cancel that
   directive.

   Prune messages can be cached and timed out by the receiver, and
   repeated as necessary by the sender.  A border router can maintain a
   table of which interfaces packets from a particular source to a
   particular target multicast group should and should not be forwarded
   on, depending on memory constraints and multicast activity in the
   Internet.





Expiration Date March 1993                                      [Page 4]


                           - 5 -



3 BGP and Reverse Path Forwarding


   The BGP protocol itself does not have to be changed to support
   inter-domain multicasting, but implementation of IGMP "prune" and
   "graft" messages by the BGP speaker is required.

   Functionally, in reverse path forwarding, if a border router which
   receives a multicast packet receives it on the link by which it would
   send a unicast packet to the originator of that multicast packet,
   then it will propagate that multicast packet to the other BGP peers
   which are using it to reach the originator and which have not sent a
   "prune" message for that {originator, group} combination.  In all
   current implementations of the BGP protocol, a border router has an
   implicit confirmation of whether its external peers are using routes
   that it has offered to them through the "echo" inherent in the BGP
   update messages (as strongly encouraged in the BGP4 Internet Draft).
   This combined with prune messages can efficiently limit propagation
   of multicast packets to only those branches that want them.

   However, without some extra features it is impossible for border
   routers to exchange prune and graft information across an autonomous
   system.  A border router can use information obtained through
   examining LOC_PREF attributes and/or other means to detect if it is
   its own AS's exit point for sending unicast packets to a particular
   multicast source.  If it is not, then the border router would never
   propagate multicast packets from that source into its AS or across
   its AS to others.  However, if it is the AS's unicast exit point for
   a particular source, then without any way to gather further
   information it will have to forward multicast packets across its AS
   to all other AS border gateways, since (in reverse path forwarding)
   it has no way of knowing if there is a group member beyond one of the
   other border gateways or not.

   There are two solutions to this problem which are reasonable.  The
   first, which is recommended here, is to define new IGMP prune and
   graft messages.  Prunes and grafts were originally meant to be
   messages from a router to one of its immediate neighbors, telling the
   neighbor whether it has recipients downstream from it with respect to
   a particular multicast {source, group} combination.  To solve the
   above problem we can create new IGMP prune and graft messages which
   would be advisory -- these messages would be sent, for example, from
   one AS border router to another, telling it that the originator has
   no recipients downstream from it with respect to a particular
   multicast source that the AS is reaching through the recipient border





Expiration Date March 1993                                      [Page 5]


                           - 6 -



   router.

   Another possibility would be to require the establishment of
   multicast "tunnels", as used by mrouted [9], between multicast-
   capable border routers.  The tunnels would be used for sending
   encapsulated IGMP prunes and grafts between the border routers,
   bypassing the AS's internal routing.  If tunnels are used, it would
   be best to have the multicast data packets carried in the tunnels as
   well -- one copy of a packet would be multicast into the AS if there
   were group members in the AS, and other copies would be encapsulated
   and sent directly to the other border routers that had not sent a
   prune for that {source, group} combination.  The tunnels could be
   automatically created when the BGP connection is created.

   Neither of these solutions would require changes to BGP, but both
   would couple multicast routing to knowledge of BGP routing
   information in the border routers.  While the proposed solutions are
   similar to each other, the first one have an advantage of not
   requiring the establishment of multicast "tunnels", thus simplifying
   the operation of the protocol.

4 Potential Problems with Reverse Path Forwarding


4.1 Asymmetric routes


   As long as the path by which one node reaches another is the exact
   reverse of how the other node reaches the first (symmetric routes),
   unicast and RPF-based multicast packets will flow along the same
   paths.  However, the Internet supports, and frequently has,
   asymmetric routes between ASs.  Network administrators currently set
   policies for how they want their networks to reach others, but, since
   in reverse path forwarding multicast packets flow according to how a
   node is reached, not according to how it reaches others, if routes
   are not symmetrical the behavior of the multicast packets will be
   controlled in the opposite way of what the network managers intended
   when they set up the controls for unicast traffic.

   Discussions in IETF meetings suggest that while most network managers
   would not mind if multicast packets flowed from their ASs along the
   paths which others use to send unicast packets to them, there are
   some who would like to retain more control of how multicast packets
   flow through the Internet.  There are ways to add source-based
   control, but they all add significant overhead either to protocol





Expiration Date March 1993                                      [Page 6]


                           - 7 -



   traffic or to network administration.  The cost to everyone seems to
   outweigh the benefit gained by a few.  We can probably set up a
   mechanism similar to that in the "unified" routing scheme [3], where
   the majority of traffic is taken care of by simple, low-overhead
   routing, and for the small number of cases where it is necessary more
   complex routing can be used.

4.2 Incremental Implementation


   One consideration is how easy it will be to get from the current
   Internet to one that mostly supports multicasting (getting to an
   Internet which fully supports multicasting is not a reasonable goal).
   Since in reverse path forwarding multicast routing depends directly
   on unicast routing, incremental implementation in the Internet might
   be awkward.  There is no way to detect which routers support
   multicast routing, and thus no way to know if multicast packets can
   get between any two points, directly from the network itself.
   Tunnels may easily be set up, as described in RFC 1075, to reach
   between islands of multicast-supporting routers, but again with RPF
   there is no way of knowing when these (relatively inefficient)
   tunnels should be in place and when they are no longer necessary
   without frequent dialog between network operators.

   Once again this is not an extreme difficulty, and network
   administrators are careful enough that they will probably be aware of
   their tunnel topology and their neighbors' activities, and able to
   control them effectively.

   Another proposal, which has been called "multicast fireworks" because
   of the way multicast packets would "explode", essentially says that
   one should not require multicast forwarding to ever be completely
   deployed Internet-wide, that the Internet will be in a hybrid state,
   with some tunnels connecting multicast-capable ASs, for a very long
   time, perhaps forever.

4.3 Scaling


   Many people have valid concerns about the capability of any multicast
   routing algorithm to scale to support 10^9 autonomous systems.  In
   the case of reverse path forwarding, some people wonder about the
   involved in not propagating multicast group member locations in the
   first place, and essentially discovering them by sending data packets
   everywhere and using prune responses to clean up the forwarding tree





Expiration Date March 1993                                      [Page 7]


                           - 8 -



   after the fact.  Under traditional RPF, every multicast group with
   global scope periodically sends at least one packet to every part of
   the world, regardless of whether there are group members there or
   not.  Since it is not necessary for a node to be a member of a group
   in order to send messages to that group, the alternative of
   propagating membership information (instead of the using the "probe"
   data packets) would require propagating membership information for
   each group everywhere, to any node that might want to send to that
   group.  Propagating knowledge of group membership would require at
   least one packet for each member-containing network to be sent to
   every leaf of the Internet, each time that member-containing network
   transitioned between having zero and at least one member.  On the
   other hand using data packets and prune messages would require one
   packet to be sent to every constituent of the Internet for the entire
   group, as opposed to sending one for each member-containing network.
   More data packets would be sent periodically, but the frequency would
   depend on the times specified in the prune messages. It is expected
   that these times will be long and that routers will use graft
   messages as necessary.  Since a graft message will only be sent if
   data packets for a particular group is desired, graft messages are
   only incrementally more traffic than the data itself will be and are
   not significant as overhead.  Thus, independent of the topology of
   the Internet, it is always cheaper to use the prune/graft approach
   than it is to propagate membership information.  Finally, prune
   messages need not apply to just the particular group and source for
   the packet that triggers them.  It can be shown that if the source
   and group fields in the prune message are prefix-based, and prunes
   are sent which cover all unwanted groups and sources, essentially in
   anticipation of future data "probe" packets, that very few of these
   packets will ever be sent.

   Since reverse path forwarding works with whatever address prefixes
   are in the route information base at any BGP node, and keeps only
   cached information about active multicast sources and groups, the
   amount of stored information required will continue to scale well as
   the Internet grows.

   There is a draft document based on Tony Ballardie's ongoing thesis
   work, in which he and others propose "core-based trees", basically
   that members of a group form a tree based on a well-known set of
   "core" nodes, and that senders of packets to that group need know
   nothing about the membership; they should simply send their packets
   toward the core.  When the packets hit the tree formed by the members
   they will begin following all branches of the tree from that point.
   This scheme seems to have great potential, in that it doesn't flood





Expiration Date March 1993                                      [Page 8]


                           - 9 -



   the Internet with either membership notifications or "probe" data
   packets, and thus it should scale well.  Policies can be applied to
   some degree and traffic will flow from a source toward the tree
   basically according to the source's preferences.  However, there is a
   chance that it might have significant overhead in maintaining trees,
   since participants must be sure that a particular "core" node is
   functioning, and adapt rapidly if it is not.  The detailed mechanisms
   of actually making the scheme work robustly are still being explored
   and a subject of future research.

5 Acknowledgments


   The development of some of the ideas presented in this document was
   supported by the Defense Advanced Research Project Agency through
   grant NAG 2-593 from the NASA Ames Research Center.  This work would
   not have been possible without the help of the IETF BGP Working
   Group, John Moy, and Steve Deering.

References


   [1] S.Deering, "Host extensions for IP multicasting", RFC 1112,
   Network Information Center, Aug. 1989.

   [2] S. Deering, "Multicast Routing in a Datagram Internetwork", PhD
   thesis, Electrical Engineering Dept., Stanford University, Dec.
   1991.

   [3] D.Estrin, Y.Rekhter, and S.Hotz, "A Unified Approach to Inter-
   Domain Routing", RFC 1322, Network Information Center, May 1991.

   [4] D.Katz, "A Proposed Standard for the Transmission of IP Datagrams
   over FDDI Networks", RFC 1188, Network Information Center, Oct. 1990.

   [5] K.Lougheed and Y.Rekhter, "A Border Gateway Protocol 3 (BGP-3)",
   RFC 1267, Network Information Center, Oct. 1991.

   [6] D.Piscitello and J.Lawrence, "A Specification of the Transmission
   of IP Datagrams Over SMDS", RFC 1209, Network Information Center,
   Mar. 1991.

   [7] Y.Rekhter and P.Gross, "Applications of the Border Gateway
   Protocol in the Internet", RFC 1268, Network Information Center, Oct.
   1991.





Expiration Date March 1993                                      [Page 9]


                           - 10 -



   [8] Y.Rekhter and T.Li, "A Border Gateway Protocol 4 (BGP-4)",
   Internet Draft, Network Information Center, June 1992.

   [9] D.Waitzman, C.Partridge, and S.Deering, "Distance vector
   multicast routing protocol", RFC 1075, Network Information Center,
   Nov. 1988.

Security Considerations


   Security issues are not discussed in this memo

Authors' Addresses


   Scott W. Brim
   Cornell Information Technologies
   143 Caldwell Hall
   Cornell University
   Ithaca, NY 14853
   USA

   Phone: +1-607-255-5510
   EMail: Scott_Brim@cornell.edu


   Yakov Rekhter
   T.J. Watson Research Center IBM Corporation
   P.O. Box 218
   Yorktown Heights, NY 10598

   Phone: +1-914-945-3896
   EMail: yakov@watson.ibm.com

















Expiration Date March 1993                                     [Page 10]