Mbone Deployment Working Group Hugh LaMaster INTERNET-DRAFT Steve Shultz Category: Informational NASA ARC/NREN draft-ietf-mboned-mix-00.txt John Meylor Operations and Management Area David Meyer Internet Engineering Task Force Cisco Systems 12 November 1998 Expires May 1999 Multicast-Friendly Internet Exchange (MIX) <draft-ietf-mboned-mix-00.txt Status of this Memo This document is an Internet-Draft. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as ``work in progress.'' To learn the current status of any Internet-Draft, please check the ``1id-abstracts.txt'' listing contained in the Internet-Drafts Shadow Directories on ftp.is.co.za (Africa), nic.nordu.net (Europe), munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast), or ftp.isi.edu (US West Coast). Abstract This document describes an architecture for a Multicast-friendly Internet eXchange (MIX), and the actual implementation at the NASA Ames Research Center Federal Internet eXchange (FIX-West, or FIX). The MIX has three objectives: native IP multicast routing, scalable interdomain policy-based route exchange, and to allow a variety of LaMaster, et al. [Page 1]
<draft-ietf-mboned-mix-01.txt November 1998 IGP protocols and topologies for intra-domain use. In support of these objectives, the MIX architecture defines the following components: a peer-peer routing protocol, a method for multicast forwarding, a method for exchanging information about active sources, and a medium which provides native multicast. This document describes the protocols and configurations necessary to provide a current, working multicast-friendly internet exchange, or MIX. This memo is a product of the MBONE Deployment Working Group (MBONED) in the Operations and Management Area of the Internet Engineering Task Force. Submit comments to <mboned@ns.uoregon.edu or the authors. Copyright Notice Copyright (C) The Internet Society (1998). All Rights Reserved. Acknowledgments Thanks to the NASA HPCC program for supporting the NREN staff portion of this project; thanks to William P. Jones of the NASA ARC Gateway Facility for making the gateway facility available for housing this project. 1. Introduction The MIX objective was to use current technology to implement a scalable, high-performance, efficient, native IP multicast architecture. Past experience at ARC, NASA WANs, and at FIX-West, had shown that mrouted/DVMRP "Mbone" tunnels were an inefficient of routing multicast through an exchange point. Specifically, at FIX-West, the large number of tunnels often resulted in unicast traffic loads on LaMaster, et al. [Page 2]
<draft-ietf-mboned-mix-01.txt November 1998 the FIX FDDI that were 10 times the underlying multicast load. In addition, some WANs had multiple tunnels criss-crossing the same physical links, resulting in wasted WAN bandwidth. And, the separate workstation and router infrastructure for the "Mbone" tunnels created numerous problems. Maintenance of Unix system and tunnel configurations was often ad hoc, because some of the network operators lacked the necessary expertise. And the hardware and software configuration and performance of the tunnel infrastructure was often out of step with the underlying router-based unicast structure. In addition, use of a single, shared, distance-vector IGP in the inter-domain space led to instability. Therefore, it was desired to implement a new multicast internet exchange from the ground up, using current technology, and significantly improving performance, efficiency, and reliability. Four elements were identified as being necessary for the MIX architecture in order to meet the objectives. These were to define a peer-peer routing protocol, a method for multicast forwarding, a method for exchanging information about distant sources and groups, and a non-switched broadcast medium. NASA Ames Research Center hosts the Federal Internet eXchange (FIX- West, or, "the FIX") as well as hosting the Ames Internet eXchange (AIX), which is connected at high speed to the MAE-West, and, which also shares the same address space as the MAE-West. These facilities are co-located at the Ames Telecommunications Gateway Facility. It was felt that this would be an excellent location to test the viability of the native multicast technologies. The Multicast- friendly Internet eXchange (MIX) is co-located adjacent to the FIX for easy access from the existing FIX routers. Choices were made for each element, and the MIX was implemented adjacent to the existing NASA ARC FIX gateway facility. At the time of writing, there are eight direct participants in the MIX, peering and exchanging routes and multicast traffic natively, and the performance and reliability have already far exceeded the tunneled infrastructure the MIX replaced. 2. Requirements and Technology In order to meet the objectives for this multicast exchange, all LaMaster, et al. [Page 3]
<draft-ietf-mboned-mix-01.txt November 1998 peering partners had to agree mutually to standardize on the following four elements. These are: - the protocol to be used for multicast route exchange - the method for performing multicast forwarding - the method for identifying active sources - the physical medium for the multicast exchange The elements chosen to implement the MIX were BGP4+ (also known as "MBGP") for routing and route exchange [BGP4+], PIM-DM and PIM-SM for multicast forwarding on the exchange, dense-mode flooding, and, the MSDP protocol for information on sources and groups, and, FDDI for the multicast medium. 2.1 Routing Two of the objectives of the MIX were to provide an EGP for scalable interdomain policy-based route exchange, and to allow a variety of IGP protocols and topologies for intra-domain use. As with unicast interdomain routing, BGP could be used as the EGP to exchange routes for multicast. However, the unicast and multicast routing paths and policies would have to be completely congruent. In practice, this is sometimes not the case. It is possible, however, to take advantage of the extensions in BGP4+ to deal with these policy and path incongruencies. BGP4+ [BGP4+] describes extensions to (unicast) BGP that allow use of the existing BGP machinery to provide the necessary scalability, policy control, and route stability features and mechanisms to be applied to both unicast and multicast routes consistently. BGP4+ allows routes to be marked "unicast forwarding", "multicast forwarding", or "both unicast and multicast forwarding". In this way, BGP4+ supports different multicast and unicast forwarding paths and policies. This removes the dependency on unicast-only routing. The ability of BGP4+ to support separate paths and policies for multicast is important for meeting the objectives of the exchange in various ways. It allows for a participant's multicast routing policy to be independent of its established unicast routing policy. This is important in order that the exchange can support providers migrating to BGP4+ as an IDMR. This is because it allows for the exchange of routes previously exchanged via DVMRP, even though those routes would not meet the existing unicast routing policy. It allows for LaMaster, et al. [Page 4]
<draft-ietf-mboned-mix-01.txt November 1998 different policy in the interim. For example, routes may be exchanged for BGP4+ multicast forwarding even though they would not be permitted under existing unicast routing policy. BGP4+ also provides for the possibility that even after full migration is complete, a separate multicast routing policy can be applied. The exchange architecture imposes no requirements on the IGP or the multicast forwarding protocol or topology used internal to an AS. 2.2 Multicast Forwarding The first requirement for the multicast forwarding protocol is that it be able to use routes exchanged via BGP4+. For this reason, PIM was selected. For the MIX, PIM-Dense-Mode (PIM-DM) was selected initially for the mutually agreed upon multicast forwarding process. By flooding data using PIM-DM, it was possible to provide information about active sources to PIM-SM RP's co-located on the MIX. Migration to PIM-Sparse-Mode (PIM-SM) with MSDP is underway. The use of PIM on a shared LAN has certain consequences. It is necessary for all MIX participants to agree on certain configuration conventions affecting PIM forwarding on multi-access LANs. In particular, it is necessary to establish a standard protocol "metric preference" (also known as "distance" or process "precedence") to be used by all peers for the PIM Assert process, because the PIM Assert process [PIM-SM] uses the "metric preference" [PIM-SM] as a mechanism by which the multicast forwarder is chosen. If all parties are not following the convention, there may be black holes, in which a route appears to be valid, but traffic does not flow, or, there may be multicast loops, which can have deleterious consequences. For the MIX, a standard set of metric preferences are applied to the BGP4+ routes as the convention for the PIM forwarding mechanism. 2.3 Active Sources There are two current methods for distributing information about active sources to participating AS's. The AS's may be dense-mode regions, or, they may contain PIM-SM RP's. One method is to use dense-mode to flood data packets to dense-mode regions and to sparse-mode RPs co-located on the exchange. The second method is to LaMaster, et al. [Page 5]
<draft-ietf-mboned-mix-01.txt November 1998 use a protocol that allows each AS to share information about the sources contained within it. For the MIX, it was decided use dense-mode, and, all participating sparse-mode peers would co-locate their RP's on the router directly- connected to the MIX. Dense-mode, including PIM-DM, and (mrouted-based) DVMRP, uses data flooding to propagate information about active source-group or <S,G pairs throughout the global multicast routing world. Unwanted sources are pruned back, and are periodically re-flooded in order to fully refresh forwarding state in mrouters. This is a simple and very reliable method of propagating information on source-group pairs, but the effectiveness of dense-mode depends upon reliable pruning, and flooding traffic to propagate <S,G information over WANs does not scale well. Recently, a new protocol, MSDP [MSDP] has been proposed that, when combined with PIM-SM, will allow independent AS's to share information about distant sources and groups without flooding. Instead of flooding all data, only <S,G information is flooded, and then, only to systems, such as PIM-SM RP's, which require the information. MSDP allows each AS to choose its own mode, sparse or dense, and also to run its own sparse-mode region independent of all other sparse-mode regions. MSDP has now been deployed on many of the MIX routers, and some MIX- connected AS's are now running sparse-mode internally. This deployment is ongoing, and is not yet complete. 2.4 Medium The objective for the MIX medium was to provide support for native multicast among multiple peering partners. There exist a number of unresolved issues regarding use of layer-2 switched media at interexchange points, and, until these issues are resolved, running native multicast on such media is problematic. Fortunately, BGP4+ permits unicast and multicast to be carried on different media, permitting a multicast medium to be used independently of the unicast medium. A FDDI concentrator was selected to provide the native multicast exchange medium. It was router-efficient, because it permitted the medium to do the multicast packet replication, with a single copy LaMaster, et al. [Page 6]
<draft-ietf-mboned-mix-01.txt November 1998 from a router being replicated to all neighbors. Using a simple broadcast medium eliminates the complexity of using a switch for multicast. And FDDI was considered operationally convenient by most of the participants. Unicast traffic continues to be routed over the existing unicast exchange media. 3. The NASA Ames Research Center Multicast-Friendly Internet Exchange The Ames Multicast-friendly Internet eXchange, or MIX, began with the first beta-test trials in March 1998, and became operational, exchanging BGP4+ routes externally and using BGP4+ between multiple AS's, in May 1998. NREN implemented BGP4+ and internal BGP4+ and began trial external peerings in the same time frame, evolving from the first trials, to full deployment by October. As of October 1998, there were 8 AS's peering using BGP4+ and actively exchanging multicast on the MIX FDDI. One of the AS's, AS10888, represents a multi-router virtual BGP4+ backbone, and a router within AS10888 has been located on the MIX by NREN, as a gateway router. The physical and logical topologies are as follows: AS10888---R----"MBone" | MIX | multicast_exchange ---------------- / \ / \ bgp4+_peer---R R---bgp4+_peer \ / \ / --------------- FIX unicast_exchanges AS10888 acts as a transit AS to connect other multicast-friendly exchanges to the NASA ARC MIX. It also acts as a gateway between the DVMRP-based "Mbone" and the BGP4+ area. 4. Topology, Architecture, and Special Considerations LaMaster, et al. [Page 7]
<draft-ietf-mboned-mix-01.txt November 1998 BGP4+ -PIM Asserts and Metric preference The PIM Assert mechanism requires that all routing protocols "compete" to see which router is allowed for forward onto the shared medium. To first order, the protocol metric preference is used to determine the forwarder. All MIX peers must coordinate routing protocol parameters so that one router does not inadvertantly win PIM asserts over a neighbor which has a functional path. This requires that BGP4+ routes have preference over other routes, such as BGP, OSPF, and DVMRP. In particular, it was necessary to standardize protocol metric preferences, and give BGP4+ routes the lowest, preferred, dynamic routing protocol metric preferences. For this reason, the standard set of BGP4+ metric preferences was chosen to be less than any other dynamic unicast routing protocol metric preferences. Any MIX routers which are using DVMRP must use a DVMRP metric preference higher than the BGP4+ metric preferences, rather than what many people have used previously as the DVMRP metric preference, of 0. -Default One transitional requirement is the necessity to have routes to "Mbone" sources, that is, sources within the global DVMRP routing region. Currently, the mechanism used is to have a single router in AS10888 on the MIX originate MBGP default to all external peers. DVMRP routing -DVMRP route redistribution At present, all BGP4+ routes tagged with a particular community are redistributed at the MIX into DVMRP within AS10888. This is to provide DVMRP region users access to sources originating within AS's that are being routed via BGP4+ exclusively. Unless a particular community string is set, it is assumed that redistribution is not desired. In the reverse direction, instead of sending DVMRP routes into BGP4+, BGP4+ default is originated from the intermediary router. In addition, local, stub-region DVMRP routes are redistributed into BGP4+ internally by several of the peers. As long as the regions remain stub regions, there is no danger, but, the possibility of a backdoor into the Mbone presents an ever-present LaMaster, et al. [Page 8]
<draft-ietf-mboned-mix-01.txt November 1998 threat of loops unless care is taken to redistribute only the routes which are known to be owned within the AS. 5. Conclusions and Recommendations -Provide support for native multicast -Use BGP4+ as a method of exchanging routes for inter-domain multicast -Use PIM-DM, or PIM-SM with MSDP -Concurrent use of BGP4+ and DVMRP for inter-domain routing is not recommended. It is strongly recommended to use BGP4+ for inter-domain route exchange. 6. Security Considerations There are no security considerations unique to the multicast exchange. 7. References [DVMRP] T. Pusateri, "Distance Vector Multicast Routing Protocol", <draft-ietf-idmr-dvmrp-v3-07.txt, August 1998. [BGP4+] T. Bates, R. Chandra, D. Katz, Y. Rekhter, "Multiprotocol Extensions for BGP-4", RFC 2283, February 1998. [BGP4+2] T. Bates, R. Chandra, D. Katz, Y. Rekhter, "Multiprotocol Extensions for BGP-4", Internet Draft, <draft-ietf-idr-bgp4-multiprotocol-v2-01.txt, August 1998. [PIM-SM] D. Estrin, D. Farinacci, A. Helmy, D. Thaler, S. Deering, M. Handley, V. Jacobson, C. Liu, P. Sharma, L. Wei, "Protocol Independent Multicast-Sparse Mode (PIM-SM): Protocol Specification", RFC 2362, June 1998. [PIM-DM] S. Deering, D. Estrin, D. Farinacci, V. Jacobson, A. Helmy, D. Meyer, L. Wei, "Protocol Independent Multicast Version 2 Dense Mode Specification", Internet Draft, <draft-ietf-pim-v2-dm-01.txt, November 1998. [MSDP] D. Farinacci, Y. Rekhter, P. Lothberg, H. Kilmer, J. Hall, LaMaster, et al. [Page 9]
<draft-ietf-mboned-mix-01.txt November 1998 "Multicast Source Discovery Protocol (MSDP)", <draft-farinacci-msdp-00.txt, June 1998. Author's Address Hugh LaMaster Steve Shultz NASA Ames Research Center Mail Stop 233-21 Moffett Field, CA 94035-1000 email: hlamaster@arc.nasa.gov shultz@arc.nasa.gov David Meyer John Meylor Cisco Systems San Jose, CA email: dmm@cisco.com jmeylor@cisco.com 8. Full Copyright Statement Copyright (C) The Internet Society (1998). All Rights Reserved. This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implmentation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the Internet Society or other Internet organizations, except as needed for the purpose of developing Internet standards in which case the procedures for copyrights defined in the Internet languages other than English. The limited permissions granted above are perpetual and will not be revoked by the Internet Society or its successors or assigns. This document and the information contained herein is provided on an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE." LaMaster, et al. [Page 10]
<draft-ietf-mboned-mix-01.txt November 1998 Table of Contents 1 Introduction .................................................... 2 2 Requirements and Technology ..................................... 3 3 The NASA Ames MIX ............................................... 7 4 Topology, Architecture, and Special Considerations .............. 7 5 Conclusions and Recommendations ................................. 9 6 Security Considerations ......................................... 9 7 References ...................................................... 9 8 Full Copyright Statement ........................................ 10 LaMaster, et al. [Page 11]