Network Working Group                          Rahul Aggarwal (Editor)
Internet Draft                                 Juniper Networks
Expiration Date: April 2005                    Thomas Morin
                                               France Telecom
                                               Luyuan Fang
                                               AT&T

                  Multicast in BGP/MPLS VPNs and VPLS

              draft-raggarwa-l3vpn-mvpn-vpls-mcast-01.txt

Status of this Memo

   By submitting this Internet-Draft, we certify that any applicable
   patent or IPR claims of which we are aware have been disclosed, and
   any of which we become aware will be disclosed, in accordance with
   RFC 3668.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as Internet-
   Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as ``work in progress.''

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/ietf/1id-abstracts.txt

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html.








draft-raggarwa-l3vpn-mvpn-vpls-mcast-01.txt                     [Page 1]


Internet Draft draft-raggarwa-l3vpn-mvpn-vpls-mcast-01.txt  October 2004

Abstract

   This document describes a solution framework for overcoming the
   limitations of existing Multicast VPN (MVPN) and VPLS multicast
   solutions.  It describes procedures for enhancing the scalability of
   multicast for BGP/MPLS VPNs. It also describes procedures for VPLS
   multicast that utilize multicast trees in the sevice provider (SP)
   network.  The procedures described here reduce the overhead of PIM
   neighbor relationships that a PE router needs to maintain for
   BGP/MPLS VPNs. They also reduce the state (and the overhead of
   maintaining the state) in the SP network by removing the need to
   maintain in the SP network at least one dedicated multicast tree per
   each VPN.

Conventions used in this document

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED",  "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in RFC-2119 [KEYWORDS].

1. Contributors

   Rahul Aggarwal
   Yakov Rekhter
   Juniper Networks
   Thomas Morin
   France Telecom
   Luyuan Fang
   AT&T
   Anil Lohiya
   Tom Pusateri
   Lenny Giuliano
   Chaitanya Kodeboniya
   Juniper Networks







draft-raggarwa-l3vpn-mvpn-vpls-mcast-01.txt                     [Page 2]


Internet Draft draft-raggarwa-l3vpn-mvpn-vpls-mcast-01.txt  October 2004

2. Terminology

   This document uses terminology described in [MVPN-PIM], [VPLS-BGP]
   and [VPLS-LDP].

3. Introduction

   [MVPN-PIM] describes the minimal set of procedures that are required
   to build multi-vendor inter-operable implementations of multicast for
   BGP/MPLS VPNs. However the solution described in [MVPN-PIM] has
   undesirable scaling properties. [ROSEN] describes additional
   procedures for multicast for BGP/MPLS VPNs and they too have
   undesirable scaling properties.

   [VPLS-BGP] and [VPLS-LDP] describe a solution for VPLS multicast that
   relies on ingress replication. This solution has certain limitations
   for some VPLS multicast traffic profiles.

   This document describes a solution framework to overcome the
   limitations of existing MVPN [MVPN-PIM, ROSEN] solutions. It also
   extends VPLS multicast to provide a solution that can utilize
   multicast trees in the SP network.

4. Existing Scalability Issues in BGP/MPLS MVPNs

   The solution described in [MVPN-PIM] and [ROSEN] has three
   fundamental scalability issues.

4.1. PIM Neighbor Adjacencies Overhead

   The solution for unicast in BGP/MPLS VPNs [2547] requires a PE to
   maintain at most one BGP peering each with every other PE in the
   network that is participating in BGP/MPLS VPNs. The use of Route
   Reflectors further reduces the number of BGP adjacencies maintained
   by a PE.

   On the other hand for multicast in BGP/MPLS VPNs [MVPN-PIM, ROSEN],
   for a particular MVPN, a PE has to maintain PIM neighbor adjacencies
   with every other PE that has a site in that MVPN. Thus for a given
   PE-PE pair multiple PIM adjacencies are required, one per MVPN that
   the PEs have in common.  This implies that the number of PIM neighbor
   adjacencies that a PE has to maintain is equal to the product of the
   number of MVPNs the PE belongs to and the average number of sites in
   each of these MVPNs.

   For each such PIM neighbor adjacency the PE has to send and receive


draft-raggarwa-l3vpn-mvpn-vpls-mcast-01.txt                     [Page 3]


Internet Draft draft-raggarwa-l3vpn-mvpn-vpls-mcast-01.txt  October 2004

   PIM Hello packets that are transmitted periodically at a default
   interval of 30 seconds. For example, on a PE router with 1000 VPNs
   and 100 sites per VPN, ascenario that is not uncommon in L3VPN
   deployments today, the PE router would have to maintain 100,000 PIM
   neighbors.  With a default of hello interval of 30s, this would
   result in an average of 3,333 hellos per second.

   It is highly desirable to reduce the overhead due to PIM adjacencies
   that a PE router needs to maintain in support of multicast with
   BGP/MPLS VPNs.

4.2. Periodic PIM Join/Prune Messages

   PIM [PIM-SM] is a soft state protocol. It requires PIM Join/Prune
   messages to be transmitted periodically. A PE has to propagate
   customer Join/Prune messages (C-Join/C-Prune) received from a CE,
   that is attached to a multicast VRF on the PE, to other PEs. Each PE
   participating in MVPNs has to periodically refresh these PIM C-
   Join/C-Prune messages. It is desirable to reduce the overhead of the
   periodic PIM control messages. The overheard of PIM C-Join messages
   increases when PIM Join suppression is disabled.  There is a need to
   disable PIM Join suppression as described in section 6.5.2.  This in
   turn further justifies the need to reduce the overhead of periodic
   PIM C-Join messages.

4.3. State in the SP Core

   Unicast in BGP/MPLS VPNs [2547] requires no per VPN state in the SP
   core.  The core maintains state for only PE to PE transport tunnels.
   VPN routing information is maintained only by the PEs participating
   in the VPN service.

   On the other hand [MVPN-PIM] specifies a solution that requires the
   SP core to maintain per MVPN state. This is because a RP rooted
   shared tree is setup using PIM-SM, by default, in the SP core for
   each MVPN. Based on configuration receiver PEs may also switch to a
   source rooted tree for a particular MVPN which further increases the
   number of multicast trees in the SP core.  [ROSEN] specifies the use
   of PIM-SSM for setting up SP multicast trees.  The use of PIM-SSM
   instead of PIM-SM increases the amount of per MVPN state maintained
   in the SP core. Use of Data MDT as specified in [ROSEN] further
   increases the overhead resulting from this state.

   It is desirable to remove the need to maintain per MVPN state in the
   SP core.



draft-raggarwa-l3vpn-mvpn-vpls-mcast-01.txt                     [Page 4]


Internet Draft draft-raggarwa-l3vpn-mvpn-vpls-mcast-01.txt  October 2004

5. Existing Limitation of VPLS Multicast

   VPLS multicast solutions described in [VPLS-BGP] and [VPLS-LDP] rely
   on ingress replication. Thus the ingress PE replicates the multicast
   packet for each egress PE and sends it to the egress PE using a
   unicast tunnel.

   This is a reasonable model when the bandwidth of the multicast
   traffic is low or/and the number of replications performed on an
   average on each outgoing interface for a particular customer VPLS
   multicast packet is small. If this is not the case it is desirable to
   utilize multicast trees in the SP core to transmit VPLS multicast
   packets.  Note that unicast packets that are flooded to each of the
   egress PEs, before the ingress PE performs learning for those unicast
   packets, will still use ingress replication.

   By appropriate IGMP or PIM snooping it is possible for the ingress PE
   to send the packet to only to the egress PEs that have the receivers
   for that traffic, rather than to all the PEs in the VPLS instance.
   While PIM/IGMP snooping allows to avoid the situation where an IP
   multicast packet is sent to PEs with no receivers, there is a cost
   for this optimization. Namely, a PE has to maintain (S,G) state for
   all the (S,G) of all the VPLSs present on the PE.  And not only this,
   but also PIM snooping has to be done not only on the CE-PE
   interfaces, but on Pseudo-Wire (PW) interfaces as well, which in turn
   introduces a non-negligeable overhead on the PE. It is desirable to
   reduce this overhead when IGMP/PIM snooping is used.

6. MVPN Solution Framework

   This section describes the framework for the MVPN solution. This
   framework makes it possible to overcome the existing scalability
   limitations described in section 4.

6.1. PIM Neighbor Maintenance using BGP

   This document proposes the use of BGP for discovering and maintaining
   PIM neighbors in a given MVPN. All the PE routers advertise their
   MVPN membership i.e. the VRFs configured for multicast, to other PE
   routers using BGP. This allows each PE router in the SP network to
   have a complete view of the MVPN membership of other PE routers. A PE
   that belongs to a MVPN considers all the other PEs that advertise
   membership for that MVPN to be PIM neighbors for that MVPN. The
   neighbor is considered up as long as the BGP advertisement is not
   withdrawn. However the PE does not have to perform PIM neighbor
   adjacency management as the PIM neighbor discovery is performed using
   BGP. This eliminates the PIM Hello processing required for


draft-raggarwa-l3vpn-mvpn-vpls-mcast-01.txt                     [Page 5]


Internet Draft draft-raggarwa-l3vpn-mvpn-vpls-mcast-01.txt  October 2004

   maintaining the PIM neighbors.

6.2. PIM Refresh Reduction

   As described in section 4.2 PIM is a soft state protocol. To
   eliminate the need to peridically refresh PIM control messages there
   is a need to build a refresh reduction mechanism in PIM. The detailed
   procedures for this will be specified later.

6.3. Separation of Customer Control Messages and Data Traffic

   BGP/MPLS VPN unicast [2547] maintains a separation between the
   exchange of customer routing information and the transmission of
   customer data i.e.  VPN unicast traffic. VPN routing information is
   exchanged using BGP while VPN data traffic is encapsulated in PE-to-
   PE tunnels. This makes the exchange of VPN routing information
   agnostic of the unicast tunneling technology. This, in turn, provides
   flexibility of supporting various tunneling technologies, without
   impacting the procedures for exchange of VPN routing information.

   [MVPN-PIM] on the other hand uses Multicast Domain (MD) tunnels for
   sending both C-Join messages and C-Data traffic. This creates an
   undesirable dependency between the exchange of customer control
   information and the multicast transport technology.

   Procedures described in section 6.1 make the discovery and
   maintenance of PIM neighbors independent of the multicast transport
   technology in the SP network. The other piece is the exchange of
   customer multicast control information. This document proposes that a
   PE use a PE-to-PE tunnel to send the customer multicast control
   information to the upstream PE that is the PIM neighbor. The C-Join
   packets are encapsulated in a MPLS label before being encapsulated in
   the PE-to-PE tunnel. This label specifies the context of the C-Join
   i.e. the MVPN the C-Join is intended for. Section 9 specifies how
   this label is learned. The destination address of the C-Join is still
   the ALL-PIM-ROUTERS multicast group address. Thus a C-Join packet is
   tunnelled to the PE which is the PIM neighbor for that packet. A
   beneficial side effect of this is that C-Join suppression is
   disabled. As described in section 6.5.2 it is desirable to disable C-
   Join suppression.

6.4. Transport of Customer Multicast Data Packets

   This document describes two mechanisms to transport customer
   multicast data packets over the SP network. One is ingress
   replication and the other is the use of multicast trees in the SP
   network.


draft-raggarwa-l3vpn-mvpn-vpls-mcast-01.txt                     [Page 6]


Internet Draft draft-raggarwa-l3vpn-mvpn-vpls-mcast-01.txt  October 2004

6.4.1. Ingress Replication

   In this mechanism the ingress PE replicates a customer multicast data
   packet of a particular group and sends it to each egress PE which is
   on the path to a receiver of that group. The packet is sent to an
   egress PE using a unicast tunnel. This has the advantage of
   operational simplicity as the SP network doesn't need to run a
   multicast routing protocol. It also has the advantage of minimizing
   state in the SP network. With C-Join suppression disabled, it has an
   advantage of sending the traffic to only the PEs that have the
   receivers for that traffic. This is a reasonable model when the
   bandwidth of the multicast traffic is low or/and the number of
   replications performed by the ingress PE on each outgoing interface
   for a particular customer multicast data packet is small.

6.4.2. Multicast Trees in the SP Network

   This mechanism uses multicast trees in the SP network for
   transporting customer multicast data packets. MD trees described in
   [MVPN-PIM] are an example of such multicast trees. The use of
   multicast trees in the SP network can be beneficial when the
   bandwidith of the multicast traffic is high or when it is desirable
   to optimize the number of copies of a multicast packet transmitted by
   the ingress. This comes at a cost of operational overhead in the SP
   core to build multicast trees and state in the SP core. This document
   places no restrictions on the protocols used to build SP multicast
   trees.

6.5. Sharing a Single SP Multicast Tree across Multiple MVPNs

   This document describes procedures for sharing a single SP multicast
   tree across multiple MVPNs.

6.5.1. Aggregate Default Trees

   An Aggregate Default Tree is a SP multicast tree that can be shared
   across multiple MPVNs and is setup by discovering the egress PEs i.e.
   the leaves of the tree, by using BGP.

   PIM neighbor discovery and maintenance using BGP allows a PE or a RP
   to learn the MVPN membership information of other PEs. This in turn
   allows the creation of one or more Aggregate Default Trees where each
   Aggregate tree is mapped to one or more MVPNs. The leaves of the
   Aggregate Default Tree are determined by the PEs that belong to all
   the MVPNs that are mapped onto the Aggregate Default Tree. Aggregate
   Default Trees remove the need to maintain per MVPN state in the SP
   core as a single SP multicast tree can be used across multiple VPNs.
   As a result they effectively reduce the number of trees in the


draft-raggarwa-l3vpn-mvpn-vpls-mcast-01.txt                     [Page 7]


Internet Draft draft-raggarwa-l3vpn-mvpn-vpls-mcast-01.txt  October 2004

   service provider core and the signaling overhead associated with
   maintaining these trees.  The MVPNs mapped onto an Aggregate Default
   Tree may not necessarily have exactly the same sites. Aggregation
   methodology is discussed further in section 10.

   An Aggregate Default Tree can be either a source tree or a shared
   tree. A source tree is used to carry traffic only for the multicast
   VRFs that exist locally on the root of the tree i.e. for which the
   root has local CEs. A shared tree on the other hand can be used to
   carry traffic belonging to VRFs that exist on other PEs as well.  For
   example a RP based PIM-SM Aggregate tree would be a shared tree.

   Note that like default MDTs described in [MVPN-PIM] Aggregate MDTs
   may result in a multicast data packet for a particular group being
   delivered to PE routers that do not have receivers for that multicast
   group. This is discussed further in section 10.

6.5.2. Aggregate Data Trees

   An Aggregate Data Tree is a SP multicast tree that can be shared
   across multiple MVPNs and is setup by discovering the egress PEs i.e.
   the leaves of the tree, by using C-Join messages. The reason for
   having Aggregate Data Trees is to provide a PE to have the ability to
   create separate SP multicast trees for high bandwidth multicast
   groups. This allows traffic for these multicast groups to reach only
   those PE routers that have receivers in these groups. This avoids
   flooding other PE routers in the MVPN. More than one such multicast
   groups can be mapped on to the same SP multicast tree. The multicast
   groups that are mapped to this SP multicast tree may also belong to
   different MVPNs.

   The setting up of Aggregate Data Trees requires the ingress PE to
   know all the other PEs that have receivers for multicast groups that
   are mapped onto the Aggregate Data Trees. This is learned from the C-
   Joins received by the ingress PE. It requires that C-Join suppression
   be disabled. The procedures used for C-Join propagation as described
   in section 6.3 ensure that Join suppression is not enabled.

   Aggregate Data Tree creation can be triggered on other criterias than
   mere bandwidth once Join suppression is disabled. For instance there
   could be a "pseudo wasted bandwidth" criteria : switching to an
   Aggregate Data Tree would be done if the bandwidth multiplied by the
   number of uninterested PEs (PE that are receiving the stream but have
   no receivers) is above a specified threshold. The motivation is that
   many sparsely subscribed low-bandwidth groups may waste much
   bandwidth, and using an Aggregate Data MDT for a high bandwidth
   multicast stream for which all PEs have receivers is of no use either


draft-raggarwa-l3vpn-mvpn-vpls-mcast-01.txt                     [Page 8]


Internet Draft draft-raggarwa-l3vpn-mvpn-vpls-mcast-01.txt  October 2004

   Note that [ROSEN] describes a limited solution for building Data MDTs
   where a Data MDT cannot be shared across different VPNs.

6.5.3. Setting up Aggregate Default Trees and Aggregate Data Trees

   This document does not place any restrictions on the multicast
   technology used to setup Aggregate Default Trees or Aggregate Data
   Trees.

   When PIM is used to setup multicast trees in the SP core, an
   Aggregate Default Tree is termed as the "Aggregate MDT" and an
   Aggregate Data Tree is termed as an "Aggregate Data MDT". The
   Aggregate MDT may be a shared tree, rooted at the RP, or a shortest
   path tree. Aggregate Data MDT is rooted at the PE that is connected
   to the multicast traffic source. The root of the Aggregate MDT or the
   Aggregate Data MDT has to advertise the P-Group address chosen by it
   for the MDT to the PEs that are leaves of the MDT. These other PEs
   can then Join this MDT. The announcement of this address is done as
   part of the discovery procedures described in section 6.5.5.

6.5.4. Demultiplexing Aggregate Default Tree and Aggregate Data Tree
   Multicast Traffic

   Aggregate Default Trees and Aggregate Data Trees require a mechanism
   for the egress PEs to demultiplex the multicast traffic received over
   the Aggregate Default Tree. This is because traffic belonging to
   multiple MVPNs can be carried over the same tree. Hence there is a
   need to identify the MVPN the packet belongs to. This is done by
   using an inner label that corresponds to the multicast VRF for which
   the packet is intended. The ingress PE uses this label as the inner
   label while encapsulating a customer multicast data packet. Each of
   the egress PEs must be able to associate this inner label with the
   same MVPN and use it to demultimplex the traffic received over the
   Aggregate Default Tree or the Aggregate Data Tree. If downstream
   label assignment were used this would require all the egress PEs in
   the MVPN to agree on a common label for the MVPN.

   We propose a solution that uses upstream label assignment by the
   ingress PE.  Hence the inner label is allocated by the ingress PE.
   Each egress PE has a separate label space for every Aggregate Default
   Tree or Aggregate Data Tree for which the egress PE is a leaf node.
   The egress PEs create a forwarding entry for the inner VPN label,
   allocated by the ingress PE, in this label space. Hence when the
   egress PE receives a packet over an Aggregate Tree (or an Aggregate
   Data Tree), Aggregate Default Tree identifier (or Aggregate Datat
   Tree Identifier) specifies the label space to perform the inner label
   lookup. An implementation may create a logical interface
   corresponding to an Aggregate Default Tree (or an Aggregate Data


draft-raggarwa-l3vpn-mvpn-vpls-mcast-01.txt                     [Page 9]


Internet Draft draft-raggarwa-l3vpn-mvpn-vpls-mcast-01.txt  October 2004

   Tree). In that case the label space to lookup the inner label is an
   interface based label space where the interface corresponds to the
   tree.

   When Aggregate MDTs (or Aggregate Data MDTs) are used the root PE
   source address and the Aggregate MDT (or Aggregate Data MDT) P-group
   address identifies the MDT interface. The label space corresponding
   to the MDT interface is the label space to perform the inner label
   lookup in. A lookup in this label space identifies the multicast VRF
   in which the customer multicast lookup needs to be done.

   If the Aggregate Default Tree or Aggregate Data Tree uses MPLS
   encapsulation the outer MPLS label and the incoming interface
   provides the label space of the label beneath it. This assumes that
   penultimate-hop-popping is disabled. The outer label and incoming
   interface effectively identifes the Aggregate Default Tree or
   Aggregate Data Tree.

   The ingress PE informs the egress PEs about the inner label as part
   of the discovery procedures described in section 6.1.

6.5.5. Encapsulation of the Aggregate Default Tree and Aggregate Data
   Tree

   An Aggregate Default Tree or an Aggregate Data Tree may use an IP/GRE
   encapsulation or a MPLS encapsulation. The protocol type in the
   IP/GRE header in the former case and the protocol type in the data
   link header in the latter need further explanation. This will be done
   in the next version of this document.

6.5.6. Aggregate Default Tree and Aggregate Data Tree Discovery

   Once a PE sets up an Aggregate Default Tree or an Aggregate Data Tree
   it needs to announce the customer multicast groups being mapped to
   this tree to other PEs in the network. This procedure is referred to
   as Aggregate Default Tree or Aggregate Data Tree discovery. For an
   Aggregate Default Tree this discovery implies announcing the mapping
   of all MVPNs mapped to the Aggregate Default Tree. The inner label
   allocated by the ingress PE for each MVPN is included. The Aggregate
   Default Tree Identifier is also included. For an Aggregate Data Tree
   this discovery implies announcing all the specific <C-Source, C-
   Group> entries mapped to this tree along with the Aggregate Data Tree
   Identifer. The inner label allocated for each <C-Source, C-Group> is
   included. The Aggregate Data Tree Identifier is also included.

   The egress PE creates a logical interface corresponding to the
   Aggregate Default Tree or the Aggregate Data Tree identifier. This
   interface is the RPF interface for all the <C-Source, C-Group>


draft-raggarwa-l3vpn-mvpn-vpls-mcast-01.txt                    [Page 10]


Internet Draft draft-raggarwa-l3vpn-mvpn-vpls-mcast-01.txt  October 2004

   entries mapped to that tree.  An Aggregate Default Tree by definition
   maps to all the <C-Source, C-Group> entries belonging to all the
   MVPNs associated with the Aggregate Default Tree. An Aggregate Data
   Tree maps to the specific <C-Source, C-Group> associated with it.

   When PIM is used to setup SP multicast trees, the egress PE also
   Joins the P-Group Address corresponding to the Aggregate MDT or the
   Aggregate Data MDT. This results in setup of the PIM SP tree.

6.5.7. Switching to Aggregate Data Trees

   Aggregate Data Trees provide a PE the ability to create separate SP
   multicast trees for certain <C-S, C-G> entires. The source PE that
   originates the Aggregate Data Tree and the egress PEs have to switch
   to using the Aggregate Data Tree for the <C-S, C-G> entries that are
   mapped to it.

   Once a source PE decides to setup an Aggregate Data Tree, it
   announces the mapping of the <C-S, C-G> entries that are mapped on
   the tree to the other PEs. Depending on the SP multicast technology
   used, this announcement may be done before or after setting up the
   Aggregate Data Tree. After the egress PEs receive the announcement
   they setup their forwarding path to receive traffic on the Aggregate
   Data Tree if they have one or more receivers interested in the <C-S,
   C-G> entries mapped to the tree. This implies setting up the
   demultiplexing forwarding entries based on the inner label as
   described earlier. It also involves changing the RPF interface for
   the relevant <C-S, C-G> entries to the Aggregate Data Tree interface.
   The egress PEs may perform this switch to the Aggregate Data Tree
   once the advertisement from the ingress PE is received or wait for a
   preconfigured timer to do so.

   A source PE may use one of two approaches to decide when to start
   transmitting data on the Aggregate Data tree. In the first approach
   once the source PE sets up the Aggregate Data Tree, it starts sending
   multicast packets for <C-S, C-G> entires mapped to the tree on both
   that tree as well as on the Aggregate Default Tree. After some
   preconfigured timer the PE stops sending multicast packets for <C-S,
   C-G> entries mapped on the Aggregate Data Tree on the Aggregate
   default tree. In the second approach a certain pre-configured delay
   after advertising the <C-S, C-G> entries mapped to an Aggregate Data
   Tree, the source PE begins to send traffic on the Aggregate Data
   Tree. At this point it stops to send traffic for the <C-S, C-G>
   entries, that are mapped on the Aggregate Data Tree, on the Aggregate
   Default Tree. This traffic is instead transmitted on the Aggregate
   Data Tree.



draft-raggarwa-l3vpn-mvpn-vpls-mcast-01.txt                    [Page 11]


Internet Draft draft-raggarwa-l3vpn-mvpn-vpls-mcast-01.txt  October 2004

7. VPLS Multicast

   This document proposes the use of SP multicast trees for VPLS
   multicast.  This allows a SP to have an option when ingress
   replication as described in [VPLS-BGP] and [VPLS-LDP] is not the best
   fit for the customer multicast traffic profile.

   Aggregate Default Trees and Aggreagate Data Trees described in
   section 6 can be used as SP multicast trees for VPLS multicast. No
   restriction is placed on the protocols used for building SP Aggregate
   Default Trees for VPLS. VPLS atuo-disovery as described in [VPLS-BGP]
   or another VPLS auto-discovery mechanism enables a PE to learn the
   VPLS membership of other PEs. This is used by the root of the
   Aggregate Default Tree to map VPLS instances on the Aggregate Default
   Tree. The leaves of the Aggregate Data Trees are learnt using C-
   Join/Prune messages that are received from other PEs as described in
   the next section.

7.1. Propagating VPLS C-Join/Prunes

   PEs participating in VPLS need to learn the <C-S, C-G> information
   for two reasons:
      1. With ingress replication, this allows a PE to send the IP
   multicast packet for a <C-S, C-G> only to other PEs in the VPLS
   instance, that have receivers interested in that particular <C-S, C-
   G>. This eliminates flooding.

      2. It allows the construction of Aggregate Data Trees.

   There are two components for a PE to learn the <C-S, C-G> information
   in a VPLS:

      1. Learning the <C-S, C-G> information from the locally homed
   VSIs.
      2. Learning the <C-S, C-G> information from the remote VSIs.

7.2. IGMP/PIM Snooping

   In order to learn the <C-S, C-G> information from the locally homed
   VSIs a PE needs to implement IGMP/PIM snooping. This is because
   unlike MVPN there is no PIM adjacency between the locally homed CEs
   and the PE. IGMP/PIM snooping has to be used to build the database of
   C-Joins that are being sent by the customer for a particular VSI.
   This also requires a PE to create a IGMP/PIM instance per VSI for
   which IGMP/PIM snooping is used. This instance is analogous to the
   multicast VRF PIM instance that is created for MVPNs.

   It is conceivable that IGMP/PIM snooping can be used to learn <C-S,


draft-raggarwa-l3vpn-mvpn-vpls-mcast-01.txt                    [Page 12]


Internet Draft draft-raggarwa-l3vpn-mvpn-vpls-mcast-01.txt  October 2004

   C-G> information from remote VSIs by snooping VPLS traffic received
   over the SP backbone. However IGMP/PIM snooping is computationally
   expensive.  Furthermore the periodic nature of PIM Join/Prune
   messages implies that snooping PIM messages places even a greater
   processing burden on a PE.  Hence to learn <C-S, C-G> information
   from remote VSIs, this document proposes the use of a reliable
   protocol machinery to transport <C-S, C-G> Joins over the SP
   infrastructure. This is described in the next section.

7.3. C-Join/Prune Propagation in the SP

   A C-Join/Prune message for <C-S, C-G> coming from a customer, that
   are snooped by a PE have to be propagated to the remote PE that can
   reach C-S.  One way to do this is to forward the C-Join/Prune as a
   multicast data packet and let the egress PEs perform IGMP/PIM
   snooping over the pseudo-wire. However PIM is a soft state protocol
   and periodically re-transmits C-Join/Prune messages. This places a
   big burden on a PE while snooping PIM messages. It is not possible to
   eliminate this overhead for snooping messages received over the
   customer facing interfaces. However it is possible to alleviate this
   overhead over SP facing interfaces. This is done by converting
   snooped PIM C-Join/Prune messages to reliable protocol messages over
   the SP network.

   Each PE maintains the database of IGMP/PIM <C-S, C-G> entries that
   are snooped and that are learnt from remote PEs for each VSI.

   Unlike MVPNs there is an additional challenge while propagating
   snooped PIM C-Join/Prune messages over the SP network for VPLS. If
   the ingress PE wishes to propagate the C-Join/Prune only to the
   upstream PE which has reachability to C-S, this upstream PE is not
   known. This is because the local PE doesn't have a route to reach C-
   S. This is unlike MVPNs where the route to reach C-S is known from
   the unicast VPN routing table. This implies that the C-Join/Prune
   message has to be sent to all the PEs in the VPLS. This document
   proposes two possible solutions for achieving this and one of these
   will be eventually picked.

   1. Using PIM

   This is similar to the propagation of PIM C-Join/Prune messages for
   MVPN that has been described earlier in the document. The PIM
   Neighbor discovery and maintenance is based on the VPLS membership
   information learnt as part of VPLS auto-discovery. VPLS auto-disovery
   allows a particular PE to learn which of the other PEs belong to a
   particular VPLS instance. Each of these PEs can be treated as a
   neighbor for PIM procedures while sending PIM C-Join/Prune messages
   to other PEs. The neighbor is considered up as long as the VPLS auto-


draft-raggarwa-l3vpn-mvpn-vpls-mcast-01.txt                    [Page 13]


Internet Draft draft-raggarwa-l3vpn-mvpn-vpls-mcast-01.txt  October 2004

   discovery mechanism does not withdraw the neighbor membership in the
   VPLS instance. This is analogous to the MVPN membership discovery
   used to maintain PIM neighbors for MVPNs.

   The C-Join/Prune messages is sent to all the PEs in the VPLS and PIM
   Join suppression is disabled. PIM refresh reduction mechanisms are
   used. To send the C-Join/Prune message to a particular remote PE, the
   message is encapsulated in the PW used to reach the PE, for the VPLS
   that the C-Join/Prune message belongs to.

   2. Using BGP

   The use of PIM for propagation VPLS C-Join/Prune information may have
   scalability limitations. This is because even after building PIM
   refresh reduction mechanisms PIM will not have optimized transport
   when there is one sender and multiple receivers. BGP provides such
   transport as it has route-reflector machinery. One option to
   propagate the C-Join/Prune information is to use BGP. This is done by
   using the BGP mechanisms described in section 8.

8. BGP Advertisements

   The procedures required in this document use BGP for MVPN membership
   discovery, for Aggregate Default Tree discovery and for Aggregate
   Data Tree discovery. This section first describes the information
   that needs to be propagated in BGP for achieving the functional
   requirements. It then describes a suggested encoding.

8.1. Information Elements

8.1.1. MVPN Membership Discovery

   MVPN membership discovery requires advertising the following
   information, for a particular MVPN, in BGP:
      1. The address of the PE advertising the MVPN. This address is
   used as the PIM neighbor address by other PEs. The issue of whether
   this address is address is shared by all MVPNs or it is different for
   each MVPN needs further discussion in the WG.
      2. The label allocated by the PE for receiving the control
   traffic, for that MVPN, from remote PEs. The MPLS label is used by
   other PEs to send PIM C-Join/Prune messages to this PE. This label
   identifies the multicast VRF for which the C-Join/Prune is intended.
   When ingress replication is used, this label must also be present for
   sending customer multicast traffic.

   When a PE distributes this information via BGP, it must include a
   Route Target Extended Communities attribute. This RT must be an


draft-raggarwa-l3vpn-mvpn-vpls-mcast-01.txt                    [Page 14]


Internet Draft draft-raggarwa-l3vpn-mvpn-vpls-mcast-01.txt  October 2004

   "Import RT" [2547] of each VRF in the MVPN. The BGP distribution
   procedures used by [2547] will then ensure that the advertised
   information gets associated with the right VRFs.

8.1.2. Aggregate Default Tree Discovery

   The root of an Aggregate Default Tree maps one or more MVPNs or VPLS
   instances to the Aggregate Default Tree. It announces this mapping in
   BGP.  Along with the MVPNs or VPLS instances that are mapped to the
   Aggregate Default Tree the Aggregate Default Tree identifier is also
   advertised in BGP.

   The following information is required in the BGP to advertise the
   MVPN or VPLS instance that is mapped to the Aggregate Default Tree:
      1. The address of the router that is the root of the Aggregate
   Default Tree.
      2. The inner label allocated by the Aggregate Default Tree root
   for the MVPN or the VPLS instance. The usage of this label is
   described in section 6.5.4.

   When a PE distributes this information via BGP, it must include the
   following:
      1. An identifier of the Aggregate Default Tree.
      2. Route Target Extended Communities attribute. This is used as
   described in the previous sub-section.

8.1.3. Aggregate Data Tree Discovery

   The root of an Aggregate Data Tree maps one or more <C-Source, C-
   Group> entries to the tree. These entires are advertised in BGP along
   with the the Aggregate Data Tree identifier to which these entires
   are mapped.

   The following information is required in BGP to advertise the <C-
   Source, C-Group> entries that are mapped to the Aggregate Data Tree:
      1. The RD corresponding to the multicast enabled VRF or the VPLS
   instance.  This is required to uniquely identify the <C-Source, C-
   Group> as the addresses could overlap between different MVPNs or VPLS
   instances.
      2. The inner label allocated by the Aggregate Data Tree root for
   the <C-Source, C-Group>. The usage of this label is described in
   section 6.5.4.
      3. The C-Source address. This address can be a prefix in order to
   allow a range of C-Source addresses to be mapped to the Aggregate
   Data Tree.
      4. The C-Group address. This address can be a range in order to
   allow a range of C-Group addresses to be mapped to the Aggregate Data
   Tree.


draft-raggarwa-l3vpn-mvpn-vpls-mcast-01.txt                    [Page 15]


Internet Draft draft-raggarwa-l3vpn-mvpn-vpls-mcast-01.txt  October 2004

   When a PE distributes this information via BGP, it must include the
   following:
      1. An identifier of the Aggregate Data Tree.
      2. Route Target Extended Communities attribute. This is used as
   described in section 8.1.1.

8.1.4. Using BGP for Propagating VPLS C-Joins/Prunes

   Section 7.3 describes PIM and BGP as possible options for propagating
   VPLS C-Join/Prune information. This section describes the information
   elements needed if BGP were to be used to propagate the VPLS C-
   Join/Prune information in the SP network.

   The following information is required in BGP to advertise the VPLS
   <C-Source, C-Group>.
      1. The RD corresponding to the VPLS instance. This is required to
   uniquely identify the <C-Source, C-Group> as the addresses could
   overlap between different VPLS instances.
      2. The C-Source address. This can be a prefix.
      3. The C-Group address. This can be a prefix.

   When a PE distributes this information via BGP, it must include the
   Route Target Extended Communities attribute. This is used as
   described in section 8.1.1.

8.1.5. Aggregate Default Tree/Data Tree Identifier

   Aggregate Default Tree and Aggregate Data Tree advertisments carry
   the Tree identifier. The following information elements are needed in
   this identifier.
       1. Whether this is a shared Aggregate Default Tree or not.
       2. The type of the tree. For example the tree may use PIM-SM or
   PIM-SSM.
       3. The identifier of the tree. For trees setup using PIM the
   identifier is a (S, G) value.

8.2. Suggested Encoding

   This section describes a suggested BGP encoding for carrying the
   information elements described above. This encoding needs further
   discussion.

   A new Subsequence-Address Family (SAFI) called the MVPN SAFI is
   proposed.  Following is the format of the NLRI associated with this
   SAFI:

             +---------------------------------+
             |   Length (2 octets)             |


draft-raggarwa-l3vpn-mvpn-vpls-mcast-01.txt                    [Page 16]


Internet Draft draft-raggarwa-l3vpn-mvpn-vpls-mcast-01.txt  October 2004

             +---------------------------------+
             |   MPLS Labels (variable)        |
             |---------------------------------+
             |    RD   (8 octets)              |
             +---------------------------------+
             |    Multicast Source Length      |
             +---------------------------------+
             |Multicast Source  (Variable)     |
             +---------------------------------+
             |Multicast Group   (Variable)     |
             +---------------------------------+

   For MVPN membership discovery the information elements are encoded in
   the NLRI. The RD and Multicast Group are set to 0.

   For Aggregate Default Tree discovery the information elements for the
   VPN/VPLS instances that are mapped to the Aggregate Default Tree are
   encoded as a NLRI. The RD and Multicast Group are set to 0. This
   advertisement also carries a new attribute to identify the Aggregate
   Deafault Tree.  The root must ensure that the label advertised is
   different for MVPN membership discovery and Aggregate Default Tree
   discovery, to guarantee the uniquess of the NLRIs.

   The address of the PE is required in the above NLRIs to maintain
   uniqueness of the NLRI. Since this address is carried in the NLRI the
   BGP next-hop address in the NEXT_HOP attribute or the
   MP_REACH_ATTRIBUTE must be set to zero by the sender and ignored by
   the receiver.

   For Aggregate Data Tree discovery, the information elements for the
   <C-S, C-G> entires that are mapped to the tree are encoded in a NLRI
   and are set using the information elements described in section
   8.1.3. The address of the Aggregate Data Tree root router is carried
   in the BGP next-hop address of the MP_REACH_ATTRIBUTE.

   For VPLS C-Join/Prune propagation the information elements are
   encoded in a NLRI. The address of the router originating the C-
   Join/Prunes is carried in the BGP next-hop address of the
   MP_REACH_ATTRIBUTE.

   A new optional transitive attribute called the
   Multicast_Tree_Attribute is defined to signal the Aggregate Default
   Tree or the Aggregate Data Tree.  Following is the format of this
   attribute:

             +---------------------------------+
             |S|   Reserved   |   Tree Type    |


draft-raggarwa-l3vpn-mvpn-vpls-mcast-01.txt                    [Page 17]


Internet Draft draft-raggarwa-l3vpn-mvpn-vpls-mcast-01.txt  October 2004

             +---------------------------------+
             |   Tree Identifier               |
             |          .                      |
             |          .                      |
             +---------------------------------+

   The S bit is set if the tree is a shared Aggregate Default Tree. Tree
   type identifies the SP multicast technology used to establish the
   tree. This determines the semantics of the tree identifier. Currently
   two Tree Types are defined:
     1. PIM-SSM MDT
     2. PIM-SM MDT

   When the type is set to PIM-SM MDT or PIM-SSM MDT, the tree
   identifier contains a PIM <P-Source, P-Multicast Group> address.

   Hence MP_REACH identifies the set of VPN customer's multicast trees,
   the Multicast_Tree_Attribute identifies a particular SP tree (aka
   Aggregate Default Tree or Aggregate Data Tree), and the advertisement
   of both in a single BGP Update creates a binding/mapping between the
   SP tree (the Aggregate Default Tree or Aggregate Data Tree) and the
   set of VPN customer's trees.

9. MVPN Neighbor Discovery and Maintenance

   The BGP information described in section 8 is used for MVPN neighbor
   discovery and maintenance. Each PE advertises its multicast VPN
   membership information using BGP. When a PE distributes this
   inforation via BGP, it must include a Route Target Extended
   Communities attribute. This RT must be an "Import RT" [2547] of each
   VRF in the MVPN. The BGP distribution procedures used by [2547] will
   then ensure that each PE learns the other PEs in the MVPN, and that
   this information gets associated with the right VRFs. This allows the
   MVPN PIM instance in a PE to discover all the PIM neighbors in that
   MVPN.

   The advertisement of the BGP information described above by a PE
   implies that the PIM module on that PE that deals with the MVPN
   corresponding to the BGP information is fully functional. When such
   module becomes disfunctional (for whatever reason) the PE MUST
   withdraw the advertisement.

   The neighbor discovery described here is applicable only to BGP/MPLS
   VPNs, and is not applicable to VPLS. This is because VPLS already has
   a membership discovery mechanism.



draft-raggarwa-l3vpn-mvpn-vpls-mcast-01.txt                    [Page 18]


Internet Draft draft-raggarwa-l3vpn-mvpn-vpls-mcast-01.txt  October 2004

9.1. PIM Hello Options

   PIM Hellos allow PIM neighbors to exchange various optional
   capabilities.  The use of BGP for discovering and maintaining PIM
   neighbors may imply that some of these optional capabilities need to
   be supported in the BGP based discovery procedures. Exchanging these
   capabilities via BGP will be described if and when the need for
   supporting these optional capabilities will arise.

10. Aggregation Methodology

   In general the herustics used to decide which MVPNs/VPLS instances or
   <C-S, C-G> entries to aggregate is implementation dependent. It is
   also conceivable that offline tools can be used for this purpose.
   This section discusses some tradeoffs with respect to aggregation.

   The "congruency" of aggregation is defined by the amount of overlap
   in the leaves of the client trees that are aggregated on a SP tree.
   For Aggregate Default Trees the congruency depends on the overlap in
   the membership of the MVPNs/VPLSs that are aggregated on the
   Aggregate Default Tree. If there is complete overlap aggregation is
   perfectly congruent. As the overlap between the MVPNs/VPLSs that are
   aggregated reduces, the congruency reduces.

   If aggregation is done such that it is not perfectly congruent a PE
   may receive traffic for MVPNs or VPLSs to which it doesn't belong. As
   the amount of multicast traffic in these unwanted MVPNs or VPLSs
   increases aggregation becomes less optimal with respect to delivered
   traffic. Hence there is a tradeoff between reducing state and
   delivering unwanted traffic.

   An implementation should provide knobs to control the congruency of
   aggregation. This will allow a service provider to deploy aggregation
   depending on the MVPN/VPLS membership and traffic profiles in its
   network.  If different PEs or RPs are setting up Aggregate Default
   Trees this will also allow a service provider to engineer the maximum
   amount of unwanted MVPN/VPLSs that a particular PE may receive
   traffic for.

   The state/bandwidth optimality trade-off can be further improved by
   having a versatile many-to-many association between client trees and
   provider trees. Thus a MVPN/VPLS can be mapped to multiple Aggregate
   Trees. The mechanisms for achieving this are for further study. Also
   it may be possible to use both ingress replication and an Aggregate
   Tree for a particular MVPN/VPLS. Mechanisms for achieving this are
   also for further study.


draft-raggarwa-l3vpn-mvpn-vpls-mcast-01.txt                    [Page 19]


Internet Draft draft-raggarwa-l3vpn-mvpn-vpls-mcast-01.txt  October 2004

11. Aggregate MDT

   This section describes how PIM can be used for establishing SP
   Aggregate Trees. Such trees are termed as Aggregate MDTs.

   An Aggregate MDT can be established using PIM-SSM or PIM-SM. The
   applicability of PIM-Bidir to Aggregate MDTs is for further study.

   The root of the Aggregate MDT advertises the MVPNs or VPLS instances
   mapped to the Aggregate MDT, along with the Aggregate MDT identifier,
   to the egress PEs that belong to these instances. The MD group
   address associated with the Aggregate MDT is assigned by the router
   that creates the Aggregate MDT. This address along with the source
   address of the router forms the Aggregate MDT Identifier. If the MDT
   is shared, it must be advertised as such in the Aggregate MDT
   identifier advertised in BGP. The MVPNs or VPLS instances are
   advertised using the BGP procedures described in section 8. The BGP
   advertisement also encodes the upstream label assigned by the
   Aggregate MDT root for that MVPN or VPLS instance.

   This information allows the egress PE to associate an Aggregate MDT
   with one or more MVPNs or VPLS instances. The Aggregate MDT Identifer
   identifies the label space to lookup the inner label. The inner label
   identifies the VRF or VSI to do the multicast lookup in after a
   packet is received from the Aggregate MDT.  The Aggregate MDT
   interface is used for the multicast RPF check for the customer
   packet. On the receipt of this information each egress PE joins the
   Aggregate MDT using PIM-SM or PIM-SSM. This results in the creation
   of an Aggregate MDT that can be shared by multiple MVPNs or VPLS
   instances.

11.1. PIM-SSM Source Aggregate MDTs

   When PIM-SSM is used to establish a source based Aggregate Default
   MDT, the root of the Aggregate Default Tree sets the tree type in the
   Tree Identifier advertised in BGP to PIM-SSM. On the receipt of the
   Aggregate MDT to MVPN mapping information each egress PE joins the
   Aggregate MDT using PIM-SSM towards the root of the Aggregate MDT.
   The root of Aggregate MDT is advertised in the Aggregate Default Tree
   Identifier. This results in the setup of the Aggregate MDT in the SP
   network. The Aggregate MDT is used to carry traffic belonging only to
   multicast VRFs or VSIs that are locally homed on the root of the
   Aggregate MDT.




draft-raggarwa-l3vpn-mvpn-vpls-mcast-01.txt                    [Page 20]


Internet Draft draft-raggarwa-l3vpn-mvpn-vpls-mcast-01.txt  October 2004

11.2. PIM-SM Shared Aggregate MDTs

   PIM-SM shared MDTs are setup by the RP. The RP participates in the
   BGP procedures for MVPN discovery. It MUST learn the MVPN membership
   information of other PEs. It MUST also announce any MVPNs for which
   it has locally configured VRFs. It can thus setup Aggregate MDTs.
   While setting up an Aggregate MDT it sets the tree type in the
   Aggregate Tree Identifier to PIM-SM.  Also the tree identifier is
   announced as shared. On the receipt of the Aggregate MDT to MVPN
   mapping information each egress PE joins the Aggregate
    MDT using PIM-SM towards the RP. An egress PE MUST not switch to the
   source tree once it starts to receive traffic on the shared Aggregate
   MDT.

   A PE that receives a customer packet from a VRF or VSI, for which
   there exists an shared Aggregate MDT, encapsulates the packet in an
   outer IP header.  The destination address is set to the corresponding
   Aggregate MDT address and the source address is set to the P-source
   address of the PE. The packet is then encapsulated in a PIM Register
   Message and sent to the RP. The RP decapsulates the packet and then
   forwards it along the Aggregate MDT. The RP MUST not switch to the
   source tree on receiving PIM Register messages.

   If there are multiple RPs, the Aggregate MD addresses between the RPs
   will have to be partitioned by the SP using configuration. To achieve
   RP load balancing a particular RP SHOULD be configured with one set
   of MVPNs or VSIs and another RP SHOULD be configured with a disjoint
   set of MVPNs or VSIs.

11.3. PIM-SSM Shared Aggregate MDTs

   It is possible to setup shared Aggregate MDTs using PIM-SSM. The
   motivation is to leverage some of the simplicity of PIM-SSM and at
   the same time maintain the ability to setup shared Aggregate MDTs,
   which PIM-SM allows to do.

   The root of the Aggregate Default Tree sets the tree type in the
   Aggregate Default Tree Identifier to PIM-SSM. The tree identifer is
   announced as shared. On the receipt of the Aggregate MDT to MVPN
   mapping information each egress PE joins the Aggregate MDT using PIM-
   SSM towards the root of the Aggregate MDT. The root of Aggregate MDT
   is advertised in the Aggregate Default Tree Identifier. This results
   in the setup of the Aggregate MDT in the SP network.

   A PE that receives a customer packet from a VRF or VSI, for which
   there exists an shared Aggregate MDT setup by PIM-SSM, encapsulates
   the packet in a tunnel and sends it to the PIM-SSM root. The tunnel
   can be a MPLS LSP or an IP/GRE tunnel. The root then forwards the


draft-raggarwa-l3vpn-mvpn-vpls-mcast-01.txt                    [Page 21]


Internet Draft draft-raggarwa-l3vpn-mvpn-vpls-mcast-01.txt  October 2004

   packet along the Aggregate MDT.

   There can be multiple routers in a network that setup PIM-SSM shared
   Aggregate MDTs. To achieve load balancing between them a particular
   root SHOULD be configured with one set of MVPNs or VSIs and another
   SHOULD be configured with a disjoint set of MVPNs or VSIs.

12. Aggregate Data MDT

   Aggregate Data MDT is created by an ingress PE using PIM-SSM. It is
   created for one or more customer multicast groups that the PE wishes
   to move to a dedicated SP tree. These groups may belong to different
   MVPNs or VPLS instances. It may be desirable that the set of PEs that
   have receivers belonging to these groups be exactly the same. However
   the procedures for setting up Aggregate Data MDTs do not require
   this. The mapping of an Aggregate Data MDT Identifier to <C-Source,
   C-Group> entries requires a source PE to know the PE routers that
   have receievers in these groups. For MVPN this is learned using the
   C-Join information. For VPLS this was described in section 7.

   The MD group address associated with the Aggregate Data MDT is
   assigned by the router that creates the Aggregate Data MDT. This
   address along with the source address of the router forms the
   Aggregate Data MDT Identifier. The Aggregate Data MDT Identifer is
   advertised in BGP. This identifier MUST not be advertised as shared.
   The mapping of the Aggregate Data MDT Identifier to the <C-Source, C-
   Group> entries is advertised by the ingress PE to the egress PEs
   using the procedures described in section 8. A single BGP Update may
   carry multiple <C-Source, C-Group> addresses as long as they all
   belong to the same VPN. The BGP advertisement also encodes the
   upstream label assigned by the Aggregate Data MDT root for that <C-S,
   C-G> entry.

   This information allows the egress PE to associate an Aggregate Data
   MDT with one or more <C-Source, C-Group>s. On the receipt of this
   information each egress PE can Join the Aggregate Data MDT. This
   results in the setup of the Aggregate Data MDT in the SP network. The
   inner label is used to identify the VRF or VSI to do the multicast
   lookup in after a packet is received from the Aggregate Data MDT. It
   is also needed for multicast RPF check for MVPNs.





draft-raggarwa-l3vpn-mvpn-vpls-mcast-01.txt                    [Page 22]


Internet Draft draft-raggarwa-l3vpn-mvpn-vpls-mcast-01.txt  October 2004

13. Data Forwarding

   The following diagram shows the progression of the packet as it
   enters and leaves the SP network when the Aggregate MDT or Aggregate
   Data MDTs are being used for multiple MVPNs or multiple VPLS
   instances. MPLS-in-GRE [MPLS-IP] encapsulation is used to encapsulate
   the customer multicast packets.

      Packets received        Packets in transit      Packets forwarded
      at ingress PE           in the service          by egress PEs
                              provider network

                              +---------------+
                              |  P-IP Header  |
                              +---------------+
                              |      GRE      |
                              +---------------+
                              | VPN Label     |
      ++=============++       ++=============++       ++=============++
      || C-IP Header ||       || C-IP Header ||       || C-IP Header ||
      ++=============++ >>>>> ++=============++ >>>>> ++=============++
      || C-Payload   ||       || C-Payload   ||       || C-Payload   ||
      ++=============++       ++=============++       ++=============++

   The P-IP header contains the Aggregate MDT (or Aggregate Data MDT) P-
   group address as the destination address and the root PE address as
   the source address. The receiver PE does a lookup on the P-IP header
   and determines the MPLS forwarding table in which to lookup the inner
   MPLS label. This table is specific to the Aggregate MDT (or Aggregate
   Data MDT) label space. The inner label is unique within the context
   of the root of the MDT (as it is assigned by the root of the MDT,
   without any coordination with any other nodes). Thus it is not unique
   across multiple roots.  So, to unambiguously identify a particular
   VPN one has to know the label, and the context within which that
   label is unique. The context is provided by the P-IP header.

   The P-IP header and the GRE header is stripped. The lookup of the
   resulting VPN MPLS label determines the VRF or the VSI in which the
   receiver PE needs to do the C-multicast data packet lookup. It then
   strips the inner MPLS label and sends the packet to the VRF/VSI for
   multicast data forwarding.




draft-raggarwa-l3vpn-mvpn-vpls-mcast-01.txt                    [Page 23]


Internet Draft draft-raggarwa-l3vpn-mvpn-vpls-mcast-01.txt  October 2004

14. Security Considerations

   Security considerations discussed in [2547], [MVPN-PIM], [VPLS-BGP]
   and [VPLS-LDP] apply to this document.

15. Acknowledgments

   We would like to thank Yuji Kamite and Eric Rosen for their comments.

16. Normative References

   [PIM-SM]  "Protocol Independent Multicast - Sparse Mode (PIM-SM)",
   Fenner, Handley, Holbrook, Kouvelas, October 2003, draft-ietf-pim-
   sm-v2-new-08.txt

   [2547] "BGP/MPLS VPNs", Rosen, Rekhter, et. al., September 2003,
   draft-ietf-l3vpn-rfc2547bis-01.txt

   [RFC2119] "Key words for use in RFCs to Indicate Requirement
   Levels.", Bradner, March 1997

   [RFC3107] Y. Rekhter, E. Rosen, "Carrying Label Information in
   BGP-4", RFC3107.

   [VPLS-BGP] K. Kompella, Y. Rekther, "Virtual Private LAN Service",
   draft-ietf-l2vpn-vpls-bgp-02.txt

   [VPLS-LDP] M. Lasserre, V. Kompella, "Virtual Private LAN Services
   over MPLS", draft-ietf-l2vpn-vpls-ldp-03.txt

   [MPLS-IP] T. Worster, Y. Rekhter, E. Rosen, "Encapsulating MPLS in IP
   or Generic Routing Encapsulation (GRE)", draft-ietf-mpls-in-ip-or-
   gre-08.txt








draft-raggarwa-l3vpn-mvpn-vpls-mcast-01.txt                    [Page 24]


Internet Draft draft-raggarwa-l3vpn-mvpn-vpls-mcast-01.txt  October 2004

17. Informative References

   [MVPN-PIM] R. Aggarwal, A. Lohiya, T. Pusateri, Y. Rekhter, "Base
   Specification for Multicast in MPLS/BGP VPNs", draft-raggarwa-
   l3vpn-2547-mvpn-00.txt

   [ROSEN] E. Rosen, Y. Cai, I. Wijnands, "Multicast in MPLS/BGP IP
   VPNs", draft-rosen-vpn-mcast-07.txt

18. Author Information

18.1. Editor Information

   Rahul Aggarwal
   Juniper Networks
   1194 North Mathilda Ave.
   Sunnyvale, CA 94089
   Email: rahul@juniper.net

18.2. Contributor Information

   Yakov Rekhter
   Juniper Networks
   1194 North Mathilda Ave.
   Sunnyvale, CA 94089
   Email: yakov@juniper.net

   Thomas Morin
   France Telecom R & D
   2, avenue Pierre-Marzin
   22307 Lannion Cedex
   France
   Email: thomas.morin@francetelecom.com

   Luyuan Fang
   AT&T
   200 Laurel Avenue, Room C2-3B35
   Middletown, NJ 07748
   Phone: 732-420-1921
   Email: luyuanfang@att.com

   Anil Lohiya
   Juniper Networks
   1194 North Mathilda Ave.


draft-raggarwa-l3vpn-mvpn-vpls-mcast-01.txt                    [Page 25]


Internet Draft draft-raggarwa-l3vpn-mvpn-vpls-mcast-01.txt  October 2004

   Sunnyvale, CA 94089
   Email: alohiya@juniper.net

   Tom Pusateri
   Juniper Networks
   1194 North Mathilda Ave.
   Sunnyvale, CA 94089
   Email: pusateri@juniper.net

   Lenny Giuliano
   Juniper Networks
   1194 North Mathilda Ave.
   Sunnyvale, CA 94089
   Email: lenny@juniper.net

   Chaitanya Kodeboniya
   Juniper Networks
   1194 North Mathilda Ave.
   Sunnyvale, CA 94089
   Email: ck@juniper.net


19. Intellectual Property

   The IETF takes no position regarding the validity or scope of any
   Intellectual Property Rights or other rights that might be claimed to
   pertain to the implementation or use of the technology described in
   this document or the extent to which any license under such rights
   might or might not be available; nor does it represent that it has
   made any independent effort to identify any such rights.  Information
   on the procedures with respect to rights in RFC documents can be
   found in BCP 78 and BCP 79.

   Copies of IPR disclosures made to the IETF Secretariat and any
   assurances of licenses to be made available, or the result of an
   attempt made to obtain a general license or permission for the use of
   such proprietary rights by implementers or users of this
   specification can be obtained from the IETF on-line IPR repository at
   http://www.ietf.org/ipr.

   The IETF invites any interested party to bring to its attention any
   copyrights, patents or patent applications, or other proprietary
   rights that may cover technology that may be required to implement
   this standard.  Please address the information to the IETF at ietf-
   ipr@ietf.org.



draft-raggarwa-l3vpn-mvpn-vpls-mcast-01.txt                    [Page 26]


Internet Draft draft-raggarwa-l3vpn-mvpn-vpls-mcast-01.txt  October 2004

20. Full Copyright Statement

   Copyright (C) The Internet Society (2004). This document is subject
   to the rights, licenses and restrictions contained in BCP 78 and
   except as set forth therein, the authors retain all their rights.

   This document and the information contained herein are provided on an
   "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
   OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET
   ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED,
   INCLUNG BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE
   INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
   WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

21. Acknowledgement

   Funding for the RFC Editor function is currently provided by the
   Internet Society.
















draft-raggarwa-l3vpn-mvpn-vpls-mcast-01.txt                    [Page 27]