NVO3 working group A. Ghanwani
Internet Draft Dell
Intended status: Standards Track L. Dunbar
Expires: June 2015 Huawei
V. Bannai
Paypal
R. Krishnan
Brocade
December 8, 2014
Framework of Supporting Applications Specific Multicast in NVO3
draft-ghanwani-nvo3-app-mcast-framework-01
Status of this Memo
This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79.
This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79. This document may not be modified,
and derivative works of it may not be created, except to publish it
as an RFC and to translate it into languages other than English.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet-
Drafts.
Internet-Drafts are draft documents valid for a maximum of six
months and may be updated, replaced, or obsoleted by other documents
at any time. It is inappropriate to use Internet-Drafts as
reference material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html
This Internet-Draft will expire on April 8, 2009.
Copyright Notice
Copyright (c) 2014 IETF Trust and the persons identified as the
document authors. All rights reserved.
Ghanwani, et al. Expires June 8, 2015 [Page 1]
Internet-Draft Framework of App multicast in NVO3 December 2014
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with
respect to this document. Code Components extracted from this
document must include Simplified BSD License text as described in
Section 4.e of the Trust Legal Provisions and are provided without
warranty as described in the Simplified BSD License.
Abstract
This draft discusses the framework of supporting applications
specific multicast traffic, i.e. the non ARP/ND related
multicast/broadcast traffic, in a network that uses Network
Virtualization using Overlays over Layer 3 (NVO3). It describes the
various mechanisms and considerations that can be used for
delivering those application specific multicast traffic in networks
that use NVO3.
Table of Contents
1. Introduction...................................................3
2. Conventions used in this document..............................3
3. Multicast mechanisms in networks that use NVO3.................4
3.1. No multicast support......................................5
3.2. Replication at the source NVE.............................5
3.3. Replication at a multicast service node (MSN).............7
3.3.1. Egress NVEs Maintaining (S-TID, G) state.............8
3.3.2. Multicast Agnostic NVEs..............................8
3.4. IP multicast in the underlay..............................9
3.5. Multicast Group Membership management....................11
3.5.1. When Layer 2 network is attached to NVE.............11
3.5.2. When Layer 3 network is attached to NVE.............12
3.6. Other schemes............................................12
4. Simultaneous use of more than one mechanism...................12
5. Summary.......................................................13
6. Security Considerations.......................................13
7. IANA Considerations...........................................13
8. References....................................................13
8.1. Normative References.....................................13
8.2. Informative References...................................14
9. Acknowledgments...............................................15
Ghanwani, et al. [Page 2]
Internet-Draft Framework of App multicast in NVO3 December 2014
1. Introduction
Network virtualization using Overlays over Layer 3 (NVO3) is a
technology that is used to address issues that arise in building
large, multitenant data centers that make extensive use of server
virtualization [PS].
This draft discusses the framework of supporting application
specific multicast traffic, i.e. the non ARP/ND related
multicast/broadcast traffic, in a network that uses Network
Virtualization using Overlays over Layer 3 (NVO3). It describes the
various mechanisms and considerations that can be used for
delivering those application specific multicast traffic in networks
that use NVO3.
The Application Specific Multicast traffic, either Source-Specific
Multicast (SSM) or Any-source Multicast (ASM), has the following
characteristics:
1. Receiver hosts in multicast sites will join multicast content
the way they do today - they use IGMP (IPv4) or MLD (IPv6). The
multicast sources and listeners are in the Tenant System
address domain.
2. The list of multicast listeners for each multicast group is not
known in advance. Therefore, NVA can't get the list of
participants for each multicast group ahead of time, and
3. The underlay network in NVO3 environment may not support IP
multicast protocol, such as PIM.
The reader is assumed to be familiar with the terminology as defined
in the NVO3 Framework document [FW] and NVO3 Architecture document
[NVO3-ARCH].
2. Conventions used in this document
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in RFC-2119 [RFC2119].
In this document, these words will appear with that interpretation
only when in ALL CAPS. Lower case uses of these words are not to be
interpreted as carrying RFC-2119 significance.
Ghanwani, et al. [Page 3]
Internet-Draft Framework of App multicast in NVO3 December 2014
ASM: Any-Source Multicast allows hosts to transmit to/from groups
without any restriction on the location of end-user computers
or VMs by allowing any receiving host to become a transmission
source.
Application Specific Multicast: includes Source-Specific Multicast
(SSM), Any-Source Multicast (ASM), or other multicast traffic
that are not derived from ARP/ND protocols.
MSN: Multicast Service Node
SSM: Source-Specific Multicast is a method of delivering multicast
packets in which the only packets that are delivered to a
receiver are those originating from a specific source address
requested by the receiver. By so limiting the source, SSM
reduces demands on the network and improves security. SSM
requires that the receiver specify the source address and
explicitly excludes the use of the (*,G) join for all
multicast groups in RFC 3376, which is possible only in IPv4's
IGMPv3 and IPv6's MLDv2.
S-TID: Source address of a multicast group in Tenant System address
domain
3. Multicast mechanisms in networks that use NVO3
In NVO3 environments, traffic between NVEs is transported using a
tunnel encapsulation such as VXLAN [VXLAN], NVGRE [NVGRE], STT
[STT], etc.
Besides the need to support the Address Resolution Protocol (ARP)
and Neighbor Discovery (ND), there are several applications that
require the support of multicast and/or broadcast in data centers
[DC-MC]. With NVO3, there are many possible ways that multicast may
be handled in such networks. We discuss some of the attributes of
the following four methods, but other methods are also possible.
1. No multicast support.
2. Replication at the source NVE.
3. Replication at a multicast service node.
4. IP multicast in the underlay.
Ghanwani, et al. [Page 4]
Internet-Draft Framework of App multicast in NVO3 December 2014
These mechanisms are briefly mentioned in the NVO3 Framework [FW]
and NVO3 architecture [NVO3-ARCH] document. This document attempts
to fill in some more details about the basic mechanisms underlying
each of these mechanisms and discusses the issues and tradeoffs of
each.
3.1. No multicast support
In this scenario, there is no support whatsoever for multicast
traffic when using the overlay. This can only work if the following
conditions are met:
1. All of the traffic is unicast. In other words, there is no
applications specific multicast traffic in the network and the
only multicast/broadcast traffic is from ARP/ND protocols and
flooding of frames with an unknown MAC destination address.
2. A network virtualization authority (NVA) is used by the NVEs to
determine the target's MAC/IP address to egress NVE mapping. In
other words, there is no data plane learning, and address
resolution requests via ARP/ND that are issued by the VMs must
be resolved by the NVE that they are attached to.
With this approach, certain multicast/broadcast applications such as
DHCP can be supported by use of a helper function in the NVE.
The main issues that need to be addressed with this mechanism are
the handling of hosts for which a mapping does not already exist in
the NVA or hosts that participate in application specific multicast.
This issue can be particularly challenging if such end systems are
reachable through more than one NVE.
3.2. Replication at the source NVE
With this method, the overlay attempts to provide a multicast
service without requiring any specific support from the underlay,
other than that of a unicast service. A multicast or broadcast
transmission is achieved by replicating the packet at the source
NVE, and making copies, one for each destination NVE that the
multicast packet must be sent to.
Ghanwani, et al. [Page 5]
Internet-Draft Framework of App multicast in NVO3 December 2014
For this mechanism to work, the source NVE must know, a priori, the
IP addresses of all destination NVEs that need to receive the
packet. For example, for a specific multicast group, the source NVE
must know the IP addresses of all the remote NVEs where there are
members of the tenant subnet and multicast group in question.
In addition, the NVE may need to support the IGMP/MLD snooping
function, i.e. to listen in on the IGMP/MLD conversation between
hosts and routers. By listening to these conversations the NVEs can
maintain a map of which hosts need which IP multicast streams. For
some environment, it might be necessary to prevent the hosts which
are not in a multicast group from receiving the specific multicast
traffic.
This approach can add some complication to the Ingress NVE when the
members of a multicast group are outside of the NVO3 domain. Those
NVEs may have to interface with external multicast protocols.
Another drawback with this method is that we have multiple copies of
the same packet that will traverse any common links that are along
the path to each of the destination NVEs. If, for example, a tenant
subnet is spread across 50 NVEs, the packet would have to be
replicated 50 times at the source NVE. This also creates an issue
with the forwarding performance of the NVE, especially if it is
implemented in software. When it is necessary to prevent hosts under
an NVE that are not in an application specific multicast from
receiving the multicast traffic, the NVE needs to maintain the
multicast group membership.
Note that this method is similar to what was used in VPLS [VPLS]
prior to extensive support of MPLS multicast [MPLS-MC]. If matching
MPLS PE with NVO3's NVE, there are some similarities between MPLS
VPN and the NVO3 overlay. However, there are some key differences:
- The client attachment to VPN PEs is somewhat static, whereas in a
DC that allows VMs to migrate anywhere, the VMs attachment to NVEs
can be changing.
- The number of PEs to which one VPN client is attached in MPLS VPN
environment is normally less than the number of NVEs to which DC
client's VMs are attached.
When a VPN client has multiple multicast groups, [RFC6513]
"Multicast VPN" combines all those multicast groups within each
VPN client to one single multicast group in the MPLS (or VPN)
core. End result: All messages from any multicast groups
Ghanwani, et al. [Page 6]
Internet-Draft Framework of App multicast in NVO3 December 2014
belonging to one VPN client will reach all the PE nodes of the
client. I.e. any messages belonging to any multicast groups under
Client XX will reach all PEs of the Client XX. When the Client XX
only has a handful of PEs, there is not too much bandwidth wasted
in the core.
In DC environment, a typical vSwitch may only supports 10~20 VMs.
A subnet with 200 VMs may spread across 200 vSwitches in the worst
case scenario.
Using "MPLS VPN multicast approach" will have to create a
Multicast group in the core for this client network to reach 200
NVEs. If only small percentage of this client's VMs participate in
application specific multicast, a great number of NVEs will
receive multicast traffic that should not be forwarded to their
attached VMs.
Therefore, the Multicast VPN solution may not scale in DC
environment with dynamic attachment of Virtual Networks to NVEs and
greater number of NVEs for each virtual network.
3.3. Replication at a multicast service node (MSN)
With this method, all multicast packets would be sent using a
unicast tunnel encapsulation to a multicast service node. The
multicast service node, in turn, would create multiple copies of the
packet and would deliver a copy, using a unicast tunnel
encapsulation, to each of the NVEs that are part of the multicast
group for which the packet is intended.
This mechanism is similar to that used by the ATM Forum's LAN
Emulation [LANE] specification [LANE].
The following are the possible ways for the Multicast Service Nodes
(MSN)to get proper membership information for each multicast group:
- The Multicast Service Nodes can exchange information with TSs
multicast routers,
- The multicast server nodes can retrieve or query the multicast (S,
G) state information with the NVEs that snoop or process the
Ghanwani, et al. [Page 7]
Internet-Draft Framework of App multicast in NVO3 December 2014
IGMP/MLD messages to maintain the current (S-TID, G) state, See
Section 3.3.1. Or
- For the NVEs that don't process or snoop IGMP/MLD messages
(usually the server based NVEs), empower those NVEs to steer the
IGMP/MLD messages to the Multicast Service nodes via a special
encapsulation, e.g. outer address destination being the multicast
server node. See Section 3.3.2 for detail.
Unlike the method described in Section 3.2, there is no performance
impact at the ingress NVE, nor are there any issues with multiple
copies of the same packet from the source NVE to the multicast
service node. However there remain issues with multiple copies of
the same packet on links that are common to the paths from the
multicast service node to each of the egress NVEs. Additional
issues that are introduced with this method include the availability
of the multicast service node, methods to scale the services offered
by the multicast service node, and the sub-optimality of the
delivery paths.
Finally, the IP address of the source NVE must be preserved in
packet copies created at the multicast service node if data plane
learning is in use. This could create problems if IP source address
reverse path forwarding (RPF) checks are in use.
3.3.1. Egress NVEs Maintaining (S-TID, G) state
Most, if not all, switches/routers support IGMP/MLD snooping.
Therefore, we can assume that the switches/routers based NVEs, i.e.
the NVE function added to existing switches/routers, can maintain
(S-TID, G) state by snooping the IGMP query/report messages for the
TSs attached.
Those NVEs, which are the egress NVEs for the multicast groups, can
send (S-TID, G) update to the Multicast Service Node (MSN) whenever
there is any change.
3.3.2. Multicast Agnostic NVEs
This is a scheme to enable TSs multicast in NVO3 environment without
NVE doing extra work for IGMP/MLD messages. This approach is
Ghanwani, et al. [Page 8]
Internet-Draft Framework of App multicast in NVO3 December 2014
applicable to Server based NVEs that don't support IGMP/MLD. For
NVEs that terminate IGMP or MLD, see Section 3.3.2 for detail.
"Multicast Service Node" can send VN scoped broadcast "IGMP query"
that are encapsulated in overlay header. The overlay header has the
following property:
Inner Source = Multicast Service Node
Inner Dest Addr = IGMP multicast address
Outer Source = the NVE module embedded in (or attached to) MSN
Outer Dest Addr = (unicast) NVEs that participate in the VN.
The NVEs perform the same procedure as other data frames: i.e.
decapsulate and forwarded to the attached TSs.
NVEs get the mapping of Inner address of the MSN <-> MSN-NVE address
from NVA, just like mappings of TS <-> attached NVE from NVA.
If the NVEs perform real time learning of inner-outer address
mapping, the NVE should have learned the following mapping:
Inner: the MSN Address ==> Outer: MSN-NVE address.
When the reply (IGMP report) from the targeted TSs arrives at the
NVEs, the normal encapsulation by the NVEs will have the outer DA as
the MSN-NVE address. The MSN-NVE performs the normal NVE function,
i.e. decapsulates the outer header and sends the inner data frame,
i.e. the IGMP/MLD report, to the MSN.
3.4. IP multicast in the underlay
In this method, the underlay supports IP multicast and the ingress
NVE encapsulates the packet with the appropriate IP multicast
address in the tunnel encapsulation header for delivery to the
desired set of NVEs. The protocol in the underlay could be any
variant of Protocol Independent Multicast (PIM), or protocol
dependent multicast, such as [ISIS-Multicast].
Ghanwani, et al. [Page 9]
Internet-Draft Framework of App multicast in NVO3 December 2014
If an NVE connects to its attached TSs via an IP network, then the
NVE needs to support the interworking between the Tenant Networks'
multicast protocols and the underlay multicast protocols.
If an NVE connects to its attached TSs via Layer 2 network, there
are multiple ways for NVEs to support the application specific
multicast:
- The NVE only supports the basic IGMP/MLD snooping function, let
the TSs routers handling the application specific multicast. This
scheme doesn't utilize the underlay IP multicast protocols.
- The NVE can act as a pseudo multicast router for the directly
attached VMs and support proper mapping of IGMP/MLD's messages to
the messages needed by the underlay IP multicast protocols.
With this method, there are none of the issues with the methods
described in Sections 3.2.
With PIM Sparse Mode (PIM-SM), the number of flows required would be
(n*g), where n is the number of source NVEs that source packets for
the group, and g is the number of groups. Bidirectional PIM (BIDIR-
PIM) would offer better scalability with the number of flows
required being g.
In the absence of any additional mechanism, e.g. using an NVA for
address resolution, for optimal delivery, there would have to be a
separate group for each tenant, plus a separate group for each
multicast address (used for multicast applications) within a tenant.
Additional considerations are that only the lower 23 bits of the IP
address (regardless of whether IPv4 or IPv6 is in use) are mapped to
the outer MAC address, and if there is equipment that prunes
multicasts at Layer 2, there will be some aliasing. Finally, a
mechanism to efficiently provision such addresses for each group
would be required.
There are additional optimizations which are possible, but they come
with their own restrictions. For example, a set of tenants may be
restricted to some subset of NVEs and they could all share the same
outer IP multicast group address. This however introduces a problem
of sub-optimal delivery (even if a particular tenant within the
Ghanwani, et al. [Page 10]
Internet-Draft Framework of App multicast in NVO3 December 2014
group of tenants doesn't have a presence on one of the NVEs which
another one does, the former's multicast packets would still be
delivered to that NVE). It also introduces an additional network
management burden to optimize which tenants should be part of the
same tenant group (based on the NVEs they share), which somewhat
dilutes the value proposition of NVO3 which is to completely
decouple the overlay and physical network design allowing complete
freedom of placement of VMs anywhere within the data center.
3.5. Multicast Group Membership management
3.5.1. When Layer 2 network is attached to NVE
The multicast receiver detection and receiver-site registration
described by [LISP-MULTICAST] can be used by NVEs for multicast
group membership management.
However, there may be vSwitches and/or low cost top of Rack (ToR)
switches in a data center network that does not support multicast
protocols, such as IGMP/MLD. Those NVEs may simply process multicast
data frames in the same way they process unicast data frames.
If an NVE's inner-outer address mapping has the multicast addresses
and the destination addresses of IGMP/MLD report messages directly
mapped to the "Multicast Server", then all the multicast data frames
and the IGMP/MLD report messages from the attached TSs received by
the NVE will be sent to the "Multicast Server" without NVE does
anything extra.
If NVE gets its inner-outer address mapping from NVA, then the
"Multicast Server" has to register with the NVA with the those
addresses:
o Inner Address: multicast address <-> Outer Address: Multicast
Server Address
o Inner Address: IGMP/MLD report address <-> Outer Address:
Multicast Server Address
If an NVE gets its inner-outer address mapping by observing data
packets traversing through, the "Multicast Server" needs to send
"IGMP/MLD" queries with
o The Inner "Source Address" = "Multicast Server"
Ghanwani, et al. [Page 11]
Internet-Draft Framework of App multicast in NVO3 December 2014
o The Outer "Source Address" = "Multicast Server"
o The Inner "Destination Address" = "IGMP/MLD broadcast/multicast
address"
o The Outer "Destination Address" = "NVE address"
o Message body=IGMP/MLD query
Then NVEs can establish the needed inner-outer address mapping
between the "Multicast Server" and the destination address of the
IGMP/MLD reports.
The "Multicast Server" also needs to send empty message to all the
NVEs with
o The Inner "Source Address" = "Client Specific Multicast Address"
o The Outer "Source Address" = "Multicast Server"
o The Inner "Destination Address" = "NVE address"
o The Outer "Destination Address" = "NVE address"
o Message body=empty
3.5.2. When Layer 3 network is attached to NVE
When the network attached to NVE is Layer 3 and supports PIM, it is
essential for NVE to establish inner-outer address mapping between
destination PIM protocol messages with the underlay's multicast
server node.
3.6. Other schemes
There are still other mechanisms that may be used that attempt to
combine some of the advantages of the above methods by offering
multiple replication points, each with a limited degree of
replication [EDGE-REP]. Such schemes offer a trade-off between the
amount of replication at an intermediate node (router) versus
performing all of the replication at the source NVE or all of the
replication at a multicast service node.
4. Simultaneous use of more than one mechanism
While the mechanisms discussed in the previous section have been
discussed individually, it is possible for implementations to rely
on more than one of these. For example, the method of Section 3.1
Ghanwani, et al. [Page 12]
Internet-Draft Framework of App multicast in NVO3 December 2014
could be used for minimizing ARP/ND, while at the same time,
multicast applications may be supported by one, or a combination of,
the other methods. For small multicast groups, the methods of
source NVE replication or the use of a multicast service node may be
attractive, while for larger multicast groups, the use of multicast
in the underlay may be preferable.
5. Summary
This document has identified various mechanisms for supporting
application specific multicast in networks that use NVO3. It
highlights the basics of each mechanism and some of the issues with
them. As solutions are developed, the protocols would need to
consider the use of these mechanisms and co-existence may be a
consideration. It also highlights some of the requirements for
supporting multicast applications in an NVO3 network.
6. Security Considerations
This draft does not introduce any new security considerations beyond
what may be present in proposed solutions
7. IANA Considerations
This document requires no IANA actions. RFC Editor: Please remove
this section before publication.
8. References
8.1. Normative References
[PS] Lasserre, M. et al., "Framework for DC network
virtualization", work in progress, January 2014.
[FW] Narten, T. et al., "Problem statement: Overlays for
network virtualization", work in progress, July 2013.
[LISP-MULTICAST] V. Moreno & D. Farinacci, "Signal-Free LISP
Multicast", work in progress, Feb 2014.
Ghanwani, et al. [Page 13]
Internet-Draft Framework of App multicast in NVO3 December 2014
[NVO3-ARCH] Narten, T. et al.," An Architecture for Overlay Networks
(NVO3)", work in progress, Feb 2014.
[RFC 3376] B. Cain, et al, "Internet Group Management Protocol,
Version 3", Oct 2002.
[RFC6513] Rosen, E. et al., "Multicast in MPLS/BGP IP VPNs".
RFC6513, Feb 2012.
8.2. Informative References
[VXLAN] Mahalingam, M. et al., "VXLAN: A framework for overlaying
virtualized Layer 2 networks over Layer 3 networks," work
in progress.
[NVGRE] Sridharan, M. et al., "NVGRE: Network virtualization
using Generic Routing Encapsulation," work in progress.
[STT] Davie, B. and Gross J., "A stateless transport tunneling
protocol for network virtualization," work in progress.
[DC-MC] McBride M., and Lui, H., "Multicast in the data center
overview," work in progress.
[ISIS-Multicast] L. Yong, et al, "ISIS Protocol Extension For
Building Distribution Trees", work in progress. Oct 2013.
[VPLS] Lasserre, M., and Kompella, V. (Eds), "Virtual Private
LAN Service (VPLS) using Label Distribution Protocol (LDP)
signaling," RFC 4762, January 2007.
Ghanwani, et al. [Page 14]
Internet-Draft Framework of App multicast in NVO3 December 2014
[MPLS-MC] Aggarwal, R. et al., "Multicast in VPLS," work in
progress.
[LANE] "LAN emulation over ATM," The ATM Forum, af-lane-
0021.000, January 1995.
[EDGE-REP] Marques P. et al., "Edge multicast replication for BGP
IP VPNs," work in progress, June 2012.
9. Acknowledgments
This document was prepared using 2-Word-v2.0.template.dot.
Ghanwani, et al. [Page 15]
Internet-Draft Framework of App multicast in NVO3 December 2014
Authors' Addresses
Anoop Ghanwani
Dell
Email: anoop@alumni.duke.edu
Linda Dunbar
Huawei Technologies
5340 Legacy Drive, Suite 1750
Plano, TX 75024, USA
Phone: (469) 277 5840
Email: ldunbar@huawei.com
Vinay Bannai
Paypal
Email: vbannai@paypal.com
Ram Krishnan
Brocade
Email: ramk@brocade.com
Ghanwani, et al. [Page 16]