Virtual Subnet: A Scalable Data Center Interconnection Solution
draft-xu-virtual-subnet-06

The information below is for an old version of the document
Document Type Active Internet-Draft (individual)
Author Xiaohu Xu 
Last updated 2011-08-27 (latest revision 2011-04-11)
Stream (None)
Formats pdf htmlized bibtex
Stream Stream state (No stream defined)
Consensus Boilerplate Unknown
RFC Editor Note (None)
IESG IESG state I-D Exists
Telechat date
Responsible AD (None)
Send notices to (None)
Network working group                                             X. Xu  
Internet Draft                                      Huawei Technologies         
Category: Standard Track                                        
Expires: October 2012                                   August 27, 2011 
                                                                                
                                      
      Virtual Subnet: A Scalable Data Center Interconnection Solution 
                                      
                        draft-xu-virtual-subnet-06 

Status of this Memo 

   This Internet-Draft is submitted to IETF in full conformance with 
   the provisions of BCP 78 and BCP 79. 

   Internet-Drafts are working documents of the Internet Engineering 
   Task Force (IETF), its areas, and its working groups. Note that 
   other groups may also distribute working documents as Internet-
   Drafts. 

   Internet-Drafts are draft documents valid for a maximum of six 
   months and may be updated, replaced, or obsoleted by other documents 
   at any time. It is inappropriate to use Internet-Drafts as reference 
   material or to cite them other than as "work in progress." 

   The list of current Internet-Drafts can be accessed at   
   http://www.ietf.org/ietf/1id-abstracts.txt. 

   The list of Internet-Draft Shadow Directories can be accessed at   
   http://www.ietf.org/shadow.html. 

   This Internet-Draft will expire on October 27, 2011. 

Copyright Notice 

   Copyright (c) 2009 IETF Trust and the persons identified as the    
   document authors.  All rights reserved. 

   This document is subject to BCP 78 and the IETF Trust's Legal    
   Provisions Relating to IETF Documents 
   (http://trustee.ietf.org/license-info) in effect on the date of    
   publication of this document. Please review these documents 
   carefully, as they describe your rights and restrictions with 
   respect to this document.  

    

 
 
 
Xu                    Expires October 27, 2012                [Page 1] 


Internet-Draft               Virtual Subnet                 August 2011 
 
Abstract 

   This document proposes a host route based IP-only L2VPN solution 
   called Virtual Subnet, which reuses BGP/MPLS IP VPN [RFC4364] and 
   ARP proxy [RFC925][RFC1027] technologies. Virtual Subnet provides a 
   much scalable approach for interconnecting geographically dispersed 
   data centers. 

Conventions used in this document 

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 
   document are to be interpreted as described in RFC-2119 [RFC2119]. 

Table of Contents 

    
   1. Introduction ................................................ 3 
   2. Terminology ................................................. 3 
   3. Solution Description......................................... 4 
      3.1. Unicast ................................................ 4 
         3.1.1. Intra-subnet Unicast............................... 4 
         3.1.2. Inter-subnet Unicast............................... 5 
      3.2. Multicast/Broadcast..................................... 6 
      3.3. CE Host Discovery....................................... 7 
      3.4. CE Multi-homing......................................... 7 
      3.5. CE Host Mobility........................................ 8 
      3.6. ARP Proxy .............................................. 8 
   4. Comparison with VPLS......................................... 8 
   5. Future work ................................................ 10 
   6. Security Considerations..................................... 10 
   7. IANA Considerations ........................................ 10 
   8. Acknowledgements ........................................... 10 
   9. References ................................................. 10 
      9.1. Normative References................................... 10 
      9.2. Informative References................................. 10 
   Authors' Addresses ............................................ 11 

 
 
Xu                    Expires October 27, 2012                [Page 2] 


Internet-Draft               Virtual Subnet                 August 2011 
 
    
1. Introduction 

   To achieve service agility to the full extent of current virtual 
   machine (VM) technology, cloud data center operators are demanding 
   solutions for VM mobility across data centers of geographically 
   dispersed locations. In this challenging environment, a solution 
   that enables fast, reliable, high-capacity and highly scalable data 
   center interconnection is essential. Virtual Private LAN Service 
   (VPLS) [RFC4761, RFC4762] seems as an available technology for such 
   demand. However, those scaling issues (e.g., ARP broadcast storm, 
   unknown unicast flooding, etc.) that exit within the large Layer2 
   Ethernet bridge network would badly impact the network performance 
   when such a flat Layer2 network is extended across multiple data 
   centers.  

   This document describes a host route based IP-only L2VPN solution 
   called Virtual Subnet (VS), which reuses BGP/MPLS IP VPN [RFC4364] 
   and ARP proxy [RFC925][RFC1027] technologies. VS provides a much 
   scalable approach for interconnecting geographically dispersed data 
   centers. In contrast with existing VPLS solutions, VS alleviates the 
   broadcast storm impact on the network performance to a great extent 
   by partitioning the otherwise whole ARP broadcast and unknown 
   unicast flooding domain associated with an IP subnet that has been 
   extended across the MPLS/IP backbone, into multiple isolated parts 
   per data center location. Besides, VS could provide many other 
   desirable benefits that VPLS could never support. For example, the 
   MAC table capacity pressure that the large amount of CE switches 
   within data centers would have to face is greatly reduced. In 
   addition, active-active data center exit capability could be 
   achieved easily even in the case where path symmetry is required. 
   Finally, the ARP table pressure on data center exit gateways could 
   be reduced by several orders of magnitude. 

   Note that non-IP traffic would not be supported in VS since VS just 
   provides an IP-only L2VPN service.  

2. Terminology 

   This memo makes use of the terms defined in [RFC4364], [MVPN], 
   [RFC2236] and [RFC2131].  

 
 
Xu                    Expires October 27, 2012                [Page 3] 


Internet-Draft               Virtual Subnet                 August 2011 
 
3. Solution Description 

3.1. Unicast 

  3.1.1. Intra-subnet Unicast 

   As shown in Figure 1, CE hosts dispersed across different VPN sites 
   of a given IP-only L2VPN instance are actually within a single IP 
   subnet (e.g., 10.0.0.0/8). PE routers automatically discover their 
   locally connected CE hosts by some approaches such as ARP learning 
   or ICMP PING and accordingly create host routes for their locally 
   connected CE hosts. These host routes are distributed across PE 
   routers with the existing BGP/MPLS IP VPN signaling. In addition, to 
   avoid forwarding those packets destined for nonexistent hosts within 
   the scope of their configured VPN subnet mistakenly according to the 
   default route, PE routers each are configured with a null route for 
   that VPN subnet. Meanwhile, APR proxy is enabled on the VRF 
   interfaces of each PE router, thus, upon receiving from a local CE 
   host an ARP request for a known remote CE host, the ingress PE 
   router would return its own MAC address as a response. 

                          +--------------------+ 
    +-----------------+   |                    |   +------------------+ 
    |VPN_A:10.0.0.0/8 |   |                    |   |VPN_A:10.0.0.0/8  | 
    |                 |   |                    |   |                  | 
    |    +------+    ++---+-+                +-+---++    +------+     | 
    |    |Host A+----+ PE-1 |                | PE-2 +----+Host B|     | 
    |    +------+    ++-+-+-+                +-+-+-++    +------+     | 
    |   10.1.1.1/8    | | |  IP/MPLS Backbone  | | |    10.1.1.2/8    | 
    +-----------------+ | |                    | | +------------------+ 
                        | +--------------------+ |  
                        |                        |   
                        |                        | 
                        V                        V 
    +-------+------------+--------+     +-------+------------+--------+ 
    |VRF ID |Destination |Next Hop|     |VRF ID |Destination |Next Hop| 
    +-------+------------+--------+     +-------+------------+--------+ 
    | VPN_A |10.1.1.1/32 |  Local |     | VPN_A |10.1.1.2/32 |  Local | 
    +-------+------------+--------+     +-------+------------+--------+ 
    | VPN_A |10.1.1.2/32 |  PE-2  |     | VPN_A |10.1.1.1/32 |  PE-1  | 
    +-------+------------+--------+     +-------+------------+--------+ 
    | VPN_A |10.0.0.0/8  |  NULL  |     | VPN_A |10.0.0.0/8  |  NULL  | 
    +-------+------------+--------+     +-------+------------+--------+ 

                      Figure 1: Intra-subnet Unicast 

 
 
Xu                    Expires October 27, 2012                [Page 4] 


Internet-Draft               Virtual Subnet                 August 2011 
 
   Assume host A sends an ARP request for host B before communicating 
   with host B, upon the receipt of this ARP request, ingress PE, PE-1, 
   lookups the associated VRF table to find the corresponding host 
   route for host B. If found and the route is learnt from a remote PE 
   router, PE-1 acting as an ARP proxy, returns its own MAC address as 
   a response to the above ARP request. Otherwise, PE-1 doesn't need to 
   respond to that ARP request. Once receiving the above ARP reply from 
   PE-1, host A would send out an IP packet destined for B with the 
   destination MAC address of PE-1's MAC address which has been learnt 
   through the above ARP resolution. One this packet arrives at PE-1, 
   PE-1 would tunnel it towards the egress PE router (i.e., PE-2), 
   which in turn forwards the packet to the destination CE host (i.e., 
   host B). 

   3.1.2. Inter-subnet Unicast 

   As shown in Figure 2, for a CE host (e.g., host A) to communicate 
   with other hosts outside its own subnet, a PE router (e.g., PE-2) 
   which is connected to a CE gateway router (e.g., GW) would be 
   configured with a default route with the next-hop pointing to that 
   CE gateway router, and this default route would be distributed to 
   other PE routers.  

                          +--------------------+ 
    +-----------------+   |                    |   +-------------+ 
    |VPN_A:10.0.0.0/8 |   |                    |   |VPN_A:       | 
    |                 |   |                    |   |10.0.0.0/8   | 
    |    +------+    ++---+-+                +-+---++       +----+--+ 
    |    |Host A+----+ PE-1 |                | PE-2 +-------+   GW  | 
    |    +------+    ++-+-+-+                +-+-+-++       +----+--+ 
    |   10.1.1.1/8    | | |  IP/MPLS Backbone  | | |10.1.1.2/8   | 
    +-----------------+ | |                    | | +-------------+ 
                        | +--------------------+ |  
                        |                        |   
                        |                        | 
                        V                        V 
    +-------+------------+--------+     +-------+------------+--------+ 
    |VRF ID |Destination |Next Hop|     |VRF ID |Destination |Next Hop| 
    +-------+------------+--------+     +-------+------------+--------+ 
    | VPN_A |10.1.1.1/32 |  Local |     | VPN_A |10.1.1.2/32 |  Local | 
    +-------+------------+--------+     +-------+------------+--------+ 
    | VPN_A |10.1.1.2/32 |  PE-2  |     | VPN_A |10.1.1.1/32 |  PE-1  | 
    +-------+------------+--------+     +-------+------------+--------+ 
    | VPN_A |10.0.0.0/8  |  NULL  |     | VPN_A |10.0.0.0/8  |  NULL  | 
    +-------+------------+--------+     +-------+------------+--------+ 
    | VPN_A |0.0.0.0/0   |  PE-2  |     | VPN_A |0.0.0.0/0   |   GW   | 
    +-------+------------+--------+     +-------+------------+--------+ 

 
 
Xu                    Expires October 27, 2012                [Page 5] 


Internet-Draft               Virtual Subnet                 August 2011 
 
                      Figure 2: Inter-subnet Unicast 

   Now host A sends an ARP request for its default gateway (i.e., GW) 
   before communicating with a destination host outside its subnet. 
   Upon receiving this ARP request, PE-1 acting as an ARP proxy returns 
   its own MAC address as a response in accordance with the rules 
   described in the above section. Host A then sends out an IP packet 
   for that destination host with destination MAC address of PE-1's MAC. 
   Upon receiving the above packet, PE-1 tunnels it towards PE-2 
   according to the default route that is learnt from PE-2. PE-2 in 
   turn forwards the packet to GW according to the configured default 
   route.  

   For the CE gateway router redundancy purpose, more than one CE 
   gateway router could be connected to a given VPN subnet. In this 
   case, Virtual Router Redundancy Protocol (VRRP) [RFC2338] could be 
   optionally enabled among these CE gateway routers, in this way, only 
   the PE router which is connected to the VRRP master is entitled to 
   announce a default route. To achieve that goal, the next-hop of the 
   default route SHOULD be set to the corresponding Virtual Router IP 
   address, and the default route SHOULD not be deemed as valid unless 
   there is a directly connected host route for the next-hop address. 
   Due to the fact that only the VRRP master is entitled to respond to 
   ARP requests for the corresponding Virtual Router IP address and 
   broadcast gratuitous ARP requests or replies on behave of the 
   Virtual Router, only the PE router which is connected to the VRRP 
   master could have an ARP entry corresponding to the Virtual Router 
   IP address and therefore could have a directly connected host route 
   for the Virtual Router IP address. In this way, packets destined for 
   the outside of a given VPN subnet would be exactly sent to the 
   corresponding VRRP master. Alternatively, PE routers could intercept 
   the VRRP messages received from their locally connected CE routers 
   and prevent them from flooding across the MPLS/IP backbone. As a 
   result, each CE router will act as a VRRP master and therefore each 
   PE router connected to the CE routers would announce a default route. 
   In this way, inbound and outbound traffic of the VPN subnet would be 
   load-balanced across multiple CE gateway routers and route 
   optimization for the above traffic is achieved simultaneously. 

3.2. Multicast/Broadcast 

   The MVPN technology [MVPN], in particular, the Protocol-Independent-
   Multicast (PIM) tree option with some extensions, could be reused 
   here to support IP multicast and broadcast between CE hosts of the 
   same VPN instance. For example, PE routers attached to a given VPN 
   join a default provider multicast distribution tree which is 
   dedicated for that VPN. Ingress PE routers, upon receiving customer 

 
 
Xu                    Expires October 27, 2012                [Page 6] 


Internet-Draft               Virtual Subnet                 August 2011 
 
   multicast or broadcast traffic from their local CE hosts, tunnel 
   such customer traffic towards remote PE routers of the same VPN over 
   the corresponding default provider multicast distribution tree. When 
   receiving customer multicast or broadcast traffic over a provider 
   multicast distribution tree, egress PE routers forward such customer 
   traffic via the corresponding VRF interfaces.  

   More details about how to support multicast and broadcast in VS will 
   be explored in a later version of this document. 

   3.3. CE Host Discovery 

   When receiving an ARP request or reply from a local CE host, PE 
   router SHOULD cache or update the corresponding ARP entry for that 
   CE host. In addition, PE router SHOULD periodically send ARP 
   requests to those discovered local CE hosts (better in unicast) so 
   as to keep the ARP entries fresh.  

   To ensure a PE router to discover all of its locally connected CE 
   hosts in time, this PE router SHOULD perform the IP or ARP scan on 
   its attached VPN site at least once when rebooting up. One possible 
   option is to use the ICMP echo approach for host discovery. For 
   example, a PE router could send out an ICMP echo request to an IP 
   broadcast address (e.g., 10.255.255.255), every CE host receiving 
   that ICMP echo request would respond with an ICMP echo reply which 
   contains its IP and MAC addresses. Thus the PE router could discover 
   all of its local CE hosts by inspecting the received ICMP echo 
   replies. If the PE router couldn't be able to process so many 
   replies in a short period of time, the otherwise whole subnet could 
   be partitioned into multiple segments and the corresponding host 
   discovery for each segment could be performed in turn. 

   3.4. CE Multi-homing 

   For PE router redundancy purpose, a VPN site could be connected to 
   more than one PE router. In this case, VRRP SHOULD run among these 
   PE routers and only the PE router which is the VRRP master could 
   respond to the ARP requests from local CE hosts and it MUST use the 
   Virtual Router MAC address in any ARP packet it sends. To achieve 
   active-active multi-homing for inbound traffic to a given multi-
   homed VPN site, those PE routers being VRRP slave could also perform 
   the host discovery function and accordingly advertise host routes 
   for local CE hosts. Note that there is no any contravention to the 
   VRRP specification [RFC2338].  

 
 
Xu                    Expires October 27, 2012                [Page 7] 


Internet-Draft               Virtual Subnet                 August 2011 
 
   3.5. CE Host Mobility 

   Once a CE host moves from one VPN site to another, it will usually 
   send out a gratuitous ARP request or reply when attaching to a new 
   VPN site. The PE router attached to the new VPN site will create a 
   CE host route upon receiving that gratuitous ARP message and then 
   advertise it to remote PE routers.  

   When the PE router attached to the old VPN site receives a host 
   route announcement for one of its local CE hosts from a remote PE 
   router, it SHOULD immediately send an ARP request or ICMP echo for 
   that CE host to determine whether or not that CE host is still 
   locally connected to it. If no corresponding reply is returned in a 
   given period of time, the PE router would delete the ARP entry of 
   that CE host and accordingly withdraw the corresponding host route. 
   Meanwhile, the PE router would broadcast a gratuitous ARP on behalf 
   of that CE host, with the sender hardware address field being filled 
   with its own MAC addresses. As a result, the ARP entry for that CE 
   host that is cached on other local CE hosts of that old VPN site 
   would be refreshed timely.  

   3.6. ARP Proxy 

   A PE router, acting as an ARP proxy, SHOULD only respond to ARP 
   requests for those CE hosts which are exactly attached to other PE 
   routers. In other words, the PE router SHOULD not respond to ARP 
   requests for its local CE hosts or those nonexistent CE hosts.  

   When VRRP is configured on multiple PE routers which are attached to 
   a given VPN site for redundancy purpose, only the PE router which is 
   the VRRP master is entitled to perform the ARP proxy function.  

4. Comparison with VPLS 

   Since VPLS simply extends a LAN across multiple sites and it 
   operates as an Ethernet bridge, most scaling issues (e.g., ARP 
   broadcast storm, unknown unicast flooding, etc.) that exist within a 
   large Ethernet bridge network are not addressed by VPLS. In VS, by 
   partitioning the otherwise whole ARP broadcast and unknown unicast 
   flooding domain associated with a given subnet, which has been 
   extended across the MPLS/IP backbone, into multiple isolated parts, 
   the broadcast storm impact on network performance is alleviated to a 
   great extent. For example, ARP broadcast traffic is limited within 
   the scope of a VPN site. Similarly, unknown unicast traffic would 
   not be flooded across the MPLS/IP backbone as well. 

 
 
Xu                    Expires October 27, 2012                [Page 8] 


Internet-Draft               Virtual Subnet                 August 2011 
 
   As for the MAC table capacity requirement on CE switches, CE 
   switches in VPLS would have to learn MAC addresses of both local CE 
   hosts and remote CE hosts. In contrast, CE switches in VS only needs 
   to learn MAC addresses of local CE hosts and local PE routers due to 
   the usage of ARP proxy.  

   Active-active DC exit is a much desirable capability when 
   considering route/path optimization for traffic routing to/from the 
   outside of geographically dispersed data centers (e.g., the 
   Internet). In normal cases, each DC site will be connected to a 
   default gateway (i.e., DC exit router) which is responsible for 
   forwarding traffic routing to/from the outside. However, since these 
   default gateways are within a single subnet due to the layer2 DCI 
   usage, normally there is only one default gateway router (acting as 
   VRRP master) is allowed to forward traffic routing to/from the 
   outside. This is obviously not optimal from the perspective of WAN 
   bandwidth utilization. Active-active VRRP approach has been proposed 
   in the above case so that the traffic destined for the outside could 
   be forwarded by the local DC exit gateways. This is workable when 
   path symmetry is not required. However, in most cases where firewall 
   or NAT devices are deployed at the DC exits, path symmetry is a must. 
   As a result, active-active VRRP is not available anymore in such 
   cases. In contrast, if VS is used as a DCI solution, when incoming 
   traffic from the Internet enters a DC, source IP addresses of the 
   traffic could be NATed on the DC exit gateway. Notes that DC exit 
   gateways of geographically dispersed DCs are configured with 
   different IP address pools without any overlapping for source NAT. 
   In addition, the corresponding routes for the above NAT address 
   pools are advertised by the DC exit gateways to their own connected 
   PE routers of the VS respectively. Thus, when the outgoing traffic 
   destined for the Internet arrives at its local PE router, that PE 
   router would forward the traffic according to the matching routes 
   for the above address pools. In this way, active-active DC exit can 
   be achieved easily even in the case where path symmetry is required. 

   Another obvious advantage of VS over VPLS, as a DCI solution, is to 
   reduce the ARP table size on DC gateways by several orders of 
   magnitude. Assume there are millions of CE hosts within a single 
   VLAN/subnet, if VPLS is used as a DCI solution, DC exit gateways 
   would have to know millions of ARP entries corresponding to these CE 
   hosts. In contrast, with VS as a DCI solution, DC exit gateways are 
   directly connected to the PE routers of the VS which act as ARP 
   proxies, MAC addresses of those ARP entries for CE hosts on DC 
   gateways are identical (i.e., the PE router's MAC). Thus these 
   millions of ARP entries can be aggregated into one entry (e.g., 
   10.0.0.0/8->the PE router's MAC). That's to say, the exact-matching 
   algorithm for ARP cache lookup is changed to the longest-matching 

 
 
Xu                    Expires October 27, 2012                [Page 9] 


Internet-Draft               Virtual Subnet                 August 2011 
 
   algorithm. Of course, there is no free lunch. The side-effect of 
   this change is that DC exit gateways could send out packets destined 
   for non-existing CE hosts to their connected PE routers of the VS. 
   Fortunately, once those packets arrive at the PE router, that PE 
   router in turn will drop those packets directly since there is no 
   matching route for them.  

    

5. Future work 

   How to support IPv6 CE hosts in VS is for future study. 

6. Security Considerations 

   TBD. 

7. IANA Considerations 

   There is no requirement for IANA.  

8. Acknowledgements 

   Thanks to Dino Farinacci, Himanshu Shah, Nabil Bitar and Giles Heron 
   for their valuable comments on this document. 

9. References 

9.1. Normative References 

   [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate               
             Requirement Levels", BCP 14, RFC 2119, March 1997. 

9.2. Informative References 

   [RFC4364] Rosen. E and Y. Rekhter, "BGP/MPLS IP Virtual Private             
             Networks (VPNs)", RFC 4364, February 2006. 

   [MVPN] Rosen. E and Aggarwal. R, "Multicast in MPLS/BGP IP VPNs", 
             draft-ietf-l3vpn-2547bis-mcast-10.txt (work in progress), 
             Janurary 2010. 

   [MVPN-BGP] R. Aggarwal, E. Rosen, T. Morin, Y. Rekhter,  C.   
             Kodeboniya, "BGP Encodings for Multicast in MPLS/BGP IP 
             VPNs", draft-ietf-l3vpn-2547bis-mcast-bgp-08.txt (work in 
             progress), September 2009. 

 
 
Xu                    Expires October 27, 2012               [Page 10] 


Internet-Draft               Virtual Subnet                 August 2011 
 
   [RFC826] Plummer, D., "An Ethernet Address Resolution Protocol or       
             Converting Network Protocol Addresses to 48-bit Ethernet       
             Addresses for Transmission on Ethernet Hardware", RFC-826,       
             Symbolics, November 1982. 

   [RFC925] Postel, J., "Multi-LAN Address Resolution", RFC-925, USC         
             Information Sciences Institute, October 1984. 

   [RFC1027] Smoot Carl-Mitchell, John S. Quarterman, "Using ARP to 
             Implement Transparent Subnet Gateways", RFC 1027, October 
             1987. 

   [RFC2338] Knight, S., et. al., "Virtual Router Redundancy Protocol",         
             RFC 2338, April 1998. 

   [RFC2236] Fenner, W., "Internet Group Management Protocol, Version 
             2", RFC 2236, November 1997. 

   [RFC4761] Kompella, K. and Y. Rekhter, "Virtual Private LAN Service          
             (VPLS) Using BGP for Auto-Discovery and Signaling", RFC            
             4761, January 2007. 

   [RFC4762] Lasserre, M. and V. Kompella, "Virtual Private LAN Service 
             (VPLS) Using Label Distribution Protocol (LDP) Signaling",         
             RFC 4762, January 2007. 

Authors' Addresses 

   Xiaohu Xu 
   Huawei Technologies, 
   No.3 Xinxi Rd., Shang-Di Information Industry Base,  
   Hai-Dian District, Beijing 100085, P.R. China 
   Phone: +86 10 82882573 
   Email: xuxh@huawei.com   

Xu                    Expires October 27, 2012               [Page 11]