Virtual Subnet: A Host Route based Subnet Extension Solution
draft-xu-virtual-subnet-08

The information below is for an old version of the document
Document Type Active Internet-Draft (individual)
Authors Xiaohu Xu  , Susan Hares  , Fan Yongbing 
Last updated 2012-07-04
Stream (None)
Formats pdf htmlized bibtex
Stream Stream state (No stream defined)
Consensus Boilerplate Unknown
RFC Editor Note (None)
IESG IESG state I-D Exists
Telechat date
Responsible AD (None)
Send notices to (None)
Network working group                                             X. Xu  
Internet Draft                                                 S. Hares         
Category: Informational                             Huawei Technologies 
                                                                 Y. Fan 
                                                          China Telecom 
Expires: January 2013                                      July 5, 2012 
                                                                                
                                      
        Virtual Subnet: A Host Route based Subnet Extension Solution 
                                      
                        draft-xu-virtual-subnet-08 

Status of this Memo 

   This Internet-Draft is submitted to IETF in full conformance with 
   the provisions of BCP 78 and BCP 79. 

   Internet-Drafts are working documents of the Internet Engineering 
   Task Force (IETF), its areas, and its working groups. Note that 
   other groups may also distribute working documents as Internet-
   Drafts. 

   Internet-Drafts are draft documents valid for a maximum of six 
   months and may be updated, replaced, or obsoleted by other documents 
   at any time. It is inappropriate to use Internet-Drafts as reference 
   material or to cite them other than as "work in progress." 

   The list of current Internet-Drafts can be accessed at   
   http://www.ietf.org/ietf/1id-abstracts.txt. 

   The list of Internet-Draft Shadow Directories can be accessed at   
   http://www.ietf.org/shadow.html. 

   This Internet-Draft will expire on January 5, 2012. 

Copyright Notice 

   Copyright (c) 2009 IETF Trust and the persons identified as the    
   document authors.  All rights reserved. 

   This document is subject to BCP 78 and the IETF Trust's Legal    
   Provisions Relating to IETF Documents 
   (http://trustee.ietf.org/license-info) in effect on the date of    
   publication of this document. Please review these documents 
   carefully, as they describe your rights and restrictions with 
   respect to this document.  

 
 
 
Xu, et al.             Expires January 5, 2013                [Page 1] 


Internet-Draft               Virtual Subnet                   July 2012 
 
    

Abstract 

   This document describes a host route based subnet extension solution 
   referred to as Virtual Subnet, which mainly reuses existing BGP/MPLS 
   IP VPN [RFC4364] and ARP proxy [RFC925][RFC1027] technologies. 
   Virtual Subnet provides a scalable approach for interconnecting 
   geographically dispersed cloud data centers. 

Conventions used in this document 

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 
   document are to be interpreted as described in RFC-2119 [RFC2119]. 

Table of Contents 

    
   1. Introduction ................................................ 3 
   2. Terminology ................................................. 5 
   3. Solution Description......................................... 5 
      3.1. Unicast ................................................ 5 
         3.1.1. Intra-subnet Unicast .............................. 5 
         3.1.2. Inter-subnet Unicast .............................. 6 
      3.2. Multicast/Broadcast .................................... 9 
      3.3. CE Host Discovery ..................................... 10 
      3.4. ARP Proxy ............................................. 10 
      3.5. CE Host Mobility ...................................... 10 
      3.6. Forwarding Table Scalability .......................... 11 
         3.6.1. MAC Table Reduction on Data Center Switches ...... 11 
         3.6.2. FIB Reduction on PE Routers ...................... 11 
         3.6.3. RIB Reduction on PE Routers ...................... 13 
      3.7. ARP Table Scalability on Default Gateways ............. 14 
      3.8. ARP/Unknown Uncast Flood Avoidance .................... 15 
      3.9. Active-active Multi-homing ............................ 15 
      3.10. Path Optimization .................................... 15 
   4. Future Work ................................................ 15 
   5. Security Considerations .................................... 16 
   6. IANA Considerations ........................................ 16 
   7. Acknowledgements ........................................... 16 
   8. References ................................................. 16 
      8.1. Normative References .................................. 16 
      8.2. Informative References ................................ 16 
   Authors' Addresses ............................................ 17 

 
 
Xu, et al.             Expires January 5, 2013                [Page 2] 


Internet-Draft               Virtual Subnet                   July 2012 
 
    
1. Introduction 

   For business continuity purposes, Virtual Machine (VM) migration 
   across data centers is commonly used in those situations such as 
   data center maintenance, data center migration, data center 
   consolidation, data center expansion, and data center disaster 
   avoidance. It's obvious that IP renumbering of servers (i.e., VMs) 
   after the migration is usually complex and costly and therefore 
   would prolong the business downtime during the process of migration. 
   To allow the seamless migration of a VM from one data center to 
   another without IP renumbering, the subnet on which the VM resides 
   needs to be extended across these data centers. 

   In the Infrastructure-as-a-Service (IaaS) cloud data center 
   environments, to achieve subnet extension across multiple data 
   centers in a scalable way, the following requirements SHOULD be 
   considered for any data center interconnect solution: 

    1) VPN Instance Scalability 

      In a modern cloud data center environment, thousands or even tens 
      of thousands of tenants could be hosted over a shared network 
      infrastructure. For security and performance isolation 
      considerations, these tenants need to be isolated from one 
      another. Hence, the data center interconnect solution SHOULD be 
      capable of providing a large enough VPN space for tenant 
      isolation.  

   2) Forwarding Table Scalability  

      With the development of virtualization technologies, a single 
      cloud data center containing millions of VMs is not uncommon 
      today. This number already implies a big challenge for data 
      center switches, especially for core/aggregation switches, from 
      the perspective of forwarding table scalability. Provided that 
      multiple data centers of such scale were interconnected at layer2, 
      this challenge would be even worse. Hence an ideal data center 
      interconnect solution SHOULD prevent the forwarding table size of 
      data center switches from growing by folds as the number of data 
      centers to be interconnected increases. Furthermore, if any kind 
      of L2VPN or L3VPN technologies is used for interconnecting data 
      centers, the scale of forwarding tables on PE routers SHOULD be 
      taken into consideration as well. 

   3) ARP Table Scalability on Default Gateways 

 
 
Xu, et al.             Expires January 5, 2013                [Page 3] 


Internet-Draft               Virtual Subnet                   July 2012 
 
      [NARTEN-ARMD] notes that the ARP tables maintained by data center 
      default gateways in cloud data centers can raise both scalability 
      and security issues. Therefore, an ideal data center interconnect 
      solution SHOULD prevent the ARP table size from growing by 
      multiples as the number of data centers to be connected increases. 

   4) ARP/Unknown Unicast Flood Suppression or Avoidance  

      It's well-known that the flooding of ARP broadcast and unknown 
      unicast traffic within a large Layer2 network will lead to 
      certain performance impact on both networks and hosts. As 
      multiple data centers each containing millions of VMs are 
      interconnected together across the Wide Area Network (WAN) at 
      layer2, the impact of flooding as mentioned above will become 
      even worse. As such, how to suppress or even avoid the flooding 
      of ARP broadcast and unknown unicast traffic across data centers 
      becomes increasingly desirable for the purpose of avoiding the 
      unnecessary consumption of network bandwidth resources and 
      service CPU resources. 

   5) Active-active Multi-homing 

      In order to utilize the bandwidth of all available paths between 
      the data center and the transport network in addition to 
      providing resilient connectivity between them, active-active 
      multi-homing is increasingly advocated by data center operators 
      as a replacement of the traditional active-standby multi-homing 
      approach.   

   6) Path Optimization 

      A subnet usually indicates a location in the network. However, 
      when a subnet has been extended across multiple geographically 
      dispersed data center locations, the location semantics of such 
      subnet is not retained any longer. As a result, the traffic from 
      a cloud user (i.e., a VPN user) which is destined for a given 
      server located at one data center location of such extended 
      subnet may arrive at another data center location firstly 
      according to the subnet route, and then be forwarded to the 
      location where the service is actually located. This suboptimal 
      routing would obviously result in the unnecessary consumption of 
      the bandwidth resources which are intended for data center 
      interconnection. Furthermore, in the case where the traditional 
      VPLS technology [RFC4761, RFC4762] is used for data center 
      interconnect and default gateways of different data center 
      locations are configured within the same virtual router 
      redundancy group, the returning traffic from that server to the 

 
 
Xu, et al.             Expires January 5, 2013                [Page 4] 


Internet-Draft               Virtual Subnet                   July 2012 
 
      cloud user may be forwarded at layer2 to a default gateway at one 
      of remote data center locations, rather than the one at the local 
      data center location. This suboptimal routing would also 
      unnecessarily consume the bandwidth resources which are intended 
      for data center interconnect. 

   This document describes a host route based subnet extension solution   
   referred to as Virtual Subnet (VS), which can meet all of the 
   requirements of cloud data center interconnect as described above. 
   Since VS mainly reuses existing technologies including BGP/MPLS IP 
   VPN [RFC4364] and ARP proxy [RFC925][RFC1027], it allows service 
   providers who are offering IaaS cloud services to the public to 
   interconnect their geographically dispersed data centers in a much 
   scalable may, and more importantly, they can accomplish this 
   interconnection on basis of their existing MPLS/BGP IP VPN 
   infrastructures and their years of experience in the operation and 
   provisioning of MPLS/BGP IP VPN services.   

   Please note that VS is targeted at scenarios where the traffic 
   across data centers is routable IP traffic. In such scenario, data 
   center operators who are implementing data center interconnect could 
   benefit from the advantages that such host route based subnet 
   extension solution exclusively has, such as MAC table reduction on 
   data center switches, ARP table reduction on data center default 
   gateways, path optimization for inter-subnet traffic, and so on. 

2. Terminology 

   This memo makes use of the terms defined in [RFC4364], [RFC2338] 
   [MVPN] and [VA-AUTO].  

3. Solution Description 

3.1. Unicast 

   3.1.1. Intra-subnet Unicast 

   As shown in Figure 1, two CE hosts (i.e., Host A and B) which are 
   configured within the same subnet (i.e., 1.1.1.0/24) are located in 
   two different data centers (i.e., DC West and DC East) respectively. 
   PE routers (i.e., PE-1 and PE-2) which are used for interconnecting 
   the above two data centers create host routes for their local CE 
   hosts respectively and then redistribute these routes into BGP. 
   Meanwhile, ARP proxy is enabled on the VRF attachment circuits of 
   these PE routers.  

    

 
 
Xu, et al.             Expires January 5, 2013                [Page 5] 


Internet-Draft               Virtual Subnet                   July 2012 
 
                          +--------------------+  
    +-----------------+   |                    |   +-----------------+  
    |VPN_A:1.1.1.1/24 |   |                    |   |VPN_A:1.1.1.1/24 |  
    |              \  |   |                    |   |  /              |  
    |    +------+   \++---+-+                +-+---++/   +------+    |  
    |    |Host A+----+ PE-1 |                | PE-2 +----+Host B|    |  
    |    +------+\   ++-+-+-+                +-+-+-++   /+------+    |  
    |     1.1.1.2/24  | | |                    | | |  1.1.1.3/24     |  
    |                 | | |                    | | |                 | 
    |     DC West     | | |  IP/MPLS Backbone  | | |     DC East     | 
    +-----------------+ | |                    | | +-----------------+  
                        | +--------------------+ |   
                        |                        |    
VRF_A :                 V                VRF_A : V                   
+------------+---------+--------+        +------------+---------+--------+  
|   Prefix   | Nexthop |Protocol|        |   Prefix   | Nexthop |Protocol|  
+------------+---------+--------+        +------------+---------+--------+  
| 1.1.1.1/32 |127.0.0.1| Direct |        | 1.1.1.1/32 |127.0.0.1| Direct |  
+------------+---------+--------+        +------------+---------+--------+ 
| 1.1.1.2/32 | 1.1.1.2 | Direct |        | 1.1.1.2/32 |   PE-1  |  IBGP  |  
+------------+---------+--------+        +------------+---------+--------+  
| 1.1.1.3/32 |   PE-2  |  IBGP  |        | 1.1.1.3/32 | 1.1.1.3 | Direct |  
+------------+---------+--------+        +------------+---------+--------+  
| 1.1.1.0/24 | 1.1.1.1 | Direct |        | 1.1.1.0/24 | 1.1.1.1 | Direct |  
+------------+---------+--------+        +------------+---------+--------+ 
                  Figure 1: Intra-subnet Unicast Example 

   Now assume host A sends an ARP request for host B before 
   communicating to host B. Upon receiving the ARP request, PE-1 as an 
   ARP proxy returns its own MAC address as a response. Host A then 
   sends IP packets for host B to PE-1. Strictly according to the 
   normal L3VPN forwarding procedure, PE-1 tunnels such packets towards 
   PE-2 which in turn forwards them to host B. In this way, host A and 
   B could communicate with each other as if they were located within 
   the same subnet or Local Area Network (LAN). In fact, such subnet is 
   a virtual subnet which is emulated by using host routes, rather than 
   a real subnet. 

   3.1.2. Inter-subnet Unicast 

   As shown in Figure 2, only one data center (i.e., DC East) is 
   deployed with a default gateway (i.e., GW). PE-2 which is connected 
   to GW would either be configured with or learn from GW a default 
   route with its next-hop being pointed to GW, and this route is 
   distributed to other PE routers (i.e., PE-1) as per normal [RFC4364] 
   operation.  Assume host A sends an ARP request for its default 
   gateway (i.e., 1.1.1.4) prior to communicating with a destination 
   host outside of its subnet (i.e., 1.1.1.0/24). Upon receiving this 
 
 
Xu, et al.             Expires January 5, 2013                [Page 6] 


Internet-Draft               Virtual Subnet                   July 2012 
 
   ARP request, PE-1 as an ARP proxy returns its own MAC address as a 
   response. Host A then sends a packet for the destination host to PE-
   1. PE-1 forwards such packet towards PE-2 according to the default 
   route learnt from PE-2, which in turn forwards that packet to GW 
   according to the default route as well. In contrast, if host B sends 
   an ARP request for its default gateway (i.e., 1.1.1.4) prior to 
   communicate with a destination host outside of its subnet, it will 
   receive an ARP response from GW. As such, the packet destined for 
   the destination host will be forwarded directly to GW. Note that 
   since the outgoing interface of the best-match route for the target 
   host (i.e., 1.1.1.4) is the same as the one over which the ARP 
   packet arrived, PE-2 would not respond to this ARP request. 

                          +--------------------+  
    +-----------------+   |                    |   +-----------------+  
    |VPN_A:1.1.1.1/24 |   |                    |   |VPN_A:1.1.1.1/24 |  
    |              \  |   |                    |   |  /              |  
    |  +------+     \++---+-+                +-+---++/     +------+  |  
    |  |Host A+------+ PE-1 |                | PE-2 +-+----+Host B|  |  
    |  +------+\     ++-+-+-+                +-+-+-++ |   /+------+  |  
    |   1.1.1.2/24    | | |                    | | |  | 1.1.1.3/24   | 
    |   GW=1.1.1.4    | | |                    | | |  | GW=1.1.1.4   | 
    |                 | | |                    | | |  |    +------+  | 
    |                 | | |                    | | |  +----+  GW  +--| 
    |                 | | |                    | | |      /+------+  | 
    |                 | | |                    | | |    1.1.1.4/24   | 
    |                 | | |                    | | |                 | 
    |     DC West     | | |  IP/MPLS Backbone  | | |      DC East    | 
    +-----------------+ | |                    | | +-----------------+  
                        | +--------------------+ |   
                        |                        |    
VRF_A :                 V                VRF_A : V                   
+------------+---------+--------+        +------------+---------+--------+  
|   Prefix   | Nexthop |Protocol|        |   Prefix   | Nexthop |Protocol|  
+------------+---------+--------+        +------------+---------+--------+ 
| 1.1.1.1/32 |127.0.0.1| Direct |        | 1.1.1.1/32 |127.0.0.1| Direct |  
+------------+---------+--------+        +------------+---------+--------+  
| 1.1.1.2/32 | 1.1.1.2 | Direct |        | 1.1.1.2/32 |  PE-1   |  IBGP  |  
+------------+---------+--------+        +------------+---------+--------+  
| 1.1.1.3/32 |   PE-2  |  IBGP  |        | 1.1.1.3/32 | 1.1.1.3 | Direct |  
+------------+---------+--------+        +------------+---------+-------- 
| 1.1.1.4/32 |   PE-2  |  IBGP  |        | 1.1.1.4/32 | 1.1.1.4 | Direct |  
+------------+---------+--------+        +------------+---------+--------+  
| 1.1.1.0/24 | 1.1.1.1 | Direct |        | 1.1.1.0/24 | 1.1.1.1 | Direct |  
+------------+---------+--------+        +------------+---------+--------+ 
| 0.0.0.0/0  |   PE-2  |  IBGP  |        | 0.0.0.0/0  | 1.1.1.4 | Static |  
+------------+---------+--------+        +------------+---------+--------+ 
                Figure 2: Inter-subnet Unicast Example (1) 
 
 
Xu, et al.             Expires January 5, 2013                [Page 7] 


Internet-Draft               Virtual Subnet                   July 2012 
 
   As shown in Figure 3, in this case where each data center is 
   deployed with a default gateway, CE hosts will get ARP responses 
   from their local default gateways, rather than from their local PE 
   routers when sending ARP requests for their default gateways.   

                          +--------------------+  
    +-----------------+   |                    |   +-----------------+  
    |VPN_A:1.1.1.1/24 |   |                    |   |VPN_A:1.1.1.1/24 |  
    |              \  |   |                    |   |  /              |  
    |  +------+     \++---+-+                +-+---++/     +------+  |  
    |  |Host A+----+-+ PE-1 |                | PE-2 +-+----+Host B|  |  
    |  +------+\   | ++-+-+-+                +-+-+-++ |   /+------+  |  
    |   1.1.1.2/24 |  | | |                    | | |  | 1.1.1.3/24   | 
    |   GW=1.1.1.4 |  | | |                    | | |  | GW=1.1.1.4   | 
    |  +------+    |  | | |                    | | |  |    +------+  | 
    |--+ GW-1 +----+  | | |                    | | |  +----+ GW-2 +--| 
    |  +------+\      | | |                    | | |      /+------+  | 
    |   1.1.1.4/24    | | |                    | | |    1.1.1.4/24   | 
    |                 | | |                    | | |                 | 
    |     DC West     | | |  IP/MPLS Backbone  | | |      DC East    | 
    +-----------------+ | |                    | | +-----------------+  
                        | +--------------------+ |   
                        |                        |    
VRF_A :                 V                VRF_A : V                   
+------------+---------+--------+        +------------+---------+--------+  
|   Prefix   | Nexthop |Protocol|        |   Prefix   | Nexthop |Protocol|  
+------------+---------+--------+        +------------+---------+--------+ 
| 1.1.1.1/32 |127.0.0.1| Direct |        | 1.1.1.1/32 |127.0.0.1| Direct |  
+------------+---------+--------+        +------------+---------+--------+  
| 1.1.1.2/32 | 1.1.1.2 | Direct |        | 1.1.1.2/32 |  PE-1   |  IBGP  |  
+------------+---------+--------+        +------------+---------+--------+  
| 1.1.1.3/32 |   PE-2  |  IBGP  |        | 1.1.1.3/32 | 1.1.1.3 | Direct |  
+------------+---------+--------+        +------------+---------+-------- 
| 1.1.1.4/32 | 1.1.1.4 | Direct |        | 1.1.1.4/32 | 1.1.1.4 | Direct |  
+------------+---------+--------+        +------------+---------+--------+  
| 1.1.1.0/24 | 1.1.1.1 | Direct |        | 1.1.1.0/24 | 1.1.1.1 | Direct |  
+------------+---------+--------+        +------------+---------+--------+ 
| 0.0.0.0/0  | 1.1.1.4 | Static |        | 0.0.0.0/0  | 1.1.1.4 | Static |  
+------------+---------+--------+        +------------+---------+--------+ 
                Figure 3: Inter-subnet Unicast Example (2) 

   Alternatively, as shown in Figure 4, PE routers themselves could be 
   directly configured as the default gateways of their locally 
   connected CE hosts as long as these PE routers have routes for the 
   outside networks. 

    

 
 
Xu, et al.             Expires January 5, 2013                [Page 8] 


Internet-Draft               Virtual Subnet                   July 2012 
 
                                 +------+  
                          +------+ PE-3 +------+ 
    +-----------------+   |      +------+      |   +-----------------+  
    |VPN_A:1.1.1.1/24 |   |                    |   |VPN_A:1.1.1.1/24 |  
    |              \  |   |                    |   |  /              |  
    |  +------+     \++---+-+                +-+---++/     +------+  |  
    |  |Host A+------+ PE-1 |                | PE-2 +------+Host B|  |  
    |  +------+\     ++-+-+-+                +-+-+-++     /+------+  |  
    |   1.1.1.2/24    | | |                    | | |    1.1.1.3/24   | 
    |   GW=1.1.1.1    | | |                    | | |    GW=1.1.1.1   | 
    |                 | | |                    | | |                 | 
    |     DC West     | | |  IP/MPLS Backbone  | | |      DC East    | 
    +-----------------+ | |                    | | +-----------------+  
                        | +--------------------+ |   
                        |                        |    
VRF_A :                 V                VRF_A : V                   
+------------+---------+--------+        +------------+---------+--------+  
|   Prefix   | Nexthop |Protocol|        |   Prefix   | Nexthop |Protocol|  
+------------+---------+--------+        +------------+---------+--------+ 
| 1.1.1.1/32 |127.0.0.1| Direct |        | 1.1.1.1/32 |127.0.0.1| Direct |  
+------------+---------+--------+        +------------+---------+--------+  
| 1.1.1.2/32 | 1.1.1.2 | Direct |        | 1.1.1.2/32 |  PE-1   |  IBGP  |  
+------------+---------+--------+        +------------+---------+--------+  
| 1.1.1.3/32 |   PE-2  |  IBGP  |        | 1.1.1.3/32 | 1.1.1.3 | Direct |  
+------------+---------+--------+        +------------+---------+--------+  
| 1.1.1.0/24 | 1.1.1.1 | Direct |        | 1.1.1.0/24 | 1.1.1.1 | Direct |  
+------------+---------+--------+        +------------+---------+--------+ 
| 0.0.0.0/0  |   PE-3  |  IBGP  |        | 0.0.0.0/0  |   PE-3  |  IBGP  |  
+------------+---------+--------+        +------------+---------+--------+ 
                Figure 4: Inter-subnet Unicast Example (3) 

3.2. Multicast/Broadcast 

   To support IP multicast and broadcast between CE hosts of the same 
   virtual subnet, the MVPN technology [MVPN] could be directly reused. 
   For example, PE routers attached to a given VPN join a default 
   provider multicast distribution tree which is dedicated for that VPN. 
   Ingress PE routers, upon receiving multicast or broadcast packets 
   from their local CE hosts, forward them towards remote PE routers 
   through the corresponding default provider multicast distribution 
   tree.  

   More details about how to support multicast and broadcast in VS will 
   be explored in a later version of this document. 

 
 
Xu, et al.             Expires January 5, 2013                [Page 9] 


Internet-Draft               Virtual Subnet                   July 2012 
 
   3.3. CE Host Discovery 

   PE routers MUST be able to discovery their local CE hosts in time, 
   especially after rebooting up, and meanwhile keep the list of local 
   CE hosts up to date in a timely manner so as to ensure the 
   availability of the host route information. PE routers could 
   accomplish local CE host discovery by some traditional host 
   discovery means such as ARP scan and/or ICMP scan. Furthermore, Link 
   Layer Discovery Protocol (LLDP) described in [802.1AB] or VSI 
   Discovery and Configuration Protocol (VDP) described in [802.1Qbg], 
   or even interaction with the data center orchestration system could 
   also be considered as a means of local CE host discovery. 

   More details about local CE host discovery in VS will be explored in 
   a later version of this document. 

   3.4. ARP Proxy 

   Acting as an ARP proxy, PE router SHOULD only respond to an ARP 
   request for the target host for which there is a route in the 
   associated VRF and the outgoing interface of the route is different 
   from the one over which the ARP request arrived. Otherwise, PE 
   router would not respond.  

   In the scenario where a given VPN site (i.e., a data center) is 
   multi-homed to more than one PE router via an Ethernet switch or an 
   Ethernet network, VRRP is usually enabled on these PE routers for 
   router redundancy purposes. In this case, only the PE router which 
   has been elected as the VRRP master is entitled to perform the ARP 
   proxy function and furthermore it SHOULD respond with the virtual IP 
   address, rather than its physical IP address.  

   3.5. CE Host Mobility 

   After moving from one VPN site to another, a CE host (e.g., a VM) 
   will send a gratuitous ARP packet. Upon receiving that packet, PE 
   router attached to the new site will create a host route for that CE 
   host and then advertise it to remote PE routers. PE router which 
   that CE host was previously attached to, upon learning such route, 
   would immediately check whether that CE host is still connected to 
   it by some means (e.g., ARP PING and/or ICMP PING). If not, the PE 
   router would withdraw the corresponding host route which has been 
   advertised before. Meanwhile, the PE router would broadcast a 
   gratuitous ARP packet on behalf of that CE host. As such, the ARP 
   entry of that CE host which was cached on any local CE host would be 
   updated accordingly.  

 
 
Xu, et al.             Expires January 5, 2013               [Page 10] 


Internet-Draft               Virtual Subnet                   July 2012 
 
   3.6. Forwarding Table Scalability 

   3.6.1. MAC Table Reduction on Data Center Switches 

   In VS, the MAC learning domain associated with a given virtual 
   subnet which has been extended across multiple data centers is 
   partitioned into segments and each of the segments is confined 
   within a single data center. Therefore data center switches only 
   needs to learn local MAC addresses, rather than learning both local 
   and remote MAC addresses as required in the case where the 
   traditional VPLS technology [RFC4761, RFC4762] is used for data 
   center interconnect.  

   3.6.2. FIB Reduction on PE Routers 

                                 +------+  
                          +------+RR/APR+------+ 
    +-----------------+   |      +------+      |   +-----------------+  
    |VPN_A:1.1.1.1/24 |   |                    |   |VPN_A:1.1.1.1/24 |  
    |              \  |   |                    |   |  /              |  
    |  +------+     \++---+-+                +-+---++/     +------+  |  
    |  |Host A+------+ PE-1 |                | PE-2 +------+Host B|  |  
    |  +------+\     ++-+-+-+                +-+-+-++     /+------+  |  
    |   1.1.1.2/24    | | |                    | | |    1.1.1.3/24   | 
    |                 | | |                    | | |                 | 
    |     DC West     | | |  IP/MPLS Backbone  | | |      DC East    | 
    +-----------------+ | |                    | | +-----------------+  
                        | +--------------------+ |   
                        |                        |    
VRF_A :                 V                VRF_A : V                   
+------------+---------+--------+------+ +------------+---------+--------+------+  
|   Prefix   | Nexthop |Protocol|In_FIB| |   Prefix   | Nexthop |Protocol|In_FIB|  
+------------+---------+--------+------+ +------------+---------+--------+------+ 
| 1.1.1.1/32 |127.0.0.1| Direct |  Yes | | 1.1.1.1/32 |127.0.0.1| Direct |  Yes |  
+------------+---------+--------+------+ +------------+---------+--------+------+  
| 1.1.1.2/32 | 1.1.1.2 | Direct |  Yes | | 1.1.1.2/32 |  PE-1   |  IBGP  |  No  | 
+------------+---------+--------+------+ +------------+---------+--------+------+  
| 1.1.1.3/32 |   PE-2  |  IBGP  |  No  | | 1.1.1.3/32 | 1.1.1.3 | Direct |  Yes |  
+------------+---------+--------+------+ +------------+---------+--------+------+  
| 1.1.1.0/25 |    RR   |  IBGP  |  Yes | | 1.1.1.0/25 |    RR   |  IBGP  |  Yes |  
+------------+---------+--------+------+ +------------+---------+--------+------+ 
|1.1.1.128/25|    RR   |  IBGP  |  Yes | |1.1.1.128/25|    RR   |  IBGP  |  Yes |   
+------------+---------+--------+------+ +------------+---------+--------+------+ 
| 1.1.1.0/24 | 1.1.1.1 | Direct |  Yes | | 1.1.1.0/24 | 1.1.1.1 | Direct |  Yes |   
+------------+---------+--------+------+ +------------+---------+--------+------+ 
                      Figure 5: FIB Reduction Example 

 
 
Xu, et al.             Expires January 5, 2013               [Page 11] 


Internet-Draft               Virtual Subnet                   July 2012 
 
   To reduce the FIB size of PE routers, Virtual Aggregation (VA) [VA-
   AUTO] technology can be used here. Take the VPN instance A shown in 
   Figure 5 as an example, the procedures of FIB reduction are as 
   follows:  

   1) Multiple more specific prefixes (e.g., 1.1.1.0/25 and 
      1.1.1.128/25) equivalent to the prefix of virtual subnet (i.e., 
      1.1.1.0/24) are configured as Virtual Prefixes (VPs) and a Route-
      Reflector (RR) is configured as an Aggregation Point Router (APR) 
      for these VPs. PE routers as RR clients advertise host routes for 
      their own local CE hosts to the RR which in turn, as an APR, 
      installs those host routes into FIB and then attach the ''can-
      suppress'' tag to those host routes before reflecting them to its 
      clients. Those host routes which have been attached with that tag 
      would not be installed into FIB by clients who are VA-aware since 
      they are not APRs for those host routes. In addition, the RR as an 
      APR would advertise the corresponding VP routes to all of its 
      clients, and those of which who are VA-aware in turn would install 
      these VP routes into FIB. Upon receiving a packet from a local CE 
      host, if no matching host route found, the ingress PE router will 
      forward the packet to the RR according to one of the VP routes 
      learnt from the RR, which in turn forwards the packet to the 
      egress PE router according to the host route learnt from that 
      egress PE router. In a word, the FIB table size of PE routers can 
      be greatly reduced at the cost of path stretch. Note that in the 
      case where the RR is not available for transferring L3VPN traffic 
      between PE routers due to some reason, the APR function could 
      actually be performed by a given PE router other than the RR as 
      long as that PE router has installed all host routes belonging to 
      the virtual subnet into its FIB. In this way, the RR only needs to 
      attach a ''can-suppress'' tag to the host routes learnt from its 
      clients before reflecting them to the other clients. Furthermore, 
      PE routers themselves could directly attach the ''can-suppress'' tag 
      to the host routes for their local CE hosts before distributing 
      them to remote peers.  

   2) Provided a given local CE host sends an ARP request for a remote 
      CE host, ingress PE router receiving such request will immediately 
      install the host route for that remote CE host into FIB, in case 
      there is a host route for that CE host in RIB and which has not 
      yet been installed into FIB. Therefore, the subsequent packets 
      destined for that remote CE host will be forwarded directly to the 
      egress PE router. Note that the FIB entries corresponding to 
      remote host routes would expire if they have not been used for 
      routing packets for a certain period of time. 

 
 
Xu, et al.             Expires January 5, 2013               [Page 12] 


Internet-Draft               Virtual Subnet                   July 2012 
 
   3.6.3. RIB Reduction on PE Routers 

                                 +------+  
                          +------+  RR  +------+ 
    +-----------------+   |      +------+      |   +-----------------+  
    |VPN_A:1.1.1.1/24 |   |                    |   |VPN_A:1.1.1.1/24 |  
    |              \  |   |                    |   |  /              |  
    |  +------+     \++---+-+                +-+---++/     +------+  |  
    |  |Host A+------+ PE-1 |                | PE-2 +------+Host B|  |  
    |  +------+\     ++-+-+-+                +-+-+-++     /+------+  |  
    |   1.1.1.2/24    | | |                    | | |    1.1.1.3/24   | 
    |                 | | |                    | | |                 | 
    |     DC West     | | |  IP/MPLS Backbone  | | |      DC East    | 
    +-----------------+ | |                    | | +-----------------+  
                        | +--------------------+ |   
                        |                        |    
VRF_A :                 V                VRF_A : V                   
+------------+---------+--------+        +------------+---------+--------+  
|   Prefix   | Nexthop |Protocol|        |   Prefix   | Nexthop |Protocol|  
+------------+---------+--------+        +------------+---------+--------+ 
| 1.1.1.1/32 |127.0.0.1| Direct |        | 1.1.1.1/32 |127.0.0.1| Direct |  
+------------+---------+--------+        +------------+---------+--------+  
| 1.1.1.2/32 | 1.1.1.2 | Direct |        | 1.1.1.3/32 | 1.1.1.3 | Direct |  
+------------+---------+--------+        +------------+---------+--------+  
| 1.1.1.0/25 |    RR   |  IBGP  |        | 1.1.1.0/25 |    RR   |  IBGP  |  
+------------+---------+--------+        +------------+---------+--------+ 
|1.1.1.128/25|    RR   |  IBGP  |        |1.1.1.128/25|    RR   |  IBGP  | 
+------------+---------+--------+        +------------+---------+--------+  
| 1.1.1.0/24 | 1.1.1.1 | Direct |        | 1.1.1.0/24 | 1.1.1.1 | Direct |  
+------------+---------+--------+        +------------+---------+--------+  
                      Figure 6: RIB Reduction Example 

   To reduce the RIB size of PE routers, BGP Outbound Route Filtering 
   (ORF) mechanism is used to realize on-demand route announcement. 
   Take the VPN instance A shown in Figure 6 as an example, the 
   procedures of RIB reduction are as follows:  

   1) PE routers as RR clients advertise host routes for their local CE 
      hosts to a RR which in turn, however doesn't reflect these host 
      routes by default unless it receives explicit ORF requests for 
      them from its clients. The RR is configured with routes for more 
      specific subnets (e.g., 1.1.1.0/25 and 1.1.1.128/25) equivalent to 
      the virtual subnet (i.e., 1.1.1.0/24) with next-hop being pointed 
      to Null0 and then advertises these routes to its clients via BGP. 
      Upon receiving a packet from a local CE host, if no matching host 
      route found, ingress PE router will forward the packet to the RR 
      according to one of the subnet routes learnt from the RR, which in 
      turn forwards the packet to the egress PE router according to the 
 
 
Xu, et al.             Expires January 5, 2013               [Page 13] 


Internet-Draft               Virtual Subnet                   July 2012 
 
      host route learnt from that egress PE router. In a word, the RIB 
      table size of PE routers can be greatly reduced at the cost of 
      path stretch. Just as the approach mentioned in section 3.6.2, in 
      the case where the RR is not available for transferring L3VPN 
      traffic between PE routers due to some reason, a PE router other 
      than the RR could advertise the more specific subnet routes as 
      long as that PE router has installed all host routes belonging to 
      that virtual subnet into its FIB. 

   2) Provided a given local CE host sends an ARP request for a remote 
      CE host, ingress PE router receiving such request will request the 
      corresponding host route from its RR by using ORF (e.g., a group 
      ORF containing Route-Target (RT) and prefix information) in case 
      there is no host route for that CE host yet in its RIB. Once the 
      host route for the remote CE host is learnt from the RR, the 
      subsequent packets destined for that CE host would be forwarded 
      directly to the egress PE router. Note that the RIB entries of 
      remote host routes could expire if they have not been used for 
      routing packets for a certain period of time. Once the expiration 
      time for a given RIB entry is approaching, the PE router would 
      notice its RR to withdraw the corresponding host route by sending 
      an ORF message. Upon receiving the corresponding withdraw message 
      from its RR, the PE router will delete that host route from its 
      RIB accordingly. 

   3.7. ARP Table Scalability on Default Gateways 

   In the case where data center default gateway functions are 
   implemented on PE routers of the VS as shown in Figure 4, since the 
   ARP table on each PE router only needs to contain ARP entries of 
   local CE hosts, the ARP table size will not grow accordingly as the 
   number of data centers to be connected increases. 

   Alternatively, if dedicated default gateways are directly connected 
   to PE routers of the VS as shown in Figure 3. Due to the use of ARP 
   proxy on PE routers, all remote CE hosts of a given virtual subnet 
   share the same MAC address (i.e., the MAC address of the local PE 
   router) from the point of view of default gateways. Therefore, ARP 
   entries of those remote CE hosts could be aggregated into one ARP 
   entry (i.e., 1.1.1.0/24-> the MAC address of the PE router). 
   Accordingly, default gateways are required to use the longest-
   matching algorithm for ARP cache lookup instead of the existing 
   exact-matching algorithm. In this way, the ARP table size of DC 
   gateways can be reduced greatly as well.  

 
 
Xu, et al.             Expires January 5, 2013               [Page 14] 


Internet-Draft               Virtual Subnet                   July 2012 
 
   3.8. ARP/Unknown Uncast Flood Avoidance 

   In VS, the flooding domain associated with a given virtual subnet 
   that has been extended across multiple data centers, has been 
   partitioned into segments and each of the segments is confined 
   within a single data center. Therefore, the performance impact on 
   networks and servers caused by the flooding of ARP broadcast and 
   unknown unicast traffic is alleviated.   

   3.9. Active-active Multi-homing 

   For the PE router redundancy purpose, a VPN site could be multi-
   homed to more than one PE router. In this case, VRRP [RFC2338] 
   SHOULD be enabled on these PE routers and only the PE router which 
   has been elected as the VRRP master could perform the ARP proxy 
   functionality. However, all PE routers, either as a VRRP master or a 
   VRRP slave, are allowed to advertise host routes for their local CE 
   hosts. Hence, from the perspective of remote PE routers, there will 
   be multiple host routes for a given CE host located within that 
   multi-homed site. In other words, active-active multi-homing is 
   available for the inbound traffic of a given multi-homed site.  

   3.10. Path Optimization 

   Take the scenario shown in Figure 4 as an example, to optimize the 
   forwarding path for traffic between enterprise sites (e.g., cloud 
   users) and cloud data centers, PE routers located at cloud data 
   centers (i.e., PE-1 and PE-2), which also perform the role of data 
   center default gateway, could propagate host routes for their local 
   CE hosts respectively to remote PE routers which are attached to 
   enterprise sites (i.e., PE-3).  As such, the traffic from enterprise 
   sites to a given server on the virtual subnet which has been 
   extended across data centers would be forwarded directly to the data 
   center location at which that server is actually located, since the 
   traffic is now forwarded on basis of the host route for that server, 
   rather than the subnet route. Furthermore, for the traffic from the 
   cloud data center to enterprise sites, since each PE router acting 
   as an default gateway would forward the traffic received from its 
   local CE hosts directly to the remote PE routers (i.e., PE-3) 
   according to the best-match route in the corresponding VRF, and as a 
   result, the traffic from data centers to enterprise sites is 
   forwarded along the optimal path without consuming the data center 
   interconnect bandwidth resources. 

4. Future Work 

   How to support IPv6 CE hosts in VS is for future study. 

 
 
Xu, et al.             Expires January 5, 2013               [Page 15] 


Internet-Draft               Virtual Subnet                   July 2012 
 
5. Security Considerations 

   TBD. 

6. IANA Considerations 

   There is no requirement for IANA.  

7. Acknowledgements 

   Thanks to Dino Farinacci, Himanshu Shah, Nabil Bitar, Giles Heron, 
   Ronald Bonica, Monique Morrow and Christian Jacquenet for their 
   valuable comments and suggestions on this document. 

8. References 

8.1. Normative References 

   [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate               
             Requirement Levels", BCP 14, RFC 2119, March 1997. 

8.2. Informative References 

   [RFC4364] Rosen. E and Y. Rekhter, "BGP/MPLS IP Virtual Private             
             Networks (VPNs)", RFC 4364, February 2006. 

   [MVPN] Rosen. E and Aggarwal. R, "Multicast in MPLS/BGP IP VPNs", 
             draft-ietf-l3vpn-2547bis-mcast-10.txt, Work in Progress, 
             Janurary 2010. 

   [VA-AUTO] Francis, P., Xu, X., Ballani, H., Jen, D., Raszuk, R., and         
             L. Zhang, "Auto-Configuration in Virtual Aggregation", 
             draft-ietf-grow-va-auto-05.txt, Work in Progress, December 
             2011.  

   [RFC925] Postel, J., "Multi-LAN Address Resolution", RFC-925, USC         
             Information Sciences Institute, October 1984. 

   [RFC1027] Smoot Carl-Mitchell, John S. Quarterman, ''Using ARP to 
             Implement Transparent Subnet Gateways'', RFC 1027, October 
             1987. 

   [RFC2338] Knight, S., et al., "Virtual Router Redundancy Protocol",          
             RFC 2338, April 1998. 

 
 
Xu, et al.             Expires January 5, 2013               [Page 16] 


Internet-Draft               Virtual Subnet                   July 2012 
 
   [RFC4761] Kompella, K. and Y. Rekhter, "Virtual Private LAN Service          
             (VPLS) Using BGP for Auto-Discovery and Signaling", RFC            
             4761, January 2007. 

   [RFC4762] Lasserre, M. and V. Kompella, "Virtual Private LAN Service         
             (VPLS) Using Label Distribution Protocol (LDP) Signaling",         
             RFC 4762, January 2007. 

   [802.1AB] IEEE Standard 802.1AB-2009, "Station and Media Access 
             Control Connectivity Discovery", September 17, 2009.     

   [802.1Qbg] IEEE Draft Standard P802.1Qbg/D2.0, "Virtual Bridged 
             Local Area Networks -Amendment XX: Edge Virtual Bridging", 
             Work in Progress, December 1, 2011. 

   [NARTEN-ARMD] Narten, T., Karir, M., and I. Foo, "Problem Statement 
             for ARMD", draft-ietf-armd-problem-statement-01.txt, Work 
             in Progress, February 2012. 

Authors' Addresses 

   Xiaohu Xu 
   Huawei Technologies, 
   Beijing, China. 
   Phone: +86 10 60610041 
   Email: xuxiaohu@huawei.com 
    
   Susan Hares 
   Huawei Technologies (FutureWei group) 
   2330 Central Expressway 
   Santa Clara, CA 95050 
   Phone: +1-734-604-0332 
   Email: Susan.Hares@huawei.com 

   Yongbing Fan
   Guangzhou Institute,China Telecom
   Guangzhou, China.
   Phone: +86 20 38639121
   Email: fanyb@gsta.com

Xu, et al.             Expires January 5, 2013               [Page 17]