Network Working Group G. Chen
Internet-Draft H. Deng
Intended status: Informational B. Zhou
Expires: January 7, 2010 CMCC, Inc.
M. Xu
D. Huo
Y. Cao
Tsinghua University
July 6, 2009
An Incremental Deployable Mapping Service for Scalable Routing
Architecture
draft-chen-lisp-er-mo-00
Status of this Memo
This Internet-Draft is submitted to IETF in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet-
Drafts.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html.
This Internet-Draft will expire on January 7, 2010.
Copyright Notice
Copyright (c) 2009 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents in effect on the date of
publication of this document (http://trustee.ietf.org/license-info).
Please review these documents carefully, as they describe your rights
and restrictions with respect to this document.
Chen, et al. Expires January 7, 2010 [Page 1]
Internet-Draft ER+MO July 2009
Abstract
This document describes a mechanism of providing mapping service for
LISP-like architecture. The mapping service comprises of EID Router
(ER) mechanism and supplementary DHT Mapping Overlay (MO), in which
ER mechanism is for non-cached packets tunneling, while the DHT MO
serves as a supplement that provides specific mappings to reduce
tunneling cost. The mechanism is flexibly deployable for ISPs since
it costs little and is easy to progress.
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3
2. Definition of Terms . . . . . . . . . . . . . . . . . . . . . 4
3. Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
4. When an ITR meets packets . . . . . . . . . . . . . . . . . . 6
5. Utilization of BGP advertising . . . . . . . . . . . . . . . . 7
5.1. Automatic Mapping obtainment and storage . . . . . . . . . 7
5.2. Mapping propagation by BGP . . . . . . . . . . . . . . . . 7
6. EID Router mechanism . . . . . . . . . . . . . . . . . . . . . 9
6.1. Address aggregation policy . . . . . . . . . . . . . . . . 9
6.2. EID Router . . . . . . . . . . . . . . . . . . . . . . . . 9
6.3. When an ER meets packets . . . . . . . . . . . . . . . . . 9
7. Supplementary DHT Mapping Overlay (MO) . . . . . . . . . . . . 11
7.1. Mapping Node (MN) and Mapping Server (MS) . . . . . . . . 11
7.2. MNID Assignment and K-bucket Table . . . . . . . . . . . . 11
7.3. LOOKUP Process . . . . . . . . . . . . . . . . . . . . . . 12
7.4. Security Consideration of Mapping Storage . . . . . . . . 12
7.5. Self-adaptive Capability . . . . . . . . . . . . . . . . . 12
7.6. Dynamic Adjustment of K value and m value . . . . . . . . 13
7.7. Mapping Storing and Exchanging in Multi-homing Scenario . 13
8. Incremental Deployment . . . . . . . . . . . . . . . . . . . . 14
9. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 15
10. Security Considerations . . . . . . . . . . . . . . . . . . . 16
11. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 17
12. References . . . . . . . . . . . . . . . . . . . . . . . . . . 18
12.1. Normative References . . . . . . . . . . . . . . . . . . . 18
12.2. Informative References . . . . . . . . . . . . . . . . . . 18
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 19
Chen, et al. Expires January 7, 2010 [Page 2]
Internet-Draft ER+MO July 2009
1. Introduction
LISP [I-D.farinacci-lisp] is an architecture for scalable routing.
It defines two address spaces: Routing Locators (RLOC) and Endpoint
Identifiers (EID). LISP uses EIDs as lookup keys for a new EID-to-
RLOC mapping database, in which way several mapping services are
built such as [I-D.fuller-lisp-alt] and [I-D.meyer-lisp-cons]. In
these mapping service solutions, different kinds of overlays are
designed and built as database for storing mapping information, as
well as providing mapping lookup results for mapping queries.
The problem they commonly share is that packets without any caches on
current ITR have to be waiting for the reply of mapping lookup query,
or simply be dropped by this ITR as long as no relevant cache exists
on this ITR.
One solution to this problem could be that, instead of sending lookup
queries to the Mapping Overlay (MO), data packet itself is sent to
the MO as a query and get forwarded in the MO to the final ETR linked
to the site in which the destination EID resides. But usually when a
packet is going through the MO, long latency becomes a remarkable
problem then.
In this draft we describe a mechanism that aims:
o to eliminate all forwarding entries to distant customer ASes in P
routers;
o to eliminate the forwarding entries, targeted to distant customer
ASes not behind the border routers, in the border routers;
o to be deployed incrementally;
o to help reduce the number of tunnels.
Chen, et al. Expires January 7, 2010 [Page 3]
Internet-Draft ER+MO July 2009
2. Definition of Terms
Mapping: EID-to-ELOC mapping.
EID Router (ER): a new introduced router which keeps entries to all
EIDs.
EID aggregated prefix: an aggregated prefix which covers some EID
blocks.
EID+RLOC aggregated prefix: an aggregated prefix which covers EID
blocks and RLOCs.
Mapping Node (MN): an entity used for storing a mapping. Each MN
holds and can only hold one mapping, and each mapping is related
to only one MN.
Master Mapping Node (MMN): a chosen Mapping Node used to be the
representative among redundant MNs. It is in charge of initiating
mapping query and exchanging mappings.
Mapping Server (MS): a server specified to physically store
mappings. Each MS can hold more than one Mapping Nodes.
Mapping Overlay (MO): a DHT overlay based on Kademlia protocol,
which is designed for storing the distributed mapping information.
Only one MO exists among ISPs in the Internet.
Chen, et al. Expires January 7, 2010 [Page 4]
Internet-Draft ER+MO July 2009
3. Overview
To achieve the four aims in Section 1, the mechanism described in
this draft mainly comprises of the following two parts:
o EID Router (ER) mechanism for non-cached packets tunneling, and
o DHT Mapping Overlay (MO) as a supplement, which provides specific
mappings to reduce tunneling cost.
The EID Router mechanism is designed for the first three aims, and
the DHT MO is designed for the last aim.
In EID Router mechanism, by manually or automatically setting the
default route to an ER (each AS at least has one ER), all forwarding
entries to distant customer ASes in P routers, and a part of
forwarding entries (targeted to distant customer ASes not behind the
border routers) in the border routers can be eliminated.
The current running Border Gateway Protocol [RFC4271] is mainly
utilized to propagate mappings through the current running BGP
speaking system. The most important reason to use the current
running BGP speaking system is to make the deployment backward
compatible, so that incremental deployment can be achieved.
The DHT Mapping Overlay can help reduce the number of tunnels which
result from deploying the ER mechanism. It is optional for ISPs and
only needs a little investment on it.
Chen, et al. Expires January 7, 2010 [Page 5]
Internet-Draft ER+MO July 2009
4. When an ITR meets packets
When an ITR receives a packet originated from a customer site, it
checks whether a copy of mapping exists in its cache first.
If the mapping exists, the ITR encapsulates the packet in a LISP
header, putting the RLOC extracted from the mapping onto the outer
destination address, meanwhile selecting one of the ITR's RLOC as the
outer source address.
Else if cache misses (i.e. no relevant copy of mapping exists in the
ITR), two concurrent events occur:
o Data Plane Traffic: the packet simply follows a default route
preset manually or automatically to an ER in current AS. Since ER
knows whole global mapping information, it can forward every
packet to the right ETR by encapsulating the packet in LISP header
with the ITR's RLOC in the outer source address and the ETR's RLOC
in the outer destination address.
o Control Plane Traffic: the ITR sends a Mapping Query to its
default Mapping Server (MS) in the AS. And then a mapping LOOKUP
process (details of mapping lookup process are shown in Section 7)
is launched in the Mapping Overlay (MO) by the Master Mapping Node
(MMN) of the ITR. Finally a copy of queried mapping is returned
to the ITR which initiated the Mapping Query, and is cached for a
period of time.
Chen, et al. Expires January 7, 2010 [Page 6]
Internet-Draft ER+MO July 2009
5. Utilization of BGP advertising
The BGP is an inter-Autonomous System routing protocol. The primary
function of a BGP speaking system is to exchange network reachability
information with other BGP systems. This network reachability
information includes information on the list of ASes that
reachability information traverses. This information is sufficient
for constructing a graph of AS connectivity for this reachability, as
well as inevitable for constructing the mappings from EIDs onto RLOCs
automatically. Moreover, especially for incremental deployment
requirement, which means ASes deployed new mechanism must work along
with those who not deployed, it is necessary to apply BGP to mapping
service.
The BGP in the mapping service has two functions: to obtain the
mappings automatically, and to propagate mappings to ERs in other
ASes.
5.1. Automatic Mapping obtainment and storage
When an customer AS advertise an BGP UPDATE message to homed (no
matter single-homed or multi-homed) provider AS which is deployed the
DHT mapping server described in Section 7, the provider AS would set
or update the relevant mapping information according to the
advertised route to the customer AS. The announced prefix is treat
as the EID in the mapping <EID, RLOC> and the address of the ETR
which directly receives BGP announcement from the customer AS is
chosen as the RLOC.
This mapping could be stored both in MN (Mapping Node) and ER (EID
Router) concurrently. In the former case, one mapping refers to one
MN and vice versa as described in Section 7. However in the latter
case, the mapping is not only stored in the ER in current provider
AS, but also propagated to distant provider ASes by BGP
advertisements and stored in ERs at those ASes.
5.2. Mapping propagation by BGP
BGP speakers work as what they act today, in addition that mapping
information is affiliated in BGP UPDATE message. Each BGP speaker on
the route SHALL keep the originality of the mappings (i.e., the
mappings stay untouched during propagation), except that it
aggregates some prefixes into one. New mapping SHOULD be formed when
such aggregation occurs, in which case both EID and RLOC in mapping
<EID, RLOC> are updated, that EID is set to the new aggregated EID
block which covers more prefixes while RLOC is set to the address of
either ER (if ER is deployed) or border router (if no ER is deployed)
in current AS.
Chen, et al. Expires January 7, 2010 [Page 7]
Internet-Draft ER+MO July 2009
Note that since aggregation is permitted during the mapping
propagation, the number of mappings stored on the ERs would be far
more less than the number of mappings stored in the MO.
Chen, et al. Expires January 7, 2010 [Page 8]
Internet-Draft ER+MO July 2009
6. EID Router mechanism
6.1. Address aggregation policy
All addresses from edge customer ASes can be seen as the EIDs. EID
prefixes can be aggregated to EID aggregated prefix. Moreover we
allow EIDs to be aggregated with RLOCs to EID+RLOC aggregated prefix.
For example, suppose two EID blocks 166.111.8/24 and 166.111.9/24
belong to two customer ASes homed to a provider AS which has some
RLOCs range from 166.111.10/24 to 166.111.11/24, the provider AS can
aggregate either to an EID aggregated prefix 166.111.8/23 or to an
EID+RLOC aggregated prefix 166.111.8/22.
6.2. EID Router
An EID Router is no particular than a legacy router, except that
special configuration is applied. It is configured to act as an eBGP
speaker, and only loads the forwarding entries to all EIDs including
EID aggregated prefixes. Note that the EID+RLOC aggregated prefixes
don't have to be loaded in EID Routers, since the RLOCs in the EID+
RLOC aggregated prefixes are supposed be reachable (i.e., forwarding
entries to these prefixes should be preserved in the P routers).
So the ideal situation becomes:
o the EID Routers load the forwarding entries to all EIDs including
EID aggregated prefixes,
o the P routers load the forwarding entries to all RLOCs and all
EID+RLOC aggregated prefixes, and
o the border routers load the forwarding entries to all EIDs, RLOCs
and EID+RLOC aggregated prefixes of the distant ASes behind the
border routers.
So due to deploying the EID Router mechanism, P routers and border
routers can get their FIB (Forwarding Information Base) size reduced.
6.3. When an ER meets packets
When an ER receives a packet, it matches the destination address with
entries in its forwarding table (that can be seen as the mapping
table). Since the ER holds whole mapping table (from its angle of
view), this packet can be encapsulated in a LISP header and sent out.
The tunnel end point may be one of the following four kinds of
routers:
Chen, et al. Expires January 7, 2010 [Page 9]
Internet-Draft ER+MO July 2009
o the border router of the peering AS on the path to the
destination, in which case aggregation occurs in this peering AS
or this peering AS didn't pass the mapping information to the
current AS.
o the border router of the non-peering AS on the path to the
destination, in which case aggregation occurs in this non-peering
AS.
o the EID Router of a distant AS (either peering or non-peering) on
the path to the destination, in which case the downstream AS
didn't pass the mapping information to this distant AS so that the
ER in this distant AS created a new mapping (the ER's RLOC is set
in the mapping).
o the destination ETR, in which case the originality of the mapping
is maintained.
Chen, et al. Expires January 7, 2010 [Page 10]
Internet-Draft ER+MO July 2009
7. Supplementary DHT Mapping Overlay (MO)
The DHT Mapping Overlay (MO) is based on Kademlia, a highly efficient
protocol of Distributed Hash Table (DHT) overlay for Peer-to-Peer
network, which applies XOR as metric to measure distance. Here in
the MO, it is adapted to meet several requirements below:
o MO should be scalable;
o MO should have a good ability of redundancy;
o MO should be self-adaptive for mapping adding or failure;
o MO should be flexible for balancing performance and overhead;
o MO should support multi-homing scenario.
The benefit of deploying the MO is that, it provides specific
mappings since it doesn't aggregate prefixes (i.e., mappings stored
in MO are finest-granulated that each mapping refers to one relation
between a customer AS and one of its provider site). Due to the
large number of such fines-granulated mappings, the MO should be
scalable and capable for redundancy. So DHT is chosen as the means
of distributing the mappings.
7.1. Mapping Node (MN) and Mapping Server (MS)
Each mapping is stored on one Mapping Node (MN) in the Mapping
Overlay (MO), and each Mapping Server (MS) can accommodate more than
one MNs. For example, an ISP is accessed by 5 customer ASes labeled
as a, b, c, d, e, whose corresponding EIDs are v, w, x, y, z
respectively. These five EID prefixes of customer ASes are one-to-
one mapped, forming five MNs physically existed on one or multiple
MSes administrated by the ISP.
7.2. MNID Assignment and K-bucket Table
In the MO, each MN is assigned a 160 bit ID. The DHT MO utilizes the
highest numerical IP address reserved in customer ASes as a MNID.
For example, assume a customer AS with a prefix 162.137.2/24 is
mapped to the RLOC 134.121.3.56. The lower 32 bits of the MNID of
the corresponding Master Mapping Node (MMN) is 0xA28902FE (i.e.,
162.137.2.254), and the rest 128 bits are all 0. The mapping will be
stored on this MMN and several (at least one) other MNs whose MNIDs
are closest to the MNID 0xA28902FE.
Each MN manages a K-bucket table of its own that keeps addresses
(RLOCs) and some other information (e.g. MNID) of other nodes. The
Chen, et al. Expires January 7, 2010 [Page 11]
Internet-Draft ER+MO July 2009
table of a node N consists of 160 rows in which the i-th row (0 <= i
< 160) preserves information (i.e. RLOCs and IDs) of some nodes
which are at a distance range 2^I ~ 2^(i+1) from node N. If i becomes
quite large, the number of nodes that the i-th row preserves is
limited to K at most.
7.3. LOOKUP Process
LOOKUP process needs to call FIND_MAP with MNID of destination MN as
parameter. Here describes the FIND_MAP procedure:
1. MN A calculate the distance D from A to B (D = A XOR B);
2. Fetch m MNs from the right row of K-bucket table of MN A and then
query them (call FIND_MAP for every one of these m MNs);
3. MN A set a timer waiting reply for each MN that a called
FIND_MAP. If it expires, then delete information of
corresponding MN in K-bucket table.
4. Each MN who received FIND_MAP call will check if it is one of the
closest MNs destined to B. If so then return mapping to MN A;
else like in step 1 and 2, calculates distance D and fetches m
closer MNs, then return them to MN A.
5. MN A continues to send FIND_MAP calls to those returned MNs until
mapping returned or find K closest MNs (which means no such
mapping existed).
7.4. Security Consideration of Mapping Storage
In native Kademlia, any MN can initiate a STORE call to put the <key,
value> pair on other K closest nodes. But for the reason that it
could probably cause security problem, for instance a malicious MN
store a wrong mapping in other MNs, a mapping can only initially
stored on one or more MNs (a MMN is chosen) which are under
supervision of the ISP who in fact controls this mapping. And only
the MMN is authorized to call STORE. After running for hours, MNs in
some other autonomous systems could keep cache of the mapping.
7.5. Self-adaptive Capability
Comparing to other non-DHT mapping system, the DHT MO is more
adaptive for MN failure and dynamic MN joining.
Assume an ISP deploys multiple MSes for the address block of a
customer AS in one or multiple provider ASes it administrates. When
some of MNs go down, as long as at least one MN is healthy, mappings
Chen, et al. Expires January 7, 2010 [Page 12]
Internet-Draft ER+MO July 2009
service can be normally provided without manually configuration.
Even if they're all out of health temporarily, mapping information
cached on other MNs could also be available in a period of time
(cache updating period).
When a new customer site accesses to some ISP, a new mapping is
required to be added in the MO. It needs to add a new MN u into the
MO and put this mapping in MN u. At first, an existing MN w in MO
should be known and w is put into u's K-bucket table. Then do a
LOOKUP process with u's MNID as parameter. Finally information in
K-bucket table of MN u can be built up and meanwhile other MNs update
their K-bucket table as well during the LOOUP process.
7.6. Dynamic Adjustment of K value and m value
After one LOOKUP, if the time of this LOOKUP is greater than
threshold t (manually configured by ISP), which implies that this
LOOKUP spent too long time, then increase K by 1. At the same time,
if 2m < K then m = 2m, otherwise increase m by 1. Consequently, more
queries will be sent to MNs during this LOOKUP process. However if
the time of this LOOKUP is no greater than t, K value and m value
stay not changed.
When congestion occurs in some AS, K value and m value both decrease
by 1 to suppress number of updates that used to keep in touch with
other MNs.
7.7. Mapping Storing and Exchanging in Multi-homing Scenario
Suppose a scenario that a customer site accesses to more than one
ISP, which is called multi-homing. When a new MMN x puts the new
mapping in the mapping system, another MMN y with the same MNID will
be probed in the MO. Different to native Kademlia protocol, no "ID
Collision Error" occurs. Instead x tells y this new mapping and
meanwhile obtains mapping information existed already. Finally x and
y both know all mapping information about how to destine for the
customer AS. Of course x and y will probe each other to ensure
availability every period of time.
Chen, et al. Expires January 7, 2010 [Page 13]
Internet-Draft ER+MO July 2009
8. Incremental Deployment
This mechanism is practical for incremental deployment, since no big
changes introduced on existing routers. Instead of deploying an
imperative third-party infrastructure over current Internet, an ISP
only puts one or more MSes in its domain and configures it to join
the MO if it wants to benefit from deploying the DHT MO.
An ISP could start from deploying an ER in its domain, through which
way the number of entries in other routers in this domain could be
reduced however the length of the intra-domain route grows. It's up
to ISPs to decide whether to tolerate such length-stretch to obtain
decrease of FIB (Forwarding Information Base) size.
As time goes by, suppose more and more ISPs have deployed ERs. Some
of them may then deploy the DHT MO to benefit from specific mappings
(that can decrease number of tunnels needed in each data
transmission) by simply putting MSes in their ASes and let them join
the MO automatically as described in Section 7.
Note that unlike other mechanisms, no new particular devices are
required to support backward-compatibility.
Chen, et al. Expires January 7, 2010 [Page 14]
Internet-Draft ER+MO July 2009
9. Acknowledgements
Chen, et al. Expires January 7, 2010 [Page 15]
Internet-Draft ER+MO July 2009
10. Security Considerations
The ERs can apply any existing security mechanisms for BGP to enhance
the security. And for DHT MO, existing authentication methods for
DHT (especially for Kademlia) can be adapted to enhance its security.
Other new security enhancements are expected to design to support the
mechanism in this draft in future.
Chen, et al. Expires January 7, 2010 [Page 16]
Internet-Draft ER+MO July 2009
11. IANA Considerations
Chen, et al. Expires January 7, 2010 [Page 17]
Internet-Draft ER+MO July 2009
12. References
12.1. Normative References
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, March 1997.
[RFC4271] Rekhter, Y., Li, T., and S. Hares, "A Border Gateway
Protocol 4 (BGP-4)", RFC 4271, January 2006.
12.2. Informative References
[I-D.farinacci-lisp]
Farinacci, D., Fuller, V., Meyer, D., and D. Lewis,
"Locator/ID Separation Protocol (LISP)",
draft-farinacci-lisp-12 (work in progress), March 2009.
[I-D.fuller-lisp-alt]
Farinacci, D., Fuller, V., Meyer, D., and D. Lewis, "LISP
Alternative Topology (LISP+ALT)", draft-fuller-lisp-alt-05
(work in progress), February 2009.
[I-D.meyer-lisp-cons]
Brim, S., "LISP-CONS: A Content distribution Overlay
Network Service for LISP", draft-meyer-lisp-cons-04 (work
in progress), April 2008.
Chen, et al. Expires January 7, 2010 [Page 18]
Internet-Draft ER+MO July 2009
Authors' Addresses
Gang Chen
CMCC, Inc.
53A, Xibianmennei Ave.,
Xuanwu District
Beijing 100053
P.R.China
Phone: +86-10-1391-071-0674
Email: phdgang@gmail.com
Hui Deng
CMCC, Inc.
53A, Xibianmennei Ave.,
Xuanwu District
Beijing 100053
P.R.China
Phone: +86-10-1391-075-0201
Email: denghui02@gmail.com
Bo Zhou
CMCC, Inc.
53A, Xibianmennei Ave.,
Xuanwu District
Beijing 100053
P.R.China
Phone: +86-10-1381-194-8723
Email: zhouboyj@chinamobile.com
Mingwei Xu
Tsinghua University
Department of Computer Science, Tsinghua University
Beijing 100084
P.R.China
Phone: +86-10-6278-5822
Email: xmw@csnet1.cs.tsinghua.edu.cn
Chen, et al. Expires January 7, 2010 [Page 19]
Internet-Draft ER+MO July 2009
Dong Huo
Tsinghua University
Department of Computer Science, Tsinghua University
Beijing 100084
P.R.China
Phone: +86-10-6278-5822
Email: dhuo.thu@gmail.com
Yu Cao
Tsinghua University
Department of Computer Science, Tsinghua University
Beijing 100084
P.R.China
Phone: +86-10-6278-5822
Email: cyanalyst@126.com
Chen, et al. Expires January 7, 2010 [Page 20]