Network Working Group                                             D. Jen
Internet-Draft                                                 M. Meisel
Intended status: Informational                                 D. Massey
Expires: May 21, 2008                                            L. Wang
                                                                B. Zhang
                                                                L. Zhang
                                                       November 18, 2007


                APT: A Practical Transit Mapping Service
                          draft-jen-apt-01.txt

Status of this Memo

   By submitting this Internet-Draft, each author represents that any
   applicable patent or other IPR claims of which he or she is aware
   have been or will be disclosed, and any of which he or she becomes
   aware will be disclosed, in accordance with Section 6 of BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as Internet-
   Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/ietf/1id-abstracts.txt.

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html.

   This Internet-Draft will expire on May 21, 2008.

Copyright Notice

   Copyright (C) The IETF Trust (2007).

Abstract

   The size of the global routing table is a rapidly growing problem.
   Several solutions have been proposed.  These solutions commonly
   divide the Internet into two address spaces, one for determining the
   delivery location, and one to use during transit.  Packets destined
   for delivery addresses are tunneled through the default-free zone



Jen, et al.               Expires May 21, 2008                  [Page 1]


Internet-Draft               Transit Mapping               November 2007


   (DFZ), which uses only transit addresses.  For this process to work,
   there must be a mapping service that can supply an appropriate
   destination transit address for any given delivery address.  We
   present a design for such a mapping service.  We adhere to a "do no
   harm" design philosophy: maintain all desirable features of the
   current architecture without negatively affecting its security or
   reliability.  Our design aims to minimize delay and prevent loss in
   packet encapsulation, minimize the number of modifications to
   existing hardware, minimize the number of new devices, and keep the
   level of control traffic manageable.









































Jen, et al.               Expires May 21, 2008                  [Page 2]


Internet-Draft               Transit Mapping               November 2007


Table of Contents

   1.  Requirements Notation  . . . . . . . . . . . . . . . . . . . .  4
   2.  Problem Statement  . . . . . . . . . . . . . . . . . . . . . .  4
   3.  Terminology  . . . . . . . . . . . . . . . . . . . . . . . . .  5
   4.  APT Overview and Requirements  . . . . . . . . . . . . . . . .  6
   5.  The Mapping Service  . . . . . . . . . . . . . . . . . . . . .  8
     5.1.  A Mapping Example  . . . . . . . . . . . . . . . . . . . .  9
   6.  Multihoming Support  . . . . . . . . . . . . . . . . . . . . . 11
     6.1.  Using Alternate ETRs During Failures . . . . . . . . . . . 12
       6.1.1.  Handling Taddr Prefix Failures . . . . . . . . . . . . 12
       6.1.2.  Handling Single-ETR Failures . . . . . . . . . . . . . 13
       6.1.3.  Handling TR-to-DN Link Failures  . . . . . . . . . . . 13
   7.  Exchanging MapSets Between TNs . . . . . . . . . . . . . . . . 14
     7.1.  MapSet Dissemination via DM-BGP  . . . . . . . . . . . . . 14
     7.2.  Regular MapSet Refresh . . . . . . . . . . . . . . . . . . 15
   8.  Security and Reliability . . . . . . . . . . . . . . . . . . . 15
     8.1.  Authenticating the Originator of Mapping Updates . . . . . 15
     8.2.  Detecting MapSet Misconfigurations . . . . . . . . . . . . 16
     8.3.  APT Control Messages . . . . . . . . . . . . . . . . . . . 17
   9.  Scalability through Recursion  . . . . . . . . . . . . . . . . 17
   10. Mapping Announcements  . . . . . . . . . . . . . . . . . . . . 18
   11. APT Header and Control Messages  . . . . . . . . . . . . . . . 19
     11.1. APT Header Fields  . . . . . . . . . . . . . . . . . . . . 19
     11.2. Cache Add Messages . . . . . . . . . . . . . . . . . . . . 20
     11.3. Cache Drop Messages  . . . . . . . . . . . . . . . . . . . 20
     11.4. ETR Unreachable Messages . . . . . . . . . . . . . . . . . 20
     11.5. DN Unreachable Messages  . . . . . . . . . . . . . . . . . 21
     11.6. The ETR-to-DN Link Failure Message Type  . . . . . . . . . 21
   12. Incremental Deployment . . . . . . . . . . . . . . . . . . . . 21
   13. IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 21
   14. Security Considerations  . . . . . . . . . . . . . . . . . . . 21
   15. References . . . . . . . . . . . . . . . . . . . . . . . . . . 21
     15.1. Normative References . . . . . . . . . . . . . . . . . . . 21
     15.2. Informative References . . . . . . . . . . . . . . . . . . 22
   Appendix A.  Open Issues . . . . . . . . . . . . . . . . . . . . . 22
   Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 23
   Intellectual Property and Copyright Statements . . . . . . . . . . 25













Jen, et al.               Expires May 21, 2008                  [Page 3]


Internet-Draft               Transit Mapping               November 2007


1.  Requirements Notation

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in [RFC2119].


2.  Problem Statement

   The unexpected, explosive growth of the Internet is causing a greater
   and greater strain on its infrastructure.  This problem has been
   well-documented in [RAWS][AddrAlloc].  Several solutions have been
   proposed to address this problem [CRIO][EFIT][EFIT-ID][LISP][SixOne]
   the majority of which involve separating the Internet into two parts,
   one for determining the delivery location, and one to use during
   transit.  Routers in transit space would only need to know how to
   route to transit prefixes, which are stable and conducive to
   topological aggregation.  When a packet is sent from source delivery
   address A to destination delivery address B, A's provider-edge router
   (the ingress tunnel router, or "ITR", as defined in [LISP])
   encapsulates the packet and sends it through transit space to B's
   provider-edge router (the egress tunnel router, or "ETR").  B's ETR
   decapsulates the packet and forwards it to the appropriate recipient,
   B.

   When encapsulating a packet, A's ITR must somehow determine B's ETR's
   transit address and include it in the outer header.  In general, any
   ITR must be able to map any given delivery address to a corresponding
   ETR transit address for proper tunneling through transit space.  This
   illustrates the need for a mapping service that can provide this
   address.  The design details of this mapping service will play a
   large part in determining the effectiveness of any proposed
   implementation of a delivery/transit address space separation.  The
   mapping service also presents a new opportunity to enhance the
   services currently offered by the Internet, which is further reason
   to carefully consider how this service should be implemented.  Should
   mapping information be distributed via a push or a pull model?  What
   additional information, if any, should be distributed along with the
   mapping information?  Can we satisfy the mapping requirement without
   impacting packet delivery quality?

   Our answers to these questions are rooted in a "do no harm" design
   philosophy: improve routing scalability without sacrificing any
   desirable features in the current architecture or negatively
   affecting its security and reliability.  To this end, we present APT,
   a practical transit mapping service designed with the following goals
   in mind.




Jen, et al.               Expires May 21, 2008                  [Page 4]


Internet-Draft               Transit Mapping               November 2007


   o  Minimize delay and prevent loss in packet encapsulation.

   o  Minimize the number of devices that need to be modified to support
      APT.

   o  Minimize the number of devices that will require additional
      resources or complexity.

   o  Keep the design modular so that the method used to propagate
      mapping information is independent from the method used to
      retrieve mapping information for tunneling.


3.  Terminology

   Transit Network (TN) - An AS whose business is to provide packet
   transport services for its customers.  Transit networks provide
   packet forwarding services for delivery networks (see definition
   below).  As a rule of thumb, if the AS appears in the middle of any
   ASPATH in a BGP route today, it is considered a transit network.

   Delivery Network (DN) - A network that is a source or destination of
   IP packets, but forwards packets between neither TNs nor other
   delivery networks.

   Transit Space - The IP address space used by transit networks.  We
   will also use the term "transit space" to refer to the topological
   area of the Internet where transit addresses are routable.

   Delivery Space - The set of all IP address spaces used by delivery
   networks.  We will also use the term "delivery space" to refer to the
   topological area of the Internet outside of transit space -- that is,
   where only delivery addresses are routable.

   Transit Address (Taddr) - A Taddr is an address in transit space.

   Delivery Address (Daddr) - A Daddr is an address in delivery space.

   Default Mapper - A new device required by APT.  Each transit network
   MUST have at least one default mapper.  A default mapper maintains a
   complete mapping table.  In other words, given any Daddr, default
   mappers can return a corresponding Taddr.  To support the growing
   trend towards multihoming, the mappings stored in default mappers
   will map a Daddr prefix to a non-empty SET of destination Taddrs, all
   of which are expected to have a direct connection to the DN.

   Tunnel Router (TR) - All edge routers in a TN will become TRs.  Like
   ITRs and ETRs in LISP [LISP], TRs provide the encapsulation and



Jen, et al.               Expires May 21, 2008                  [Page 5]


Internet-Draft               Transit Mapping               November 2007


   decapsulation services required for tunneling packets through transit
   space.  A TR has both ITR and ETR functionality, meaning that any TR
   can perform both encapsulation and decapsulation of packets.  To
   properly encapsulate any given packet, TRs can query the default
   mappers for mapping information.  TRs also cache commonly used
   MapRecs locally.  Note that TR cache entries are NOT identical to the
   mappings stored at default mappers (see the definitions of "MapSet"
   and "MapRec" below).

   APT Node - A general term referring to any device type introduced by
   APT.  This includes both default mappers and TRs.

   MapSet - A MapSet contains a Daddr prefix and a non-empty SET of ETR
   Taddrs associated with the prefix.  MapSets also include related
   information such as priority rankings for each of the ETRs in the
   set.  Default mappers store MapSets.

   MapRec - A MapRec contains a Daddr prefix and any SINGLE ETR Taddr
   associated with that prefix.  Any MapRec is a subset of the complete
   MapSet for its Daddr prefix.  TRs store MapRecs along with an
   associated TTL.  A MapRec is removed from a TR's cache once its TTL
   expires.


4.  APT Overview and Requirements

   This section is a comprehensive overview of the devices and protocols
   introduced by APT.  For explanations and justifications, see the
   corresponding referenced sections.

   Default Mapper Requirements (see Section 5)

   o  Default mappers must have enough storage space to store the full,
      global mapping table and associated metadata.

   o  Every destination Taddr in a MapSet MUST have an associated time
      before retry (TBR, see Section 6.1).

   o  Default mappers MUST keep track of the Taddrs of the TRs they
      serve.

   o  Default mappers MUST examine the destination Taddr of incoming
      packets for addresses other than their own.

   TR Requirements (see Section 5)

   o  TRs MUST keep a small cache to hold recently-used MapRecs and
      their TTLs.



Jen, et al.               Expires May 21, 2008                  [Page 6]


Internet-Draft               Transit Mapping               November 2007


   o  TRs MUST have a default route to their default mapper.

   o  TRs MUST be able to encapsulate and decapsulate IP-in-UDP packets
      with an APT header (see Section 11).

   Failover for Multihomed DNs (see Section 6.1)

   o  When a Taddr prefix is withdrawn via BGP (see Section 6.1.1)

      *  ITRs forward packets destined for unroutable Taddrs to their
         default mapper.

      *  The default mapper forwards the packet to an alternate ETR if
         one is available.

      *  The default mapper sends a Cache Add Message to the originating
         ITR.

   o  When a TR becomes unreachable (see Section 6.1.2)

      *  Packets destined for the TR are intercepted by its default
         mapper.

      *  The default mapper sets the TBR for the appropriate MapRec.

      *  The default mapper forwards TR-addressed packets to an
         alternate ETR if one is available.

      *  The default mapper sends an ETR Unreachable packet to the ITR's
         default mapper.

      *  The default mapper broadcasts a Cache Drop Message to its TRs.

      *  The ITR's default mapper sets the TBR for the appropriate
         MapRec.

      *  The ITR's default mapper broadcasts a Cache Drop Message to its
         TRs.

   o  When a DN becomes unreachable from its TR (see Section 6.1.3)

      *  The TR forwards packets destined for the DN to its default
         mapper, setting the APT packet type to ETR-to-DN link failure
         (see Section 11.1).

      *  The default mapper sets the TBR for the appropriate MapRec.





Jen, et al.               Expires May 21, 2008                  [Page 7]


Internet-Draft               Transit Mapping               November 2007


      *  The default mapper forwards the packet to an alternate ETR if
         one is available.

      *  The default mapper sends a Delivery Network Unreachable packet
         to the ITR's default mapper.

      *  The default mapper broadcasts a Cache Drop Message to its TRs.

      *  The ITR's default mapper sets the TBR for the appropriate
         MapRec.

      *  The ITR's default mapper broadcasts a Cache Drop Message to its
         TRs.

   Mapping Dissemination

   o  Default mappers MUST sign updates with their TN's private key.

   o  Default mappers MUST verify the signature before processing or
      forwarding MapSet updates (see Section 8).

   o  Default mappers MUST NOT remove or alter the signature when
      forwarding the update.

   o  Default mappers MUST cryptographically sign control messages that
      may need to travel between ASes.

   o  Default mappers MUST speak DM-BGP and peer with other default
      mappers (see Section 7.1).

      *  DM-BGP is a separate instance of standard BGP that runs on a
         different TCP port.

      *  Only default mappers speak DM-BGP.

      *  DM-BGP updates carry mapping updates in a new attribute type.


5.  The Mapping Service

   TRs serve as the gateway between delivery and transit space.  When a
   TR receives a packet from a DN that needs to be routed through
   transit space, it maps the packet's destination Daddr to an
   appropriate destination Taddr (the mapping lookup details are
   presented below).  The TR will then encapsulate the packet with a UDP
   header containing an APT header followed by the original layer-3
   packet as the UDP payload (see Section 11).  The packet can then be
   routed through transit space.



Jen, et al.               Expires May 21, 2008                  [Page 8]


Internet-Draft               Transit Mapping               November 2007


   To minimize the latency introduced by encapsulation, APT seeks to
   store mapping information as close to the ITR as possible.  However,
   the global mapping table is likely to grow very large over time.  To
   avoid undue memory requirements for ITRs while still keeping mapping
   information within reach, we introduce the concept of default
   mappers.

   A TR does not need to store the entire global mapping table.
   Instead, it queries a default mapper for mapping information and
   caches recently used MapRecs.

   Default mappers are the only devices in the network that need to
   store the complete global mapping table.  As we will see in the
   following example, TRs only make use of default mappers in the event
   of a cache miss.  This means that, given sufficiently sized caches at
   the TRs, network latency will not heavily depend upon default mapper
   performance.  Note that each TN need only have a single default
   mapper, but may choose to deploy more to avoid a single point of
   failure and to enhance overall performance.  In the latter case, a TN
   MAY choose to use anycast to reach one of the default mappers or use
   multicast to reach all of them.

5.1.  A Mapping Example




























Jen, et al.               Expires May 21, 2008                  [Page 9]


Internet-Draft               Transit Mapping               November 2007


   Below is a simple topology for demonstrative purposes.  A and B are
   DNs, each addressable via a single Daddr prefix, TN1 and TN2 are TNs,
   ITR1, ETR1, and ETR2 are TRs, any node labeled "X" is a router, and
   M1 and M2 are default mappers.  A portion of the mapping table for M1
   is shown.

        ___                               ___
       / A \                             / B \_________
       \___/                             \___/        |   Delivery Space
- - - - -|- - - - - - - - - - - - - - - - -| - - - - -|- - - - - - - - -
      .--+---.                          .--+---.      |    Transit Space
   __-| ITR1 |-__                    __-| ETR1 |-__   |
  /   '------'  .`--.            .--'.  '------'  .`--+--.
 |   T   ____   | X |------------| X | T   ____   | ETR2 |
 |   N  | M1 |  '-;-'            '-:-' N  | M2 |  '-;----'
  \  1  '-/\-'   /                  \  2  '----'   /
   \_____/  \___/                    \____________/
 _______/    \___________________
|    DN    | TS Addr  | Priority |
|----------|----------|----------|
|   ...    |   ...    |    ...   |
|----------|----------|----------|
|    B     |   ETR1   |    10    |
|          |   ETR2   |    20    |
|----------|----------|----------|
|   ...    |   ...    |    ...   |
'--------------------------------'

                                 Figure 1

   In this section, we illustrate how TRs and default mappers interact
   within a TN to properly tunnel packets through transit space.

   In Figure 1, a node in network A sends a packet to a Daddr in network
   B. When this packet arrives at ITR1, ITR1 looks up the destination
   Daddr in its MapRec cache.  If a matching prefix is present in its
   cache, ITR1 simply encapsulates the packet with the corresponding
   destination Taddr and sends it across transit space.  If a matching
   prefix is not present, ITR1 will send the packet through its default
   mapper, M1.  It does this by encapsulating the packet with the
   (possibly anycast) address for its default mapper(s) as the
   destination Taddr.

   This packet will arrive at M1, the only default mapper in TN1.  When
   M1 receives the packet, it decapsulates the packet and examines the
   destination Daddr.  Since default mappers store the full, global
   mapping table, a default mapper will always be able to encapsulate
   the packet with a valid destination Taddr.  All packets encapsulated



Jen, et al.               Expires May 21, 2008                 [Page 10]


Internet-Draft               Transit Mapping               November 2007


   by a default mapper MUST contain the default mapper's Taddr as the
   source address.

   In addition to forwarding the packet to an appropriate TR (ETR1, in
   this case), M1 also treats the incoming packet as an implicit request
   from ITR1 for mapping information.  M1 responds to ITR1 with a Cache
   Add Message (see Section 11.2) containing a MapRec that maps B to
   ETR1.  This allows ITR1 to add this MapRec to its cache so that ITR1
   can tunnel further packets destined for B directly to ETR1.  The
   MapRec also has an associated time to live (TTL) that is set by M1.
   The TTL ensures that ITR1 will occasionally re-request this mapping
   information from M1.  At this time, if the mapping information has
   changed in any way since ITR1's prior request, M1 can respond with an
   updated MapRec.  Without this TTL, ITR1's cached information may
   become stale over time.


6.  Multihoming Support

   In the example above, the observant reader may have noted that B is
   multihomed.  That is, B can be reached through both ETR1 and ETR2.
   Multihoming provides B with both enhanced reliability in case of a
   connectivity failure and the flexibility to distribute incoming
   traffic across different tunnel endpoints.

   In accordance with our design goals, all of the logic for selecting a
   tunnel endpoint for a multihomed DN is contained within default
   mappers.  Default mappers store full MapSets containing the addresses
   of all ETRs for a given Daddr prefix, while TRs only store a single
   MapRec per Daddr prefix.  When a TR requests a MapRec for a
   multihomed DN, it is up to the default mapper to decide which one to
   return.

   Many DNs will want to have some control over which tunnel endpoint is
   used for incoming traffic.  Therefore, each MapRec in a MapSet has an
   associated priority value, which is made available to all default
   mappers throughout the transit space (see Section 7).  The number is
   to be treated like a ranking -- an ETR with a lower priority value is
   more preferable.

   At the same time, a sending TN may have its own preference regarding
   which of the ETRs to use for a given Daddr prefix.  Default mappers
   can use a combination of locally configured routing policies and
   MapSet priority information to choose from the set of valid ETR
   addresses.  Going back to Figure 1, assume that ITR1 does not have a
   MapRec for B in its cache.  When A addresses a packet to B, ITR1 will
   send the packet to M1.  If M1 has no preference between ETR1 and
   ETR2, it will examine the priority values in B's MapSet and select



Jen, et al.               Expires May 21, 2008                 [Page 11]


Internet-Draft               Transit Mapping               November 2007


   ETR1, B's most preferred ETR.  M1 forwards the packet to ETR1 and
   returns the corresponding MapRec to ITR1, which stores the MapRec in
   its cache.

   In the case of a priority value tie, the default mapper can break the
   tie by picking the ETR to which it has the shortest path.  If some
   ETRs are tied in terms of both lowest priority value and shortest
   path, the default mapper is free to break the tie arbitrarily.  The
   address of the selected ETR will be used as the destination address
   when encapsulating the packet.

   We envision that DNs will be able to manipulate their incoming
   traffic load by setting appropriate priority values in their MapSet.
   A DN who wants load balancing can assign the same priority value to
   all of his MapRecs.  A DN who wants to have one TN as a primary
   provider and another only as a backup can simply assign a higher
   priority value to his ETR at his backup provider.

6.1.  Using Alternate ETRs During Failures

   When a network failure has rendered an ETR unable to perform its
   duties, an affected multihomed user will expect his traffic to be
   temporarily routed through an alternate ETR.  There are three general
   types of failures that would require an ITR to use an alternate ETR:
   (1) an ITR may have discovered via BGP that it can no longer reach
   the Taddr prefix containing the address of the intended ETR, (2) the
   ETR itself may go down or lose connectivity, and (3) the link between
   a DN and its TR may be down, a new problem introduced by the
   tunneling architecture.  This section will explain how each type of
   failure is handled, using Figure 1 as a reference.  We assume that,
   at the time of failure, all TNs are using ETR1 to reach B.

   To assist in handling these failures, default mappers store a time
   before retry (TBR) for each MapRec.  Normally, the TBR for each
   MapRec is set to zero, indicating that it is usable.  Any MapRec with
   a non-zero TBR value is considered invalid.  We will refer to the
   action of setting a MapRec's TBR to a non-zero value as "invalidating
   a MapRec."  MapRecs that map to unroutable destinations are also
   considered invalid.  So long as a MapRec is invalid, default mappers
   will not use this entry as a destination address or include it in
   mapping responses.  The role of the TBR in handling failures will
   become clear in the explanations below.

6.1.1.  Handling Taddr Prefix Failures

   For failures of type (1), ITR1 has no route to ETR1.  Assume a host
   in network A attempts to send a packet to a host in network B. If
   ITR1 does not have a MapRec for B in its cache, it will forward the



Jen, et al.               Expires May 21, 2008                 [Page 12]


Internet-Draft               Transit Mapping               November 2007


   packet to M1 (see Section 5.1).  If ITR1 does have a MapRec for B in
   its cache, it will see that it has no route to ETR1, and forward the
   packet to its default mapper, M1.  M1 will also see that it has no
   route to ETR1, and thus select the next-most-preferred ETR for B,
   ETR2.  If it has a route to ETR2, it sends the packet with ETR2 as
   the destination Taddr and replies to ITR1 with the corresponding
   MapRec.  M1 can assign a relatively short TTL to the MapRec in its
   response.  Once this TTL expires, ITR1 will forward the next packet
   for B to the default mapper, which will respond with the most-
   preferred MapRec that is routable at that time.  This allows ITRs to
   quickly revert to using ETR1 once it becomes reachable again.

6.1.2.  Handling Single-ETR Failures

   In the second case, the Taddr prefix containing ETR1 is still
   routable from ITR1, but ETR1 has failed or is otherwise unreachable.
   Since this failure is confined to TN2, all routers in TN2 should be
   able to detect that ETR1 is unreachable via TN2's IGP.  In order to
   prepare for this situation, M2 announces a very high-cost link to all
   of the TRs it serves (in this case, ETR1 and ETR2) via IGP.  When
   ETR1 fails, since the normal IGP path to ETR1 will no longer be
   valid, all packets addressed to ETR1 will be forwarded to M2 instead.

   When M2 receives a data packet addressed to one of the TRs it serves
   (ETR1, in this case), it will assume the TR is unreachable,
   invalidate the corresponding MapRec, and broadcast a Cache Drop
   Message (see Section 11.3) to all of the TRs it serves.  Using the
   default mapper address in the APT header (see Section 11), it will
   also reply to the sender's default mapper (in this case, M1) with an
   ETR Unreachable Message (see Section 11.4).  M1 can then also
   invalidate the corresponding MapRec and broadcast a Cache Drop
   Message to its TRs.

   In order to minimize packet losses, M2 should not simply drop data
   packets addressed to ETR1.  Instead, M2 should attempt to reroute the
   packet to an alternate ETR, even if that ETR is in a different TN.
   It can do this by simply decapsulating the packet, looking up the
   MapSet for the Daddr prefix, and re-encapsulating the packet with a
   valid ETR as the destination Taddr according to the normal ETR-
   selection guidelines.

6.1.3.  Handling TR-to-DN Link Failures

   The final case involves a failure of the link connecting ETR1 to B.
   When ETR1 discovers it cannot reach B, it will send packets destined
   for B to its default mapper, M2, setting the APT message type to ETR-
   to-DN Link Failure (see Section 11.6) when encapsulating the packet.
   M2 will see that the packet's APT message type is ETR-to-DN Link



Jen, et al.               Expires May 21, 2008                 [Page 13]


Internet-Draft               Transit Mapping               November 2007


   Failure, and handle this situation in the same way as situation 2
   (see Section 6.1.2), except that the message it sends to M1 will be a
   DN Unreachable Message (see Section 11.5) instead of an ETR
   Unreachable Message.

   DN Unreachable and ETR Unreachable Messages are handled the same way.
   However, we have kept them as separate notification types in order to
   allow for divergent behavior in the future.


7.  Exchanging MapSets Between TNs

   To avoid introducing latency or packet loss when encapsulating
   packets, the default mappers must have all MapSets available locally.
   In order for default mappers to store a full, global mapping table,
   there must be some way for them to receive MapSets from other TNs.
   However, only default mappers should receive MapSets.  In this
   section, we propose a method for MapSet dissemination.  The APT
   design in general does not depend on this particular method; it only
   requires that SOME method exists for secure, up-to-date, lightweight
   MapSet dissemination.

7.1.  MapSet Dissemination via DM-BGP

   MapSet dissemination can be accomplished using a separate BGP
   instance that is only run between default mappers.  We refer to this
   new BGP instance as 'DM-BGP'.  As a protocol, DM-BGP is identical to
   BGP, but it serves a different purpose.  DM-BGP is used to
   disseminate MapSets, not as a reachability protocol.  It is simply
   run on a different TCP port and is only used by default mappers so as
   not to affect the RIB-In of other nodes.

   When a default mapper wishes to distribute his TN's mapping
   information to other default mappers, he sends out a DM-BGP update
   with the mapping information included as an optional, transitive BGP
   attribute with a new type.  The NLRI included MUST be a prefix that
   uniquely identifies the source TN.  When other default mappers
   receive DM-BGP updates, they store this information in their MapSet
   tables, replacing any existing MapSets.  BGP policy knobs can still
   be tuned as desired by each TN.  Upon receiving mapping updates, TNs
   can choose whether to forward the update to each of their peers, so
   long as their actions are in accordance with the BGP protocol.

   A default mapper may receive the same mapping update more than once.
   This will occur when there is more than one DM-BGP path from the
   source default mapper's TN to the receiving default mapper's TN.
   Along with the mapping information, the new attribute should include
   a sequence number to allow receivers to detect duplicate mapping



Jen, et al.               Expires May 21, 2008                 [Page 14]


Internet-Draft               Transit Mapping               November 2007


   updates.  Default mappers MUST regularly announce MapSets to the rest
   of the network for all of the DNs to which their TN connects.  As a
   precaution, however, these DM-BGP updates should be infrequent and
   rate-limited.

7.2.  Regular MapSet Refresh

   Regardless of the protocol used to disseminate MapSets, MapSets are
   not transient data.  In order for default mappers to prevent their
   MapSet tables from strictly increasing in size without bound, they
   must be able to remove stale MapSets.  For this reason, each MapSet
   entry MUST contain a time to live (TTL).  A default mapper MAY remove
   a MapSet from its table at any time after this TTL has expired.  In
   order to avoid premature removal from the global mapping table,
   default mappers MUST (1) regularly re-announce all MapSets for DNs
   they connect to and (2) set the TTL for each MapSet to no less than
   three times their refresh interval.


8.  Security and Reliability

   Using DM-BGP to distribute mapping announcements guarantees that they
   are only accepted from manually configured DM-BGP peers.  This
   ensures that mapping updates are no less secure than routing updates
   are today.  However, mapping updates have the potential to cause far
   more damage; with no security measures in place, a mapping update
   could direct ALL traffic for an entire Daddr prefix to an arbitrary
   Taddr.  APT strives to prevent attacks and misconfigurations from
   having adverse effects outside of the TN in which they occur.
   Therefore, mapping updates will require some level of security.

8.1.  Authenticating the Originator of Mapping Updates

   Our first step towards authenticating mapping updates is to
   authenticate an update's originator.  For this reason, each default
   mapper MUST cryptographically sign the mapping data in any update it
   originates.  All default mappers within a single TN SHOULD use the
   same key pair, but default mappers in different TNs MUST use
   different key pairs.  When a default mapper receives a mapping
   update, it MUST verify this signature before processing or forwarding
   the update.  Default mappers MUST NOT remove or alter this signature
   when forwarding the update.

   Clearly, this scheme can only work if there is a secure way to
   distribute all public keys to all default mappers.  This should be a
   relatively straightforward problem to solve.  We describe one simple,
   appropriate method for secure key distribution in a network of
   manually configured peers in a separate document (forthcoming).



Jen, et al.               Expires May 21, 2008                 [Page 15]


Internet-Draft               Transit Mapping               November 2007


8.2.  Detecting MapSet Misconfigurations

   Though the scheme outlined in Section 8.1 allows for secure
   authentication of the originator of a mapping update, it does not
   guarantee the correctness of the data.  Since DM-BGP peerings are
   manually configured and therefore form a relatively closed network,
   misconfigurations are far more likely than attacks to be the cause of
   inaccurate mapping data.

   The types of misconfigurations that could potentially be harmful are
   those that result in one TN accidentally interfering with the MapSet
   for a DN that it is not connected to.  This can happen whenever a
   provider accidentally announces a MapSet for the wrong Daddr prefix.
   These types of accidental conflicts fall into three categories: (1) a
   TN announces a MapSet for the wrong Daddr prefix when that prefix
   already has a MapSet in the global mapping table, (2) a TN announces
   a MapSet for a Daddr prefix that subsumes a longer Daddr prefix that
   already has a MapSet, and (3) a TN announces a MapSet for a Daddr
   prefix that is a subset of a shorter Daddr prefix that already has a
   MapSet.

   The first category of conflicts is the only one that we intend to
   actively prevent.  Clearly, the DN that owns a particular Daddr
   prefix should be the ultimate authority for his mapping information.
   However, DNs do not announce their MapSet to the network directly,
   but rather through the TNs they connect to.  In order to ensure a
   mapping update for a Daddr prefix is approved by its rightful owner,
   we must first include some sort of prefix owner identification in
   each MapSet.  To this end, we introduce a DN key field into each
   mapping.  This field SHOULD contain a cryptographically valid public
   key, but it is not currently used as such.  When a default mapper
   receives a new MapSet that would replace an existing one, it only
   needs to ensure that the DN key has not changed.  (This scheme is
   similar in spirit to the way that OpenSSH uses its 'known_hosts'
   file.)  Note that DN keys are different from the keys used by default
   mappers to authenticate DM-BGP updates.

   For the other two categories, it is less clear that such an
   announcement is the result of a misconfiguration.  It is possible,
   for example, that the owner of a /16 Daddr prefix has resold some of
   the /24 prefixes it contains to other DNs.  In such a case, only the
   administrators will know if the announcement is valid.  It is for
   this reason that (in the spirit of PHAS [PHAS]) we do not attempt to
   prevent such changes, but only detect and notify interested parties.
   Since legitimate MapSet changes are infrequent, notifying interested
   parties of MapSet changes via e-mail is a perfectly viable option.
   These notifications could also prove useful in debugging the mapping
   service, or a particular TN's configuration.



Jen, et al.               Expires May 21, 2008                 [Page 16]


Internet-Draft               Transit Mapping               November 2007


8.3.  APT Control Messages

   APT never requires Cache Drop and Cache Add Messages to traverse AS
   boundaries.  Any such message that does traverse an AS boundary must
   be an error or an attack.  Therefore, TRs MUST ignore Cache Drop and
   Cache Add messages with a source Taddr outside of their TN.  Since
   ISPs already generally drop packets from an external source when they
   contain a local source address, this simple policy should be
   sufficient to prevent TR cache poisoning, whether accidental or
   intentional.

   Since any APT control message that may need to travel between ASes
   can also affect traffic flow, such control messages MUST be
   cryptographically signed.  This currently includes ETR Unreachable
   Messages (see Section 11.4) and DN Unreachable Messages (see
   Section 11.5).  Recall that the infrastructure required to generate
   and verify cryptographic signatures is already required for mapping
   update dissemination (see Section 8.1).  When a default mapper
   receives such a control message, it MAY choose to verify this
   signature.


9.  Scalability through Recursion

   It is conceivable that the global mapping table could eventually grow
   large enough that it would no longer be possible to store it in a
   single default mapper.  Theoretically, the global mapping table could
   grow to contain a separate MapSet for every Daddr prefix.  In the
   case of IPv6 prefixes, the total number of MapSets would be on the
   order of 10^18, far more than we can expect to be able to store on a
   single device.  If the global mapping table were to approach such
   gargantuan proportions, APT can simply be applied recursively.

   In the recursive case, the terms "transit" and "delivery" are only
   meaningful relative to a particular depth of recursion, or number of
   times the packet has been encapsulated.  We will refer to the non-
   recursive deployment of APT as the global level (G).  What we have up
   until now referred to as delivery space and transit space are in fact
   G delivery space and G transit space.

   At one level of recursion, G transit space is split into two address
   spaces: recursion depth 1 (R1) delivery space and R1 transit space.
   R1 delivery space is just another name for G transit space.  Which
   name is used will depend on the context.  R1 transit space can be
   further split into two R2 spaces, and so on.  Using this terminology,
   all protocols and concepts in APT can be understood to apply
   generally at any level of recursion.




Jen, et al.               Expires May 21, 2008                 [Page 17]


Internet-Draft               Transit Mapping               November 2007


   This figure shows the layout of a packet while being tunneled at an
   APT recursion depth of two.

   ________________________________________
   |           R2 transit header          |
   |--------------------------------------|
   | R2 delivery a.k.a. R1 transit header |
   |--------------------------------------|
   | R1 delivery a.k.a. G transit header  |
   |--------------------------------------|
   |           G delivery header          |
   |--------------------------------------|
   |                                      |
   |               payload                |
   |                                      |
   |______________________________________|

                                 Figure 2


10.  Mapping Announcements

   Each mapping announcement has the following fields:

   o  Address Type - This field specifies the type of Daddrs used in the
      announcement.  All Daddr prefixes in a single mapping announcement
      MUST be of the same address type.  Currently, this is expected to
      be either IPv4 or IPv6, but other address types are also allowed.

   o  Total Length - This field specifies the total number of bytes used
      by all MapSets in the announcement.  Each mapping announcement can
      contain MapSets for multiple prefixes, each with multiple MapRecs.

   o  Sequence Number - This field reflects the freshness of an update.
      Default mappers can avoid processing updates with old sequence
      numbers.

   o  Signature - The message should be cryptographically signed using
      the private key of the sending default mapper.

   These fields are followed by one or more MapSets.  Each MapSet in the
   announcement is described by the following fields:

   o  Daddr Prefix - This is the Daddr prefix for the MapSet.

   o  Time To Live (TTL) - This is the amount of time in hours that this
      MapSet should persist in default mappers before being considered
      obsolete and erased.  This value MUST be set to at least three



Jen, et al.               Expires May 21, 2008                 [Page 18]


Internet-Draft               Transit Mapping               November 2007


      times the sender's regular refresh interval.  The TTL is specified
      in hours to prevent misconfigurations from causing excessive
      mapping updates.

   o  ETR Count - This is the total number of ETRs that the
      corresponding Daddr prefix maps to.

   o  Each ETR in a MapSet is described by the following fields:

      *  Taddr - The address of this ETR.

      *  Priority - Priorities are arbitrary integers that only have
         meaning in reference to each other.  Taddrs with lower priority
         values are considered more preferable.

      *  DN Public Key - This public key SHOULD uniquely identify the DN
         that owns this MapSet.  It can be used to help identify
         configuration errors, and possibly for authoritative,
         cryptographic authentication of MapSet data in the future.


11.  APT Header and Control Messages

   Delivery space packets are encapsulated with a UDP header by an ITR.
   The UDP header should specify a well-known port reserved for APT, and
   the UDP payload MUST begin with an APT header.  For regular data, a
   layer-3 header immediately follows the APT header.  For other message
   types, we describe the fields that follow below.

11.1.  APT Header Fields

   The APT header contains the following fields:

   o  Version - The version of APT that should be used to interpret the
      header information.

   o  Tag - Extra field reserved for future use.

   o  Type - Determines the type of message being sent.  Appropriate
      values are as follows:

         0: Regular Data

         1: Cache Add (Section 11.2)

         2: Cache Drop (Section 11.3)





Jen, et al.               Expires May 21, 2008                 [Page 19]


Internet-Draft               Transit Mapping               November 2007


         3: ETR Unreachable (Section 11.4)

         4: DN Unreachable (Section 11.5)

         5: ETR-to-DN link failure (Section 11.6)

   o  Default Mapper Taddr - The address of the default mapper for the
      ITR that generated this header.  This is the Taddr where any
      failure notifications from the destination TN will be sent.  If
      this header was generated by a default mapper, this field SHOULD
      contain the same address as the source address in the
      encapsulating IP header.

11.2.  Cache Add Messages

   Cache Add Messages are only sent by default mappers to TRs within
   their own TNs, most notably in response to data packets.  When a TR
   receives a Cache Add Message, it simply adds the enclosed MapRec to
   its cache, replacing any existing cache entry.

   o  Daddr Prefix - This is the Daddr prefix portion of the MapRec to
      be added to the receiving TR's cache.

   o  ETR Taddr - This is the Taddr portion of the MapRec to be added to
      the receiving TR's cache.  It is the address of the ETR that can
      reach the Daddr prefix in the previous field.

   o  TTL - The TTL specifies the amount of time in seconds before the
      added cache entry should expire.  Expired cache entries should be
      deleted from the TR's cache.

11.3.  Cache Drop Messages

   Cache Drop Messages are only sent by default mappers to TRs within
   their own TNs.  When a TR receives a Cache Drop Message, it simply
   removes the cache entry corresponding to the enclosed Daddr prefix
   from its cache, if such an entry exists.

   o  Daddr Prefix - This is the Daddr prefix of the MapRec to be
      dropped.

11.4.  ETR Unreachable Messages

   ETR Unreachable Messages are sent by default mappers to other default
   mappers to notify them of failures.

   o  Transit Address - This is the Taddr of the ETR that cannot be
      reached.



Jen, et al.               Expires May 21, 2008                 [Page 20]


Internet-Draft               Transit Mapping               November 2007


   o  Signature - The message should be cryptographically signed using
      the private key of the sending default mapper.

11.5.  DN Unreachable Messages

   DN Unreachable Messages are sent by default mappers to other default
   mappers to notify them of failures.

   o  Daddr Prefix - This is the Daddr prefix of the DN that cannot be
      reached.

   o  Signature - The message should be cryptographically signed using
      the private key of the sending default mapper.

11.6.  The ETR-to-DN Link Failure Message Type

   This message type is used by an ETR for two purposes: (1) to indicate
   to its default mapper that its direct link to the DN for the enclosed
   data packet is down, and (2) to preserve that data packet so that the
   ETR's default mapper might deliver it to the DN by way of a different
   ETR.


12.  Incremental Deployment

   Incremental deployment methods and incentives for APT will be
   discussed in a separate draft (forthcoming).


13.  IANA Considerations

   This memo includes no request to IANA.


14.  Security Considerations

   Security considerations for APT are discussed in Section 8.


15.  References

15.1.  Normative References

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119, March 1997.






Jen, et al.               Expires May 21, 2008                 [Page 21]


Internet-Draft               Transit Mapping               November 2007


15.2.  Informative References

   [AddrAlloc]
              Meng, X., Xu, Z., Zhang, B., Huston, G., Lu, S., and L.
              Zhang, "IPv4 Address Allocation and BGP Routing Table
              Evolution", ACM SIGCOMM Computer Communication Review
              (CCR) special issue on Internet Vital Statistics, Volume
              35, Issue 1, p71-80.

   [CRIO]     Zhang, X., Francis, P., Wang, J., and K. Yoshida, "Scaling
              IP Routing with the Core Router-Integrated Overlay", Proc.
              International Conference on Network Protocols , 11 2005.

   [EFIT]     Massey, D., Wang, L., Zhang, B., and L. Zhang, "A Scalable
              Routing System Design for Future Internet", SIGCOMM IPv6
              Workshop , 8 2007.

   [EFIT-ID]  Massey, D., Wang, L., Zhang, B., and L. Zhang, "A Proposal
              for Scalable Internet Routing and Addressing", Internet Dr
              aft, http://www.ietf.org/internet-drafts/
              draft-wang-ietf-efit-00.txt, 2 2007.

   [LISP]     Farinacci, D., Fuller, V., Oran, D., and D. Meyer,
              "Locator/ID Separation Protocol (LISP)", Internet Draft, h
              ttp://www.ietf.org/internet-drafts/
              draft-farinacci-lisp-05.txt, 2007.

   [PHAS]     Lad, M., Massey, D., Pei, D., Wu, Y., Zhang, B., and L.
              Zhang, "PHAS: A Prefix Hijack Alert System", USENIX
              Security .

   [RAWS]     Meyer, D., Zhang, L., and K. Fall, "Report from the IAB
              Workshop on Routing and Addressing", Internet Draft, http:
              //www.ietf.org/internet-drafts/
              draft-iab-raws-report-02.txt, 2007.

   [SixOne]   Vogt, C., "Six/One: A Solution for Routing and Addressing
              in IPv6", Internet Draft, http://www.ietf.org/
              internet-drafts/draft-vogt-rrg-six-one-00.txt.


Appendix A.  Open Issues

   MapSets contain a priority field for each ETR, but this does not
   allow for uneven distribution of traffic across ETRs with the same
   priority, e.g. a 75/25 split.  To provide a mechanism for DNs to
   request such traffic distributions, we should also include a weight
   field for each ETR.



Jen, et al.               Expires May 21, 2008                 [Page 22]


Internet-Draft               Transit Mapping               November 2007


   If a TN sends out inaccurate mapping announcements, other TNs can
   identify and respond to the misbehaving source TN.  However, there
   are no preventative security measures in place.  Is detection and
   response enough of a security measure?

   We are considering automating customer-DN-to-provider-TN mapping
   updates.  Under our current design, whenever a DN needs to update its
   mapping information (it may add, subtract, or change providers, or
   change its priority values), the DN must contact its provider TNs
   offline and request that they announce the updated mapping
   information.  It is then up to the provider TNs to update the mapping
   information.  As we have seen with DNS updates, human involvement
   introduces the possibility of human error and delay.  We hope to
   provide DNs with an automated way to manage their mapping
   information.

   Is it too much to ask ISPs to change all of their PE routers into
   TRs?  We suspect that TR implementation should involve only software
   changes.  Existing router hardware can do everything required by a
   TR.  Thus, we suspect the cost should be reasonable.


Authors' Addresses

   Dan Jen

   Email: jenster@cs.ucla.edu


   Michael Meisel

   Email: meisel@cs.ucla.edu


   Dan Massey

   Email: massey@cs.colostate.edu


   Lan Wang

   Email: lanwang@memphis.edu


   Beichuan Zhang

   Email: bzhang@cs.arizona.edu




Jen, et al.               Expires May 21, 2008                 [Page 23]


Internet-Draft               Transit Mapping               November 2007


   Lixia Zhang

   Email: lixia@cs.ucla.edu
















































Jen, et al.               Expires May 21, 2008                 [Page 24]


Internet-Draft               Transit Mapping               November 2007


Full Copyright Statement

   Copyright (C) The IETF Trust (2007).

   This document is subject to the rights, licenses and restrictions
   contained in BCP 78, and except as set forth therein, the authors
   retain all their rights.

   This document and the information contained herein are provided on an
   "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
   OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND
   THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS
   OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF
   THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
   WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.


Intellectual Property

   The IETF takes no position regarding the validity or scope of any
   Intellectual Property Rights or other rights that might be claimed to
   pertain to the implementation or use of the technology described in
   this document or the extent to which any license under such rights
   might or might not be available; nor does it represent that it has
   made any independent effort to identify any such rights.  Information
   on the procedures with respect to rights in RFC documents can be
   found in BCP 78 and BCP 79.

   Copies of IPR disclosures made to the IETF Secretariat and any
   assurances of licenses to be made available, or the result of an
   attempt made to obtain a general license or permission for the use of
   such proprietary rights by implementers or users of this
   specification can be obtained from the IETF on-line IPR repository at
   http://www.ietf.org/ipr.

   The IETF invites any interested party to bring to its attention any
   copyrights, patents or patent applications, or other proprietary
   rights that may cover technology that may be required to implement
   this standard.  Please address the information to the IETF at
   ietf-ipr@ietf.org.


Acknowledgment

   Funding for the RFC Editor function is provided by the IETF
   Administrative Support Activity (IASA).





Jen, et al.               Expires May 21, 2008                 [Page 25]