draft-odell-8+8-00

Network Working Group                                        Mike O'Dell
Internet-Draft                                        UUNET Technologies
                                                  1996/10/22 05:58:54GMT
                                                    Expire in six months

          8+8 - An Alternate Addressing Architecture for IPv6

                        <draft-odell-8+8-00.txt>

1. Status of this Memo

   This document is an Internet-Draft.  Internet-Drafts are working
   documents of the Internet Engineering Task Force (IETF), its areas,
   and its working groups. Note that other groups may also distribute
   working documents as Internet-Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as ``work in progress.''

   To learn the current status of any Internet-Draft, please check the
   1id-abstracts.txt listing contained in the Internet-Drafts Shadow
   Directories on ftp.is.co.za (Africa) , nic.nordu.net (Europe),
   munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast ), or
   ftp.isi.edu (US West Coast).

2. Abstract

   This document presents an alternative addressing architecture for
   IPv6 which controls global routing growth with very aggressive
   topological aggregation. It also includes support for scalable
   multihoming as a distinguished service while freeing sites and
   service resellers from the tyranny of CIDR-based aggregation by
   providing transparent rehoming of both.

3. Introduction

   IP version 6 represents a significant advancement in the technology
   of the Internet.  It provides large addresses, many sorely-needed
   functional capabilities, and was intended to be a platform for the
   further evolution of the Global Internet.  Unfortunately, when IPv6
   was created, Route Scaling, which has become the most significant
   problem for the continued growth of the Internet, was not widely
   understood to be the forcing function we now know it to be.  Because
   of that, the current IPv6 addressing proposal fails to provide an
   operationally-scalable scheme for aggressive topological aggregation
   and the continued scaling of the routing architecture.




O'Dell v2.21                                                    [Page 1]


Internet-Draft                8+8 for IPv6        1996/10/22 05:58:54GMT


   The current IPv6 addressing proposals continue to rely almost
   entirely upon CIDR-style aggregation for route growth control. Unlike
   IPv4, in IPv6 this mechanism is coupled with support for easier
   network renumbering which may make so-called "provider-based
   addressing" a bit more palatable.

   In general, the current IPv6 addressing model is inadequate for
   several reasons.  CIDR-style aggregation breaks down in the face of
   the accelerating growth of multi-homed sites (leaf sites or regional
   networks).  Renumbering to accomplish simple topological rehoming
   (e.g., changing ISPs) is a problem whose magnitude will only grow
   over time. It will always be difficult to explain this to customers,
   increasingly so with decreasing customer sophistication.  While the
   large IPv6 addresses provide for a huge increase in the number of end
   systems which can be accommodated, it also portends a huge increase
   in the number of routes required to reach them. Even if CIDR
   aggregation continues at current levels, this presents a serious
   problem because of the scaling behavior of the global route
   computations.

   This document presents a new proposal for using the 16 byte IPv6
   address which mitigates the route scaling problem and with it a
   number of collateral issues.  This model provides for aggressive
   topological aggregation while controlling the complexity of flat-
   routed regions.  It uses and supports the dynamic address assignment
   machinery in IPv6, but makes the exact role of that machinery a local
   decision with understandable costs and benefits rather than a
   mandatory mechanism for simple rehoming situations.

   The model also identifies the special work done by the global
   Internet infrastructure to support multihomed sites, isolating it
   into a specific mechanism which is then traceable to and incurred by
   only those sites wishing to use this capability.  This then makes it
   possible for sites to make informed cost-benefit decisions about
   multihoming.

4. Central Concepts

   The addressing model proposed here is called "8+8" to distinguish it
   from the existing proposals which are called "Flat-16" in this
   document. The first central concept in 8+8 is simple:

        The 16 byte IPv6 address is split into two 8-byte objects stored
        in the existing 16-byte container.

   The lower 8 bytes (least significant) form the "End System
   Designator," or ESD.  The upper 8 bytes (most significant) are called
   the "Routing Goop", or RG.  The ESD designates a computer system and



O'Dell v2.21                                                    [Page 2]


Internet-Draft                8+8 for IPv6        1996/10/22 05:58:54GMT


   the RG encodes information about its attachment to the global
   Internet topology.

   As with other schemes distinguishing location from identity, the 8+8
   model requires modifying the upper level protocols to consider only
   the ESD when performing pseudo-header operations meant to identify
   the end system as opposed to its location in the topology. A few
   important examples: the TCP checksum pseudo-header would use only the
   ESDs instead of the Flat-16 addresses;  TCP associations would be
   identified by ESD/Port instead of Flat-16/Port; IPSEC Authentication
   and ESP header calculations would only consider the ESD and not the
   RG of the address. Together these allow session-scale state like TCP
   connections to survive global topology changes without special
   considerations in the transport protocol.

   Note: this proposal does not effect the IPv6 multicast, loopback, or
   link-local address formats or usage. It is probably necessary to
   create a new version of the "IPv6 site-local prefix" which uses an
   ESD as the lower 8 bytes and would be used for within-site sessions
   (in the exiting IPv6 sense) and for originating external traffic.


   The second central concept is:

        Formalize the distinction between "Public Topology" and "Private
        Topology".

   "Public Topology" is structure which must be understood by a number
   other organizations, especially and specifically transit networks,
   for constructing global Internet connectivity.  "Private Topology" is
   structure which is of no particular interest outside the containing
   organization.  In particular, general transit service is provided by
   networks exposed in the Public Topology; networks composed of only
   Private Topology cannot provide general transit service to the Global
   Internet.

   In the current IPv4 Internet, the distinction between Public and
   Private Topology exists as a side-effect but it is not used to any
   significant advantage beyond that which arises naturally from CIDR-
   style aggregation.  A current example of private topology is the
   subnet structure used by the topology within a site as applied to the
   CIDR block for the entire site.  No one else outside the site
   particularly cares about the internal structure of the site so there
   is no real need to carry any routing information about it other than
   the CIDR block describing it as a whole.

   The 8+8 model elevates this observation to a major architectural
   component providing an explicit notion of a "Site".  A "Site" is the



O'Dell v2.21                                                    [Page 3]


Internet-Draft                8+8 for IPv6        1996/10/22 05:58:54GMT


   simplest unit attachment to the Global Internet and is also the unit
   of Private Topology.  Within a Site, the ESD of a system is
   sufficient for reaching it across the Private Topology as well as
   globally identifying the system outside the confines of the Site.
   This site-internal reachability can be accomplished by either flat-
   routing on the ESD with a site (whether this is called "LAN
   Switching" or something else is irrelevant), or by using a structured
   ESD within the site.  Both of these solutions are supported by the
   structure of the ESD and each has identifiable and understandable
   costs and benefits.  These will be discussed at length later.

   The "Public Topology" is the transit infrastructure which carries
   traffic from one Site to another.  It is composed of the various
   carrier, reseller, and regional networks which we know today.  The
   Routing Goop portion of an 8+8 address is a locator which encodes
   information about the way a Site (containing Private Topology) is
   connected to the Public Topology of the transit networks.  As will be
   explained later, Routing Goop compactly encodes topology information
   with very high degrees of aggregation while still affording the
   opportunity to carry local detail for optimizing regional routes
   without sacrificing global aggregation.  Again, this will be
   discussed later.

   The third central concept is:

        Dynamic insertion of Routing Goop into source addresses by Site
        Boundary Routers when a packet leaves a Site and enters the
        Public Topology.

   This is one of the most radical parts of this proposal and was not
   included in earlier versions of this document, but discussions with
   various people convinced the author that it solves a sufficiently
   compelling number of problems with one simple mechanism that it was
   adopted.  It too will be discussed later.

5. The Structure of End System Designators - the ESD

   End System Designators designate every computer system in the 8+8
   Internet regardless of whether it is a host, router, or other network
   element.  While a given system can have more than one ESD, each ESD
   is globally unique.  This is critical for their utility to the
   upper-level protocols.  This uniqueness can be induced several ways
   as will be seen.

   An interesting question is whether an ESD identifies a system,
   possibly as in the XNS architecture, or an interface, as in the
   existing IPv4 and IPv6 architecture.  The answer is that an ESD
   designates an interface on a computer system and that interface can



O'Dell v2.21                                                    [Page 4]


Internet-Draft                8+8 for IPv6        1996/10/22 05:58:54GMT


   be either physical or virtual.

   When processing an 8+8 address, a computer system need only examine
   the ESD portion of the address to determine whether a packet is
   destined for that system.

   There are circumstances when it is quite useful to have "an address"
   for a computer system which is independent of any particular physical
   interface on that system. It has become commonplace in IPv4 practice
   to use a distinguished virtual interface to provide a system with
   such an "interface independent identity".  This provides the same
   architectural utility of XNS while still allowing the flexibility of
   the IPv4 "addressed interface" model. We chose to retain the
   successful IPv4/IPv6 model.

   NOTE: We specifically avoid being pedantic about exactly what
   constitutes an "interface" and a "computer system" as the
   malleability of those notions in IPv4 has proven manifestly useful in
   practice.

   To summarize the ESD uniqueness characteristics:

           (1) an ESD is globally unique
           (2) an ESD designates an "interface" on "a computer system"
           (3) an Interface may have more than one ESD
               (current IPv6 already requires implementations to support
               multiple addresses per interface)
           (4) an ESD may not necessarily designate a particular
               physical computer (Neighbor Discovery continues to provide
               a level of virtual address translation and great
               cleverness can be contained therein)

   The following describes the 8 bytes of the currently-defined ESD
   structures.

















O'Dell v2.21                                                    [Page 5]


Internet-Draft                8+8 for IPv6        1996/10/22 05:58:54GMT


    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | Private Topology Partition  |M| top 16 bits of Identity Token |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

    3               4                   5                   6
    2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |              bottom 32 bits of Identity Token                 |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

           Bits 0-14:  15-bit Private Topology Partition (PTP)
                       Provides for 32768 distinct partitions in the
                       Private Topology
           Bit  15:    Identity Token Mode Indicator
                       0 => 48-bit Identity Token
                       1 => Mode in upper bits of Identity Token
           Bits 16-63: 48-bit Identity Token


   Identity Tokens are formed as follows:

           Mode 0 ESDs: (Bit 15: 0)
                   Identity Token is 48-bits of IEEE MAC Address
                   Bits 16-63: IEEE 48-bit MAC Address

           Mode 1 ESDs: (Bits 15-18: 1001)
                   Identity Token is 45 bit "IETF NodeID" integer which are
                   assigned densely starting with 1.
                   Bits 19-63: IETF NodeID

           Mode 2 ESDs: (Bits 15-18: 1010)
                   Identity Token is 32 bit officially-assigned public IPv4
                   address (i.e., NOT an RFC-1918 private-use address),
                   zero padded
                   Bits 19-31: must be zero
                   Bits 32-63: valid IPv4 Address

           Mode 3 through Mode 7 ESDs (Bits 15-18: 1011 - 1111)
                   RESERVED

   For interfaces with IEEE-assigned 48-bit MAC addresses, a Mode-0 ESD
   is the most natural ESD for that particular interface.  On the other
   hand, a point-to-point interface with no other naturally-occurring
   MAC address could be labeled using a Mode-1 ESD.  Mode-2 ESDs provide
   for exploiting an already widely-deployed identifier space for easing
   the transition to 8+8.  Links with MAC addresses larger than 6 bytes



O'Dell v2.21                                                    [Page 6]


Internet-Draft                8+8 for IPv6        1996/10/22 05:58:54GMT


   can use Mode-2 ESDs and IPv6 dynamic configuration support with
   Neighbor Discovery.

   The IETF NodeID in the Mode-1 ESD is a 45-bit unsigned integer which
   starts at one (1) and is incremented, assigning the numbers as
   densely as possible.  There is no particular need to delegate on bit
   boundaries as powers-of-2 don't matter.  The numbers merely must be
   assigned uniquely to requesters.  We leave the actual assignment
   strategy and any potential delegation to the purview of the IANA.

   A few comments on "global uniqueness" are in order because in
   previous discussions, some people seem to think that unless
   "uniqueness" can be accomplished with absolute and complete
   mathematical perfection any scheme using the concept is unworkable.
   This complete and utter nonsense and is rendered patently false by
   multiple counter-example:

   IEEE MAC addresses are globally unique by nature of the delegation
   process where they are assigned to interfaces by the manufacturers.
   Both XNS and IPX rely on this uniqueness and it works very well in
   practice.  IETF-NodeID values will be globally unique by nature of
   the same kind of assignment mechanism.  IPv4 addresses must be
   globally unique for the Internet to function, and it does, mostly, by
   nature of exactly the same kind of assignment mechanism.

   Yes, it is true that sometimes accidents happen and an IPv4 prefix is
   misconfigured and it can be troublesome to track down.  But the
   problem is quite manageable.  Moreover, even with its extreme rarity,
   it is much more common than two Ethernet interfaces having the same
   MAC address.  The author believes that the IEEE MAC address
   assignment machinery coupled with the job the manufacturers do is the
   closest approximation to "global uniqueness" which any significant
   human enterprise can achieve, and it is more than adequate to the
   task at hand.  The IETF NodeIDs will be assigned at least as well as
   IPv4 addresses, and IPv4 seems to work well enough for the Global
   Internet to function with incredibly few problems arising from this
   particular source.


6. The Structure of a Site

   The 8+8 global routing architecture ultimately views a Site as a leaf
   of the topology and doesn't concern itself with the interior of this
   private topology.  However, the internal topology of a Site is
   extremely important to the management and operation of the Site so
   the ESD structure provides for a rich set of organizational
   alternatives with different cost-benefit tradeoffs.




O'Dell v2.21                                                    [Page 7]


Internet-Draft                8+8 for IPv6        1996/10/22 05:58:54GMT


   ESDs are globally unique but can also carry internal structure. The
   global uniqueness is provided by the Identity Token while the
   internal structure is carried in the Private Topology Partition.  The
   ESD structure provides for 32768 distinct Private Topology Partitions
   (PTPs) within a Site.  This is the equivalent of EVERY Site having
   been assigned a CIDR block of 128 Class-B addresses subnetted down to
   a Class-C.  The difference is that in an ESD, the subnet population
   is limited strictly by the link-level (LAN) technology and not by the
   253 host limit of the Class-C subnet.  This allows an extremely rich
   topology to be contained within a Site without it exporting
   complexity into the global routing structure which must then be
   concealed by tricks like CIDR aggregation.

   Of course, an organization is not constrained to being structured as
   a single Site.  The trade-off is that the inter-Site topology must
   then be part of the Public Topology. While the individual Sites
   retain considerable independence in topological structure and
   attachment to the Global Internet, they must be aware of changes
   between the constituent Sites and that rehoming of constituent Sites
   will potentially impact long-running sessions. That is the cost of
   exploiting the routing machinery available to the Public Topology.

   Given the flexibility available for organizing a Site, it is
   worthwhile to examine a few examples.  Note that none of these
   organizational approaches is exclusive.  A large Site might well mix
   these approaches to good effect and indeed the goal is to provide the
   designer of private Site topology with a broad spectrum of design
   alternatives.

   The simplest structure to imagine is a Site using all Mode-0 ESDs
   with all the systems connected in a single Private Topology Partition
   (i.e., all the ESDs carry the same PTP value which is assigned by the
   local network administration).  Given the sophistication of current
   LAN-switching technology, a Site like this could be both large and
   internally complex, but the complexity is absorbed into the LAN
   infrastructure and it appears to be only one partition from the 8+8
   Private Topology view.  This structure has one very significant
   advantage:  rehoming a system within this structure will not change
   the ESD and TCP sessions (for example) will survive arbitrary changes
   in the private topology.  This works, of course, because the single
   PTP is a virtual topology with the real topology hidden by the LAN
   Switching machinery.

   The second Site model is like the one just described, except it would
   have multiple PTPs with routing carrying traffic between the
   segments.  This is very close to the common IPv4 structure of a CIDR
   block being subnetted to assign a prefix to each PTP.  This approach
   has the advantage of familiarity, but it has the disadvantage that



O'Dell v2.21                                                    [Page 8]


Internet-Draft                8+8 for IPv6        1996/10/22 05:58:54GMT


   long-lived TCP connections don't necessarily survive arbitrary
   changes to the private topology.  The existing IPv6 dynamic address
   assignment machinery will serve to make such internal changes much
   less painful than with IPv4, however.  One point worth noting,
   though, is that even with multiple PTPs routed within a Site, a
   "Private Topology Partition" need not correspond to a "physical" LAN
   cable.  The PTP values could be used to label larger organizational
   structures like "Engineering" or "Finance".  This could reduce the
   likelihood that common internal topology changes break long-lived
   connections.

   The third Site model uses Mode-2 ESDs based on existing IPv4 address
   assignments.  In this case, all the IPv4 Identity Tokens could be
   placed in a single PTP and then routed internally on the IPv4 address
   in the lowest 4 bytes of the Identity Token.  This has the advantage
   of significant familiarity, but also can induce externally-visible
   changes if ESDs must be reassigned because of private topology
   requirements. Again, it must be emphasized that the IPv4 addresses
   used in a Mode-2 ESD must be an officially-registered, public-use
   IPv4 address and NOT an RFC-1918 private-use address.  Using an RFC-
   1918 private-use address violates the global uniqueness properties
   required of an ESD.

   In all of the multi-segment cases, a Mode-1 ESD could be used to
   designate any point-to-point link endpoint, the loopback addresses in
   routers, or any other IP-accessible network elements which don't
   naturally have IEEE MAC address for forming a Mode-0 ESD.  And in all
   of the cases, Mode-1 ESDs could be used universally, although it is
   more appropriate to use Mode-0 whenever possible; no sense wasting
   Identity Tokens when it isn't necessary.

   In all of the cases where the real topology is not completely
   virtualized by the LAN technology, there will be "Internal
   Renumbering" events caused by moving systems between infrastructure
   segments (PTPs).  This will have the effect of killing long-running
   off-Site connections unless provisions are made to allow the systems
   to carry the previous ESDs as synonyms for a while.  Given that most
   significant topology moves involve powering off the end system in
   question, this is hardly a hardship.  However, the powerful
   renumbering support already developed for IPv6 can make those other
   moves considerably less impacting.

   But most importantly, external rehoming of a Site to the global
   infrastructure can be made completely transparent in almost every
   case.

7. The Structure of Routing Goop




O'Dell v2.21                                                    [Page 9]


Internet-Draft                8+8 for IPv6        1996/10/22 05:58:54GMT


   Routing Goop, or "RG" is the upper 8 bytes of an 8+8 address.  This
   somewhat non-technical term was chosen because all the other
   alternatives seem to have various degrees of conceptual baggage which
   would be as much work to neutralize as the new notions are to explain
   in the first place.

   Fundamentally, RG is a Locator.  It encodes the topological
   connectivity of the Site containing the computer system identified by
   the ESD in the lower 8 bytes.  In the case of a singly-homed Site,
   rehoming to a new attachment to the Public Topology will change ONLY
   the RG in full 8+8 addresses for computer systems at that Site.  One
   example of such a rehoming would be a change of the Site's Internet
   Service Provider.  This change-over can be made essentially
   completely transparent to users both inside and outside the Site,
   although it does involve a practical limit on the transition duration
   relating to how long the departing ISP is willing to extend
   transitional courtesies.  During a changeover, though, all new
   connections will be initiated via the new ISP connection.

   This brings up the deep structure of the topology information carried
   in RG and how it is encoded.  More specifically, RG is a hierarchical
   locator which can be viewed as a rooted path-expression of flat-
   routed regions which are tangent.  Each element in the path-
   expression contains only enough detail to negotiate the flat-routed
   region.

   It has been observed before that the graph of the Global Internet is
   not obviously a hierarchy so how can this work?

   We start with the observation that every connected graph has at least
   one labeling which forms a spanning tree covering the nodes. The
   hierarchy is induced by a labeling function which partitions the
   global graph into regions and recursively into subregions.  This
   function is only globally visible at the top-level where an initial
   partitioning of the graph is used to form the first level of what
   will become the hierarchy.  Within each partition there is a local
   sub-partition function which assigns labels, and we proceed
   recursively. The nested recursions directly induce the hierarchy.

   This decomposition of the Global Internet produces a recursive graph
   where each level is composed of a set of subgraphs which are
   explicitly connected (i.e., explicitly routed between the subgraphs)
   while the structure within each subgraph is assumed to be flat-routed
   (at least as seen at that level).

   From an abstract viewpoint, a hierarchical partitioning can be
   induced with an arbitrary choice of labeling function (as long as the
   function produces the minimally-required partitioning). However, we



O'Dell v2.21                                                   [Page 10]


Internet-Draft                8+8 for IPv6        1996/10/22 05:58:54GMT


   desire the partitions to have several important properties which
   effects the choice of labeling function.

   The general goal is to produce a global labeling which represents the
   topology as compactly as possible, yet allows rich connectivity while
   bounding the complexity of the discrete regions which are flat-
   routed.

   The top level objects in the 8+8 graph hierarchy are called "Large
   Structures".  These are objects chosen for their ability to naturally
   represent significant topological aggregation of substructure (not
   geographical, political, or geometric).  The number of Large
   Structures is explicitly limited to bound the complexity at the top
   level of the aggregation graph.

   Within Large Structures, the (sub-)partition function is a trade-off
   between the flat-routing complexity within a region and minimizing
   total depth of the substructure.  This is driven by the internal
   topology of a Large Structure and the choices in different Large
   Structures will not necessarily be the same. This is why Routing Goop
   only has one hard bit boundary; Large Structures are free to
   internally subdivide as they chose. They are only required to
   encapsulate a significant portion of the Public Topology.

   One obvious candidate for Large Structures is large networks which
   already represent considerable aggregation based on existing CIDR
   deployment.  Another good candidate might be "Exchange Points".  The
   8+8 model can accommodate both of these simultaneously, allowing
   IPv6-style "Network-anchored Prefixes" and "Exchange-anchored
   Prefixes" like that proposed by some to coexist and be subsumed into
   a unified notion of "Aggregator-anchored Prefixes."  Of course, these
   aren't prefixes strictly in the IPv4 CIDR sense, but the left-
   anchored substrings of the Routing Goop are intuitively quite
   similar.

   Large Structures are assigned a Large Structure Identifier, known as
   an LSID.  The total number of LSIDs is intentionally limited as we
   assume the paths between Large Structures are only flat-routed.

   Two consenting Large Structures remain free to share a tangency below
   the top level and exchange routes so as to provide for improved
   routing between the two of them (formalizing cut-throughs in the
   natural hierarchy).  The goal is to provide for manageable complexity
   of the ultimate default-free zone (the top level of the global
   hierarchy) while allowing for controlled circumvention of the natural
   hierarchical paths.





O'Dell v2.21                                                   [Page 11]


Internet-Draft                8+8 for IPv6        1996/10/22 05:58:54GMT


   Bit-level structure of Routing Goop:

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | xxx | 13 Bits of LSID         |      Upper 16 bits of Goop    |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

    3               4                   5                   6
    2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |              Bottom 32 bits of Routing Goop                   |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

   NOTE: The Routing Goop structure above assumes that the 8+8  proposal
   is  designated by a 3-bit type of IPv6 address.  If an 8+8 address is
   identified by two upper bits, the LSID would expand to 14  bits.   If
   identified  by  one bit, the LSID would stay at 14 bits and the Upper
   16 bits of Goop would expand to 17 bits.

   Routing between two interior points of two Large Structures is always
   possible based solely on the LSID. This provides a "forwarding
   strategy of last resort" for a router running "default-free".  From
   one point of view, the LSID partitions the Global Internet into a set
   of regions such that an interior router only need carry a "per-LSID
   default" pointing at an appropriate boundary router which knows how
   to to handle traffic bound outside the containing Large Structure for
   a point in the other Large Structure.

   If two Large Structures share a tangency somewhere below the top
   level, then some interior routers of both Large Structures will share
   routes to exploit the tangency for optimizing paths.  How this cut-
   through information is distributed within the two Large Structures is
   not revealed elsewhere in the global topology. The exact "shape" of
   the optimization region is controlled by the decisions about which
   routes to advertise across the cut-through.  These decisions are made
   by the collaborators and the optimized region need not be symmetric
   with respect to the cut-through.  The size of the optimization area
   is controlled by how far routes learned via the cut-through are
   propagated within the sub-graphs tangent via the cut-through. Again,
   this is a matter of engineering choices made by the collaborators
   operating the cut-through.

   We note that while the LSID is intuitively similar to the Autonomous
   System Number currently used in IPv4 policy-based routing machinery,
   the LSID is quite distinct from the AS number and the two identifiers
   play very different roles.  AS Numbers will continue be used for
   policy routing information exchange and will remain distinct.



O'Dell v2.21                                                   [Page 12]


Internet-Draft                8+8 for IPv6        1996/10/22 05:58:54GMT


8. The "Flow" of Routing Goop

   It is intuitively useful to think about Routing Goop as "flowing
   downhill" through the hierarchy from the topmost Large Structures,
   through the intermediate levels of the Public Topology, and
   ultimately down to the Site.  As the RG propagates downward, the
   prefix extends to the right, just like in IPv4 CIDR, with each
   extension navigating the nested flat-routed subgraphs, eventually
   terminating at the Site, which then descends invisibly into the
   Private Topology of that Site.

   The nested flat-routed areas correspond to transit subnetworks of the
   Large Structure.  One very important example of such subnets is the
   "reseller" or "wholesale transit customer" of a Large Structure.
   (Note that whether the Large Structure is a network or an exchange
   point doesn't matter.)  The reseller network provides transit for
   Sites, so must be part of the Public Topology and appears as a
   substring within the Routing Goop, usually the right-most extension
   unless the reseller has further reseller customers.  In that case,
   the next level reseller will have his own extension to record his
   place in the Public Topology and to provide for navigating through it
   as well.

   The overall picture can now be drawn as a forest of trees
   distributing Routing Goop down to the Sites, with each tree being a
   Large Structure and the Large Structures connected arbitrarily at the
   top level. This structure will be mirrored by the actual machinery
   for distributing Routing Goop to the Sites as will be discussed a bit
   later, but this mental image of the prefixes "flowing" from the
   anchoring Large Structures is critical to understanding fundamental
   self-organizing abilities in the 8+8 model.

   While the 8+8 machinery is intended to be adequate for almost
   completely automated self-organization with respect to the
   construction and propagation of Routing Goop on an Internet-wide
   basis, we proceed for now closely following current practice
   (admitting manual configuration of certain information like Routing
   Goop) because of the additional complexity of the self-organization
   functions.  Initial deployment following current practice would not
   preclude eventual deployment of a fully self-organizing Global
   Internet.

9. The Distribution of Routing Goop

   There are two cases to consider for how Routing Goop gets
   distributed: source addresses and destination addresses.  In both
   cases RG is part of the address, one way or another, so we show how a
   full 16-byte address with the right RG gets created in these two



O'Dell v2.21                                                   [Page 13]


Internet-Draft                8+8 for IPv6        1996/10/22 05:58:54GMT


   cases.

9.1 RG for Source Addresses

   The RG of a source address is almost always the site-local prefix.
   If the destination address is not within the Site, the packet will
   leave the Site via one of possibly several Site Boundary Routers.
   The Site Boundary Router inserts the correct RG in the source address
   based on the path the destination should use to return a packet to
   the sender.  Except in very unusual circumstances this will be the RG
   which corresponds to the attachment path of the Site Boundary Router
   to the Global Internet.

   If the Site is Mulithomed via just one Site Boundary Router, then the
   router is free to apply whatever local policy suits. It simply must
   fill in a valid RG path which leads back to a Site Boundary Router
   for that Site.  If the Site is Multihomed via more than one Site
   Boundary Router, which router the packet leaves by is purely local
   policy and which RG gets applied is likewise local policy.

   The dynamic insertion of RG upon Site exit accomplishes a number of
   things.

   (1) It means that for most purposes, a computer system at a Site need
   not concern itself with exit topology policy matters which can be
   particularly tricky in Multihomed Sites.

   (2) It means that computer systems are essentially not impacted at
   all by topological rehoming of the Site.

   (3) It means that more complex Multihoming scenarios with multiple
   Site Boundary Routers each with multiple connections to the Global
   Internet can execute arbitrarily complex path recovery policy without
   concern for how it might impact a computer system doing source
   address selection.

   (4) It means that Mobile IP is dramatically simplified over the
   current model, but we postpone that discussion to another day.

   (5) It means that while a computer systems might forge the ESD in a
   source address, it CANNOT forge the point of injection into the
   Public Topology.  This is not strong authentication down to the
   particular computer system, but it is probably a strong deterrent to
   certain obnoxious activities due to the dramatically improved
   traceability.  We also note that the first-hop attachment router in
   the Public Topology is free to insert or override the RG if somehow
   an errant packet escapes a Site without it, thereby enforcing
   tracability. Of course, the Public first-hop router could always just



O'Dell v2.21                                                   [Page 14]


Internet-Draft                8+8 for IPv6        1996/10/22 05:58:54GMT


   drop a packet carrying inappropriate source RG as well. But to make
   it very clear, we put the burden of inserting correct RG in exiting
   source addresses squarely and solely on the Site and the Site Border
   Router. Any other location of the task has bad performance scaling.

   This simple mechanism solves a number of problems and actually
   simplifies the operation and deployment of this architecture so is
   well worth the implications it has for Site Border Routers.

   The Site Border Router gets the necessary RG from the first-hop
   attachment router in the Public Topology.  Alternately, as an initial
   mechanism the RG could be statically configured, but the real goal is
   completely automated propagation down the tree so that an entire
   complex subtree can be rehomed without human intervention or service
   disruption.

9.2 RG for Destination Addresses

   Currently, an IPv6 address lookup for a DNS name returns the
   information in a "AAAA" record which is the full 16 bytes of the IPv6
   address.

   The 8+8 design proposes synthesizing the 16 bytes of information in a
   query response from two different sources: an "AA" record and an "RG"
   record.  The "AA" record carries the 8-byte ESD for the DNS name in
   question and the "RG" record carries 8 bytes of the appropriate
   Routing Goop.

   One interesting question is how the AA record gets paired with an RG
   record in a given nameserver.  One simpleminded implementation would
   be to pair an RG record with a zone, but that has the problem of
   requiring all the systems in that zone to use the same Routing Goop
   and hence be in the same Site.

   A better scheme is to carry an "RG Name" in the "AA" record which
   would allow a nameserver to concatenate an arbitrary RG prefix to the
   ESD producing the full 16 byte response.  The "RG Name" would be a
   full DNS name which could be recursively translated (and the result
   cached).  Structured as an "upward delegation" with an appropriate
   Time-to-Live, a Site could import the Routing Goop information from
   their service provider completely automatically.  This capability
   will be used to great advantage in the discussions of rehoming which
   follows. [Interactions between RG TTL and zone TTL is an issue to be
   explored more.]

   Alternately, one special case for an RG record could be a delegation
   to a Site Border Router which could supply the correct RG
   automatically, at least in single-homed cases, and possibly in



O'Dell v2.21                                                   [Page 15]


Internet-Draft                8+8 for IPv6        1996/10/22 05:58:54GMT


   multihomed cases.

   The result of this structure is that individual zone entries for
   individual nodes (AA records) do NOT change when a Site rehomes.  The
   only thing which changes (logically) is the RG information which is
   composed with the nodes AA record to produce a full 16-byte response.
   This means the general Dynamic DNS machinery is NOT required to
   support Site rehoming.

   It also gives rise to significant potential for "smart nameservers"
   which examine the source address of a query to provided a more
   topologically appropriate translation for a given DNS query.  This
   isn't perfect, but it is much more detail than current nameservers
   have available without processing a full BGP routing table to
   ascertain IPv4 prefix/AS correspondence.

10. Rehoming A Site

   When a Site changes its point of attachment to the Global Internet,
   it is said to "rehome". One of the significant criticisms of IPv4
   CIDR and IPv6 "Provider-based Addressing" is the requirement to
   "renumber" a Site when it rehomes.  One of the explicit goals of the
   8+8 architecture is to eliminate, or at least mitigate, the impact of
   this.

   It is important to reiterate the notion that the Routing Goop of an
   8+8 address is not just a Locator, but that it encodes a PATH from
   the top level of the global hierarchy down to the Site.  Changing
   that path is what makes Rehoming and Multihoming essentially
   equivalent operations.  We proceed with the simple case first.

   When a Site wishes to rehome, it must establish a new attachment
   point to the Global Internet, and hence establish a new access path.
   Then it must start using that new path before the old path is
   removed.  The procedure is as follows:

   A Site establishes a connection with a new ISP and it becomes able to
   carry the traffic.  At that point, the Site alters the upward
   delegation of the DNS RG records.  Henceforth, all new connections
   made with the new translations will follow the new path to the Site.
   The new connection path is then made the preferred exit path and
   source addresses in packets exiting the Site immediately start being
   marked with the new return path.  The old connection should be
   maintained for some administratively determined grace period to allow
   DNS timeouts to transition new sessions to the new path and for
   long-running sessions to terminate.

   At first blush, it might appear that when the exit path for the Site



O'Dell v2.21                                                   [Page 16]


Internet-Draft                8+8 for IPv6        1996/10/22 05:58:54GMT


   switches over to the new path and the Site Border Router starts
   marking packets with the new RG, the return path for long-running
   sessions would automatically switch over to the new path.  Alas, this
   is not so because a long-running session will be using destination
   address containing the old RG acquired when the session first
   started.

   Consideration was given to providing some kind of "path redirect"
   which would allow the other end to deal with "flying cutovers" of a
   running session, but the security implications of this mechanism are
   too far-reaching to consider as part of initial depolyment.  If at
   some later point it becomes clear how to accomplish this safely, then
   it could be added downstream. But the complexity, security risks, and
   the mangnitude of the added value do not make it worthwhile at
   present, although the author would love to be convinced otherwise.

   Alternately, the Site could request a "Rehoming Courtesy" from their
   old ISP which would effectively make it a multihomed Site for some
   period of time.  After multihoming was established, the old
   connection could be taken down and the long-running sessions would
   continue to survive as long as the Site was multi-homed by way of the
   Rehoming Courtesy.

   Note that at no time did the rehoming effect anything internal to the
   Site's Private Topology.  The only change was the attachment to the
   Public Topology and the Routing Goop which records that attachment
   location.

11. Multihoming a Site

   One of the curiosities of IPv4 is that the network does a lot more
   work for a multihomed site but it is very hard to pin it down so that
   the instigator of the efforts can compensate the workers.

   In the 8+8 model, multihoming is an explicit service which is
   performed for a Site by the agents of the Public Topology which
   provide the access for the Site.  This mechanism can be made more
   sophisticated, but the notion is most readily explained by
   considering a Site which is dual homed to two different ISPs and
   hence has two distinct access paths represented by two distinct blobs
   of Routing Goop.

   The Site is attached to each ISP via some link and we postulate some
   kind of keep-alive protocol which determines when reachability to the
   Site's border router is lost. The ISP routers serving the dual-homed
   Site are identified to each other (via static configuration
   information in the simplest case or a dynamic protocol in the more
   general case), and when a link to the Site is lost, the ISP router



O'Dell v2.21                                                   [Page 17]


Internet-Draft                8+8 for IPv6        1996/10/22 05:58:54GMT


   anchoring the dead link simply tunnels any traffic destined for the
   Site via the other ISP router.

   This approach clearly requires coordination between the two serving
   ISPs. This is not a new constraint - multihoming already requires
   considerable coordination between the Site and is providers.  Of
   course, creating a protocol for dynamically creating a "homing group"
   is probably a very worthwhile investment but it is not absolutely
   necessary at the outset.

   It should be obvious now that the "Rehoming Courtesy" in the previous
   section is simply doing the router-pair coordination with the new ISP
   for some period of time.

12. Rehoming a Reseller

   Rehoming a Reseller is a slightly more general case of rehoming a
   Site, primarily characterized by more lead time, a longer grace
   period, and some necessary coordination with customer Sites to insure
   that the Routing Goop propagates correctly.

   The Reseller will establish a new connection which will not only
   result in a new path for the Reseller's topology, but for that of his
   customer Sites. When the Reseller alters his upward delegation of
   Routing Goop, it will ripple downward to his customer Sites by nature
   of their upward delegations.  The downward ripple of Routing Goop via
   the upward delegations should cause the Site zone TTLs to be reduced
   appropriately to insure caches expire well within the dual-homed
   transition grace period for the Reseller.

   This essentially rehomes all the Reseller's customer Sites all at the
   same time the Reseller's infrastructure is rehoming and should be
   completely transparent except for long-lived sessions which do not
   terminate by the end of the grace period.

13. Multihoming a Reseller

   There are two parts to multihoming a Reseller - one part similar to
   the Multihomed Site case above, and one part which is quite
   different.

   For this discussion, assume a Reseller which is dual-homed and hence
   has two different Routing Goop prefixes (remember that each path to
   the top level of the hierarchy has a distinct prefix). The reseller
   can solicit multihomed tunneling services from his two access point
   routers to provide alternate path service just like a multihomed
   Site.  Why traffic is coming to any particular router, though, is
   influenced entirely by what routes are advertised out that particular



O'Dell v2.21                                                   [Page 18]


Internet-Draft                8+8 for IPv6        1996/10/22 05:58:54GMT


   connection via BGP5 (or IDRP).  This is rather different from the
   multihomed Site case where the ESD is the object of interest and the
   RG simply gets the traffic to the Site boundary.

   The question arises, however, as to which prefix gets used for
   extending downward to his customer Sites.  The answer in the simplest
   case is to pick one and use it, making the Sites "natural" in the
   chosen prefix.  The alternate prefix can, of course, be advertised
   out the alternate path if desired.  But this work can be ascribed to
   the instigator and the superior attachment points can charge for this
   service.  (This is somewhat akin to charging for routes, but only
   routes which create a discontinuity in the routing space.)

15. A Comment on NAT Boxes

   Discussions of this proposal raised the question of what it means for
   Network Address Translation (NAT) boxes.  On the one hand, the 8+8
   model allows a NAT box to modify the Routing Goop during forwarding
   without impeding end-to-end TCP checksums which only rely upon the
   ESDs.  On the other hand, it isn't very clear what purpose of a NAT
   box would have given the 8+8 model.

   Typically a NAT box is cited as a way to have private topology within
   a site (note lower case) which is then attached to the Public
   Topology via the NAT box without revealing anything about that
   private topology.  The basic structure of the 8+8 model accomplishes
   exactly this goal - providing genuine Private Topology within local
   purview while providing independence of attachment point to the
   Public Topology.  The broad conclusion is that pure NAT boxes don't
   have much of a future given the 8+8 model.  More general application
   gateways performing firewall functions or "intranet bridges"
   providing crypto-tunnels between the protected interior of two Sites,
   however, are altogether another matter.

15. General Comments

   While some of 8+8 is something of a radical departure from IPv6 as we
   currently know it, in general it relies deeply on all the IPv6
   underpinnings which contribute so much to the attractiveness of IPv6:
   Neighbor Discover, all the dynamic configuration machinery designed
   to make renumbering palatable even using "provider-based addressing",
   and the flexibility of the "salami headers" which make tunneling and
   security attractive.  The general forwarding operations based on
   longest-match-under-prefix-mask and the policy-based routing
   machinery of BGP5/IDRP are also simply assumed.  All of these will
   need a tweak or two based on this proposal and it is beyond this
   author to do all the analysis required to identify every such tweak
   needed, so it will be up to the community to analyze this proposal



O'Dell v2.21                                                   [Page 19]


Internet-Draft                8+8 for IPv6        1996/10/22 05:58:54GMT


   and if embraced, look at all the related machinery which is touched
   in some subtle manner.

   This document has presented both an outline and the deep ideas behind
   an 8+8 proposal, and the author believes it has addressed the "hard
   problems" to the point it can convince the reader of the viability,
   and indeed the merits of this approach.  The routing scaling problems
   going forward require the kind of flexibility afforded by this
   approach.  Once the 8+8 partitioning of the address is accomplished,
   we are freed to tinker with the routing and forwarding machinery in
   ways which cannot be achieved nearly as readily as with a monolithic
   16-byte address.

16. Closing Comments

   This document presents a model which has been under construction by
   the author since before Fall of 1995, at least.  Conversations with a
   great many people have contributed to the design presented in this
   document. A skeletal version of this proposal first appeared in some
   email from Dave Clark of MIT who planted the seed and provided the
   monicker "8+8".  A great many others have contributed ideas and
   observations, all of which went into the stew pot for the synthesis
   contained here.  While it is impossible to mention all of them, a few
   deserve special mention as having provided comments on drafts or
   otherwise have significantly influence the thinking contained herein:
   Vadim Antonov, Ran Atkinson, Scott Bradner, Brian Carpenter, Noel
   Chiappa, Steve Deering, Sean Doran, Joel Halpern, Christian Huitema,
   Tony Li, Peter Lothberg, Louis Mamakos, Radia Perlman, Yakov Rekhter,
   Paul Traina. And a special thanks to all those folks in the IPng
   working groups who contributed to the foundation which is IPv6.


17. Security Considerations

   Almost certainly lots of them.

18. Author's Address

   Mike O'Dell
   UUNET Technologies, Inc.
   3060 Williams Drive
   Fairfax, VA 22031
   voice: 703-206-5890
   fax:   703-206-5471
   email: mo@uu.net






O'Dell v2.21                                                   [Page 20]

Document	Document type	Expired Internet-Draft (individual) Expired & archived
	Select version	00
	Author	Michael D. O'Dell Email authors
	RFC stream	(None)
	Intended RFC status	(None)
	Other formats	txt pdf bibtex bibxml