Internet Draft -- Expires Nov. 20, 1992
PRELIMINARY DRAFT:
Pip: The `P' Internet Protocol
Paul F. Tsuchiya
Bellcore
tsuchiya@thumper.bellcore.com
May 19, 1992
Status
This document is an Internet Draft. Internet Drafts are working documents
of the Internet Engineering Task Force (IETF), its Areas, and its Working
Groups. Note that other groups may also distribute working documents as
Internet Drafts.
Internet Drafts are draft documents valid for a maximum of six months.
Internet Drafts may be updated, replaced, or obsoleted by other documents
at any time. It is not appropriate to use Internet Drafts as reference
material or to cite them other than as a "working draft" or "work in
progress."
Please check the I-D abstract listing contained in each Internet Draft
directory to learn the current status of this or any other Internet Draft.
Disclaimer:
This text version does not contain the figures from the postscript
version. As such, it is missing information essential to the
paper, and so it is strongly suggested that the postscript version
be read.
1.0 Purpose of this draft
Pip is an IP protocol that scales, encodes policy, and is high speed. The
purpose of this draft is to explain the basic concepts behind Pip so that
people can start thinking about potential pitfalls. I am proposing Pip as an
alternative to the two "medium term" proposals that emerged from the
Road (Routing and Addressing) group to deal with the dual IP problems
of scaling and address depletion. Because this proposal, which represents
new ideas, is competing with old (and therefore well thought-out) ideas, I
wish to circulate it (and get the process started) as quickly as possible,
albeit in not as complete a form as I would like. I expect to have a
complete proposal by the beginning of September. There will be a plenary
presentation and a BOF covering this material at the Boston meeting of
IETF.
2.0 Pip General
Pip has the following features:
1. Pip carries multiple address types in a common format. As such, it is
beneficial for transition from one address to another, and for future
evolution (of routing techniques as well as of addressing schemes).
2. The Pip address is completely general (multiple levels of hierarchy,
expands to any number of systems).
3. The Pip address is compact-it grows with the number of systems.
4. The Pip address efficiently encodes policy (source-based) routes, both
in "long form" (explicit path) and "short form" (path identifier).
5. Because the Pip address can be a path identifier (multi-layer if de-
sired, like the ATM VCI/VPI), Pip can be used in a connection-orient-
ed fashion (this paper only briefly touches on mechanisms for
controlling connections).
6. The Pip address includes multicasting (potentially substantially more
sophisticated than what is for IP multicast numbers, for instance, hier-
archical multicast).
7. Pip efficiently encodes QOS (Quality-of-Service) information.
8. The routing table lookup with Pip is well-bounded (by the depth of
the address hierarchy).
9. Pip accommodates "multiple defaults" routing from (multi-homed)
stub domains.
10. Pip allows intra-domain routing and hosts to operate with no notion
of the "inter-domain" parts of their address, if desired. This is equiva-
lent to current IP hosts and intra-domain routers not needing to know
their own network number.
11. Pip accommodates tunneling across transit domains.
12. By virtue of 8 and 9, Pip accommodates separation of interior and ex-
terior routing.
13. Pip simplifies handling mobile systems (by having flat network layer
identifiers).
In short, Pip is a "next generation" protocol, intended to allow the internet
to evolve over the foreseeable future.
One of the design philosophies behind Pip is that it encodes all "routing"
information (what is traditionally spread over the address and QOS fields)
in a single structure (the Routing Directive). The rules for parsing the
structure are simple on one hand, but provide a rich set of routing
functions. Therefore, it is possible to build a single forwarding engine that
will accommodate many different types of routing styles, including
traditional hierarchical addresses, policy, source route, and virtual circuit.
This way, the forwarding engine can be built in hardware and can remain
constant even while internet routing evolves.
Another design philosophy behind Pip is that it delays the definition of
how internet packet should be composed and interpreted. The meaning of
addresses and QOS information are dynamically determined by
information in Directory Services, distributed protocols such as routing
protocols, and MIBs, rather than in a protocol specification. Current
internet protocols have continuously been moving towards this
philosophy, but with header formats that are not conducive to late
semantic definition. Pip facilitates late semantic definition of the internet
protocol header. This on one hand makes it easier to evolve the internet
incrementally, but requires that all systems (hosts, routers, and directory
servers) be a little smarter, and that algorithms be a little more complex.
This, in a nutshell, is the trade-off being made by Pip.
3.0 Transition Approach
Like IP, Pip by itself is nothing more than a header format and some rules
about how to forward the header. It is nothing without routing and
addressing and related algorithms behind it. But since Pip can encode the
semantics of existing internet headers (addresses, QOS, etc.), it can take
advantage of existing routing protocols and addressing schemes. This is
one of the main virtues of the proposal to move to CLNP [OSI2]-that it
takes advantage of an existing body of work. However, Pip will allow us
to move forward into advanced features that CLNP will not handle, while
still allowing us to take advantage of existing work (although not as easily
as moving to CLNP will).
Since Pip can encode backbone-oriented "addresses" that are
semantically equivalent to NSAP addresses, transition to Pip will be
almost identical to the transition to CLNP already described by Callon
[Ref]. Once most of IP has disappeared (and therefore scaling and address
depletion are no longer concerns), we can evolve advanced features into
the internet (policy, mobility, flow control) without having to change the
internet protocol. (Of course not having to change the internet protocol
doesn't mean not having to change routers. But not having to change the
internet protocol is still better than having to change it, especially because
it facilitates piece-wise evolution).
In the following sections, I show how Pip works outside of the context of
interoperation with existing addressing and routing schemes.
4.0 Pip Header Structure
Figure 1 shows the Pip header structure. The Pip header has 5 parts not
found (at least in this form) in current internet protocols. They are the
Handling Directive (HD), the Tunnel, the Logical Router (LR), the
Routing Hints (RH), and the IDs. While these parts are fundamental to
Pip, the details of their layout, and the layout of other fields, is open to
change.
The IDs field contains flat (non-hierarchical) values that do nothing more
than identify the source and destination of a Pip packet. The Routing
Directive (RD), which consists of the Tunnel, the LR and the RH,
contains routing information. Either the Tunnel or the RH are used, but
not both. The RH holds routing information such as (hierarchical)
addressing, source-route (including policy), and virtual circuit
information. The Tunnel simply marks entry and exit points of a domain,
and is used to temporarily over-ride the RH. The LR holds route-effecting
QOS information (such as routing metrics), plus various information
needed to make the RH operate properly. The HD holds non-route-
effecting QOS information, such as queueing directives, congestion
avoidance and control, and priority.
This packet structure better represents internet protocol functions than
traditional internet protocols. For instance, traditional internet protocols
combine the functions of identification and routing into the address fields.
Doing this generally limits the flexibility of the protocol. For instance,
host mobility is harder when the address combines these two functions.
Traditional internet protocols also split the routing function over multiple
fields (the address and the QOS fields). While this doesn't necessarily
limit functionality, it generally complicates the routing table lookup
function, or more accurately, it generally results in router
implementations that ignore the QOS fields, thus making it harder to add
QOS routing to the existing infrastructure.
Traditional internet protocols must use self-encapsulation in order to
tunnel through groups of routers. Pip has a specific field for this purpose,
thus eliminating the overhead of replicating the entire header.
No Pip header checksum is shown in Figure 1. I am undecided as to
whether or not one is necessary, particularly since the HD, Hop Count,
Tunnel, and RH fields will commonly change values from router to router.
In fact, of the first 5+ (32-bit) words, only the first word will potentially
not be modified. No fragmentation/reassembly fields are shown. I am
strongly inclined to leave these out, and just depend on dynamic
MaxPDU discovery to handle this. Finally, no version number field is
shown. Protocol identification (at the previous layer) can serve this
function.
The following sections cover the various parts of the Pip header in detail.
4.1 Boring Parts
The "boring" parts of the Pip header are the ID Type field (4 bits), the
Options length field (4 bits), the Total Length field (24 bits), and the
Protocol field (8 bits), and the Hop Count field (8 bits).
The ID Type describes the length and type of the Source and Destination
IDs. The IDs can be 0, 4, 6, or 8 octets each (the actual types, which are
not so boring, are described in the separate section on IDs below). The
Options Length field gives the number of 32-bit options that come after
the RD. The Total Length field gives the total length of the Pip packet,
including the Pip header, in octets. The maximum size Pip packet is 224 =
16,777,216 octets. This is substantially larger than the corresponding
fields in IP or CLNP, both of which allow for maximum packet sizes of
65536 octets. These fields comprise the first 32-bit word.
The Protocol field indicates the higher layer protocol, and is equivalent to
the IP Protocol field. The Hop Count field counts down the number of
hops before the packet should be dropped. It is the same size as the
corresponding fields in IP or CLNP, allowing for 256 hops. The Hops
field falls on a 32-bit (and 64-bit) boundary, making it convenient to
modify.
4.2 Tunnel and Routing Directive (RD)
The RD is the most novel and powerful aspect of Pip. The RD is general,
compact, and fast. It is general in that it can accommodate any address
type and any routing algorithm type, including source-based routing. It is
compact in that it encodes hierarchical addresses efficiently. And, it is fast
because 1) the number of steps required for the forwarding function is
small, even in the worst case, and 2) the same steps are used for
forwarding all types of routing, so an efficient and general forwarding
engine can be built.
The RD composed of three parts, the Tunnel, the Logical Router (LR),
and the Routing Hints (RH).
Because a router can be playing multiple roles, Pip models a router as
multiple "Logical Routers". For instance, a router may be operating at
multiple levels of the hierarchy, may be participating in multiple routing
algorithms, including multicast, may be operating with multiple routing
metrics, and so on. While the function of logical routers is for most
purposes a feature, it is required to make the RH mechanism work
properly, as is described below.
The basic algorithm for finding a route is to 1) determine the forwarding
table index, 2) determine which forwarding table to use (that is, which
logical router is active for this packet), 3) index directly into the
forwarding table (no search technique such as hashing or tree search is
necessary) and retrieve the routing information, 4) modify the RD for the
next-hop router. This is explained in more detail below (see Section
4.2.4)Tunnel
The 32-bit Tunnel is composed of two 16-bit fields, the Source Exit ID
(SEI) and the Destination Exit ID (DEI). The DEI comes after the SEI,
and so falls on the least significant bits of a word boundary.
When the DEI is 0, then the Tunnel is ignored and the RH is used to route
the packet. Otherwise, the RD is ignored and the Tunnel is used.
The purpose of the Tunnel is as follows. Consider two routers, X and Y,
both of which understand the RH (at the level at which the RH is
operating). Between X and Y are a series of routers that do not understand
the RH (at that level). Assume that a Pip packet (with a NULL Tunnel)
arrives at X and should be routed to Y. In order to get the packet to Y, X
fills the DEI field with a value that is understood by the intermediate
routers to mean "route to Y". X fills the SEI field with a value that is
understood by the intermediate routers to mean "route to X". The purpose
of the SEI field is to handle the case where a return packet (an error packet
or control packet of some sort) needs to be sent (either to X or to the
original source host). When Y receives the packet, it recognizes the
Tunnel as terminating at itself, writes the Tunnel field to 0, and forwards
based on the RH.
Tunneling is traditionally useful for preventing external routing
information from being required internally. It is also used by the ISIS
routing protocol for repairing area partitions. Pip tunneling can be used
for both of these purposes. Because of the way "addresses" (called RH
Numbers in Pip) are assigned in Pip, however, tunneling turns out to be
necessary just to make Pip work.
There are no nested tunnels in Pip (that is, tunnels cannot have tunnels).
While nested tunnels could be of some use, it seems that the usefulness of
tunneling diminishes with the number of nested levels. By having only
one level of tunneling, the packet format is simplified (and the size kept
small). To make nested tunneling work, it would be necessary to either
modify the size of the packet en route (to add and delete tunnels), or for
the originating host to put in enough Tunnel fields for the deepest nesting.
The former case is difficult because it requires changing the packet size,
which doesn't work for instance with (cut-through) ATM switching. The
latter requires extra complexity and overhead in informing the originating
host how many Tunnel fields to include in the packet. For these reasons, I
have chosen to limit tunneling to one level.
4.2.1 Logical Router (LR)
As described above, the LR field indicates which of multiple forwarding
tables should be used when routing a packet. The many uses of the LR
will become clear throughout the coming examples.
Note that in theory one can always use different indexing values, rather
than different forwarding tables, as a means of distinguishing logical
routers. This, however, couples "addressing" (RH numbering) between
different logical domains, thus generally complicating things. For
instance, one could use different RH values to indicate different QOSs
(cost, delay, etc.), but that would require that each system have an RH
Number indicating cost, another indicating delay, and so on. So, unless
such coupling is convenient, it is best to decouple RH numbering using
the LR field.
Even though the LR field can be treated as a flat field by a router, the
individual bits have specific meaning. My goal is that most or all of the
bits' meaning be determined dynamically (via system management or the
routing protocol or some other distributed protocol), and not be specified
in a standards document. This allows for the maximum flexibility in
evolving the protocol (adding new features, purging old ones). For
instance, upon booting, a host should, as part of its configuration process,
contact a local router and learn the meaning of each bit of the LR field. A
network debugger, even, could query attached routers for these
definitions, so that meaningful information could be logged and
displayed.
The following bits are likely to be required:
1. Level. This indicates what level of a hierarchical RH Number is being
routed on at a given time. This use of the LR field is only necessary if
hierarchical RH Numbers are being used.
2. Multicast. If multicast is used, at least one bit may be needed to indi-
cate whether the packet should be multicast or unicast. If several mul-
ticast algorithms are in use, multiple bits may be needed.
3. Route-effecting QOS. This would be any QOS type that influences
the route chosen, such as cost or high-bandwidth. Note that QOS need
not be route effecting. For instance, a QOS type of low delay might
only influence how packets are queued (given priority in the queue),
but not influence how they are routed. In this case, the HD would
have certain bits set aside for "low delay" (actually, priority queue-
ing), but the LR would not. In other cases, a given QOS might effect
both routing and handling.
4.2.2 Routing Hints (RH)
The RH is the most interesting and novel aspect of Pip. It holds what is
normally thought of as the "address" in a traditional internet header. It can
also hold many other kinds of routing information, such as policy
information.
The RH consists of the RH Descriptor and the Routing Hint Fields (RHF,
see Figure 2). The RH Descriptor tells how to interpret the RHFs. The
RHFs are a series of fields, listed in the order that they will be required by
the routers in the path from source to destination. This should not be taken
to assume that the RHFs necessarily specify a source route, in some
conventional sense of the term. Most normally, the RHFs will simply
contain a hierarchical source and destination RH Number, where each
RHF denotes one level of the hierarchical RH Number. This and other
uses of the RHFs (such as virtual circuit or path identifiers, true source
routes, and Sirpent- or Paris-style source routes) are given later.
Each pair of RHFs are separated by an RHF Relator (RHFR). The RHFR
is a two-bit field that shows the relationship between the field before it
and the field after. It has three values, up, down, and none. If down, the
previous RHF is hierarchically above the subsequent RHF. If up, the
previous RHF is hierarchically below the subsequent RHF. If none, the
two RHFs are not hierarchically related.
The RH Descriptor and RH are parsed as follows. The 6-bit RHF Offset
field determines which RHF is currently active. The RHF Length field
indicates the size of each RHF (all of which are the same length). The
RHF sizes represented by each RHF Length value are given in the
following table:
After this is a series of 1 or more RHFs. Where the actual values needed
in the RHFs vary greatly (some small, some large), this structure will
result in a larger RH than seems necessary. I don't know how to shrink
each RHF to its smallest size and still make the header parsing simple
(and therefore fast).
After the RHFs comes enough padding to make the RD fall on a 32-bit
word boundary.
The combined 10-bit RHF Offset/RHF Length, then, is used to isolate the
current RHF that a router should be routing on. A typical implementation
on a common CPU/RAM processor would be to use the full 10 bits as a
direct index into an array of size 1024, each entry of which contains data
on how to isolate the current field. For instance, if RHF Offset = 3 and
RHF Length= 8 (meaning each RHF/RHFR is 14 bits long), the data
would instruct the processor to fetch the first (32-bit) word of the RH,
shift left 10, mask with 0x00003c00, fetch the second word, shift right 22,
mask with 0x000003ff, and OR the two results. In this example, the RHF/
RHFR straddled 32-bit word boundaries, and so two fetches were needed.
(The RHF Relator should also be saved off at this time to be used later.)
Once the RHF is isolated, it is used as a direct index into a forwarding
table. The forwarding table can be well populated because (as is discussed
later in this paper) the RHF values are chosen not based on how many
things might have to be encoded at a given level of the hierarchy, but on
how many things are actually encoded at a given level. In other words,
the "address" that is ultimately carried in packets is, unlike current
internet protocol addresses, well-utilized.
In addition to the information in the forwarding table described above, the
forwarding table entry must also indicate whether the RHF Offset needs
to be decremented. The RHF Offset is usually decremented when a packet
crosses a hierarchical boundary. For instance, if the packet was being
forwarded based on the equivalent of "network number" through a
backbone, the router bordering the indicated network would decrement
the RHF Offset so that the next router (the router in the indicated
network) would automatically look at the "subnet number" field. Often a
single router is acting at two or more levels of the hierarchy, for instance a
level 2 router in the ISIS routing protocol. In this case, the forwarding
table entry and RHFR would indicate that, instead of routing the packet to
another router, the next RHF should also be examined (and, another
forwarding table used). It would be unusual to find a router operating at
more than three levels of the hierarchy. Further, address hierarchies are
shallow. Telephone numbers in the USA have only 4 levels of hierarchy
(including the international code). Therefore, the number of iterations of
this search is well-bounded.
Note that this "field indexing" style of lookup is not just a cute
optimization. Pip derives most of its routing flexibility from it, and
wouldn't be general without it.
4.2.3 Fowarding Algorithm
This section describes the algorithm for forwarding a packet, based on the
contents of the Tunnel and the RD (see Figure 3). For expository reasons,
the unicast algorithm is defined, followed by the modifications needed for
multicast. These same algorithm is used no matter what kind of routing
algorithm is being used (hierarchical, policy, source, virtual circuit).
Getting the appropriate behavior, according to the routing algorithm used,
requires configuring the tables shown in Figure 3 correctly.
1. If the Tunnel Field is not 0, index into the Tunnel Table using the val-
ue in the Tunnel Field, and go to step 2. Otherwise (the Tunnel Field
is 0), index into the Logical Router Table (LR Table) with the value in
the LR Field, and go to step 3.
2. If the Information column contains forwarding info, then modify the
Tunnel Field value according to the instructions in the Information
column, and forward the packet. Otherwise, if it contains a pointer to
the LR Table, set the Tunnel Field to 0 and go to step 1. Otherwise, if
it contains a pointer to a forwarding table, then go to step 4.
3. If the Information column contains forwarding info, then modify the
LR Field and Tunnel Field values according to the instructions in the
Information column, and forward the packet accordingly. Otherwise,
if it contains a pointer to another forwarding table, then go to step 4.
4. Using the RH Descriptor (RHF Offset/RHF Length), isolate the cor-
rect RHF and RHFR. Using the RHF, index into the correct forward-
ing table (determined by the pointer in the previous step). If the
Information column contains forwarding info, then modify RHF Off-
set field, the value of the isolated RHF, the Tunnel Field, and the LR
Field value according to the instructions in the Information column,
and forward the packet accordingly. Otherwise, if it contains a pointer
to another forwarding table, modify the isolated RHF field value ac-
cording to the instructions in the Information column, and repeat step
4 (using the new forwarding table).
If tunneling is being used, and the router receiving the Pip packet is not
the last router of the tunnel, then the router will find the forwarding
information in the Tunnel Table, and not index any other tables. If the
router is the last router of the tunnel, and the Tunnel Field has not been set
to zero by the previous router, then the router will find a pointer in the
Tunnel Table, and forward according to the RH.
If tunneling is not being used, the router receiving the packet will
normally find a pointer in the Logical Router Table. When a router finds a
pointer in a forwarding table (thus pointing it to another forwarding
table), it is normally the result of "routing down the hierarchy". That is,
the router is operating at multiple levels of the hierarchy, and is parsing
the hierarchical RH Number.
Section 5 gives examples of the algorithm described above.
Multicast Algorithm
For multicast, the tables in Figure 3 are modified such that the
Information column in each table contains a set of information blocks,
each one being a pointer or forwarding info. When there are multiple
forwarding info blocks (either in the same table entry, or by virtue of
multiple pointers reaching multiple tables), then multiple packets are
transmitted. Each packet may have the Tunnel or RD fields modified
differently, so each information block contains these instructions.
4.3 Handling Directive (HD)
The HD is something of a catch-all field for any packet handling
mechanisms that don't influence the route taken by a packet. Typical
handling types would be queueing directives, such as priority queueing,
security directives, such as encryption, and so on.
The meaning of the specific bits is meant to be handled in the same way as
the LR-that is, the meaning of the bits is defined dynamically through
system management or configuration protocols, not through hard-coded
definition in a standards document.
Each domain autonomously determines what meaning is assigned to each
bit. When different domains use different bits for the same purpose, the
value of the HD must be modified when a packet crosses domain borders
so that the next domain may correctly interpret the meaning of the HD.
The border router determines the proper translation via protocol exchange
with the neighboring domain or via system management.
By packing all of the handling bits together, an implementation style
whereby the HD is used as a direct index into a RAM memory, thus
retrieving the appropriate handling mechanisms and values, is possible.
This paper does not further discuss the HD. Most notably, it does not
discuss how a dynamic routing protocol would propagate HD
information.
When an ID is present, it alone is used to identify the source and
destination hosts. However, IDs can be mapped to the associated RH, so
that the RH implies a certain ID The ID therefore need not be carried in
most packets. This works as follows. When a packet is first sent from a
source host X to a destination host Y, the ID is included. The destination
host Y, upon receiving the packet, associates the source ID with the
"Source RH Number". These are the RHFs that describe the "source
address" of the source host (see example 1). When Y returns a packet to
X, it writes X's ID in the destination ID field, and X's Source RH Number
in the RH (as the Destination RH Number). This indicates to X that Y has
recorded the mapping between X's source RHs and X's ID, and
subsequent packets from X that contain the same source RH need not
include the ID field.
If the host is mobile, and changes RH Numbers while communicating
with another host, then it includes the ID when it uses a new RH Number.
This lets the destination host associate another Source RH Number with
the ID, so that subsequent packets can again leave the ID off. An out-of-
band message can be used to de-associate no-longer-valid RH Numbers.
(If both hosts are mobile, then some kind of third party server will be
necessary, so that current RH Numbers can be determined, in case both
hosts get new RH Numbers simultaneously.) If the hosts get new RH
Numbers often, then the ID can simply be included in every packet.
The ID Type field is interpreted as follows. The first two bits indicate the
type (and length) of the source ID, and the second two bits indicate the
type of the destination ID. The meaning of the four values are: 0 = no IDs;
1 = 32-bit IP number; 2 = 48-bit IEEE 802 number; 3 = 64 bit number.
The 64-bit number can have multiple interpretations, including X.121
number, E.164 number, and so on. While the ID field never influences
routing, the IP-type ID can be used during transition from IP to Pip to
determine how to fill in parts of the RD as the packet traverses the
internet.
The ID field is padded out to a 32-bit boundary. It may make sense to pad
out to a 64-bit boundary, given the introduction of 64-bit word processors.
4.5 Options
No options are defined at this time. In the future there might be options to
establish virtual paths in lieu of policy routes, reserve bandwidth, manage
mobile hosts, manage multicast lists, or whatever. In general, I would
assume that, if options are present, the packet leaves the normal
forwarding code (or hardware) path for special (and slower) processing.
Options are not further discussed in this paper.
4.6 Messages
Pip requires the following "ICMP"-type messages:
Use/don't use tunneling message
Incorrect RH message (usually means not enough levels of RH
Number given)
Max PDU exceeded notification
Received ID incorrect (used to flush old RH Number from sending
host)
Normal redirect
Tunnel redirect
ARP
The use of these messages are explained by the following examples.
5.0 Examples
Following are descriptions of how various routing and addressing styles
are used with Pip. These will further explain the use of the RD.
5.1 Example 1: IP-style Hierarchical RH Numbers (Addresses)
The examples in this section are primarily for the purpose of introducing
the various concepts of Pip, particularly the RD. None of the examples are
give the complete algorithm, but they get successively more complex and
complete. Later examples (Examples 2 and on) will be complete.
Consider the network of Figure 4. The RH Numbers shown correspond to
IP-style addressing.
The Pip analogue to existing IP and CLNP addressing styles is
hierarchical RH Numbers. When plain hierarchical RH Numbers (plain
means with no QOS or policy information) are used, the RHFs (and
RHFRs) are structured as shown in Figure 5. The first group of RHFs are
called the "Source RHFs". These are separated by "up" RHFRs, and are
roughly equivalent to the source address in a traditional IP packet. The
second (and last) group of RHFs are called the "Destination RHFs".
These are separated by "down" RHFRs, and are roughly equivalent to the
destination address in a traditional IP packet.
The Source RHFs are listed in order of lowest level of the hierarchy first.
That is, this field will come in on the wire first. The Destination RHFs are
listed in order of highest level of the hierarchy first. Note that this is the
order in which the fields (specifically the Destination RHFs in this case)
will be used by routers. The RHFR between the source and destination
RH Number indicates "none".
5.1.1 Example 1.1: No tunneling, no default routing.
Assume that no tunneling is needed, and that default routing is not being
used. In other words, the forwarding tables of the routers within the
network have network numbers for other networks. The Tunnel Table for
router x consists of one entry, indicating that all non-zero tunnel values
are invalid. If a Pip packet with a non-zero Tunnel was received, the
"Don't use tunneling" message would be sent to the sender.
The LR table for router x is as follows:
LR table = [ <LR.level=3, use FT3> <LR.level=2, use FT2>
<LR.level=1, ambiguous> ]
For these examples, the only information in the LR Table is that
concerning the hierarchical level at which the packet is operating. Since
the bits denoting this do not necessarily need to be in the least significant
positions of the LR Field, the "LR.level=X" notation implies the index
into the LR table.
The reason the LR.level=1 is ambiguous is that router x is attached to two
level 1 areas (subnets), and therefore wouldn't know which level 1 table
(FT1a or FT1b) to use. As seen from x's forwarding tables below, FT2
must first be indexed to determine whether FT1a or FT1b should be used.
The forwarding tables for router x are as follows:
These table are simplified in that they do not show, for pedagogical
reasons, information relating to the RHF Relators. This will be shown in
later examples.
Example 1.1a: From 2.2.1 to 2.2.2
First consider a packet from 2.2.1 to 2.2.2. Host 2.2.1 would initially
make a directory service query and get back an RH Number in the
following form: <level 1 = 2; level2 = 2; level 3 = 2>. By comparing its
own RH Number with that for the destination, 2.2.1 would conclude that
they share the same level 3 and level 2 (that is, are in the same network
and subnet). 2.2.1 would then compose the following RD:
RD = < Tunnel = 0; LR.level = 1; RHF Offset = 2; RH = 1 (none) 2 >,
where "LR.level" indicates the bits in the LR field indicating the
hierarchical level, and "RH = 1 (none) 2" means that the first RHF is
value 1, the second RHF is value 2, and the RHFR between them is
"none".
The source knows to set Tunnel = 0 because of a local parameter
indicating that tunneling is not in effect. Normally, a host will assume that
tunneling is not in effect unless told otherwise (either by a configuration
message or by a "Don't use tunneling" error message).
The source host initially sets LR.level = 1 because that is the highest
uncommon level between source and dest (and therefore a level at which
routing must take place). The RH contains the level 1 value from the
source (1) followed by the level 1 value from the destination (2). Because
the host is setting the RH.level to 1, the host doesn't have to include any
RH Number components higher than that in the RH. Since neither value is
hierarchically above the other, the RHFR is set to "none". Finally, the
RHF Offset is set to point to the beginning of the Destination RHF of the
RH (value 2). In all examples, the RHF being pointed to by the RHF
Offset will be printed in bold type.
If the host knew that strict subnet-per-LAN IP-style RH Numbering were
being used, it could deduce that the destination host is on the same LAN
as itself, and ARP for the destination. But assuming that the source host
doesn't know this, the source host would send the packet to its "default"
router, which is x.
When router x receives the packet, it goes into the LR table with
LR.level=1, and determines that the LR is ambiguous in this case. It
therefore sends an "LR ambiguous" message to the host. The host would
label router x as being ambiguous at level 1, so that future packets (even
to different destinations) would start at level 2. Normally, a configuration
message from router x (as part of router discovery) would have prevented
the need for the error message.
The host composes another RH, this time with level 2 included:
RD = < Tunnel = 0; LR.level = 2; RHF Offset = 3; RH = 1 (up) 2
(none) 2 (down) 2>.
Now, the bottom two levels of the source RH Number (2.1) occupy the
first two RHFs (but in reverse order), and the bottom two levels of the
destination RH Number (2.2) occupy the last two RHFs.
When router x received this packet, it would index the LR table with
LR.level=2, and determine that forwarding table FT2 should be used.
Using the RHF Offset, router x would isolate the third RHF (value 2)
from the RH. Router x would index 2 into forwarding table FT2, and
retrieve a result indicating that it needs to move to level 1, using
forwarding table FT1a. Router x would increment RHF Offset, isolate the
fourth RHF (value 2) from the RH, use this as an index into FT1a, and
determine that the destination is on subnet 2.2. It would then use an ARP
function to discover the LAN RH Number of 2.2.2.
Router x would also redirect host 2.2.1. After the redirect, packets from
2.2.1 would go directly to 2.2.2, and would use an RH with only level 1.
To form a return packet, 2.2.2 would reverse the order of the RHFs, and
calculate the values of LR.level and RHF Offset similarly to the way that
2.2.1 calculated them. As such, 2.2.2 would copy the level of the
incoming packet into the return packet.
Note that the RH for level 1 packets (after the redirect) would only be 1
word long. Only putting as much of the RH Number in the RH as needed
is one reason that Pip is compact. Since most traffic is local, most packets
will be able to take advantage of this particular optimization.
Example 1.1b: From 2.2.1 to 2.1.3
For a packet from 2.2.1 to 2.1.3, the directory service query would return
<level 3 = 2; level2 = 1; level 1 = 3>. By comparing its own RH Number
with that for the destination, 2.2.1 would conclude that they share the
same level 3 (network), but not the same level 2 or level 1. 2.2.1 would
then compose the following RD:
RD = < Tunnel = 0; LR.level = 2; RHF Offset = 3; RH = 1 (up) 2
(none) 1 (down) 3>.
The bottom two levels of the source RH Number (2.1) occupy the first
two RHFs (but in reverse order), and the bottom two levels of the
destination RH Number (1.3) occupy the last two RHFs. When router x
receives this packet, it would parse the packet as described above, go into
FT2 with index 1, then go into FT1b with index 3, and route the packet to
subnet 2.1.
Example 1.1c: From 2.2.1 to 1.5.11
For a packet from 2.2.1 to 1.5.11 (a host in Net 1), host 2.2.1 would
determine that there is no common level, and so would form an RD
starting at level 3:
RD = <Tunnel = 0; LR.level = 3; RHF Offset = 4; RH = 1 (up) 2 (up)
2 (none) 1 (down) 5 (down) 11>.
The full three levels of the source address (2.2.1) occupy the first three
RHFs (but in reverse order), and the full three levels of the destination
address (1.5.11) occupy the last three RHFs. When router x receives this
packet, it would go to forwarding table FT3 (based on the RL.level of 3)
with an index of 1, and forward the packet to router z without
incrementing the RHF Offset or changing the LR.level.
5.1.2 Example 1.2: With default routing, no tunneling
In the previous examples, the level 3 table (FT3), at least in the IP case,
would be very large, because it must hold all active network numbers.
One way to reduce forwarding table size in general is to use default
routing. With current IP networks, default routing works best if there is
only one exit point, because since there is only one path out of a private
network, default routing doesn't degrade the quality of paths found. If
default routing to multiple exits is used, then sometimes a non-optimal
exit point can be chosen.
With Pip, tunneling would normally be used to handle default routing
with multiple exits. For pedagogical purposes, we give an example here
where default routing is used without tunneling (again from the network
of Figure 4). The level 1 and 2 forwarding tables for router x (FT1a,
FT1b, and FT2) are the same as for Example 1.1. The forwarding table for
level 3 (FT3), however, has a single entry of:
FT3 (level 3, tunnel=0) = [ *, y, 3 ],
where * means all possible index values, y means next hop router y, and 3
means the transmitted packet should operate at level 3 (LR.level = 3, RHF
Offset = unchanged).
Assume the same host pair as Example 1.1c above (2.2.1 to 1.5.11). Host
2.2.1 would form the same RD as shown in example 1.1.c. Upon
receiving this packet, router x would not even need to isolate the RHF,
because it knows that all packets at level 3 are routed to y. Assuming that
y defaults level 3 packets to Backbone 1, the packet would take a longer
path than necessary.
5.1.3 Example 1.3: With default routing and tunneling
Now, we consider the case where tunneling is in use. The level 1 and 2
forwarding tables (FT1a, FT1b, and FT2) for router x are the same as in
the first example. There is no level 3 forwarding table. The Tunnel Table
(TT) is shown below:
Note that there is a new column in the table (the 4th column). This is the
value the Tunnel field gets written to upon transmission. Note that the
Tunnel Table is small (just two entries, one for each exit point). Router x's
LR table is modified as follows (to indicate the lack of a level 3
forwarding table):
LR table = [ <LR.level=3; error (Send "Use tunneling message")>
<LR.level=2; use FT2>
<LR.level=1; ambiguous>
The Tunnel Table and level 3 Forwarding Table for router y are as
follows:
Example 1.3a: From 2.2.1 to 1.5.11, host fails to use tunnel
Normally hosts would be configured to use or not use tunnels as
appropriate (via some router-to-host configuration protocol). Assume for
this example though that host 2.2.1 has somehow not been informed to
use tunnels for inter-domain (level 3) traffic.
Host 2.2.1 would generate an RD as shown in Example 1.1c. When router
x receives this packet, it goes to the LR Table entry for LR.level=3. This
results in the error shown. Router x sends an error message to 2.2.1
indicating that it must use tunneling for level 3 traffic.
Example 1.3b: From 2.2.1 to 1.5.11, host uses tunnel value
Now assume that either because of proper configuration or the error
message of the previous example, host 2.2.1 knows to use a tunnel for
level 3 traffic. Now, host 2.2.1 generates the following RD:
RD = <Tunnel = 1; LR.level = 3; RHF Offset = 4; RH = 1 (up) 2 (up)
2 (none) 1 (down) 5 (down) 11>.
In general, a host will know which Tunnel values are valid, via a
configuration message. Barring this, it probably makes sense to have a
convention where, lacking better information, a host simply chooses
value 1. The routing algorithm could treat this value to mean "route to
closest exit point", so that a single exit point doesn't get overloaded with
default-tunneled packets.
In this example, host 2.2.1 arbitrarily picks a Tunnel value of 1. Upon
receiving this packet, router x indexes into TT by 1 (the Tunnel value),
and forwards the packet to router y with no changes in the RD. When y
receives the packet, it indexes 1 into its Tunnel Table TT. The resulting
entry indicates that the appropriate exit point has been reached (which is y
for Tunnel value 1), and that the level 3 (inter-domain) forwarding table
FT3 should be consulted. (Alternatively, router x could have written the
Tunnel Field to 0 upon transmission to y. In this case, y would go directly
to the RH).
For this, router y isolates the appropriate RHF in the RH, which is the 4th
RHF (destination network number), value 1. The first entry in FT3
reveals that the appropriate exit point is actually z. Therefore, y puts z's
tunnel value (2) in the Tunnel field and forwards the packet to z. Router y
also sends a "Tunnel Redirect" message to 2.2.1, indicating that for this
particular level 3 value (network number 1), the appropriate tunnel value
is 2. As a result, subsequent packets from 2.2.1 to 1.*.* (where "*" means
"anything") will go via z.
Discussion
The "Tunnel Redirect" described in Example 1.3a, combined with use of
the Tunnel Field, are what make multiple defaults routing work. With
multiple defaults routing, the host's relationship with the exit border
routers is analogous to a host's relationship with its directly connected
(next-hop) routers. In the latter case, the connected router sends a
conventional redirect to the host to get to use an alternate router attached
to the same network. In the former case, the Tunnel Redirect serves the
same purpose with respect to an alternate border router attached to the
same stub domain. This is a powerful technique useful for isolating the
internal stub routing from external routing.
A few more comments about router y's level 3 forwarding tables is called
for. Note first that if router y receives an RD with a tunnel of 2 (FT3a,
second entry), it will forward that packet onto z. This would be necessary,
for instance, if a host on subnet 2.4 tunneled a packet to z.
If a packet is tunneled to y destined for network 3, y would write the
tunnel to 0 (assuming that it didn't subsequently have to tunnel through
backbone 1), and forward the packet onto Backbone 1 (FT3b, third entry).
As with router x, router y should never receive an RD at level 3 with a
NULL tunnel (except from a mis-configured host). When router y
receives a packet from Backbone 1, the RD should indicate level 2, as y's
neighbor router in Backbone 1 would know to decrement the LR.level
(and increment the RHF Offset) before forwarding a packet to y.
5.1.4 Example 1.4: Using tunneling for policy
This example shows how tunneling can be used as a limited policy
mechanism. Later examples will show how full policy information can be
encoded in the RD.
For this example, assume that x's and y's level 3 forwarding tables are as
shown in example 1.3, and that z's level 3 forwarding tables are structured
similarly to y's, except that z uses Backbone 2 to get to Network 1, uses y
to get to Network 3, and uses Backbone 2 to get to Network 4. Therefore,
there are two ways to get to Network 4, either via Backbone 1 (via y), or
via Backbone 2 (via z).
Assume that Host 2.2.1 has a packet to send to a host on Network 4. If
host uses a tunnel value of 1, then the packet will travel via Backbone 1. If
the host uses a tunnel value of 2, then the packet will travel via Backbone
2. In this manner, the tunnel value acts as a policy mechanism.
Although it is not the best method for getting policy, note that, with the
topology of Figure 4, it could be possible for Host 2.2.1 to choose
between Backbone 1 and 2 even for sending packets to Networks 1 or 3.
This could be done, for instance, by modifying y's and z's routing tables
so that they didn't send tunnel redirects, but instead blindly forwarded the
packet onto their connected backbones. (This is assuming that Network 2
does not advertise itself as a transit network, and therefore packets would
not be routed back to 2, thus causing a loop.)
A variation on this would be to define a bit in the LR to mean "force
indicated tunnel", so that if this bit was off, the border routers (y or z)
would pick the best path, but if this bit were on, it would override the
router's better judgement and force the packet directly onto the backbone
as described in the last paragraph. As with all host-initiated policy
mechanisms, this requires that the host (or policy server) be
knowledgable about the route it is choosing.
5.2 Example 2: Backbone-oriented Hierarchical RH Numbers
It is well-known that IP-style addresses do not scale well. NSAP
addresses (at least as defined by RFC 1237 [CGC]) scale better because
the addresses are rooted at the backbones.
Figure 6 shows an example topology and backbone-oriented RH Numbers
for use with this and subsequent examples. Each backbone has its own
number, which is advertised in routing updates to all other backbones.
(Hierarchically grouped backbones, for instance, where all backbones in a
country are given the same RH Number prefix, are possible, but are not
shown in Figure 6.) Note that stub network X has two levels of hierarchy
internally, while stub Y only has one.
One of the outstanding problems with the address assignment technique
of RFC 1237 is how to handle stub networks that are attached to more
than one backbone. One solution is to have multiple RH Numbers, one
per attached backbone. This type of solution can be used for Pip. For
instance, stub X (and its hosts) is shown to have two RH Number prefixes
(1.14 and 26.81), one reflecting its attachment to A and the other its
attachment to D. The negative aspects of the multiple addresses solution
are not as bad with Pip as with CLNP. Indeed, with Pip, hosts can be
completely isolated from inter-domain RH Numbering conventions.
One reason that multiple RH Number prefixes is easier with Pip is the
simple fact that "inter-domain" levels of the RH Number are not included
in intra-domain RDs. For instance, the RD for a packet from host w to
host y would be:
RD = < Tunnel = 0; LR.level = 2; RHF Offset = 3; RH = 9 (up) 27
(none) 12 (down) 58>.
Neither of the prefixes for stub domain X (1.14 or 26.81) are in the
packet. Internal communications are not affected by backbone RH
Numbering conventions. Hosts may (or may not) need to know their
backbone RH Numbers for inter-domain traffic, and so the functions for
reconfiguring these parts of all host RH Numbers may be required. This
would be done alongside other host configuration (such as how to use
tunnels, etc.), and is not particularly difficult.
Another reason why multiple RH Numbers is less of a problem with Pip is
that the transport protocol uses only the ID field for the purpose of
labeling connections. This means that the RH Number prefix (or any other
part of the RD) can change arbitrarily during a transport connection
without effecting the connection.
Appendix A shows the forwarding tables for various routers in Figure 6.
5.2.1 Example 2.1: Inter-domain communications without backbone
selection (with tunneling)
For these examples, host x wishes to send a packet to host z, and does not
care which backbone (A or D) is used, but would like the routers to
choose the best path. Assume that routing will find D as the best backbone
for reaching Y from X.
Example 2.1a: Complete host isolation from external RH Numbering
conventions.
This example describes a mode of operation where hosts (or internal
routers) do not need to know the "inter-domain" components of their RH
Numbers (although directory systems still must). This is the extreme case
of isolating internal network operation from external influences.
At a minimum, the host must initially know 1) that the stub-domain
border routers will handle the inter-domain RH Numbers, and 2) which
bit in the LR Field determines that so-called RH-Tunneling will be used
to find exit routers. The host must eventually know 1) how many levels of
inter-domain RH Number there are, and 2) the minimum RHF length for
these levels.
Initially, the host makes its best guess at the number of levels and the
minimum RHF length. For example, if host x thought that there was only
one level of RH Number above the stub domain, it might create the
following RD:
RD = <Tunnel = 0; LR.RH-Tunnel=1; RHF Offset = 3;
RH = 96 (up) 12 (up) 1 (none) 61 (down) 92 (down) 7>.
Note that the host is not using the Tunnel Field per se for this packet.
Instead, the use of an "RH-Tunnel" is encoded in the LR Field. The RH-
Tunnel number is placed in the third RHF. The entries in the RH-Tunnel
forwarding table contain routes to stub exit points. The purpose for using
this method of tunneling, which only works for stubs, not for backbones,
will become clear later in this example.
The RH-Tunnel value of 1 is just a guess on the part of the host. Since the
host has assumed only one level of hierarchy above its own RH Number,
it puts one RHF above its known RH Number (21.96). Since this field will
need to be written to its correct value by the border router, the RHF Offset
initially points to this field.
Through the tunneling mechanism similar to that already described, x will
eventually discover a tunnel that will get the packet to router b. Looking
at the forwarding tables for router b in Appendix A, we see that router b
would first access forwarding table FTt with index 1. This entry contains
a pointer rather than forwarding info (as can be seen by the fact that the
"next-hop" column is empty). Since the RHFR proceeding the third RHF
is "none", the "none" column in the table is used. The exclamation point
("!") indicates that this is an error, and that an error message of some sort
should be sent. In this case, it is a "Incorrect RH" message indicating to
host x that it has not set the correct number of levels in the RH Number.
Upon receiving this message, the host would assume 2 levels of RH
Number above the stub domain, and create the following RD:
RD = <Tunnel = 0; LR.RH-Tunnel=1; RHF Offset = 3;
RH = 96 (up) 12 (up) 1 (up) 1 (none) 61 (down) 92
(down) 7>.
This RD shows 4 levels of source RH Number instead of 3. Both source
RH Number levels 3 and 4 are filled in with the RH-Tunnel value of 1.
When router b receives this packet, it goes through the same steps as
before up to the point where it accesses forwarding table FTt, index 1.
This time, it refers to the "up" column, writes the RHF to value 14, and
increments the RHF Offset (as indicated by the "+"). The question mark
("?") after the value 14 in the new-value field indicates that a check
should be made at this point for sending an error message. In this case, the
check is to make sure that the RHF Length is big enough to hold the new
value. If it weren't, an error message indicating the correct minimum
RHF Length for the inter-domain parts of the RH Number would be sent
to the host.
At this point, the RH is as follows:
RH = 96 (up) 12 (up) 14 (up) 1 (none) 61 (down) 92 (down) 7>.
Next, router b goes to forwarding table FT4b index 1, writes the RHF to
value 1, checks again for correct RHF Length, increments the RHF
Offset, and goes to forwarding table FT4a, index 61. This entry indicates
router j (backbone D) as the next hop. The "?" here refers to a check to
see if an RH-Tunnel redirect should be sent. In this case the answer is yes,
because the RH-Tunnel value of 1 indicates backbone A. The RH-Tunnel
redirect would direct host x to subsequently use RH-Tunnel 2 to reach
level 4 RH Number "61".
When router b forwards the packet to router j, the RD is as follows:
RD = <Tunnel = 0; LR.level = 4; RHF Offset = 5;
RH = 96 (up) 12 (up) 14 (up) 1 (none) 61 (down) 92
(down) 7>.
The RH-Tunnel bit is not relevant to router j, and its semantics no longer
exist in the LR Field. The LR.level has been set at 4, as indicated in the
"none" column for FT4a, index 61.
The source RH Number has been filled in by router b. It is as though host
x knew its full source RH Number. Note that, in a sense, the wrong source
RH Number has been formed. This is because a return packet based on
this source RH Number will come back via backbone A instead of
backbone D-asymmetric paths. The source RH Number is composed
according to the RH-Tunnel value, not according to the actual exit point.
Because of the redirect to host x, however, subsequent packets will go via
RH-Tunnel=2, and therefore the "correct" source RH Number of
Example 2.1b: Partial host isolation from external RH Numbering
conventions
Any number of variations on this theme are possible. For instance, hosts
could normally not know the inter-domain RH Numbers, but learn them
on an as-needed basis.
In this mode of operation, a host could create an RD with RH-Tunnels, as
in the previous example, but intentionally incorrectly compose the RD,
for instance, by putting no levels above the intra-domain RH Numbers.
The error message sent by the border router could include the proper
inter-domain RH numbers. In subsequent packets, the host would
compose correct RDs, with RH.level = 4 and RHF Offset pointing to the
highest-level destination RH Number. This saves the border router from
having to work through two extra levels of hierarchy.
The learned inter-domain RH numbers would be used only for the
appropriate destination(s), and would be flushed periodically.
Or, the host could operate as in example 2.1a, but when it receives a
return packet from the destination host, it can learn the appropriate inter-
domain RH Numbers from the Destination RHFs of the received packet.
If the host later receives a tunnel redirect (implying that a different
outgoing backbone was being used), the host could again write the inter-
domain RH Numbers to zero, thus learning the new RH Number is
subsequent return packets.
Once the host learns and uses the correct inter-domain RH Numbers, it
may use the Tunnel Field to exit the stub domain.
Example 2.1c: No host isolation from external RH Numbering
conventions
This example is quite similar to the previous two examples, except that
the function of the border router filling in the proper inter-domain RH
Numbers is not used. Instead, hosts are configured with <Tunnel value;
matching inter-domain RH Number> tuples, one for each exit backbone.
All hosts in stub X would have two tuples: <Tunnel=1; level 4=1, level
3=14> and <Tunnel=2; level 4=26, level 3=81>. A Tunnel value of 1,
then, represents exit points that reach backbone A (1), and a Tunnel value
of 2 represents exit points that reach backbone D (26). Note that these
tunnel values are not pointing to exit routers per se-they are pointing to
exit backbones. Therefore, a Tunnel value of either 1 or 2 could cause a
packet to go to router b, since it is connected to both backbone A and
backbone D.
Since x wants routing to pick the appropriate exit backbone, it creates the
following RD:
RD = <Tunnel = 1; LR.level = 4; RHF Offset = 5;
RH = 96 (up) 12 (up) 14 (up) 1 (none) 61 (down) 92
(down) 7>.
The thing about this RD that will force routing to choose the best path
(according to routers) is that the RHF Offset points to the destination
backbone (61), and doesn't pre-suppose what exit point to use.
Presumably, routing will have an opinion about what is the best way to
get to backbone 61, and will do the right thing. The Tunnel value (1) was
picked arbitrarily.
Looking at the forwarding tables for b (in Appendix A), we see that b will
access forwarding table TT (because the Tunnel is non-zero), and index 1.
Router b would write the Tunnel value to 0, index the LR Table at
LR.level=4, and go to table FT4a, index 61. (This is deduced from the
tables shown in Appendix A because the other level 4 forwarding tables,
FT4b, is only reached via FT3.)
At this point, the behavior is similar to that of example 2.1a, where a
tunnel redirect is sent.
As a result of the tunnel redirect, host x subsequently composes the
following RD:
RD = <Tunnel = 2; LR.level = 4; RHF Offset = 5;
RH = 96 (up) 12 (up) 81 (up) 26 (none) 61 (down) 92
(down) 7>.
Discussion
Note that both modes of operation (hosts that do not know the inter-
domain RH Numbers and hosts that do) can operate in the same domain
using the forwarding tables shown for router b.
Note that the "dumb host" mode of operation (that of Example 2.1a) can
work because the ID function has been partitioned from the "routing"
function. This allows routers to change aspects of the routing information
while still allowing hosts to recognize the source and destination of
packets.
I have mixed feelings about the "dumb host" mode of operation. On one
hand, the notion of not having to administer inter-domain RH Numbers in
machines other than border routers and directory service is appealing. On
the other hand, it seems to me that, given the right protocols, it should be
easy to manage inter-domain RH Numbers in all hosts and routers. For
instance, OSI is in the process of defining a means whereby all hosts in an
area can be informed of new NSAP prefixes. This technique is tied to
current ISIS and ESIS functions, and is actually quite simple.
5.2.2 Example 2.2: Inter-domain communications with backbone
selection, with tunneling.
For these examples, the source host wishes to manipulate the exit
backbone chosen, rather than let the routers choose. Note that this use
assumes that the host (or user) has the knowledge necessary to choose a
backbone that makes sense. For instance, it might be silly for a host to
choose backbone A over backbone D, when backbone A forwards the
packet onto backbone D anyway.
Example 2.2a: Punching holes, different hierarchy depths, and
symmetric paths
As with the previous examples (2.1), host x wishes to send a packet to
host z. But, host x wants the packet to go through and return via backbone
A. We assume that the hosts in X have the same information as with
example 2.1c, that is, that they know which inter-domain RH Numbers
are associated with which Tunnel values.
Host x creates the following RD:
RD = <Tunnel = 1; LR.level = 4; RHF Offset = 4;
RH = 96 (up) 12 (up) 14 (up) 1 (none) 61 (down) 92
(down) 7>.
The difference between this and the previous example is that the RHF
Offset is set to 4 instead of 5, and is therefore pointing to the highest
Source RH Number (1) instead of the highest Destination RH Number
(61). As a result, when router b receives this packet, it replicates the
actions of example 2.1c, except that it indexes FT4a with value 1 instead
of value 61. This retrieves a next-hop of g, which matches the implied
Tunnel value, and so no tunnel redirect is necessary.
The RHFR in this case is "none", and so router b keeps the LR.level at 4.
Since the next hop is in backbone A, router b increments the RHF Offset.
Note that if router b had not incremented the RHF Offset, router g would
have taken the extra step of determining that the RHF (1) indicated itself
and incrementing the RHF Offset itself. Router b forwards the packet to
backbone A (router g) rather than backbone D, as it otherwise would
have.
At this point, it is instructive to follow the packet through the internet to
the destination. Router g receives the following RD:
RD = <Tunnel = 0; LR.level = 4; RHF Offset = 5;
RH = 96 (up) 12 (up) 14 (up) 1 (none) 61 (down) 92
(down) 7>.
Router g access its forwarding table FT4, index 61, and routes the packet
to router j. (Here we see that, from a purely topological perspective
anyway, host x's choice of A as its backbone does nothing more than
incur 2 extra hops.) The RD received by router j is the same as that shown
above (router g does not change the semantics of the RD, although it
could have modified the bit positions in the LR or HD, if backbone D
interprets the bits differently than backbone A).
Note that appropriate exit point from backbone D to backbone J is router
l. Within backbone D, however, two ways are shown to get from j to l. By
inspecting the forwarding tables for router j, we see that a "QOS" metric
determines which way is taken. This metric would be encoded in the LR
(along with level), and is used to choose the "Logical Router" (that is, the
appropriate forwarding table) for the metric type. Note that this metric
example only influences the path inside a backbone. A metric could just
as well influence the path of backbones.
For this example, assume that the "QOS" metric bit is 0, and so
forwarding table FT4a is used, indexed by 61. This returns a next hop of l,
and the RD is not modified. Note that if there are routers between routers j
and l, router j would have to tunnel to reach router l.
When router l receives the packet, it indexes 61 into table FT4a. Instead
of retrieving an entry indicating that the packet should be routed to
backbone J, route l is instructed to look into a level 3 table (FT3b). This is
surprising, as the destination stub is not under backbone D. The reason for
it in this case is that 1) there are two ways to enter backbone J, and 2)
router l would like to pick the most appropriate entry point into backbone
J for the given stub. This is analogous to the "east coast/west coast"
problem found sometimes in the USA, where a neighbor backbone can be
entered on either coast, and more detailed information about the location
of the destination is desired to know which entry point to take.
Router l increments the RHF Offset, and indexes 92 into forwarding table
FT3b. This entry indicates that router p is the best next hop into backbone
J. Note that router l has two level 3 forwarding tables, FT3a and FT3b. It
is necessary to separate the forwarding tables for the level 3 destinations
within backbone D from those in backbone J. And indeed, it would be
necessary to have a separate level 3 table for every level 4 entity whose
level 3 details were known. This is in order to distinguish between
identical level 3 values in the different level 4 areas.
This form of gathering detailed information about the internal structure of
other domains is sometimes called "hole punching", and is a feature of the
IDRP routing protocol.
Router p receives a packet with the following RD:
RD = <Tunnel = 0; LR.level = 3; RHF Offset = 6;
RH = 96 (up) 12 (up) 14 (up) 1 (none) 61 (down) 92
(down) 7>.
Router p indexes 92 into forwarding table FT3. This entry returns a next
hop of q. Note that the LR.level of the RD transmitted by router p is set to
1, even though it came in as level 3 and even though the RHF Offset was
incremented only once. This is necessary because stub domain Y only has
one level of hierarchy, and therefore views the "top" of the hierarchy as
level 3 rather than level 4. A host in a stub domain will view the top level
of the RH Number hierarchy as being the number of levels in its RH
Number. This is true whether or not the destination host has the same
number of levels.
A router can view the top level of the hierarchy as being any level equal to
or greater than the number of levels it is aware of. As such, router g, for
instance, could view the top level as level 2. The stub domains would then
be level 1. As long as one router translates the level into the proper value
for the next router, the level value can be chosen somewhat arbitrarily.
To continue the example, router q receives the following RD:
RD = <Tunnel = 0; LR.level = 1; RHF Offset = 7;
RH = 96 (up) 12 (up) 14 (up) 1 (none) 61 (down) 92
(down) 7>.
Router q transmits the packet to r, which transmits it to z (the forwarding
tables for q and r are not shown).
To form a return packet, host z reverses the order of RHFs, resulting in the
following RD:
RD = <Tunnel = 1; LR.level = 3; RHF Offset = 4;
RH = 7 (up) 92 (up) 61 (none) 1 (down) 14 (down) 12
(down) 96>.
The destination RHF pointed to by the RHF Offset (1) signifies backbone
A. This means that the reverse path will be symmetric with the forward
path (at least at the level of domains).
5.2.3 Example 2.3: General Policy Routing
The previous example showed a small level of policy routing, in that the
source host was able to choose the exit backbone. Recent work [BE, LS]
indicates that policy routing in general can best be achieved with domain-
level source routing. In this example, we show how this can be encoded
with Pip.
For general policy routing, but still with hierarchical RH Numbers, the
RD is of the form shown in Figure 7.
In between the source and destination RHFs are the intermediate RHFs.
These designate the backbones on the path from source to destination.
Example 2.3a: Choosing the inter-domain path
For this example, assume that host x not only wants the packet to go via
backbone A, but to traverse backbones B and C as well. To do this, host x
forms the following RD:
RD = <Tunnel = 1; LR.level = 4; RHF Offset = 4;
RH = 96 (up) 12 (up) 14 (up) 1 (none) 14 (none) 9 (none)
61 (down) 92 (down) 7>.
The packet would reach router g similarly to example 2.2a. Router g
receives the following RD:
RD = <Tunnel = 1; LR.level = 4; RHF Offset = 5;
RH = 96 (up) 12 (up) 14 (up) 1 (none) 14 (none) 9 (none)
61 (down) 92 (down) 7>.
Instead of pointing to the destination backbone (61), the RD points to
backbone B (14). Therefore, router g forwards the packet to router f,
which forwards it to router h. When router h receives the packet, it would
point to backbone C (9), and so on. The domain path taken by the packet
would be X-A-B-C-J-Y.
When host z receives this packet, it knows by inspecting the RD that 14
and 9 are intermediate backbones (because of the "none" RHFRs), and
strictly speaking are not necessary for returning the packet to y. If z
wanted the return path to be symmetric with the forward path, then it can
form an RD by reversing the RHFs. However, if z doesn't care about the
return path, or wishes a different return path, it can remove the
intermediate RHFs (14 and 9), and potentially add some of its own.
Example 2.3b: Choosing the intra-domain path
In this example, host w is sending a packet to host z. Host w doesn't care
about the inter-domain path, but wishes the intra-domain path to transit
areas 19 and 14 before exiting the domain. To do this, host w forms the
following RD:
RD = <Tunnel = 0; LR.level = 2; RHF Offset = 3;
RH = 9 (up) 27 (none) 19 (none) 12 (up) 14 (up) 1 (none)
61 (down) 92 (down) 7>.
When router c receives this RD, it will index 19 into its forwarding table
FT2 (not shown, but analogous to router b's FT2), and route the packet to
router a, which will forward it to b (based on an index of 12 into its FT2
forwarding table, also not shown). When router b receives the packet, it
will have the following RD:
RD = <Tunnel = 0; LR.level = 3; RHF Offset = 5;
RH = 9 (up) 27 (none) 19 (none) 12 (up) 14 (up) 1 (none)
61 (down) 92 (down) 7>.
Router b will index 14 into its forwarding table FT3, which indicates that
it should go to level 4 and route on the next RHF. Note that here is an
example where, to save memory, this table could be implemented as a
single "wildcard" entry rather than a full table to be indexed into.
When host z receives this packet, it can again either leave in the
intermediate RHFs or take them out. In this case, however, the
intermediate RHFs are interspersed between source RHFs. This can be
detected, however, by inspection of the RHFRs. Assuming that host z
leaves the intermediate RHFs in, it would form the following RD:
RD = <Tunnel = 1; LR.level = 3; RHF Offset = 4;
RH = 7 (up) 92 (up) 61 (none) 1 (down) 14 (down) 12
(none) 19 (none) 27 (down) 9>.
When router g receives this packet from backbone D, it forwards the
packet to router b. Router b receives the following RD:
RD = <Tunnel = 0; LR.level = 2; RHF Offset = 6;
RH = 7 (up) 92 (up) 61 (none) 1 (down) 14 (down) 12
(none) 19 (none) 27 (down) 9>.
Since the RHFR after the 6th RHF (12) is "none", router b goes to the
"none" column of index 12 in table FT2, increments the RHF Offset, and
indexes again into FT2, but this time 19. As a result, the return packet
takes the reverse path of the forward packet.
Note that in general for this to work, since backbone A has two ways to
reach stub X, backbone A should have hole punching information about
stub X. For instance, if backbone A transmits the packet to stub X via
router e, then router c would forward the packet to area 12, which would
then return the packet to router c via area 19. The packet would not loop
more than this once, but none-the-less it is clearly a non-optimal path.
This is a natural consequence of doing policy routing without specifying
the path adequately, and is not a bug with Pip per se. (Alternatively, to
eliminate the need for hole punching information in A's routers, X could
have two level 3 RH numbers under backbone 1. One number would
indicate entry via e, and the other entry via g. Each could alternate route
to the other in case of node or link crashes making the primary route
impossible.)
5.2.4 Comments on Header Size
Even with some policy in the RD, the Pip headers are still relatively
(compared to CLNP) small. For instance, assume that there are no more
than 1000 top level backbones, and that any hierarchy element has no
more than 1000 sub-elements. In this case, the largest RHF is 10 bits.
Therefore, the RHFs of Example 2.2 require only 82 bits, or 3 words
when padded out to 32-bit words. Including two 6-octet IDs, we get 6
words total (note that not all packets must include the IDs). This can
advantageously be compared to CLNP addresses, which require 10 words
(two 5-word addresses). The RHFs of Example 2.2, which have a decent
amount of policy information in them, require only 106 bits, or 4 words
when padded out (7 words when IDs are considered).
5.3 Example 3: Node-level Source Routing
Example 2.3 showed how Pip can do domain-level (or area-level) source
routing for policy routing. Other literature [Per2, Che, CG] suggests that
node-level source routing has advantages. In the case of Perlman, source
routing is used to make a network more robust. In the case of Cherition
(Sirpent) and Cidon (Paris), it is to speed up the forwarding process.
Perlman encodes node identifiers in the source route, Sirpent encodes
outgoing link identifiers, and Paris encodes self-routing switch codes.
Consider a case where a stub domain wished to use Perlman's byzantine
routing for internal communications, and to use normal hierarchical RH
Numbering for external communications. For external communications,
the RH numbering of Figure 6 would be used.
For internal communications, a separate RH numbering scheme is used.
In this scheme, each router is given an identifier, counting up from 1. For
instance, if a network had 500 routers, they would be numbered 1 through
500, and the RHF for each router would be 9 bits long. Each host would
have a number assigned by its connected router. Therefore, even if each
router had 500 hosts (for a total of 250,000 hosts), each RHF would still
be only 9 bits.
The RD would be composed as follows:
A separate LR value would be used to distinguish RH numbers in this
local scheme from hierarchical global RH numbers. Assuming 9 bits per
RHF, Pip can encode a source route of 18 hops plus two 6-octet IDs in the
same space required for two NSAP addresses.
5.4 Routing on a path identifier (or VCI number)
There are various advantages to setting up a dynamic path identifier rather
than sending full RH Numbering information in each packet. Because
part of the forwarding function is to modify the RHF, the RD can be used
as a path or virtual circuit identifier. It can also be used as a hierarchical
path identifier as with ATM cells.
It might be possible to use an option field in Pip to convey the information
necessary to setup a path.
5.5 Multicast Routing
Pip provides enormous potential for increasing the sophistication and
efficiency of multicast routing.
For instance, Pip can encode hierarchical multicast routing, where for
instance one level of the RH indicated a multicast at the backbone level,
while the next level down indicated multicast within a stub. This could be
used, for instance, to allow a backbone to view the various stub locations
of an international corporation as the group members of a single multicast
tree (a single upper level multicast RHF), while in fact the corporation
had many multicast groups (multiple lower level multicast RHFs).
Since different applications require different multicast trees (for instance,
applications that don't require smallest possible delays could get away
with a single multicast tree instead of multiple source-rooted multicast
trees), multiple multicast algorithms could run in parallel, with bits in the
LR Field distinguishing between them.
6.0 Transition from IP
This section outlines an approach for transitioning from IP to Pip.
I presume that the target architecture for Pip is backbone-oriented
hierarchical RH Numbers such as shown in Example 2.2. This RH
Number structure is essentially the same as what is proposed in RFC 1237
[CGC]. I don't see any reason to use geographically-oriented RH
Numbers, such as proposed by Deering [Ref?}, given that 1) the inter-
domain part of RH Numbers can be hidden from stubs, and 2) that with
Pip, it is straight-forward to take advantage of backbone-oriented RH
Numbers for policy routing. None-the-less, geographically-oriented RH
Numbers can be used with Pip, and so the issue remains open to debate.
Because the RH Numbers are semantically equivalent to RFC 1237
NSAPs, it should be possible to use the "CNAT" transition plan being
developed by Callon almost as is. The main difference is that Pip will be
used instead of CLNP, and RH Numbers will be used instead of NSAPs.
The transition, then, goes roughly as follows:
1. Start running Pip in the backbones.
a. Modify BGP carry RH Numbers. Once BGP has been modified
for general masks as currently planned (BGP4), it will be rela-
tively easy to add RH Numbers, as BGP4 will already have hole
punching capability.
b. An RH Number Authority (perhaps the same authority that as-
signs IP addresses, or perhaps the Internet Society) will assign
RH Numbers to backbones. On one hand, this will result in fewer
assignments than are currently done by the IP numbering authori-
ty, but on the other hand each assignment will require some
screening to insure that the recipient is a valid backbone.
2. Simultaneous with 1, populate border routers with mappings between
IP network number and corresponding RH Numbers (i.e., IP net num-
ber <=> RH backbone.stub). This is to allow for translation between
IP packets and Pip packets at the borders of stubs. These mappings
can be distributed using a new BGP attribute.
3. Simultaneous with 1, modify the DNS root servers to issue RH Num-
bers in addition to IP numbers.
4. One-by-one, modify intra-domain routing to use Pip. Because Pip can
use either the subnet/host model of IP or the area/host model of
CLNP, and because inter-domain routing information need not be
seen within stub domains, both IP and CLNP routing protocols can be
modified to carry Pip RH Numbers.
5. Simultaneous with 4, modify the stub DNS servers to issue RH Num-
bers in addition to IP numbers.
6. One-by-one, modify hosts to run Pip.
a. At the same time, higher layer protocols such as FTP or TCP that
encode IP addresses should be modified to either not require in-
ternet-layer identifiers, or to handle multiple types, including Pip
IDs. The TCP pseudo-header checksum could be made to include
the whole Pip ID.
b. While any host in a stub is an IP-only host, all Pip hosts should be
able to run IP, in order to talk to that host without translation, and
intra-domain routing must be able to handle IP or Pip.
c. Once a stub domain becomes pure Pip (no IP boxes), that stub do-
main should never have to translate Pip packets into IP packets.
The burden of all translations should be up to the stub that still
runs IP.
7.0 Further Work
Obviously there is a great deal of work to be done-detailed Pip
specification; specification of modifications required to existing protocols,
particularly routing but also DNS; development of a transition plan;
specification of configuration protocols; establishment of a Pip addressing
authority; and experimentation, among others.
While I don't expect anybody to buy completely into Pip based on this
paper alone, I hope that this paper convinces most that Pip is an
alternative worth expending considerable resources on.
REFERENCES
[BE] Breslau, L. and Estrin D., "Design of Inter-Administrative
Domain Routing Protocols", Proceedings of ACM
SIGCOMM `90, Philadelphia PA, September 1990
[Che] Cheriton, D.R., "Sirpent: A High-Performance
Internetworking Approach", Proceedings of ACM
SIGCOMM `89, Austin Texas, September 1989
[CG] Cidon, I., and Gopal, I., "Control Mechanisms for High-
Speed Networks", Proceedings of IEEE International
Conference on Communications `90, Atlanta Georgia,
April 1990
[Chi] Chiappa, J.N., "A New IP Routing and Addressing
Architecture", IETF Internet Draft, draft-chiappa-routing-
00.txt, available by anonymous FTP at nnsc.nsf.net.
[CGC] Collela, R., Gardner, E.P., Callon, R.W., "Guidelines for
OSI NSAP allocation in the internet", RFC-1237, USC/
Information Sciences Institute, July 1991.
[LS] Lepp, M., Steenstrup, M., "An Architecture for Inter-
domain Policy Routing", IETF Internet Draft, draft-
chiappa-routing-00.txt, available by anonymous FTP at
nnsc.nsf.net.
[OSI2] International Organization for Standardization ISO8473,
"Protocol for providing the Connectionless-mode
Network Service"
[OSI3] International Organization for Standardization ISO10589,
"Intermediate System to Intermediate System Intra-
Domain routeing exchange protocol for use in
Conjunction with the Protocol for providing the
Connectionless-mode Network Service (ISO 8473)"
[Per1] Perlman, R., "Incorporation of Service Classes into a
Network Architecture", Proceedings of the Seventh Data
Communications Symposium ACM SIGCOMM, Vol. 11,
No. 4, October 1981, pp. 204-210.
[Per2] Perlman, R., "Byzantine Routing", PhD Thesis,
Department of Computer Science, MIT, 19??.
[Tsu] Tsuchiya, P.F., "Scaling and Policy Routing using
Multiple Hierarchical Addresses," Proceedings of
SIGCOMM `91, Zurich, September 1991.
Appendix A: Forwarding Tables for Routers of
Each table shown is a Forwarding Table or Tunnel Table. The first line
gives the table label, followed by the criteria (LR.level, Tunnel, or
previous forwarding table) under which the table is accessed. No LR
Tables are shown, because the LR Table can be deduced from the criteria
that each forwarding table is labeled with.
Within the body of each table, the first column is the index into the table.
This index is either derived from the Tunnel or an RHF, depending on
which applies for the given table. There are skips in the index values. The
intervening index values are not shown when the corresponding network
components are not shown Figure 6. Normally, the forwarding tables are
well-packed, and all index values are represented.
The action taken after any table access is to either route to the next-hop
router, in which case the second column (next-hop) will have an entry, or
to access another table, in which case one or more of the three "next-level
or next-table" columns will have an entry. The next table chosen depends
on the meaning of the RHF Relator after the RHF field. The last column
(new-value) is the value written into either the Tunnel or the RHF field,
depending on which applies, upon transmission of the packet. In practice,
both the Tunnel value and RHF may be modified, but for these examples,
it is always only one or the other.
A plus (+) after any entry in these four columns means that the RHF
Offset should be incremented (either before transmitting the packet or
before accessing the next table). A blank entry simply means that the
circumstances under which the entry has been reached should not occur.
This may or may not result in an error message. An exclamation point "!"
after any entry means that the entry might validly be reached, but that an
error message should be sent. A "?" after any entry means that additional
checks will be made to determine if an error message is necessary (the
text will explain these as they are encountered). An entry or RH (in a
tunnel forwarding table) means to evaluate the RH from scratch.
Internet Draft -- Expires Nov. 20, 1992