Network Working Group                                    F. Templin, Ed.
Internet-Draft                                      Boeing Phantom Works
Intended status: Informational                         February 14, 2008
Expires: August 17, 2008


             Subnetwork Encapsulation and Adaptation Layer
                       draft-templin-seal-03.txt

Status of this Memo

   By submitting this Internet-Draft, each author represents that any
   applicable patent or other IPR claims of which he or she is aware
   have been or will be disclosed, and any of which he or she becomes
   aware will be disclosed, in accordance with Section 6 of BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as Internet-
   Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/ietf/1id-abstracts.txt.

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html.

   This Internet-Draft will expire on August 17, 2008.

Copyright Notice

   Copyright (C) The IETF Trust (2008).

Abstract

   Subnetworks are connected network regions bounded by border routers.
   These routers forward unicast and multicast packets over virtual
   links that are tunneled above another forwarding layer.  These
   virtual links span multiple IP- and/or sub-IP layer forwarding hops
   which may cross links with diverse Maximum Transmission Units (MTUs)
   and introduce packet duplication.  This document specifies a
   Subnetwork Encapsulation and Adaptation Layer (SEAL) that
   accommodates diverse underlying link technologies.



Templin                  Expires August 17, 2008                [Page 1]


Internet-Draft                    SEAL                     February 2008


Table of Contents

   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  3
   2.  Terminology and Requirements . . . . . . . . . . . . . . . . .  4
   3.  Applicability Statement  . . . . . . . . . . . . . . . . . . .  5
   4.  SEAL Protocol Specification  . . . . . . . . . . . . . . . . .  6
     4.1.  Model of Operation . . . . . . . . . . . . . . . . . . . .  6
     4.2.  Packetization  . . . . . . . . . . . . . . . . . . . . . .  8
       4.2.1.  Packet Size Considerations . . . . . . . . . . . . . .  8
       4.2.2.  Inner IPv4 Fragmentation . . . . . . . . . . . . . . .  9
       4.2.3.  SEAL Segmentation and Encapsulation  . . . . . . . . .  9
       4.2.4.  Sending Packets  . . . . . . . . . . . . . . . . . . . 11
     4.3.  Reassembly . . . . . . . . . . . . . . . . . . . . . . . . 12
       4.3.1.  Reassembly Buffer Requirements . . . . . . . . . . . . 12
       4.3.2.  IPv4-Layer Reassembly  . . . . . . . . . . . . . . . . 12
       4.3.3.  SEAL-Layer Reassembly  . . . . . . . . . . . . . . . . 13
       4.3.4.  Reassembly Integrity Checks  . . . . . . . . . . . . . 13
     4.4.  Generating Fragmentation Reports . . . . . . . . . . . . . 13
     4.5.  Receiving Fragmentation Reports  . . . . . . . . . . . . . 14
     4.6.  S-MSS Probing and Setting DF . . . . . . . . . . . . . . . 15
     4.7.  Processing ICMP PTBs . . . . . . . . . . . . . . . . . . . 16
   5.  Link Requirements  . . . . . . . . . . . . . . . . . . . . . . 16
   6.  End System Requirements  . . . . . . . . . . . . . . . . . . . 16
   7.  Router Requirements  . . . . . . . . . . . . . . . . . . . . . 17
   8.  IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 17
   9.  Security Considerations  . . . . . . . . . . . . . . . . . . . 17
   10. Acknowledgments  . . . . . . . . . . . . . . . . . . . . . . . 17
   11. References . . . . . . . . . . . . . . . . . . . . . . . . . . 18
     11.1. Normative References . . . . . . . . . . . . . . . . . . . 18
     11.2. Informative References . . . . . . . . . . . . . . . . . . 18
   Appendix A.  Historic Evolution of PMTUD (written 10/30/2002)  . . 20
   Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 21
   Intellectual Property and Copyright Statements . . . . . . . . . . 22


















Templin                  Expires August 17, 2008                [Page 2]


Internet-Draft                    SEAL                     February 2008


1.  Introduction

   For the purpose of this document, subnetworks are defined as
   connected network regions bounded by border routers.  Examples
   include the global Internet interdomain routing core, Mobile Ad Hoc
   Networks (MANETs) and enterprise networks.  These subnetworks are
   manifested as virtual links that may span many underlying networks
   and traditional IP subnets, e.g., in the internal organization of an
   enterprise network.

   Subnetwork border routers forward unicast and multicast packets over
   virtual links that span multiple IP- and/or sub-IP layer forwarding
   hops, which may traverse links with diverse Maximum Transmission
   Units (MTUs) and may also introduce packet duplication due to
   temporal or persistent routing loops.  It is also expected that these
   routers will support operation of the Internet protocols
   [RFC0791][RFC2460].

   As internet technology and communication has grown and matured, many
   techniques have developed that use virtual topologies (frequently
   tunnels of one form or another) over an actual IP network.  Those
   virtual topologies have elements which appear as one hop in the
   virtual topology, but are actually multiple IP or sub-IP layer hops.
   These multiple hops often have quite diverse properties which are
   often not even visible to the end-points of the virtual hop.  This
   introduces many failure modes that are not dealt with well in current
   approaches.

   The use of IP encapsulation has long been considered as an
   alternative for creating such virtual topologies.  However, the
   insertion of an outer IP header reduces the effective path MTU as-
   seen by the IP layer.  When IPv4 is used, this reduced MTU can be
   accommodated through the use of IPv4 fragmentation, but unmitigated
   in-the-network fragmentation has been shown to be harmful through
   operational experience and studies conducted over the course of many
   years [FRAG][FOLK][RFC2923][RFC4459][RFC4963].

   This document proposes a Subnetwork Encapsulation and Adaptation
   Layer (SEAL) for the operation of IP over subnetworks that connect
   routers with Ingress- and Egress Tunnel Endpoints (ITEs/ETEs).  SEAL
   supports simple and robust duplicate packet detection, and
   accommodates links with diverse MTUs.  SEAL introduces a new
   encapsulation format that differs from existing encapsulations
   primarily in that it enables a mid-layer segmentation and reassembly
   capability that is distinct from IP fragmentation.  This mid-layer
   segmentation allows an in-the-network cutting and pasting of packets
   that does not violate the IPv6 restriction of no in-the-network
   fragmentation, and also avoids the harmful effects of in-the-network



Templin                  Expires August 17, 2008                [Page 3]


Internet-Draft                    SEAL                     February 2008


   IPv4 fragmentation.  The SEAL protocol is specified in the following
   sections.


2.  Terminology and Requirements

   The term subnetwork in this document refers to a connected network
   region bounded by border routers that connect over a virtual link
   manifested through tunneling that appears as a fully-connected shared
   link, or a "virtual ethernet".  SEAL is the Subnetwork Encapsulation
   and Adaptation Layer that adapts this virtual ethernet to the
   underlying heterogeneous networking links and equipment.

   The terms "inner" and "outer" are used extensively throughout this
   document to respectively refer to the innermost {layer, protocol/
   header, packet, etc.} *before* any encapsulation, and the outermost
   {layer, protocol, header, packet etc.} *after* any encapsulation.
   Between these inner and outer layers, there may also be mid-layer
   encapsulations, including the SEAL encapsulation.  These mid-layer
   encapsulations are denoted as '*' (where '*' may signify NULL, a
   single mid-layer encapsulation, or multiple mid-layer
   encapsulations.)

   The notation IPvX/*/IPvY refers to an inner IPvX packet encapsulated
   in an outer IPvY header separated by any '*' mid-layer headers.  The
   notation "IP" means either IP protocol version (IPv4 or IPv6).

   The following abbreviations correspond to terms used within this
   document and elsewhere in common Internetworking nomenclature:

      Subnetwork - a connected network region that is bounded by border
      routers

      SEAL - Subnetwork Encapsulation and Adaptation Layer

      MANET - Mobile Ad-hoc Network

      VET - Virtual EThernet

      ITE - Ingress Tunnel Endpoint

      ETE - Egress Tunnel Endpoint

      MTU - Maximum Transmission Unit

      S-MSS - SEAL Maximum Segment Size





Templin                  Expires August 17, 2008                [Page 4]


Internet-Draft                    SEAL                     February 2008


      EMTU_R - Effective MTU to Receive

      PTB - an ICMPv6 "Packet Too Big" or an ICMPv4 "fragmentation
      needed" message

      DF - the IPv4 header Don't Fragment flag

      ENCAPS - the size of the outer encapsulating SEAL/*/IPv4 headers

      FRAGREP - a Fragmentation Report message

      SEAL packet - a segment of an inner packet encapsulated in outer
      SEAL/*/IPv4 headers

      SEAL ID - a 32-bit Identification value that is randomly
      initialized and monotonically incremented for each SEAL packet
      sent to an ETE

      Unfragmentable - an IPv4 packet with DF=1, or an IPv6 packet

   The keywords MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD,
   SHOULD NOT, RECOMMENDED, MAY, and OPTIONAL, when they appear in this
   document, are to be interpreted as described in [RFC2119].


3.  Applicability Statement

   SEAL inserts an additional mid-layer encapsulation when IP/*/IPv4
   encapsulation is used, and appears as a subnetwork encapsulation as
   seen by inner layers.

   While the SEAL approach was motivated by the specific use case of
   duplicate packet detection in MANETs, the domain of applicability is
   not limited to the MANET problem space and extends to other
   subnetwork uses such as tunneling across enterprise networks, the
   interdomain routing core, etc.

   SEAL can be used as a mid-layer encapsulation above an outer UDP/IPv4
   encapsulation, however the technique of concatenating the 16-bit SEAL
   ID Extension and the IPv4 ID (i.e., co-mingling the two identifier
   spaces) will not work when there are network address translators
   (NATs) in the path that may re-write the IPv4 ID, e.g., such as for
   the Teredo domain of applicability [RFC4380].  Moreover, it may not
   be possible to expect non-initial IPv4 fragments to pass through NATs
   and firewalls in all cases.  A variation of this proposal that
   maintains separate ID spaces for the SEAL ID and IPv4 ID and that
   operates in the presence of NATs and firewalls will be specified in a
   future version of this document.



Templin                  Expires August 17, 2008                [Page 5]


Internet-Draft                    SEAL                     February 2008


   The current document version speaks exclusively to the use case of
   encapsulation over IPv4 as the outer layer, however the same
   principles apply when IPv6 is the outer layer.  In-the-network
   fragmentation is not permitted for encapsulations over IPv6, however,
   so the "implicit" probing capabilities specified for IPv4 in this
   document are not available.  Still, encapsulations over IPv6 can use
   "explicit" probing as well as the same architectural concepts as
   specified for IPv4 herein.  A future version of this document will
   address the case of IPv6 as the outer encapsulation layer in more
   detail.

   For further study, SEAL may also be useful for "transport-mode"
   applications, e.g., when the inner packet encapsulates ordinary
   protocol data rather than an IP packet.


4.  SEAL Protocol Specification

4.1.  Model of Operation

   Ingres Tunnel Endpoints (ITEs) insert a SEAL header in the IP/*/
   IPv4-encapsulated packets they inject into a subnetwork, where the
   outermost IPv4 header contains the source and destination addresses
   of the ITE/ETE subnetwork entry/exit points, respectively.  SEAL
   defines a new IP protocol type and a new mid-layer encapsulation for
   both unicast and multicast inner packets.  The ITE inserts a SEAL
   header during encapsulation as shown in Figure 1:
























Templin                  Expires August 17, 2008                [Page 6]


Internet-Draft                    SEAL                     February 2008


                                      +-------------------------+
                                      |                         |
                                      ~   Outer */IPv4 headers  ~
                                      |                         |
                                      +-------------------------+
                                      +--     SEAL Header     --+
   +-------------------------+        +-------------------------+
   |                         |        |                         |
   ~ Any mid-layer * headers ~        ~ Any mid-layer * headers ~
   |                         |        |                         |
   +-------------------------+        +-------------------------+
   |                         |        |                         |
   ~        Inner IP         ~  --->  ~        Inner IP         ~
   ~         Packet          ~  --->  ~         Packet          ~
   |                         |        |                         |
   +-------------------------+        +-------------------------+
   |  Any mid-layer trailers |        |  Any mid-layer trailers |
   +-------------------------+        +-------------------------+
                                      |    Any outer trailers   |
                                      +-------------------------+

                       Figure 1: SEAL Encapsulation

   where the SEAL header is inserted as follows:

   o  For simple IP/IPv4 encapsulations (e.g.,
      [RFC2003][RFC2004][RFC4213]), the SEAL header is inserted between
      the inner IP and outer IPv4 headers as: IP/SEAL/IPv4.

   o  For tunnel-mode IPsec/ESP encapsulations over IPv4,
      [RFC4301][RFC4303], the SEAL header is inserted between the ESP
      and outer IPv4 headers as: IP/*/ESP/SEAL/IPv4.

   o  For IP encapsulations over transports such as UDP (e.g.,
      [I-D.farinacci-lisp]), the SEAL header is inserted immediately
      after the outer transport layer header, e.g., as IP/*/SEAL/UDP/
      IPv4.

   Encapsulation and tunneling establishes a virtual point-to-multipoint
   interface abstraction of the subnetwork.  From a logical viewpoint,
   this interface appears as a Virtual EThernet (VET)
   [I-D.templin-autoconf-dhcp] that connects the ITE to all ETEs in the
   subnetwork as single-hop neighbors.  From a physical perspective,
   however, packets sent over the VET interface may be forwarded across
   many IP and/or sub-IP layer subnetwork hops.

   SEAL-encapsulated packets include a 32-bit SEAL-ID formed from the
   concatenation of the 16-bit ID Extension field in the SEAL header as



Templin                  Expires August 17, 2008                [Page 7]


Internet-Draft                    SEAL                     February 2008


   the most-significant bits and with the 16-bit ID value in the outer
   IPv4 header as the least-significant bits.  Routers use the SEAL-ID
   for both duplicate packet detection within the subnetwork and also
   for multi-level segmentation and reassembly of large packets.

   SEAL enables a multi-level segmentation and reassembly capability.
   First, the ITE can use inner IPv4 fragmentation for fragmentable
   inner IPv4 packets before encapsulation to avoid lower-level
   segmentation and reassembly.  Secondly, the SEAL layer itself
   provides a simple mid-layer cutting-and-pasting of inner packets
   without incurring IPv4 fragmentation on the outer packet.  Finally,
   ordinary IPv4 fragmentation for the outer IPv4 packet after SEAL
   encapsulation is permitted under certain limited and carefully
   managed circumstances.

4.2.  Packetization

4.2.1.  Packet Size Considerations

   Due to the ubiquitous deployment of standard Ethernet and similar
   networking gear, the nominal Internet cell size has become 1500
   bytes; this is the de facto size that end systems have come to expect
   will be delivered by the network without loss due to an MTU
   restriction on the path, or a suitable ICMP PTB message returned.
   However, PTB messages are not delivered reliably, and any PTBs
   received could be erroneous or maliciously fabricated.  (Indeed, in
   the case of treating the global Internet interdomain routing core as
   a subnetwork, the PTB messages could come from anywhere in the
   Internet.)  The ITE therefore requires a means for conveying 1500
   byte (or smaller) original packets over the VET interface without
   loss due to link MTU restrictions and/or triggering PTB messages from
   within the subnetwork.

   In common deployments, there may be many forwarding hops between the
   source and the ITE.  Within those hops, there may be additional
   encapsulations (IPSec, L2TP, etc.) such that a 1500 byte original
   packet might grow to a larger size by the time it reaches the ITE.
   Similarly, additional encapsulations on the path from the ITE to the
   ETE could cause the packet to become larger still.  In order to
   preserve the end system expectation of delivery for 1500 byte and
   smaller packets, the ITE therefore requires a means for conveying
   this larger packet over the VET interface even though there may be
   subnetwork links that configure a smaller MTU.

   The ITE upholds the 1500-byte-and-smaller packet delivery expectation
   by instituting a SEAL Maximum Segment Size (S-MSS) variable
   (suggested default 1KB) and configurable within the range of [128 -
   2KB].  The ITE also institutes a segmentation region for packet sizes



Templin                  Expires August 17, 2008                [Page 8]


Internet-Draft                    SEAL                     February 2008


   [S-MSS - 2KB] such that all inner packets within this size range are
   segmented into multiple SEAL packets while avoiding in-the-network
   IPv4 fragmentation.

   The ITE additionally admits all inner packets larger than 2KB into
   the VET interface as single-segment SEAL packets under the assumption
   that original sources that send packets larger than 1500 bytes are
   using an end-to-end MTU determination capability such as specified in
   [RFC4821].

4.2.2.  Inner IPv4 Fragmentation

   The IPv4 layer of a subnetwork border router that configures an ITE
   fragments inner IPv4 packets larger than 2KB and with the IPv4 Don't
   Fragment (DF) bit set to 0 into IPv4 fragments no larger than the
   minimum of 2KB and S-MSS.  The IP layer then submits each inner IPv4
   fragment to the ITE as an independent IP packet for encapsulation.
   Note that inner fragmentation may not be available for certain ITE
   types, e.g., for tunnel-mode IPsec.  Any inner IPv4 fragments created
   in this fashion will be reassembled by the final destination.

   Inner IPv4 fragmentation is not performed for inner IPv4 packets
   larger than 2KB and with the DF bit set to 0.  Instead, these packets
   are encapsulated by the ITE and sent as single segments as discussed
   in the following section.

4.2.3.  SEAL Segmentation and Encapsulation

   After any inner IPv4 fragmentation, the ITE encapsulates IPv4
   packets/fragments no larger than 2KB in any mid-layer '*' headers,
   then performs SEAL segmentation on this inner packet based on a
   segment size that is likely to avoid IPv4 fragmentation within the
   subnetwork.  The ITE maintains S-MSS for each ETR, e.g., as per-ETR
   IPv4 destination cache soft state, including IPv4 multicast
   destinations.  S-MSS SHOULD be initialized to 1KB by default, and MAY
   be changed to different values in the [128 - 2KB] range based on
   static configuration and/or dynamic segment size probing.

   Note that this SEAL segmentation ignores the DF bit in the inner IPv4
   header or (in the case of IPv6) ignores the fact that the network is
   not permitted to perform IPv6 fragmentation, but this segmentation
   process is a mid-layer (not an IP layer) operation and is necessary
   to adapt the inner packet to the subnetwork path characteristics.
   Moreover, the inner packet will be restored to its original form when
   it is removed from the subnetwork by the ETE, therefore, the fact
   that the packet may have been segmented within the subnetwork is not
   observable outside of the subnetwork.




Templin                  Expires August 17, 2008                [Page 9]


Internet-Draft                    SEAL                     February 2008


   The ITE MUST NOT break unfragmentable inner packets larger than 2KB
   into smaller segments, but rather MUST encapsulate them as a single
   segment SEAL packet.  The ITE breaks inner packets no larger than 2KB
   into N segments (N <= 16) that are no larger than S-MSS bytes each,
   i.e., even if the inner packet is unfragmentable.  Each segment
   except the final one MUST be of equal length, while the final segment
   MAY be of different length.  The first byte of each segment MUST
   begin immediately after the final byte of the previous segment, i.e.,
   the segments MUST NOT overlap.

   The ITE encapsulates each segment in a SEAL header formatted
   according to the following figure:

       0                   1                   2                   3
       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |          ID Extension         |R|M|CTL|Segment|  Next Header  |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

                   Figure 2: Minimal SEAL Header Format

   where the header fields are defined as follows:

   ID Extension (16)
      a 16-bit extension of the 16-bit ID field in the outer IPv4
      header; encodes the most-significant 16 bits of a 32 bit SEAL-ID
      value.

   R (1)
      Reserved.

   M (1)
      the "More Segments" bit.  Set to 1 if this SEAL packet contains a
      non-final segment of a multi-segment inner packet.

   CTL (2)
      a 2-bit "Control" field that identifies the type of SEAL packet as
      follows:

      '00' - a Fragmentation Report (FRAGREP).

      '01' - a non-probe SEAL packet.

      '10' - an implicit probe.







Templin                  Expires August 17, 2008               [Page 10]


Internet-Draft                    SEAL                     February 2008


      '11' - an explicit probe.

   Segment (4)
      a 4-bit Segment number.  Encodes a segment number between 0 - 15.

   Next Header (8)  an 8-bit field that encodes an IP protocol number
      the same as for the IPv4 protocol and IPv6 next header fields.

   For N-segment inner packets (N <= 16), the ITE selects a SEAL header
   format (minimal or extended) and encapsulates each segment in a
   header of the same format with (M=1; Segment=0) for the first
   segment, (M=1; Segment=1) for the second segment, etc., with the
   final segment setting (M=0; Segment=N-1).  Note that single-segment
   inner packets instead set (M=0; Segment=0).

   During encapsulation, the ITE also sets CTL='01' in the SEAL header
   of each segment if this segment is not to be used as a probe.
   Otherwise, the ITE sets CTL='10' or '11 according to the type of
   probe (see: Section 4.6).

   The ITE next writes either the IP protocol number corresponding to
   the inner packet (minimal format) or the value zero (extended format)
   in 'Next Header A' in the SEAL header of each segment.  When extended
   format is used, the ITE also writes a 20-bit flow label value
   corresponding to the inner packet into the Flow Label field and
   writes the IP protocol number corresponding to the inner packet in
   'Next Header B'.  The ITE then encapsulates the segment in the
   requisite */IPv4 outer headers.

   The ITE maintains a 32-bit SEAL-ID value as per-ETE soft state, e.g.
   in the IPv4 destination cache.  The value is randomly-initialized
   when the soft state is created and monotonically incremented (modulo
   2^32) for each successive SEAL packet sent to the ETE.  For each SEAL
   packet, the ITE writes the least-significant 16 bits of the SEAL-ID
   value in the ID field in the outer IPv4 header, and writes the most-
   significant 16 bits in the ID Extension field in the SEAL header.

   The ITE finally sets other fields in the outer */IPv4 headers
   according to the specific encapsulation format (e.g., [RFC2003],
   [RFC4213], etc.).

4.2.4.  Sending Packets

   For inner packets larger than 2KB, the ITE determines whether the
   size of the packet plus the size of the SEAL/*/IPv4 encapsulation
   headers is larger than the MTU of the underlying interface over which
   the tunnel is configured.  If the packet is too large, the ITE
   discards it and sends an ICMP PTB message back to the original source



Templin                  Expires August 17, 2008               [Page 11]


Internet-Draft                    SEAL                     February 2008


   with an MTU value taken from the underlying interface minus the size
   of the encapsulating headers.  Otherwise, the ITE sets the Don't
   Fragment (DF) bit in the outer IPv4 header to DF=1.

   For inner packets that were no larger than 2KB before segmentation,
   the ITE sets DF=0 in the outer IPv4 header SEAL packets to be used as
   an implicit/explicit probes (as specified in Section 4.6) and MAY set
   DF=1 in the outer IPv4 header of other SEAL packets once the path has
   been probed.  After setting DF, the ITE SHOULD send all SEAL packets
   that encapsulate segments of the same inner packet into the VET
   interface in canonical order, i.e., Segment 0 first, then Segment 1,
   etc.

4.3.  Reassembly

4.3.1.  Reassembly Buffer Requirements

   ETEs MUST be capable of using IPv4-layer reassembly to reassemble
   SEAL packets of at least (2KB+ENCAPS) bytes, i.e., ETEs MUST
   configure an IPv4 Effective MTU to Receive (EMTU_R) of at least (2KB+
   ENCAPS).

   ETEs MUST also be capable of using SEAL-layer reassembly to
   reassemble inner packets of at least 2KB, i.e., ETEs MUST configure a
   SEAL EMTU_R of at least 2KB.

4.3.2.  IPv4-Layer Reassembly

   The ETE performs IPv4 reassembly as-normal, and maintains a
   conservative high- and low-water mark for the number of outstanding
   reassemblies pending for each ITE as is common for widely deployed
   implementations.  When the size of the reassembly buffer exceeds this
   high-water mark, the ETE actively discards incomplete reassemblies
   (e.g., using an Active Queue Management (AQM) strategy such as drop-
   eldest, Random Early Drop (RED), etc.) until the size falls below the
   low-water mark.

   After reassembly, the ETE either accepts or discards the reassembled
   SEAL packet based on the current status of the IPv4 reassembly cache
   (congested vs uncongested).  The choice of accepting/discarding a
   reassembly may also depend on the strength of the upper-layer
   integrity check if known (e.g., IPSec/ESP provides a strong upper-
   layer integrity check) and/or the volatility of the data (e.g.,
   multicast streaming audio/video).

   In the limiting case, the ETE may choose to discard all reassembled
   SEAL packets after sending Fragmentation Reports (see: Section 4.4).




Templin                  Expires August 17, 2008               [Page 12]


Internet-Draft                    SEAL                     February 2008


4.3.3.  SEAL-Layer Reassembly

   After any IPv4-layer reassembly, the ETE performs SEAL-layer
   reassembly for N-segment inner packets through simple in-order
   concatenation of the encapsulated segments from N consecutive SEAL
   packets.  These packets contain Segment numbers 0 through N-1, and
   with consecutive SEAL-ID values encoded in the 32-bit concatenation
   of the ID Extension field in the SEAL header and the ID field in the
   IPv4 header.  That is, for an N-segment inner packet, inner packet
   reassembly entails the concatenation of the segments from SEAL
   packets with (Segment 0, SEAL-ID i), followed by (Segment 1, SEAL-ID
   ((i + 1) mod 2^32)), etc. up to (Segment N-1, SEAL-ID ((i + N-1) mod
   2^32)).  This requires the ETE to maintain a cache of recently
   received SEAL packets for a hold time that would allow for reasonable
   inter-segment delays.

   Rather than set an absolute hold time, the ETE must actively discard
   any pending reassemblies that appear to have no opportunity for
   completion, e.g., when a considerable number of SEAL packets have
   been received before a packet that completes the pending reassembly
   has arrived.  This assumes that any packet reordering within the
   subnetwork will be on the order of a small number of positions and
   that any gross reordering will be short-lived in nature.

4.3.4.  Reassembly Integrity Checks

   TBD - a future version of this draft may specify an integrity check
   vector, inserted by the ITE during encapsulation and used by the ETE
   to detect packet splicing errors during IPv4 reassembly.  Such an
   integrity check capability is specified in [I-D.templin-inetmtu], and
   could lead to increased packet delivery ratios if used by SEAL.

4.4.  Generating Fragmentation Reports

   When the ETE has received at least the leading 128 bytes (or up to
   the end) of a SEAL packet that was delivered as multiple IPv4
   fragments and with CTL='1X" in the SEAL header, it generates a
   Fragmentation Report (FRAGREP) message to send back over the VET
   interface to the original source.  The ETE also generates a FRAGREP
   for any SEAL packet with CTL='11' even if the packet was not
   fragmented.

   When the IPv4 reassembly cache is congested, convergence time may be
   improved if the ETE generates the FRAGREP even before the entire SEAL
   packet has been reassembled, since congestion-related loss may cause
   some fragments to be lost.  The 128 byte FRAGREP size was chosen 1)
   to ensure that enough header bytes are included in order to provide
   sufficient information to the ITE, and 2) since RFC1812-compliant



Templin                  Expires August 17, 2008               [Page 13]


Internet-Draft                    SEAL                     February 2008


   routers that fragment are permitted to create the smallest fragment
   as the initial fragment but should minimize the number of fragments.
   Thus, by reassembling at least 128 bytes the ITE is likely to receive
   a large enough fragment to determine a reasonable S-MSS estimate.

   The ETE prepares the FRAGREP message by encapsulating the leading 128
   bytes of the fragmented SEAL packet in an outer SEAL/*/IPv4 header.
   The ETE sets the IPv4 length field in the encapsulated packet/
   fragment to the length of the largest IPv4 fragment received, i.e.,
   even if the largest fragment received was not the first fragment.

   The ETE next sets CTL='00' and Segment=0 in the SEAL header, and sets
   the fields of the outer */IPv4 headers according to the specific
   encapsulation type.  In particular, the ETE sets the destination
   address of the FRAGREP to the source address that was included in the
   IPv4 first fragment, and sets the source address of the FRAGREP to
   the destination address that was included in the IPv4 first fragment.
   If the destination address in the first fragment was multicast, the
   ETE instead sets the source address of the FRAGREP to an address
   assigned to the underlying IPv4 interface.

   The FRAGREP message has the following format:

   +-------------------------+
   |                         |
   ~   Outer */IPv4 headers  ~
   |                         |
   +-------------------------+
   |       SEAL Header       |
   |  (CTL='00', Segment=0)  |
   +-------------------------+
   |                         |
   ~ Up to 128 bytes of pkt, ~
   ~  with IPv4 len set to   ~
   | len of largest fragment |
   |                         |
   +-------------------------+

             Figure 3: Fragmentation Report (FRAGREP) Message

4.5.  Receiving Fragmentation Reports

   FRAGREP messages are carried in SEAL packets that set (CTL='00';
   Segment=0) in their SEAL headers.  When the ITE receives a potential
   FRAGREP message, it first verifies that the message was formatted
   correctly by the ETE (per Section 4.4) and confirms that the FRAGREP
   matches one of the implicit/explicit probes that it actually sent to
   the ETE by examining the encapsulated IPv4 fragment, e.g., by



Templin                  Expires August 17, 2008               [Page 14]


Internet-Draft                    SEAL                     February 2008


   examining the ID fields.  If the FRAGREP matches one of the ITE's
   explicit probes, the ITE advances its window of outstanding implicit
   probes.

   For a valid FRAGREP, if the length field in the encapsulated IPv4
   fragment contains a value larger than (128+ENCAPS), the ITE sets
   S-MSS for this ETE to this length minus ENCAPS; otherwise, it sets
   S-MSS = MIN(S-MSS/2, 128) .  This limited halving procedure accounts
   for the possibility that the ETE received the leading 128 bytes of
   the fragmented SEAL packet in IPv4 fragments that were significantly
   smaller than the path MTU.  In that case, convergence to an
   acceptable S-MSS size may require multiple iterations of sending SEAL
   packets and receiving FRAGREP messages in a manner that parallels
   classical path MTU discovery [RFC1191], albeit with all path MTU
   feedback coming from the ETE and not a network middlebox.  But, the
   limited halving procedure ensures that convergence will occur quickly
   even in extreme cases, while the correct MTU will normally be
   determined in a single iteration since routers that use IPv4
   fragmentation are recommended to produce the minimum number of
   fragments [RFC1812].

4.6.  S-MSS Probing and Setting DF

   When S-MSS is larger than 128, the ITE probes the path to the ETE to
   detect and dampen any in-the-network IPv4 fragmentation.  The ITE
   sets CTL='10' in the SEAL header and DF=0 in the outer IPv4 header of
   SEAL packets to be used as implicit probes and will receive FRAGREP
   messages from the ETE if any in-the-network fragmentation occurs.
   The ITE must also send explicit probes periodically to maintain a
   "window" of outstanding implicit probes.  This window allows the ITE
   validate any FRAGREPs it receives, since any FRAGREP received for an
   implicit probe that was sent prior to the last successful explicit
   probe, or at a later time than the next SEAL packet to be sent, must
   be invalid.

   The 32 bit SEAL-ID value reported in FRAGREP messages can be used as
   an index into the current implicit probe window.

   The ITE sends explicit probes by sending single-segment SEAL packets
   with CTL='11' in the SEAL header and DF=0 in the IPv4 header.  The
   ITE can also probe for larger S-MSS values by sending explicit probes
   with trailing padding added to create a 2KB probe.  When the ETE
   receives an explicit probe, it will return a FRAGREP message whether
   or not any in-the-network fragmentation occurs, which the ITE will
   process exactly as for any FRAGREP per Section 4.5.

   The ITE can optionally send intervening SEAL packets between explicit
   probing intervals as implicit probes by setting DF=0, or as classical



Templin                  Expires August 17, 2008               [Page 15]


Internet-Draft                    SEAL                     February 2008


   path MTU discovery probes by setting DF=1.  The choice of setting
   DF=0/1 is based on the subnetwork trust basis for receiving ICMP PTB
   messages, as discussed in Section 4.7.

   When S-MSS=128, the ITE MUST set CTL='01' in the SEAL header of each
   SEAL packet that is not being used as an explicit probe such that the
   ETE will not generate FRAGREPs for unavoidable in-the-network
   fragmentation.

4.7.  Processing ICMP PTBs

   The ITE may receive ICMP PTB messages in response to any packets that
   were admitted into the VET interface with DF=1.  The ITE may
   optionally ignore, log, or honor the messages according to the
   subnetwork trust basis.  For example, ITEs connected to subnetworks
   managed under a single administrative domain may be configured to
   honor ICMP PTBs while ITEs connected to the global interdomain
   routing core may be configured to ignore/log them.

   When ICMP PTBs are honored, the ITE:

   o  SHOULD send translated ICMP PTB messages back to the original
      source (if possible) for ICMP PTBs that correspond to SEAL packets
      that encapsulate a segment larger than 2KB.

   o  SHOULD treat ICMP PTBs that correspond to SEAL packets that
      encapsulate segments no larger than 2KB as an indication to resume
      probing.


5.  Link Requirements

   Subnetwork designers are strongly encouraged to follow the
   recommendations in [RFC3819] when configuring link MTUs.


6.  End System Requirements

   SEAL is a router-to-router encapsulation protocol and therefore makes
   no requirements for end systems.  However, end systems that send
   unfragmentable IP packets of 1501 bytes or larger are strongly
   encouraged to use Packetization Layer Path MTU Discovery per
   [RFC4821], since the network may not always be able to return useful
   ICMP PTB messages.







Templin                  Expires August 17, 2008               [Page 16]


Internet-Draft                    SEAL                     February 2008


7.  Router Requirements

   IPv4 routers observe the requirements in [RFC1812].


8.  IANA Considerations

   A new IP protocol number for the SEAL protocol is requested.

   A new IPv4 site-scoped ALL_MANET_ROUTERS multicast group is
   requested.


9.  Security Considerations

   Unlike IPv4 fragmentation, overlapping fragment attacks are not
   possible due to the requirement that SEAL segments be non-
   overlapping.

   An amplification/reflection attack is possible when an attacker sends
   spoofed IPv4 fragments to an ETE, resulting in a stream of FRAGREP
   messages returned to a victim ITE.  The encapsulated segment of the
   spoofed IPv4 fragment provides mitigation for the ITE to detect and
   discard spurious FRAGREPs.

   The SEAL header is sent in-the-clear (outside of any IPsec/ESP
   encapsulations) the same as for the IPv4 header.  As for IPv6
   extension headers, the SEAL header is protected only by L2 integrity
   checks, and is not covered under any L3 integrity checks.


10.  Acknowledgments

   Path MTU determination through the report of fragmentation
   experienced by the final destination was first proposed by Charles
   Lynn of BBN on the TCP-IP mailing list in May 1987.  An historical
   analysis of the evolution of path MTU discovery appears in
   http://www.tools.ietf.org/html/draft-templin-v6v4-ndisc-01 and is
   reproduced in Appendix A of this document.

   This work was inspired in part by discussions on the IETF MANET and
   IRTF RRG mailing lists in the 12/07 - 01/08 timeframe, and the author
   acknowledges those who participated in the discussions.  The work
   also draws on the earlier investigations of [I-D.templin-inetmtu]
   which acknowledges many who contributed to the effort.

   Jari Arkko and Joel Halpern provided useful comments that improved
   the document.



Templin                  Expires August 17, 2008               [Page 17]


Internet-Draft                    SEAL                     February 2008


11.  References

11.1.  Normative References

   [RFC0791]  Postel, J., "Internet Protocol", STD 5, RFC 791,
              September 1981.

   [RFC1812]  Baker, F., "Requirements for IP Version 4 Routers",
              RFC 1812, June 1995.

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119, March 1997.

   [RFC2460]  Deering, S. and R. Hinden, "Internet Protocol, Version 6
              (IPv6) Specification", RFC 2460, December 1998.

11.2.  Informative References

   [FOLK]     C, C., D, D., and k. k, "Beyond Folklore: Observations on
              Fragmented Traffic", December 2002.

   [FRAG]     Kent, C. and J. Mogul, "Fragmentation Considered Harmful",
              October 1987.

   [I-D.farinacci-lisp]
              Farinacci, D., "Locator/ID Separation Protocol (LISP)",
              draft-farinacci-lisp-05 (work in progress), November 2007.

   [I-D.ietf-manet-smf]
              Macker, J. and S. Team, "Simplified Multicast Forwarding
              for MANET", draft-ietf-manet-smf-06 (work in progress),
              November 2007.

   [I-D.templin-autoconf-dhcp]
              Templin, F., Russert, S., and S. Yi, "MANET
              Autoconfiguration", draft-templin-autoconf-dhcp-11 (work
              in progress), February 2008.

   [I-D.templin-inetmtu]
              Templin, F., "Simple Protocol for Robust IP/*/IP Tunnel
              Endpoint MTU Determination  (sprite-mtu)",
              draft-templin-inetmtu-06 (work in progress),
              November 2007.

   [MTUDWG]   "IETF MTU Discovery Working Group mailing list,
              gatekeeper.dec.com/pub/DEC/WRL/mogul/mtudwg-log, November
              1989 - February 1995.".




Templin                  Expires August 17, 2008               [Page 18]


Internet-Draft                    SEAL                     February 2008


   [RFC1063]  Mogul, J., Kent, C., Partridge, C., and K. McCloghrie, "IP
              MTU discovery options", RFC 1063, July 1988.

   [RFC1191]  Mogul, J. and S. Deering, "Path MTU discovery", RFC 1191,
              November 1990.

   [RFC1981]  McCann, J., Deering, S., and J. Mogul, "Path MTU Discovery
              for IP version 6", RFC 1981, August 1996.

   [RFC2003]  Perkins, C., "IP Encapsulation within IP", RFC 2003,
              October 1996.

   [RFC2004]  Perkins, C., "Minimal Encapsulation within IP", RFC 2004,
              October 1996.

   [RFC2923]  Lahey, K., "TCP Problems with Path MTU Discovery",
              RFC 2923, September 2000.

   [RFC3819]  Karn, P., Bormann, C., Fairhurst, G., Grossman, D.,
              Ludwig, R., Mahdavi, J., Montenegro, G., Touch, J., and L.
              Wood, "Advice for Internet Subnetwork Designers", BCP 89,
              RFC 3819, July 2004.

   [RFC4213]  Nordmark, E. and R. Gilligan, "Basic Transition Mechanisms
              for IPv6 Hosts and Routers", RFC 4213, October 2005.

   [RFC4301]  Kent, S. and K. Seo, "Security Architecture for the
              Internet Protocol", RFC 4301, December 2005.

   [RFC4303]  Kent, S., "IP Encapsulating Security Payload (ESP)",
              RFC 4303, December 2005.

   [RFC4380]  Huitema, C., "Teredo: Tunneling IPv6 over UDP through
              Network Address Translations (NATs)", RFC 4380,
              February 2006.

   [RFC4459]  Savola, P., "MTU and Fragmentation Issues with In-the-
              Network Tunneling", RFC 4459, April 2006.

   [RFC4821]  Mathis, M. and J. Heffner, "Packetization Layer Path MTU
              Discovery", RFC 4821, March 2007.

   [RFC4963]  Heffner, J., Mathis, M., and B. Chandler, "IPv4 Reassembly
              Errors at High Data Rates", RFC 4963, July 2007.

   [TCP-IP]   "TCP-IP mailing list archives,
              http://www-mice.cs.ucl.ac.uk/multimedia/mist/tcpip, May
              1987 - May 1990.".



Templin                  Expires August 17, 2008               [Page 19]


Internet-Draft                    SEAL                     February 2008


Appendix A.  Historic Evolution of PMTUD (written 10/30/2002)

   The topic of Path MTU discovery (PMTUD) saw a flurry of discussion
   and numerous proposals in the late 1980's through early 1990.  The
   initial problem was posed by Art Berggreen on May 22, 1987 in a
   message to the TCP-IP discussion group [TCP-IP].  The discussion that
   followed provided significant reference material for [FRAG].  An IETF
   Path MTU Discovery Working Group [MTUDWG] was formed in late 1989
   with charter to produce an RFC.  Several variations on a very few
   basic proposals were entertained, including:

   1.  Routers record the PMTUD estimate in ICMP-like path probe
       messages (proposed in [FRAG] and later [RFC1063])

   2.  The destination reports any fragmentation that occurs for packets
       received with the "RF" (Report Fragmentation) bit set (Steve
       Deering's 1989 adaptation of Charles Lynn's Nov. 1987 proposal)

   3.  A hybrid combination of 1) and Charles Lynn's Nov. 1987 proposal
       (straw RFC draft by McCloughrie, Fox and Mogul on Jan 12, 1990)

   4.  Combination of the Lynn proposal with TCP (Fred Bohle, Jan 30,
       1990)

   5.  Fragmentation avoidance by setting "IP_DF" flag on all packets
       and retransmitting if ICMPv4 "fragmentation needed" messages
       occur (Geof Cooper's 1987 proposal; later adapted into [RFC1191]
       by Mogul and Deering).

   Option 1) seemed attractive to the group at the time, since it was
   believed that routers would migrate more quickly than hosts.  Option
   2) was a strong contender, but repeated attempts to secure an "RF"
   bit in the IPv4 header from the IESG failed and the proponents became
   discouraged. 3) was abandoned because it was perceived as too
   complicated, and 4) never received any apparent serious
   consideration.  Proposal 5) was a late entry into the discussion from
   Steve Deering on Feb. 24th, 1990.  The discussion group soon
   thereafter seemingly lost track of all other proposals and adopted
   5), which eventually evolved into [RFC1191] and later [RFC1981].

   In retrospect, the "RF" bit postulated in 2) is not needed if a
   "contract" is first established between the peers, as in proposal 4)
   and a message to the MTUDWG mailing list from jrd@PTT.LCS.MIT.EDU on
   Feb 19. 1990.  These proposals saw little discussion or rebuttal, and
   were dismissed based on the following the assertions:






Templin                  Expires August 17, 2008               [Page 20]


Internet-Draft                    SEAL                     February 2008


   o  routers upgrade their software faster than hosts

   o  PCs could not reassemble fragmented packets

   o  Proteon and Wellfleet routers did not reproduce the "RF" bit
      properly in fragmented packets

   o  Ethernet-FDDI bridges would need to perform fragmentation (i.e.,
      "translucent" not "transparent" bridging)

   o  the 16-bit IP_ID field could wrap around and disrupt reassembly at
      high packet arrival rates

   The first four assertions, although perhaps valid at the time, have
   been overcome by historical events leaving only the final to
   consider.  But, [FOLK] has shown that IP_ID wraparound simply does
   not occur within several orders of magnitude the reassembly timeout
   window on high-bandwidth networks.

   (Authors 2/11/08 note: this final point was based on a loose
   interpretation of [FOLK], and is more accurately addressed in
   [RFC4963].)


Author's Address

   Fred L. Templin (editor)
   Boeing Phantom Works
   P.O. Box 3707
   Seattle, WA  98124
   USA

   Email: fltemplin@acm.org


















Templin                  Expires August 17, 2008               [Page 21]


Internet-Draft                    SEAL                     February 2008


Full Copyright Statement

   Copyright (C) The IETF Trust (2008).

   This document is subject to the rights, licenses and restrictions
   contained in BCP 78, and except as set forth therein, the authors
   retain all their rights.

   This document and the information contained herein are provided on an
   "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
   OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND
   THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS
   OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF
   THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
   WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.


Intellectual Property

   The IETF takes no position regarding the validity or scope of any
   Intellectual Property Rights or other rights that might be claimed to
   pertain to the implementation or use of the technology described in
   this document or the extent to which any license under such rights
   might or might not be available; nor does it represent that it has
   made any independent effort to identify any such rights.  Information
   on the procedures with respect to rights in RFC documents can be
   found in BCP 78 and BCP 79.

   Copies of IPR disclosures made to the IETF Secretariat and any
   assurances of licenses to be made available, or the result of an
   attempt made to obtain a general license or permission for the use of
   such proprietary rights by implementers or users of this
   specification can be obtained from the IETF on-line IPR repository at
   http://www.ietf.org/ipr.

   The IETF invites any interested party to bring to its attention any
   copyrights, patents or patent applications, or other proprietary
   rights that may cover technology that may be required to implement
   this standard.  Please address the information to the IETF at
   ietf-ipr@ietf.org.


Acknowledgment

   Funding for the RFC Editor function is provided by the IETF
   Administrative Support Activity (IASA).





Templin                  Expires August 17, 2008               [Page 22]