Network Working Group                                    F. Templin, Ed.
Internet-Draft                                      Boeing Phantom Works
Intended status: Informational                        September 14, 2007
Expires: March 17, 2008


      Packetization Layer Path MTU Discovery for IP/*/IPv4 Tunnels
                      draft-templin-inetmtu-00.txt

Status of this Memo

   By submitting this Internet-Draft, each author represents that any
   applicable patent or other IPR claims of which he or she is aware
   have been or will be disclosed, and any of which he or she becomes
   aware will be disclosed, in accordance with Section 6 of BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as Internet-
   Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/ietf/1id-abstracts.txt.

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html.

   This Internet-Draft will expire on March 17, 2008.

Copyright Notice

   Copyright (C) The IETF Trust (2007).

Abstract

   The nominal Maximum Transmission Unit (MTU) MTU of the Internet has
   become 1500 bytes, but existing IP/*/IPv4 tunneling mechanisms impose
   an encapsulation overhead that can reduce the effective path MTU to
   smaller values.  Additionally, existing IP/*/IPv4 tunneling
   mechanisms are limited in their ability to discover and utilize
   larger MTUs.  This document specifies new mechanisms for conveying
   packets over IP/*/IPv4 tunnels that address these issues.




Templin                  Expires March 17, 2008                 [Page 1]


Internet-Draft            PLPMTUD  for Tunnels            September 2007


Table of Contents

   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  3
   2.  Terminology  . . . . . . . . . . . . . . . . . . . . . . . . .  3
   3.  Concept of Operation . . . . . . . . . . . . . . . . . . . . .  4
   4.  Tunnel MTU and EMTU_R  . . . . . . . . . . . . . . . . . . . .  4
   5.  Tunnel Soft State  . . . . . . . . . . . . . . . . . . . . . .  5
   6.  Sending Packets  . . . . . . . . . . . . . . . . . . . . . . .  5
     6.1.  Conceptual Sending Algorithm . . . . . . . . . . . . . . .  6
     6.2.  Inner packet Fragmentation . . . . . . . . . . . . . . . .  7
     6.3.  Encapsulation  . . . . . . . . . . . . . . . . . . . . . .  7
       6.3.1.  Footer . . . . . . . . . . . . . . . . . . . . . . . .  7
       6.3.2.  Checksum Calculation . . . . . . . . . . . . . . . . .  8
       6.3.3.  Data, Probe Request, and Probe Solicitation Format . .  9
       6.3.4.  Probe Reply Format . . . . . . . . . . . . . . . . . .  9
     6.4.  Outer Packet Fragmentation . . . . . . . . . . . . . . . . 11
     6.5.  Setting DF in the Outer Header . . . . . . . . . . . . . . 11
     6.6.  Window Management  . . . . . . . . . . . . . . . . . . . . 11
   7.  Receiving Packets  . . . . . . . . . . . . . . . . . . . . . . 11
     7.1.  Decapsulation  . . . . . . . . . . . . . . . . . . . . . . 11
     7.2.  Receiving Packet Too Big (PTB) Errors  . . . . . . . . . . 12
   8.  Tunnel Qualification and Soft State Management . . . . . . . . 12
     8.1.  Probe Requests . . . . . . . . . . . . . . . . . . . . . . 12
       8.1.1.  Sending Probe Requests . . . . . . . . . . . . . . . . 12
       8.1.2.  Receiving Probe Requests . . . . . . . . . . . . . . . 13
     8.2.  Probe Solicitations  . . . . . . . . . . . . . . . . . . . 13
       8.2.1.  Sending Probe Solicitations  . . . . . . . . . . . . . 13
       8.2.2.  Receiving Probe Solicitations  . . . . . . . . . . . . 13
     8.3.  Probe Replies  . . . . . . . . . . . . . . . . . . . . . . 14
       8.3.1.  Sending Probe Replies  . . . . . . . . . . . . . . . . 14
       8.3.2.  Receiving Probe Replies  . . . . . . . . . . . . . . . 15
   9.  8-bit Fletcher Checksum Calculation  . . . . . . . . . . . . . 16
   10. Updated Specifications . . . . . . . . . . . . . . . . . . . . 16
   11. IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 17
   12. Security Considerations  . . . . . . . . . . . . . . . . . . . 17
   13. Acknowledgments  . . . . . . . . . . . . . . . . . . . . . . . 17
   14. References . . . . . . . . . . . . . . . . . . . . . . . . . . 18
     14.1. Normative References . . . . . . . . . . . . . . . . . . . 18
     14.2. Informative References . . . . . . . . . . . . . . . . . . 18
   Appendix A.  Discussion  . . . . . . . . . . . . . . . . . . . . . 19
   Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 19
   Intellectual Property and Copyright Statements . . . . . . . . . . 20









Templin                  Expires March 17, 2008                 [Page 2]


Internet-Draft            PLPMTUD  for Tunnels            September 2007


1.  Introduction

   The nominal Maximum Transmission Unit (MTU) of today's Internet has
   become 1500 bytes due to the preponderance of networking gear that
   configures an MTU of that size.  Since not all links in the Internet
   configure a 1500 byte MTU, however [RFC3819], packets can be dropped
   due to an MTU restriction on the path.  Internet Protocol, Version 4
   (IPv4) [RFC0791] is the predominant network layer protocol in the
   Internet today, and it is likely that IPv4 use will continue to grow
   into the future.  It is therefore essential that tunnels over IPv4
   (hereafter called IP/*/IPv4 tunnels) be made capable of consistent
   and efficient handling of packets of various sizes.

   Upper layers see IP/*/IPv4 tunnels as ordinary links, but even for
   packets no larger than 1500 bytes these links are susceptible to
   silent loss (e.g., due to path MTU restrictions, lost error messages,
   layered encapsulations, reassembly buffer limitations, etc.)
   resulting in poor performance and/or communications failures
   [RFC2923][RFC4459][RFC4821][RFC4963].

   This document specifies new mechanisms for IP/*/IPv4 tunnels that
   assure robust handling for packets of various sizes; it updates the
   functional specifications for Tunnel Endpoints (TEs) found in
   existing IP/*/IPv4 tunneling mechanisms (see: Section 10).


2.  Terminology

   The following abbreviations and terms are used in this document:

      DF - the IPv4 header "Don't Fragment" flag ([RFC0791], Section
      3.1).

      EMTU_R - Effective MTU to Receive ([RFC1122], Section 3.3.2).

      ENCAPS - the size of the encapsulating */IPv4 headers plus
      trailers.

      IPv4 - Internet Protocol, Version 4

      IPv6 - Internet Protocol, Version 6

      MaxFragSize - Maximum Fragment Size

      MaxPktSize - Maximum Packet Size

      ReassTime - Reassembly Timeout




Templin                  Expires March 17, 2008                 [Page 3]


Internet-Draft            PLPMTUD  for Tunnels            September 2007


      MTU - Maximum Transmission Unit

      PTB - Packet Too Big error

      TE - Tunnel Endpoint

      TFE - Tunnel Far End

      TNE - Tunnel Near End

      IP/*/IPv4 - an IP packet encapsulated in */IPv4 headers (e.g. for
      "*" = NULL, UDP, TCP, AH, ESP, etc.).

      inner packet/header/payload - an IP packet/header/payload before
      IP/*/IPv4 encapsulation.

      outer packet/header/payload - a */IPv4 packet/header/payload after
      IP/*/IPv4 encapsulation.


3.  Concept of Operation

   TEs that implement this scheme engage in a continuous handshaking
   process to confirm that the TFE is participating and to maintain soft
   state used for determining maximum packet sizes.  When one or both of
   the TEs do not implement the scheme, the behavior automatically
   reverts to that of the legacy IP/*/IPv4 tunneling mechanism.


4.  Tunnel MTU and EMTU_R

   TEs configure an indefinite MTU on the tunnel interface, i.e., there
   is no logical limit on the size of inner packets that upper layers
   can present to the tunnel interface.  Note that since the maximum
   length IPv4 packet is 65535 bytes (64KB-1) the practical maximum
   length inner packet that can traverse the tunnel is 65515 bytes
   (65535 - ENCAPS, for a 20 byte minimum-sized encapsulating IPv4
   header with no trailers).

   TEs MUST configure an EMTU_R that is no smaller than 2048 bytes (2KB)
   on all IPv4 interfaces over which a tunnel interface is configured.
   Additionally, they MUST configure an EMTU_R that is no smaller than
   2KB on the tunnel interface, and SHOULD configure an EMTU_R that is
   no smaller than the largest EMTU_R of any IPv4 interfaces over which
   the tunnel is configured.






Templin                  Expires March 17, 2008                 [Page 4]


Internet-Draft            PLPMTUD  for Tunnels            September 2007


5.  Tunnel Soft State

   TEs maintain the following per-TFE conceptual variables as soft state
   (e.g., in a conceptual neighbor cache):

   MaxFragSize
      the current maximum-sized outer packet/fragment that can be
      accommodated by the IPv4 path MTU without further fragmentation.
      Recommended default value: 128 bytes.  Range: 68 bytes to 64KB.

   MaxPktSize
      the current maximum-sized inner packet/fragment that the TFE can
      reassemble over the tunnel, i.e., the EMTU_R. Recommended default
      value: the minimum EMTU_R defined for the specific IP/*/IPv4
      tunneling mechanism (e.g., 1500 bytes for [RFC4213]).  Range: 576
      bytes to (2^32-1) bytes.

   ReassTime
      the current timeout value that the TFE uses for reassembly of
      fragmented packets that traverse the tunnel.  Recommended default
      value: 120 seconds.  Range: 4uSec to 4*(2^32)usec (~4.55hr).

   IPv4Id
      the current IPv4 ID value that the TE will assign in the outer
      IPv4 header of packets it sends into the tunnel.  Initial value:
      randomly chosen.  Range: 0 to 2^16-1.

   isQualified
      boolean indicating whether the TFE implements the scheme.
      Recommended default value: FALSE.

   isNAT
      boolean indicating whether there is an IPv4 Network Address
      Translator (NAT) on the path to the TFE.  Default value: TRUE or
      FALSE, based on the specific IP/*/IPv4 tunneling mechanism.

   See: [RFC3819], Section 2 for subnetwork MTU recommendations that
   influence 'MaxFragSize'.

   See: [RFC1122], Section 3.3.2 for EMTU_R and reassembly timeout
   recommendations.


6.  Sending Packets

   TEs send packets across a tunnel to the TFE according to the
   following specifications:




Templin                  Expires March 17, 2008                 [Page 5]


Internet-Draft            PLPMTUD  for Tunnels            September 2007


6.1.  Conceptual Sending Algorithm

   With reference to Sections 6.2 - 6.6, TEs use the following
   conceptual sending algorithm:

        if inner packet is larger than 'MaxPktSize' and inner packet
          is not fragmentable (see: Section 6.2)
            Send PTB appropriate to the inner protocol
            (e.g., an ICMPv6 PTB [RFC1981]) with MTU = 'MaxPktSize'.
            Drop packet.
        else
            if 'isNAT' and inner packet is not a probe used for
              'MaxFragSize' determination
                if inner packet larger than 2*('MaxFragSize' - ENCAPS)
                  and inner packet not fragmentable (see: Section 6.2)
                    Send PTB appropriate to the inner protocol
                    with MTU = 2*('MaxFragSize' - ENCAPS)).
                    Drop packet.
                else
                    Fragment inner packet into fragments no larger
                    than MIN('MaxPktSize', 2*('MaxFragSize' - ENCAPS))
                    (see: Section 6.2).
                endif
            else
                Fragment inner packet into fragments no larger than
                'MaxPktSize' (see: Section 6.2).
            endif
            foreach inner packet/fragment
                Encapsulate as an outer IPv4 packet (see: Section 6.3).
                if outer packet is not a probe used for 'MaxFragSize'
                  determination
                    fragment outer packet into fragments no larger than
                    'MaxFragSize' (see: Section 6.4).
                endif
                foreach outer packet/fragment
                    Set DF in the outer header according to Section 6.5.
                    Send fragment subject to window restrictions
                    (see: Section 6.6).
                endforeach
            endforeach
        endif

                  Figure 1: Conceptual Sending Algorithm








Templin                  Expires March 17, 2008                 [Page 6]


Internet-Draft            PLPMTUD  for Tunnels            September 2007


6.2.  Inner packet Fragmentation

   An inner packet is not fragmentable IFF the TE is not permitted to
   break it into inner fragments before encapsulation, e.g., an IPv6
   packet without a fragment header, an IPv4 packet with DF=1, etc.

   TEs break fragmentable inner packets into inner fragments of no more
   than 'MaxPktSize' bytes when 'isNAT' is FALSE and no more than
   MIN('MaxPktSize', 2*('MaxFragSize' - ENCAPS)) bytes when 'isNAT' is
   TRUE.  The TE then encapsulates each inner fragment per Section 6.3.
   These inner fragments will be reassembled by the final destination.

   When 'isNAT' is TRUE, 2*('MaxFragSize' - ENCAPS) may not be large
   enough to accommodate the minimum IPv6 MTU such that the TE may be
   required to drop an IPv6 packet smaller than 1280 bytes and send an
   ICMPv6 PTB with an MTU value less than 1280 bytes.  The original IPv6
   source will then include a fragment header in subsequent IPv6 packets
   and the TE can then perform IPv6 fragmentation on these inner packets
   using the fragment header included by the source according to the
   final paragraph of [RFC2460], Section 5.

6.3.  Encapsulation

   TEs encapsulate inner IP packets according to the specific IP/*/IPv4
   document except that the TE maintains a randomly-initialized and
   monotonically-increasing (modulo 64K) per-TFE 'IPv4Id' value that it
   encodes in the outer IPv4 headers of successive encapsulated packets.

   The TE also appends trailing data to packets during encapsulation as
   specified below and increments the outer IPv4 header length by the
   number of trailing data bytes added.  The following sections specify
   the trailing data formats and encapsulation procedures:

6.3.1.  Footer

   The TE adds the following 4-byte footer as the final 4 bytes of the
   trailing data.  The footer is byte-aligned only, and need not be
   aligned on an even word/longword/etc. boundary:

        0                   1                   2                   3
        0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |Version| Type  |   Reserved    | Fletcher A    |  Fletcher B   |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

                          Figure 2: Footer Format

   where the fields of the footer are specified as follows:



Templin                  Expires March 17, 2008                 [Page 7]


Internet-Draft            PLPMTUD  for Tunnels            September 2007


   Version (4 bits)
      The Version field indicates the format of the trailing data.  This
      document describes version 1.

   Type (4 bits)
      The type of encapsulated packet.  The following types are defined:

      0 - Ordinary data packet.

      1 - Probe Request (see: Section 8.1).

      2 - Probe Solicitation (see: Section 8.2).

      3 - Probe Reply (see: Section 8.3).

      4 - 15 - Reserved for future use.

   Reserved (8 bits)
      Reserved for future use.

   Fletcher A (8 bits)
      The 8-bit Fletcher A checksum component.

   Fletcher B (8 bits)
      The 8-bit Fletcher B checksum component.

6.3.2.  Checksum Calculation

   The TE MUST include a non-zero trailing checksum in the footer of all
   probe packets (types 1 through 3) and MUST include a non-zero
   trailing checksum in the footer of data packets (type 0) when 'isNAT'
   is TRUE.  The TE MAY include a zero trailing checksum in the footer
   of data packets when 'isNAT' is FALSE.

   For probe reply packets, the TE appends zero-filled padding bytes as
   necessary to extend the packet to a minimum of 50 bytes beyond the
   beginning of the inner IP header.  The TE then appends a 14 byte
   control block as specified in Section 6.3.4.  For all other packets
   that will include a non-zero trailing checksum, the TE appends zero-
   filled padding bytes as necessary to extend the packet to a minimum
   of 64 bytes beyond the beginning of the inner IP header.  The TE then
   calculates the 8-bit Fletcher checksum as specified in Section 9 and
   encodes the results in the Fletcher A and B fields of the footer.

   The footer is appended as the final 4 bytes of the trailing data, as
   specified in the following sections.





Templin                  Expires March 17, 2008                 [Page 8]


Internet-Draft            PLPMTUD  for Tunnels            September 2007


6.3.3.  Data, Probe Request, and Probe Solicitation Format

   The TE uses the following packet format for data, probe request, and
   probe solicitation packets (types 0 through 2):
                          +---------------------------------+
                          |           Outer IPv4            |
                          |        Header w/'IPv4Id'        |
                          +---------------------------------+
                          |            * Headers            |
                          |                                 |
   +-------------+        +---------------------------------+
   |   Inner IP  |        |             Inner IP            |
   ~   packet    ~  ===>  ~             packet              ~
   |             |        |                                 |     T
   +-------------+        +---------------------------------+ -\  r
    Inner Packet          |                                 |  |  a
                          ~           Zero Padding          ~  |  i
                          |                                 |   > l
                          +---------------------------------+  |  e
                          |   Footer (see: Section 6.3.1)   |  |  r
                          +---------------------------------+ -/  s
                              Outer Packet with Trailers

       Figure 3: Data, Probe Request, and Probe Solicitation Format

6.3.4.  Probe Reply Format

   The TE uses the following encapsulation format for all probe reply
   packets (type 3):






















Templin                  Expires March 17, 2008                 [Page 9]


Internet-Draft            PLPMTUD  for Tunnels            September 2007


                          +--------------------------------+
                          |           Outer IPv4           |
                          |        Header w/'IPv4Id'       |
                          +--------------------------------+
                          |           * Headers            |
                          |                                |
   +-------------+        +--------------------------------+
   |   inner IP  |        |            inner IP            |
   ~    echo    ~  ===>  ~              echo               ~
   |    reply    |        |             reply              |
   +-------------+        +--------------------------------+ -\
   Inner Reply            |                                |  |
                          ~           Zero Padding         ~  |
                          |                                |  |
                          |               +----------------+  |  T
                          |               |     YourPort   |  |  r
                          +---------------+----------------+  |  a
                          |            YourAddr            |   > i
                          +--------------------------------+  |  l
                          |            ReassTime           |  |  e
                          +--------------------------------+  |  r
                          |            MaxPktSize          |  |  s
                          +--------------------------------+  |
                          |   Footer (see: Section 6.3.1)  |  |
                          +--------------------------------+ -/
                              Outer Reply with Trailers

                       Figure 4: Probe Reply Format

   where the following 14-byte "control block" information is included
   immediately following the padding and immediately before the trailing
   footer:

      YourPort (16 bits) - 1's complement of the observed port number of
      the TFE, or zero.

      YourAddr (32 bits) - 1's complement of the observed IPv4 address
      of the TFE, or zero.

      ReassTime (32 bits) - non-zero value between 1 - (2^32-1) in 4usec
      increments.

      MaxPktSize (32 bits) - non-zero value between 576 - (2^32-1) in 1
      byte increments.







Templin                  Expires March 17, 2008                [Page 10]


Internet-Draft            PLPMTUD  for Tunnels            September 2007


6.4.  Outer Packet Fragmentation

   For packets other than probe requests used for 'MaxFragSize'
   determination, TEs use IPv4 fragmentation to fragment outer packets
   after IPv4 encapsulation into fragments no larger than 'MaxFragSize'
   bytes.  These outer fragments will be reassembled by the TFE.

6.5.  Setting DF in the Outer Header

   TEs MUST set DF=1 in the outer IPv4 header of probe requests to be
   used for 'MaxFragSize' determination.  TEs MAY set DF=0 in the outer
   header of other probe requests and SHOULD set DF=0 in the outer
   header of probe replies.

   TEs MUST set DF=1 in the outer header of ordinary data packets/
   fragments when 'isNAT' is TRUE.

   TEs MAY set DF=0 in the outer header of ordinary data packets/
   fragments when 'isNAT' is FALSE.

6.6.  Window Management

   TEs send packets into a tunnel according to a window based on the
   TFE's advertised 'ReassTime'.  In particular, the TE must not admit
   more than 2^16 packets into the tunnel within the 'ReassTime' window.

   TEs SHOULD NOT admit some of the inner- and outer fragments of the
   original packet into the tunnel and drop others, i.e., either all of
   the fragments should be admitted or all should be dropped.


7.  Receiving Packets

7.1.  Decapsulation

   TEs decapsulate each outer packet they receive exactly as specified
   in the appropriate IP/*/IPv4 document except that when 'isQualified'
   is TRUE and the packet includes a non-zero trailing checksum the TE
   first verifies the checksum in the outer packet as specified in
   Section 9.  If the A and B results of the checksum calculation match
   the values stored in the trailing checksum, the TE decapsulates the
   packet; otherwise it drops the packet.

   Note that the initial probe request/reply packets from a new TFE will
   be received before 'isQualified' is set to TRUE.  The TE decapsulates
   these packets also as specified in Section 8.





Templin                  Expires March 17, 2008                [Page 11]


Internet-Draft            PLPMTUD  for Tunnels            September 2007


7.2.  Receiving Packet Too Big (PTB) Errors

   TEs may receive ICMPv4 PTB errors with Type=3 ("Destination
   Unreachable") and Code=4 ("fragmentation needed, and DF set") that
   include a Next-Hop MTU value [RFC1191] in response to any packets
   that were admitted into the tunnel with DF=1 [RFC0792].

   When a TE receives an ICMPv4 PTB with a Next-Hop MTU value smaller
   than 'MaxFragSize', it SHOULD reduce 'MaxFragSize' and/or actively
   probe to discover and confirm a new 'MaxFragSize'.  The TE SHOULD NOT
   send a translated PTB back to the inner source.


8.  Tunnel Qualification and Soft State Management

   TEs engage in a probing process to qualify new TFEs and refresh per-
   TFE soft state for qualified TFEs thereafter.  TEs discontinue the
   probing process and garbage-collect stale soft state for dormant
   tunnels and unqualified TFEs.  TEs exchange probe requests, probe
   solicitations and probe replies as specified in the following
   sections:

8.1.  Probe Requests

   TEs send and receive probe requests as specified below:

8.1.1.  Sending Probe Requests

8.1.1.1.  Basic Probing Strategy

   TEs send probe requests while data is actively flowing through the
   tunnel.  The TE sends initial probe requests to qualify each new TFE,
   then sends periodic probe requests thereafter.  The TE SHOULD limit
   the rate at which it sends probe requests to each TFE, but MUST probe
   frequently enough to refresh the per-TFE conceptual variables.

   The TE retains a cache of recently-sent probe requests and uses them
   to verify subsequent probe replies.

8.1.1.2.  MaxFragSize Probing

   The TE SHOULD probe to detect larger 'MaxFragsize' values by sending
   progressively larger probe requests padded to the desired probe size.
   When the TE receives sufficient evidence through probing that the
   forward path to the TFE supports the probed size, it advances
   'MaxFragSize' to the probe size.  The TE SHOULD NOT send probe
   requests larger than ('MaxPktSize' + ENCAPS).  The TE MAY send a
   series of probes in parallel to mitigate 'MaxFragSize' fluctuations



Templin                  Expires March 17, 2008                [Page 12]


Internet-Draft            PLPMTUD  for Tunnels            September 2007


   in the case of multipath routes with diverse path MTUs.

8.1.1.3.  Generating and Sending Probe Requests

   TEs generate probe requests by creating a minimum-sized and
   unfragmentable IP echo request packet according to the inner IP
   protocol (e.g., an ICMPv6 echo request [RFC4443] when the inner IP
   protocol is IPv6).  The echo request MUST include source and
   destination addresses that correspond to the TNE and TFE
   respectively, and SHOULD include additional identifying information
   (e.g., sequence/identification numbers, nonce values, etc.) that the
   TFE will echo in its reply.  The TE then encapsulates the echo
   request with padding added to create an outer probe request of the
   desired probe size and sends the probe request into the tunnel as
   specified in Section 6.

8.1.2.  Receiving Probe Requests

   When a TE receives a potential probe request from a TFE, it first
   determines whether the packet includes a valid trailing checksum.  If
   the packet did not include a valid trailing checksum, the TE
   discontinues probe request processing, decapsulates the packet as for
   ordinary data and returns from processing.  Otherwise, the TE
   generates a probe reply as specified in Section 8.3.

8.2.  Probe Solicitations

   TEs send and receive probe solicitations as specified below:

8.2.1.  Sending Probe Solicitations

   When a TE has new information to convey to a TFE, but has not
   received recent probe requests from the TFE, it MAY send a probe
   solicitation to the TFE.  The TE creates a NULL inner IP packet
   (e.g., an IPv6 header with "No Next Header" in the Next Header field)
   with source and destination addresses that correspond to the TNE and
   TFE respectively.  The TE then encapsulates the NULL packet as a
   probe solicitation and sends it into the tunnel as specified in
   Section 6.

8.2.2.  Receiving Probe Solicitations

   When a TE receives a potential probe solicitation from a TFE, it
   first determines whether the packet includes a valid trailing
   checksum.  If the packet did not include a valid trailing checksum,
   the TE discontinues probe solicitation processing, decapsulates the
   inner packet as for ordinary data and returns from processing.




Templin                  Expires March 17, 2008                [Page 13]


Internet-Draft            PLPMTUD  for Tunnels            September 2007


   Otherwise, the TE SHOULD send an expedited probe request with DF=0 to
   the TFE as specified in Section 8.1 if it has not successfully probed
   the TFE recently.  The TE then discards the probe solicitation.

8.3.  Probe Replies

   TEs send and receive probe replies as specified below:

8.3.1.  Sending Probe Replies

   TEs send probe replies in response to valid probe requests and use
   them as a mechanism for advertising 'MaxPktSize' and 'ReassTime'
   values to the TFE.  TEs also use probe replies to inform the TFE of
   the IPv4 address and protocol port number that it observed in the
   TFE's probe request.

   The TE creates an inner IP echo reply packet according to the inner
   IP protocol (e.g., an ICMPv6 echo reply [RFC4443] when the inner
   protocol is IPv6).  The TE includes in the echo reply the destination
   address of the echo request as the source address and the source
   address of the echo request as the destination addresses.  The TE
   also includes in the echo reply any additional identifying
   information that the TFE included in its echo request.

   The TE then encapsulates the echo reply as specified in Section 6.3.
   For IP/*/IPv4 tunneling mechanisms that include a port number in the
   encapsulating * header, the TE includes the 1's complement of the
   protocol source port number it observed in the TFE's probe request
   (e.g., the UDP source port number for IPv6/UDP/IPv4 encapsulation) in
   the 16-bit 'YourPort' field.  (Otherwise, the TE encodes the value
   '0' in the 'YourPort' field.)  The TE next includes the 1's
   complement of the source address it observed in the outer IPv4 header
   of the TFE's probe request in the 32-bit 'YourAddr' field.

   The TE next includes a value that is less than or equal to an EMTU_R
   appropriate for the interface the TFE's probe request arrived on in
   the 'MaxPktSize' field.  The TE MAY choose to dynamically increase or
   decrease the 'MaxPktSize' values it advertises to a TFE in successive
   probe replies, but if so it SHOULD seek to converge to a stable
   value.

   The TE finally includes a reassembly timeout value appropriate for
   the interface the TFE's probe request arrived on in the 'ReassTime'
   field.  The TE MAY choose to dynamically increase or decrease the
   'ReassTime' value it advertises to a TFE in successive probe replies,
   but if so it SHOULD seek to converge to a stable value.

   Following the encoding of the above trailing data, the TE appends the



Templin                  Expires March 17, 2008                [Page 14]


Internet-Draft            PLPMTUD  for Tunnels            September 2007


   trailing checksum and sends the reply to the TFE.

8.3.2.  Receiving Probe Replies

8.3.2.1.  Probe Reply Verification

   When a TE receives a potential probe reply from a TFE, it first
   determines whether the packet includes a valid trailing checksum.
   The TE next verifies that the packet includes enough trailing data to
   contain a probe reply control block (see: Section 6.3.4) then
   examines the 'MaxPktSize' and 'ReassTime' values in the potential
   control block.  If the packet did not include a valid trailing
   checksum, or the packet did not include a control block, or if either
   of the 'MaxPktSize' or 'ReassTime' values in the potential control
   block lie outside of the acceptable ranges listed in Section 6.3.4,
   the TE discontinues probe reply processing, decapsulates the packet
   as for ordinary data and returns from processing.

   Next, the TE verifies that the inner IP echo request matches one of
   its cached probe requests by examining the inner IP source and
   destination addresses as well as any other identifying information in
   the inner packet.  The TE sets: 'isQualified' to TRUE for this TFE if
   the probe reply is valid; otherwise, it discards the probe reply and
   returns from processing.  If the TE receives excessive invalid probe
   replies from a TFE, it resets 'isQualified' to FALSE and restores
   'MaxFragSize' and 'MaxPktSize' to default values.

8.3.2.2.  Probe Reply Processing

   For IP/*/IPv4 tunneling mechanisms that include port numbers in
   encapsulating * headers, the TE next examines the 'YourPort' and
   'YourAddr' values encoded in the packet.  If the values match the 1's
   complement of the probe request's protocol port and IPv4 address,
   respectively, the TE sets 'isNAT' to FALSE; otherwise, it sets
   'isNAT' to TRUE.  (For encapsulating * headers that do not include
   port numbers, the TE the ignores the 'YourPort' value in this check.)

   Next, the TE records the 'MaxPktSize' and 'ReassTime' values in the
   corresponding conceptual variables for this TFE.  If the new
   'MaxPktSize' is smaller than ('MaxFragSize' - ENCAPS), the TE SHOULD
   reduce 'MaxFragSize' to ('MaxPktSize' + ENCAPS).  If the 'MaxPktSize'
   and 'ReassTime' values fluctuate significantly between successive
   probe replies, the TE SHOULD record the most conservative values
   received (e.g., 16KB 'MaxPktSize' instead of 64KB, 90sec 'ReassTime'
   instead of 60sec, etc.).

   Following the above processing, the TE discards the probe reply.




Templin                  Expires March 17, 2008                [Page 15]


Internet-Draft            PLPMTUD  for Tunnels            September 2007


9.  8-bit Fletcher Checksum Calculation

   The 8-bit Fletcher Checksum is discussed in [RFC1146][STONE1][STONE2]
   and is used by this specification to provide an integrity check with
   different properties than those used by common link layers and upper
   layer protocols.

   The TE calculates the 8-bit Fletcher checksum of the first 64 bytes
   of the inner packet beginning with the inner IP header according to
   the algorithm of [RFC1146], which is reproduced below with an
   additional rule for representing zero results:

        The 8-bit Fletcher Checksum Algorithm is calculated over a
        sequence of data octets (call them D[1] through D[N]) by
        maintaining 2 unsigned 1's-complement 8-bit accumulators A and B
        whose contents are initially zero, and performing the following
        loop where i ranges from 1 to N:

             A := A + D[i]
             B := B + A

        If, at the end of the loop, either or both of the A, B
        accumulators encode the value 0x0000, invert the value
        in the accumulator(s) to 0xffff.

   Note that faster algorithms are possible and may be used instead of
   the algorithm above; see: [RFC1146] for citations of alternate
   algorithms.


10.  Updated Specifications

   This document updates the following specifications:

   o  RFC2003 (IP-in-IP)

   o  RFC2529 (6over4)

   o  RFC2661 (L2TP)

   o  RFC2784 (GRE)

   o  RFC3056 (6to4)

   o  RFC3378 (ETHERIP)

   o  RFC3884 (IPSec Transport Mode for Dynamic Routing)




Templin                  Expires March 17, 2008                [Page 16]


Internet-Draft            PLPMTUD  for Tunnels            September 2007


   o  RFC4023 (MPLS-in-IP)

   o  RFC4213 (Basic IPv6 Transition Mechanisms)

   o  RFC4214 (ISATAP)

   o  RFC4301 (IPSec)

   o  RFC4302 (AH)

   o  RFC4303 (ESP)

   o  RFC4380 (TEREDO)

   o  LISP

   o  others....


11.  IANA Considerations

   The IANA is instructed to create a registry for the Version and Type
   values that occur in the footers of encapsulated packets per Section
   6.3.1.


12.  Security Considerations

   Probe replies contain identifying information that is useful for
   defending against off-path attacks.  A possible attack vector
   involves an attacker sending probe requests and/or probe
   solicitations with spoofed source addresses.  TEs mitigate such
   attacks by rate limiting the probe requests/replies they send.

   Security considerations for specific IP/*/IPv4 tunneling mechanisms
   are given in the respective documents.


13.  Acknowledgments

   This work has benefited from discussions with Fred Baker, Iljitsch
   van Beijnum, Steve Casner, Gorry Fairhurst, John Heffner, Joe Macker,
   Matt Mathis, and Joe Touch.  Dan Romascanu mentioned the IEEE 802.3as
   extension of the Ethernet frame size to 2048 bytes.


14.  References




Templin                  Expires March 17, 2008                [Page 17]


Internet-Draft            PLPMTUD  for Tunnels            September 2007


14.1.  Normative References

   [RFC0791]  Postel, J., "Internet Protocol", STD 5, RFC 791,
              September 1981.

   [RFC0792]  Postel, J., "Internet Control Message Protocol", STD 5,
              RFC 792, September 1981.

   [RFC1122]  Braden, R., "Requirements for Internet Hosts -
              Communication Layers", STD 3, RFC 1122, October 1989.

   [RFC1191]  Mogul, J. and S. Deering, "Path MTU discovery", RFC 1191,
              November 1990.

   [RFC1812]  Baker, F., "Requirements for IP Version 4 Routers",
              RFC 1812, June 1995.

   [RFC2460]  Deering, S. and R. Hinden, "Internet Protocol, Version 6
              (IPv6) Specification", RFC 2460, December 1998.

14.2.  Informative References

   [RFC0905]  International Organization for Standardization (ISO), "ISO
              Transport Protocol specification ISO DP 8073", RFC 905,
              April 1984.

   [RFC1146]  Zweig, J. and C. Partridge, "TCP alternate checksum
              options", RFC 1146, March 1990.

   [RFC1981]  McCann, J., Deering, S., and J. Mogul, "Path MTU Discovery
              for IP version 6", RFC 1981, August 1996.

   [RFC2923]  Lahey, K., "TCP Problems with Path MTU Discovery",
              RFC 2923, September 2000.

   [RFC3385]  Sheinwald, D., Satran, J., Thaler, P., and V. Cavanna,
              "Internet Protocol Small Computer System Interface (iSCSI)
              Cyclic Redundancy Check (CRC)/Checksum Considerations",
              RFC 3385, September 2002.

   [RFC3819]  Karn, P., Bormann, C., Fairhurst, G., Grossman, D.,
              Ludwig, R., Mahdavi, J., Montenegro, G., Touch, J., and L.
              Wood, "Advice for Internet Subnetwork Designers", BCP 89,
              RFC 3819, July 2004.

   [RFC4213]  Nordmark, E. and R. Gilligan, "Basic Transition Mechanisms
              for IPv6 Hosts and Routers", RFC 4213, October 2005.




Templin                  Expires March 17, 2008                [Page 18]


Internet-Draft            PLPMTUD  for Tunnels            September 2007


   [RFC4443]  Conta, A., Deering, S., and M. Gupta, "Internet Control
              Message Protocol (ICMPv6) for the Internet Protocol
              Version 6 (IPv6) Specification", RFC 4443, March 2006.

   [RFC4459]  Savola, P., "MTU and Fragmentation Issues with In-the-
              Network Tunneling", RFC 4459, April 2006.

   [RFC4821]  Mathis, M. and J. Heffner, "Packetization Layer Path MTU
              Discovery", RFC 4821, March 2007.

   [RFC4963]  Heffner, J., Mathis, M., and B. Chandler, "IPv4 Reassembly
              Errors at High Data Rates", RFC 4963, July 2007.

   [STONE1]   Stone, J., "Checksums in the Internet (Stanford Doctoral
              Dissertation)", August 2001.

   [STONE2]   Stone, J., Greenwald, M., Partridge, C., and J. Hughes,
              "Performance of Checksums and CRC's over Real Data, IEEE/
              ACM Transactions on Networking, Vol 6, No. 5",
              October 1998.


Appendix A.  Discussion

   Probing strategies for packetization layer protocols are specified in
   ([RFC4821], Section 7) and apply also to the TE's 'MaxFragSize'
   probing process.

   Further strategies for handling ICMPv4 PTB errors are specified in
   ([RFC4821], Section 7) and apply also to the TE's 'MaxFragSize'
   probing process.

   Note that decapsulation automatically erases any padding that may
   have been inserted by the TE along with the trailing checksum.


Author's Address

   Fred L. Templin (editor)
   Boeing Phantom Works
   P.O. Box 3707
   Seattle, WA  98124
   USA

   Email: fred.l.templin@boeing.com






Templin                  Expires March 17, 2008                [Page 19]


Internet-Draft            PLPMTUD  for Tunnels            September 2007


Full Copyright Statement

   Copyright (C) The IETF Trust (2007).

   This document is subject to the rights, licenses and restrictions
   contained in BCP 78, and except as set forth therein, the authors
   retain all their rights.

   This document and the information contained herein are provided on an
   "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
   OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND
   THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS
   OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF
   THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
   WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.


Intellectual Property

   The IETF takes no position regarding the validity or scope of any
   Intellectual Property Rights or other rights that might be claimed to
   pertain to the implementation or use of the technology described in
   this document or the extent to which any license under such rights
   might or might not be available; nor does it represent that it has
   made any independent effort to identify any such rights.  Information
   on the procedures with respect to rights in RFC documents can be
   found in BCP 78 and BCP 79.

   Copies of IPR disclosures made to the IETF Secretariat and any
   assurances of licenses to be made available, or the result of an
   attempt made to obtain a general license or permission for the use of
   such proprietary rights by implementers or users of this
   specification can be obtained from the IETF on-line IPR repository at
   http://www.ietf.org/ipr.

   The IETF invites any interested party to bring to its attention any
   copyrights, patents or patent applications, or other proprietary
   rights that may cover technology that may be required to implement
   this standard.  Please address the information to the IETF at
   ietf-ipr@ietf.org.


Acknowledgment

   Funding for the RFC Editor function is provided by the IETF
   Administrative Support Activity (IASA).





Templin                  Expires March 17, 2008                [Page 20]