Network Working Group                                          R. Hinden
Internet-Draft                                      Check Point Software
Intended status: Experimental                               G. Fairhurst
Expires: September 12, 2019                       University of Aberdeen
                                                          March 11, 2019


                IPv6 Minimum Path MTU Hop-by-Hop Option
                    draft-hinden-6man-mtu-option-01

Abstract

   This document specifies a new Hop-by-Hop IPv6 option that is used to
   record the minimum Path MTU along the forward path between a source
   to a destination host.  This collects a minimum recorded MTU along
   the path to the destination.  The value can then be communicated back
   to the source host by an ICMPv6 Packet Too Big message.

   This Hop-by-Hop option is intended to be used in environments like
   Data Centers and on paths between Data Centers, to allow them to
   better take advantage of paths able to support a large Path MTU.

Status of This Memo

   This Internet-Draft is submitted to IETF in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at http://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on September 12, 2019.

Copyright Notice

   Copyright (c) 2019 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (http://trustee.ietf.org/license-info) in effect on the date of
   publication of this document.  Please review these documents



Hinden & Fairhurst     Expires September 12, 2019               [Page 1]


Internet-Draft               Path MTU Option                  March 2019


   carefully, as they describe your rights and restrictions with respect
   to this document.

Table of Contents

   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   2
   2.  Motivation and Problem Solved . . . . . . . . . . . . . . . .   4
   3.  Requirements Language . . . . . . . . . . . . . . . . . . . .   5
   4.  Applicability Statements  . . . . . . . . . . . . . . . . . .   5
   5.  IPv6 Minimum Path MTU Hop-by-Hop Option . . . . . . . . . . .   5
   6.  Router, Host, and Transport Behaviors . . . . . . . . . . . .   6
     6.1.  Router Behaviour  . . . . . . . . . . . . . . . . . . . .   6
     6.2.  Host Behavior . . . . . . . . . . . . . . . . . . . . . .   7
     6.3.  Transport Behavior  . . . . . . . . . . . . . . . . . . .   8
   7.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .   9
   8.  Security Considerations . . . . . . . . . . . . . . . . . . .  10
   9.  Acknowledgments . . . . . . . . . . . . . . . . . . . . . . .  10
   10. Change log [RFC Editor: Please remove]  . . . . . . . . . . .  10
   11. References  . . . . . . . . . . . . . . . . . . . . . . . . .  11
     11.1.  Normative References . . . . . . . . . . . . . . . . . .  11
     11.2.  Informative References . . . . . . . . . . . . . . . . .  11
   Appendix A.  Planned Experiments  . . . . . . . . . . . . . . . .  12
   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  12

1.  Introduction

   This draft proposes a new Hop-by-Hop Option to be used to record the
   minimum MTU along the forward path between the source and destination
   nodes.  The source node creates a packet with this Hop-by-Hop Option
   and fills the Reported PMTU Field in the option with the value of the
   MTU for the outbound link that will be used to forward the packet
   towards the destination.

   At each subsequent hop where the option is processed, the router
   compares the value of the Reported PMTU in the option and the MTU of
   its outgoing link.  If the MTU of the outgoing link is less than the
   Reported PMTU specified in the option, it rewrites the value in the
   Option Data with the smaller value.  When the packet arrives at the
   Destination node, the Destination node can send the minimum reported
   PMTU value back to the Source Node.  This can be done by creating an
   ICMPv6 Packet Too Big message.

   The figure below can be used to illustrate the operation of the
   method.  In this case, the path between the Sender and Destination
   nodes comprises three links, the sender has a link MTU of size MTU-S,
   the link between routers R1 and R2 has an MTU of size 8 KBytes, and
   the final link to the destination has an MTU of size MTU-D.




Hinden & Fairhurst     Expires September 12, 2019               [Page 2]


Internet-Draft               Path MTU Option                  March 2019


      +--------+         +----+        +----+         +-------+
      |        |         |    |        |    |         |       |
      | Sender +---------+ R1 +--------+ R2 +-------- + Dest. |
      |        |         |    |        |    |         |       |
      +--------+  MTU-S  +----+  8 KB  +----+  MTU-D  +-------+


   The scenarios are described:

   Scenario 1, considers all links to have an 8 KByte MTU and the method
   is supported by both routers.

   Scenario 2, considers the destination link to have an MTU of 1500
   Byte.  This is the smallest MTU, router R2 resets the reported PMTU
   to 1500 Byte and this is detected by the method.  Had there been
   another smaller MTU at a link further along the path that supports
   the method, the lower PMTU would also have been detected.

   Scenario 3, considers the case where the router preceding the
   smallest link does not support the method, and the method then fails
   to detect the actual PMTU.  These scenarios are summarized in the
   table below.  This scenario would also arise if the PTB message was
   not delivered to the sender.


      +-+-----+-----+----+----+----------+-----------------------+
      | |MTU-S|MTU-D| R1 | R2 | Rec PMTU | Note                  |
      +-+-----+-----+----+----+----------+-----------------------+
      |1| 8KB | 8KB | H  | H  |  8 KB    | Endpoints attempt to  |
      |       |     |    |    |          | use an 8 KB PMTU.     |
      +-+-----+-----+----+----+----------+-----------------------+
      |2| 8KB |1500B| H  | H  |  1500 B  | Endpoints attempt to  |
      | |     |     |    |    |          | use a 1500 B PMTU.    |
      +-+-----+-----+----+----+----------+-----------------------+
      |3| 8KB |1500B| H  | -  |  8 KB    | Endpoints attempt to  |
      | |     |     |    |    |          | use an 8 KB PMTU, but |
      | |     |     |    |    |          | need to implement a   |
      | |     |     |    |    |          | method to fall back   |
      | |     |     |    |    |          | use a 1500 B PMTU.    |
      +-+-----+-----+----+----+----------+-----------------------+


   IPv6 as specified in [RFC8200] allows nodes to optionally process
   Hop-by-Hop headers.  Specifically from Section 4:

   o  The Hop-by-Hop Options header is not inserted or deleted, but may
      be examined or processed by any node along a packet's delivery
      path, until the packet reaches the node (or each of the set of



Hinden & Fairhurst     Expires September 12, 2019               [Page 3]


Internet-Draft               Path MTU Option                  March 2019


      nodes, in the case of multicast) identified in the Destination
      Address field of the IPv6 header.  The Hop-by-Hop Options header,
      when present, must immediately follow the IPv6 header.  Its
      presence is indicated by the value zero in the Next Header field
      of the IPv6 header.

   o  NOTE: While [RFC2460] required that all nodes must examine and
      process the Hop-by-Hop Options header, it is now expected that
      nodes along a packet's delivery path only examine and process the
      Hop-by-Hop Options header if explicitly configured to do so.

   The Hop-by-Hop Option defined in this document is designed to take
   advantage of this property of how Hop-by-Hop options are processed.
   Nodes that do not support this Option SHOULD ignore them.  This can
   mean that the value returned in the response message does not account
   for all links along a path.

2.  Motivation and Problem Solved

   The current state of Path MTU Discovery on the Internet is
   problematic.  The problems with the mechanisms defined in [RFC8201]
   are known to not work well in all environments.  Nodes in the middle
   of the network may not send ICMP Packet Too Big messages or they are
   rate limited to the point of not making them a useful mechanism.

   This results in many connection defaulting to 1280 octets and makes
   it very difficult to take advantage of links with larger MTU where
   they exist.  Applications that need to send large packets over UDP
   are forced to use IPv6 Fragmentation.

   Transport encapsulations and network-layer tunnels reduce the PMTU
   available for a transport to use.  For example, Network
   Virtualization Using Generic Routing Encapsulation (NVGRE) [RFC7637]
   encapsulates L2 packets in an outer IP header and does not allow IP
   Fragmentation.

   The use of 10G Ethernet will not achieve it's potential because the
   packet per second rate will exceed what most nodes can send to
   achieve multi-gigabit rates if the packet size limited to 1280
   octets.  For example, the packet per second rate required to reach
   wire speed on a 10G Ethernet link with 1280 octet packets is about
   977K packets per second (pps), vs. 139K pps for 9,000 octet packets.
   A significant difference.

   The purpose of the this draft is to improve the situation by defining
   a mechanism that does not rely on nodes in the middle of the network
   to send ICMPv6 Packet Too Big messages, instead it provides the
   destination host information on the minimum Path MTU and it can send



Hinden & Fairhurst     Expires September 12, 2019               [Page 4]


Internet-Draft               Path MTU Option                  March 2019


   this information back to the source host.  This is expected to work
   better than the current RFC8201 based mechanisms.

3.  Requirements Language

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
   "OPTIONAL" in this document are to be interpreted as described in BCP
   14 [RFC2119] [RFC8174] when, and only when, they appear in all
   capitals, as shown here.

4.  Applicability Statements

   This Hop-by-Hop Option header is intended to be used in environments
   such as Data Centers and on paths between Data Centers, to allow them
   to better take advantage of a path that is able to support a large
   PMTU.  For example, it helps inform a sender that the path includes
   links that have a MTU of 9,000 Bytes.  This has many performance
   advantages compared to the current practice of limiting packets to
   1280 Bytes.

   The design of the option is sufficiently simple that it could be
   executed on a router's fast path.  To create critical mass for this
   to happen will have to be a strong pull from router vendors
   customers.  This could be the case for connections within and between
   Data Centers.

   The method could also be useful in other environments, including the
   general Internet.

5.  IPv6 Minimum Path MTU Hop-by-Hop Option

   The Minimum Path MTU Hop-by-Hop Option has the following format:


















Hinden & Fairhurst     Expires September 12, 2019               [Page 5]


Internet-Draft               Path MTU Option                  March 2019


      Option    Option    Option
       Type    Data Len   Data
     +--------+--------+--------+--------+
     |BBCTTTTT|00000010|  2 octet value  |
     +--------+--------+--------+--------+

     Option Type:

     BB     00   Skip over this option and continue processing.

     C       1   Option data can change en route to the packet's final
                 destination.

     TTTT 11110  Expermental Option Type from [IANA-HBH].

     Length: 2   Note the size of the each value field in Option Data
                 field supports Path MTU values from 0 to 65,535 octets.

     Value:  n   The Reported PMTU in octets, reflecting the smallest
                 link MTU that the packet experienced across the path.


6.  Router, Host, and Transport Behaviors

6.1.  Router Behaviour

   Routers that do not support Hop-by-Hop options SHOULD ignore this
   option and SHOULD forward the packet.

   Routers that support Hop-by-Hop Options, but do not recognize this
   option SHOULD ignore the option and SHOULD forward the packet.

   Routers that recognize this option SHOULD compare the Reported PMTU
   in the Option Value field and the MTU configured for the outgoing
   link.  If the MTU of the outgoing link is less than the Reported
   PMTU, the router rewrites the Reported PMTU in the Option to use the
   smaller value.

   Discussion:

   o  The design of this Hop-by-Hop Option makes it feasible to be
      implemented within the fast path of a router, because the required
      processing is simple.








Hinden & Fairhurst     Expires September 12, 2019               [Page 6]


Internet-Draft               Path MTU Option                  March 2019


6.2.  Host Behavior

   The source host that supports this option SHOULD create a packet with
   this Hop-by-Hop Option and fill the Reported PMTU field of the option
   with the MTU of configured for the link over which it will send the
   packet on the next hop towards the destination.

   Discussion:

   o  This option does not need to be sent in all packets belonging to a
      flow.  A transport protocol (or packetization layer) can set this
      option only on specific packets used to test the path.

   o  In the case of TCP, the option could be included in packets
      carrying a SYN segment as part of the connection set up, or can
      periodically be sent in packets carrying other segments.
      Including this packet in a SYN could increase the probability that
      SYN segment is lost, when routers on the path drop packets with
      this option.  Including this option in a large packet is not
      likely to be useful, since the large packet might itself also be
      dropped by a link along the path with a smaller MTU, preventing
      the Reported PMTU information from reaching the Destination node.

   o  The use with datagram transport protocols (e.g.  UDP) is harder to
      characterize because applications using datagram transports range
      from very short-lived (low data-volume applications) exchanges, to
      longer (bulk) exchanges of packets between the Source and
      Destination nodes [RFC8085].

   o  For applications that use Anycast, this option should be included
      in all packets as the actual destination will vary due to the
      nature of Anycast.

   o  Simple-exchange protocols (i.e low data-volume applications
      [RFC8085] that only send one or a few packets per transaction,
      could be optimised by assuming that the Path MTU is symmetrical,
      that is where the Path MTU is the same in both directions, or at
      least not smaller in the return path.  This optimisation does not
      hold when the paths are not symmetric.

   o  The use of this option with DNS and DNSSEC over UDP ought to work
      as long as the paths are symmetric.  The DNS server will learn the
      Path MTU from the DNS query messages.  If the return Path MTU is
      smaller, then the large DNSSEC response may be dropped and the
      known problems with PMTUD will occur.  DNS and DNSSEC over
      transport protocols that can carry the Path MTU should work.





Hinden & Fairhurst     Expires September 12, 2019               [Page 7]


Internet-Draft               Path MTU Option                  March 2019


   A Destination Host MUST NOT respond to each packet received with this
   option, when the option also carries the same received value.  This
   requires the implementation to cache the last received value of the
   option.  This is necessary to avoid generating excessive feedback
   traffic.  When sending an ICMPv6 Packet Too Big message the node MUST
   follow the procedures in [RFC4443] and [RFC8201] to avoid sending too
   many ICMPv6 Packet Too Big Messages to the source.

   When a Destination Host, that supports this option, receives a packet
   with this option, it SHOULD first compare the Reported PMTU value
   with a value received earlier from this source.  If this is the first
   value, or if the received value is lower, it SHOULD record the value
   as the Received PMTU for the Source of the Packet, and it SHOULD send
   the new value back to the Source of the packet.  This can be done by
   creating an ICMPv6 Packet Too Big message.

   NOTE: The Received PMTU could also be reset by a timer to allow
   periodic refresh of the state.  This would also allow a sender to
   discover cases where the Path MTU has increased (e.g., due to a
   change in the forwarding path).

   Discussion:

   o  A simple mechanism could only send an ICMPv6 Packet Too Big
      message the first time this option is received or when the
      Received PMTU is reduced.  This is good because it limits the
      number sent, but there is no provision for retransmission of the
      Path MTU if the ICMPv6 Packet Too Big Message fails to reach the
      sender, or the sender looses state.

   o  The Reported PMTU value could increase or decrease over time.  For
      instance, it would increase when the path changes and the packets
      become then forwarded over a link with a MTU larger than the link
      previously used.

6.3.  Transport Behavior

   A transport endpoint using this option needs to use a method to
   verify the information provided by this option.

   The Received PMTU does not necessarily reflect the actual PMTU
   between the sender and destination.  Care therefore needs to be
   exercised in using this value at the sender.  Specifically:

   o  If the Received PMTU value returned by the Destination is the same
      as the initial Reported PMTU value, there could still be a router
      or layer 2 device on the path that does not support this PMTU.
      The usable PMTU therefore needs to be confirmed.



Hinden & Fairhurst     Expires September 12, 2019               [Page 8]


Internet-Draft               Path MTU Option                  March 2019


   o  If the Received PMTU value returned by the Destination is smaller
      than the initial Reported PMTU value, this is an indication that
      there is at least one router in the path with a smaller MTU.
      There could still be another router or layer 2 device on the path
      that does not support this MTU.

   o  If the Received PMTU value returned by the Destination is larger
      than the initial Reported PMTU value, this may be a corrupted,
      delayed or mis-ordered response, and SHOULD be ignored.

   A sender needs to discriminate between the Received PMTU value in a
   PTB message generated in response to a Hop-by-Hop option requesting
   this, and a PTB message received from a router on the path.

   A PMTUD or PLPMTUD method could use the Received PMTU value as an
   initial target size to probe the path.  This can significantly
   decrease the number of probe attempts (and hence time taken) to
   arrive at a workable PMTU.  It has the potential to complete
   discovery of the correct value in a single Round Trip Time (RTT),
   even over paths that may have successive links configured with lower
   MTUs.

   Since the method can delay notification of an increase in the actual
   PMTU, a sender with a link MTU larger than the current PMTU SHOULD
   periodically probe for a PMTU value that is larger than the Received
   PMTU value.  This specification does not define an interval for the
   time between probes.

   Since the option consumes less capacity than an a full probe packet,
   there may be advantage in using this to detect a change in the path
   characteristics.

   Note: Further details to be included in next version.

   NOTE: A future version of the document will consider more the impact
   of Equal Cost Multipath (ECMP).  Specifically, whether a Received
   PMTU value should be maintained by the method for each transport
   endpoint, or for each network address, and how these are best used by
   methods such as PLPMTUD or DPLPMTUD.

7.  IANA Considerations

   No IANA assignments are requested.  Document uses experimental option
   from [IANA-HBH].







Hinden & Fairhurst     Expires September 12, 2019               [Page 9]


Internet-Draft               Path MTU Option                  March 2019


8.  Security Considerations

   The method has no way to protect the destination from off-path attack
   using this option in packets that do not originate from the source.
   This attack could be used to inflate or reduce the size of the
   reported PMTU.  Mechanisms to provide this protection can be provided
   at a higher layer (e.g., the transport packetization layer using
   PLPMTUD or DPLPMTUD), where more information is available about the
   size of packet that has successfully traversed a path.

   The method solicits a response from the destination, which should be
   used to generate a response to the IPv6 node originating the option
   packet.  A malicious attacker could generate a packet to the
   destination for a previously inactive flow or one that advertises a
   change in the size of the MTU for an active flow.  This would create
   additional work at the destination, and could induce creation of
   state when a new flow is created.  It could potentially result in
   additional traffic on the return path to the sender, which could be
   mitigated by limiting the rate at which responses are generated.

   A sender MUST check the quoted packet within the PTB message to
   validate that the message is in response to a packet that was
   originated by the sender.  This is intended to provide protection
   against off-path insertion of ICMP PTB messages by an attacker trying
   to disrupt the service.  Messages that fail this check MAY be logged,
   but the information they contain MUST be discarded.

   TBD

9.  Acknowledgments

   Helpful comments were received from [your name here] and other
   members of the 6MAN working group.

10.  Change log [RFC Editor: Please remove]

   draft-hinden-6man-mtu-option-01, 2019-March-05

   o  Changed requested status from Standards Track to Experimental to
      allow use of experimental option type (11110) to allow for
      experimentation.  Removed request for IANA Option assignment.
   o  Added Section 2 "Motivation and Problem Solved" section to better
      describe what the purpose of this document is.
   o  Added Appendix A describing planned experiments and how the
      results will be measured.
   o  Editorial changes.

   draft-hinden-6man-mtu-option-00, 2018-Oct-16



Hinden & Fairhurst     Expires September 12, 2019              [Page 10]


Internet-Draft               Path MTU Option                  March 2019


   o  Initial draft.

11.  References

11.1.  Normative References

   [IANA-HBH]
              "Destination Options and Hop-by-Hop Options",
              <https://www.iana.org/assignments/ipv6-parameters/
              ipv6-parameters.xhtml#ipv6-parameters-2>.

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/
              RFC2119, March 1997, <https://www.rfc-editor.org/info/
              rfc2119>.

   [RFC4443]  Conta, A., Deering, S., and M. Gupta, Ed., "Internet
              Control Message Protocol (ICMPv6) for the Internet
              Protocol Version 6 (IPv6) Specification", STD 89, RFC
              4443, DOI 10.17487/RFC4443, March 2006, <https://www.rfc-
              editor.org/info/rfc4443>.

   [RFC8174]  Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC
              2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174,
              May 2017, <https://www.rfc-editor.org/info/rfc8174>.

   [RFC8200]  Deering, S. and R. Hinden, "Internet Protocol, Version 6
              (IPv6) Specification", STD 86, RFC 8200, DOI 10.17487/
              RFC8200, July 2017, <https://www.rfc-editor.org/info/
              rfc8200>.

   [RFC8201]  McCann, J., Deering, S., Mogul, J., and R. Hinden, Ed.,
              "Path MTU Discovery for IP version 6", STD 87, RFC 8201,
              DOI 10.17487/RFC8201, July 2017, <https://www.rfc-
              editor.org/info/rfc8201>.

11.2.  Informative References

   [RFC2460]  Deering, S. and R. Hinden, "Internet Protocol, Version 6
              (IPv6) Specification", RFC 2460, DOI 10.17487/RFC2460,
              December 1998, <https://www.rfc-editor.org/info/rfc2460>.

   [RFC7637]  Garg, P., Ed. and Y. Wang, Ed., "NVGRE: Network
              Virtualization Using Generic Routing Encapsulation", RFC
              7637, DOI 10.17487/RFC7637, September 2015,
              <https://www.rfc-editor.org/info/rfc7637>.





Hinden & Fairhurst     Expires September 12, 2019              [Page 11]


Internet-Draft               Path MTU Option                  March 2019


   [RFC8085]  Eggert, L., Fairhurst, G., and G. Shepherd, "UDP Usage
              Guidelines", BCP 145, RFC 8085, DOI 10.17487/RFC8085,
              March 2017, <https://www.rfc-editor.org/info/rfc8085>.

Appendix A.  Planned Experiments

   TBD

   This section will describe a set of experiments planned for the use
   of the option defined in this document.  There are many aspects of
   the design that require experimental data or experience to evaluate
   this experimental specification.

   This includes experiments to understand the pathology of packets sent
   with the specified option to determine the likelihood that they are
   lost within specific types of network segment.

   This includes consideration of the cost and alternatives for
   providing the feedback required by the mechanism and how to
   effectively limit the rate of transmission.

   This includes consideration of the potential for integration in
   frameworks such as that offered by DPLPMTUD.

   There are also security-related topics to be understood as described
   in the Security Considerations (Section 8).

Authors' Addresses

   Robert M. Hinden
   Check Point Software
   959 Skyway Road
   San Carlos, CA  94070
   USA

   Email: bob.hinden@gmail.com


   Godred Fairhurst
   University of Aberdeen
   School of Engineering
   Fraser Noble Building
   Aberdeen  AB24 3UE
   UK

   Email: gorry@erg.abdn.ac.uk





Hinden & Fairhurst     Expires September 12, 2019              [Page 12]