Path MTU Discovery for IP version 6
RFC 8201

Note: This ballot was opened for revision 06 and is now closed.

Suresh Krishnan Yes

(Alia Atlas) No Objection

Deborah Brungard No Objection

Ben Campbell No Objection

Comment (2017-05-10 for -06)
No email
send info
I agree Alvaro's DISCUSS point about 2119 language.

(Benoît Claise) (was Discuss) No Objection

Comment (2017-05-11 for -07)
No email
send info
In this document, I see:

   IPv6 nodes SHOULD implement Path MTU Discovery in order to discover
   and take advantage of paths with PMTU greater than the IPv6 minimum
   link MTU [I-D.ietf-6man-rfc2460bis].  A minimal IPv6 implementation
   (e.g., in a boot ROM) may choose to omit implementation of Path MTU
   Discovery.

In draft-ietf-6man-rfc2460bis-09:
   It is strongly recommended that IPv6 nodes implement Path MTU
   Discovery [RFC1981], in order to discover and take advantage of path
   MTUs greater than 1280 octets.  However, a minimal IPv6
   implementation (e.g., in a boot ROM) may simply restrict itself to
   sending packets no larger than 1280 octets, and omit implementation
   of Path MTU Discovery.

So a SHOULD in one document versus "strongly recommended" in the other. 
We should reconcile the two texts.
Note: may and may are consistent.


ICMPv6 PTB => ICMPv6 Packet to Big (PTB)

Alissa Cooper No Objection

Comment (2017-05-10 for -06)
No email
send info
I agree with Alvaro's proposed resolution to the 2119 issue, which was also raised by the Gen-ART reviewer.

Spencer Dawkins No Objection

Comment (2017-05-10 for -06)
No email
send info
I'm watching the many e-mail threads on IESG ballot positions for this draft, but don't have anything to add.

Warren Kumari (was Discuss) No Objection

Comment (2017-05-19 for -07)
No email
send info
I sent email about this to the authors on Feb 23rd - I seem to still have have many of the same questions...

Comments:
1: Sec 1: "Path MTU Discovery relies on such messages to determine the MTU of the path."
 -- it is unclear which "such" refers to. Perhaps s/such/ICMPv6/ (or PTB).

2: Sec 3: "Upon receipt of such a message, the
   source node reduces its assumed PMTU for the path based on the MTU of
   the constricting hop as reported in the Packet Too Big message" -- this says that it reduces it *for the path*. But (as somewhat alluded to later in the draft) the nodes doesn't know what the path *is* -- it can decrease for the destination, or flow, or even interface, but (unless it is strict source routing) it doesn't control or really know the path (see also #4)

3: Sec 4: "The recommended setting for this timer is twice its minimum value (10 minutes)." - as above. This was from 1996 - were these metrics discussed at all during the -bis? I suspect that the average flow is much shorter these days (more web traffic, fatter pipes, etc) and so a flow of 10 minutes seems really long (to me at least). 

4: Sec 5.2: "The packetization layers must be notified about decreases in the
   PMTU.  Any packetization layer instance (for example, a TCP
   connection) that is actively using the path must be notified if the
   PMTU estimate is decreased.
      Note: even if the Packet Too Big message contains an Original
      Packet Header that refers to a UDP packet, the TCP layer must be
      notified if any of its connections use the given path."
 - this is related to #2 -- I don't know *which* path my packets take - once I launch them into the void, they may be routed purely based upon destination IP address, or they may be hashed based upon some set of header fields to a particular ECMP link or LSP. Once packets hit a load balancer, it is probably even *likely* that the UDP and TCP packets end up on different things. So, if I get a PTB from a router somewhere, I can probably guess that other packets to the same destination address will also follow that path, but I cannot know that for sure. I'm fine to decrease MTU towards that destination IP, but is that what this is suggesting? If so, please say that. If not, please let me know what I should do. The above is even more tricky / fun when I'm using flow id as the flow identifier -- if I get a PTB for flow 0x1234, what do I do? 

 5: Sec 5.3: "Once a minute, a timer-driven procedure runs through all cached PMTU values, and for each PMTU whose timestamp is not "reserved" and is older than the timeout interval ...". Please consider providing clarifications here. The wording implies that I should set a timer to fire on the minute, and trigger the behavior. If all of the (NTP synced!) machines in my datacenter do this, and all try send bigger packets (on 1/10th of long flows) their first hop router will get many, many over-sized packets and it will severely rate-limit the PTBs. 


Nits (Some of these are purely academic.)
I understand that you are trying to limit the changes, so feel free to ignore these:

1: "A node sending packets much smaller than the Path
MTU allows is wasting network resources and probably getting
suboptimal throughput." - the "much" confuses me. If I'm using anything less than the MTU I'm wasting network resources and getting suboptimal throughput - I might not care, but if (used MTU) < (path MTU) I'm wasting resources.

2: "Nodes implementing Path MTU Discovery and sending packets larger than
the IPv6 minimum link MTU are susceptible to problematic connectivity
if ICMPv6 [ICMPv6] messages are blocked or not transmitted." The "implementing Path MTU Discovery and" seems redundant. ALL nodes sending packets larger than minimum MTU are "susceptible to problematic connectivity if ICMPv6 [ICMPv6] messages are blocked or not transmitted.". I get what you are trying to say, but my OCD tendencies would not allow me to ignore this... 

3: "In the case of multipath routing (e.g., Equal Cost Multipath Routing, ECMP),"- this is vague / confusing -- (Equal Cost Multipath Routing, ECMP) makes it sound like either ECMP is an acronym for Equal Cost Multipath Routing, or that ECMP is something different to Equal Cost Multipath Routing.
I'd suggest just dropping the "ECMP" (or, "Equal Cost Multipath (ECMP) routing", but that seems clumsy)

Mirja Kühlewind (was Discuss) No Objection

Comment (2017-05-29)
No email
send info
Thanks for addressing my discuss. I believe there is an editing nit in section 5.4 now:
"Alternatively, the retransmission could be done in immediate response
   to a notification that the Path MTU was decreased, but only for the
   specific connection specified by the Packet Too Big message, but only
   based on the message and connection. "
Is it on purpose to have twice "but only" here?

I leave my previous comments  below for the record. I don't think all of them have been addressed but I aldo don't recognize any further discussion about those points:

1) I agree with Ekr on this sentence:
"Nodes SHOULD appropriately validate the payload of ICMPv6 PTB
   messages to ensure these are received in response to transmitted
   traffic (i.e., a reported error condition that corresponds to an IPv6
   packet actually sent by the application) per [ICMPv6]."
This sounds like it should be a MUST but I guess it depends on the upper layer protocol if such a validation is possible or not, e.g. if information are available that can be used for validation. Maybe you can be more explicit here and even say something like pmtu discovery should/must only be used if the upper layer protocol provides means for validation of the icmp payload (like a sequence number in TCP)…?

Further also note that if the upper layer does the validation while the IP layer maintains EMTU_S, there must be an interface from the upper layer to the IP layer to tell if a packet is valid or not before the IP layer updates the MTU estimate. This seems actually more complicated than this one sentences indicates.

2) Also as Ekr says, I also have problems to fully understand this normative text in section 4:
"After receiving a Packet Too Big message, a node MUST attempt to
   avoid eliciting more such messages in the near future.  The node MUST
   reduce the size of the packets it is sending along the path.  Using a
   PMTU estimate larger than the IPv6 minimum link MTU may continue to
   elicit Packet Too Big messages.  Since each of these messages (and
   the dropped packets they respond to) consume network resources, the
   node MUST force the Path MTU Discovery process to end.

   Nodes using Path MTU Discovery MUST detect decreases in PMTU as fast
   as possible."
I especially don't understand the first part, given that a PTB message may still indicate a MTU that is larger than the minimum link MTU which then may cause another PTB message later on the path. This text reads like if you receive one PTB message you should better end discovery and fall back to the minimum link MTU to avoid any further PTB message and not waist any resources. I don't think that's the intention and as such I don't understand when it is recommended to end discovery here...?

3) Section 5.2 seems to be written with only single homed hosts in mind. It might be good to advise that the pmtu information should always be stored on a per interface basis...?

4) Also section 5.2:
You only advise to store information per flow ID, however, if the flow label is not used, wouldn't it make really sense to just use the 5-tuple instead? Also note that EMCP is often done based on the 5-tuple or even 6-tuple (with the ToS field).

5) And more in section 5.2:
"When a Packet Too Big message is received, the node determines which
   path the message applies to based on the contents of the Packet Too
   Big message. "
MAYBE:
"When a valid Packet Too Big message is received, the node determines which
   path the message applies to based on the contents of the Packet Too
   Big message."
And further on:
"If the tentative PMTU is less than the existing PMTU estimate, the
   tentative PMTU replaces the existing PMTU as the PMTU value for the
   path."
This doesn't cover the case where a pmtu probe with a larger size was send and the PTB message returns a larger value then stored. Maybe state this explicitly.

This applies similar to this sentence in section 6:
OLD
"A node, however, should never raise its estimate of the
      PMTU based on a Packet Too Big message, so should not be
      vulnerable to this attack.“
NEW
"A node, however, MUST NOT raise its estimate of the
      PMTU based on a Packet Too Big message that is not a (validated) response to a PMTU probe that was previously send by this node, so should not be
      vulnerable to this attack."

6) Further section 5.2:
Should this statement be maybe upper case MUST:
"The packetization layers must be notified about decreases in the PMTU. "

7) Technical comment on section 5.3 in general:
There is a difference between aging if a flow is active or not. While I maybe don't want to probe again for this connection because my application already decided to use a mode where it can live with the current pmtu and it's too much effect to switch, I really want to probe at the beginning of the next connection again to check if I can use a different mode now. While the IP layer does not have a notion of connection it can observe if packets are frequently send with the same 5-tuple and reset the cached pmtu after a certain idle time.

8) Section 5.4: should this maybe be normative, at least the last MUST NOT (be fragmented):
"A packetization layer (e.g., TCP) must track the PMTU for the path(s)
   in use by a connection; it should not send segments that would result
   in packets larger than the PMTU, except to probe during PMTU
   discovery (this probe packet must not be fragmented to the PMTU). "


Nit:
The abbreviation PTB is only used once in section 4 (and never expanded).

Terry Manderson No Objection

Alexey Melnikov No Objection

(Kathleen Moriarty) No Objection

Comment (2017-05-10 for -06)
No email
send info
Thanks for the agreed text update from the SecDir review that should show up in the next revision:
https://mailarchive.ietf.org/arch/msg/secdir/TSP93gEx0QW9WDOHUK3X3ipGiMk

I also agree with others that use of RFC2119 and ensuring consistent use of normative language would be helpful.

Eric Rescorla No Objection

Comment (2017-05-08 for -06)
No email
send info
Document: draft-ietf-6man-rfc1981bis-06.txt

OVERALL
I see in the shepherd's writeup that you have opted not to cite RFC
2119, but that makes the mixed case use of SHOULD/MUST even more
confusing. I would suggest that at minimum you go through the document
and evaluate whether each should/must should be capitalized, though I
would prefer a cite to 2119.

For instance:
   changed.  Therefore, attempts to detect increases in a path's PMTU
   should be done infrequently.

Is this normative?


I also share the concerns others have raised about whether, given the
actual state of PMTU this is something we should be making IS, but
I'm willing to bow to the majority here.


S 3.
   Note that Path MTU Discovery must be performed even in cases where a
   node "thinks" a destination is attached to the same link as itself.

I think you need to qualify this must because you just said above that
you don't need to if you use the minimum. Perhaps:

   Note that even when a node "thinks" a destination is attached to
   the same link as itself, it might have a PMTU lower than the
   link MTU...


S 4.

   Nodes SHOULD appropriately validate the payload of ICMPv6 PTB
   messages to ensure these are received in response to transmitted
   traffic (i.e., a reported error condition that corresponds to an IPv6
   packet actually sent by the application) per [ICMPv6].

This seems like it ought to be a MUST. Is there a good reason why
it is not? Perhaps also a cite to how one validates.


   When a node receives a Packet Too Big message, it MUST reduce its

a valid Packet Too Big message, I think because in graf 2 you say you
should validate.


   elicit Packet Too Big messages.  Since each of these messages (and
   the dropped packets they respond to) consume network resources, the
   node MUST force the Path MTU Discovery process to end.

It's not clear to me what the requirement is.

   Nodes using Path MTU Discovery MUST detect decreases in PMTU as fast
   as possible.  Nodes MAY detect increases in PMTU, but because doing

Same thing, what are you requiring. How could I be nonconformant to
this?

S 5.
   This section discusses a number of issues related to the
   implementation of Path MTU Discovery.  This is not a specification,
   but rather a set of notes provided as an aid for implementers.

However, this section contains a lot of normative language. Is that all
non-normative?


S 5.3.
   If the stale PMTU value is too large, this will be discovered almost
   immediately once a large enough packet is sent on the path.  No such
   mechanism exists for realizing that a stale PMTU value is too small,
   so an implementation SHOULD "age" cached values.  When a PMTU value
   has not been decreased for a while (on the order of 10 minutes), the
   PMTU estimate should be set to the MTU of the first-hop link, and the
   packetization layers should be notified of the change.  This will
   cause the complete Path MTU Discovery process to take place again.

Is this really good advice for TCP? It seems like if you have a
situation where it required several attempts to get the true PMTU (for
instance, if you have successively narrower tunnels), then a PMTU
reset could have a pretty material impact on throughput.


S 6.
      dropped.  A node, however, should never raise its estimate of the
      PMTU based on a Packet Too Big message, so should not be
      vulnerable to this attack.

I get that this is now not a normative statement but rather a claim
about what nodes who follow the MUST NOT in S 4, but it might still
be better to make it a MUST to avoid confusion.

Alvaro Retana (was Discuss) No Objection

Adam Roach No Objection

Comment (2017-05-10 for -06)
No email
send info
I'd like to add my voice to the concerns expressed regarding RFC2119 language. I understand the desire to only deal with the changed parts of the specification; but the current process as I understand it is that bis versions of document are expected to be "brought up to code" according to modern IETF document practices. Perhaps we should have a conversation about whether that practice is in need of revising, but I'm not sure making piecemeal exceptions is the best way to go about starting that conversation. In any case, I believe that current practice is that new RFCs cite 2119 and adhere to its definitions when using these specific terms in all-caps; and that this will be published as a new RFC.

Minor technical comment: Like others, I also had a really hard time with the paragraph in section 4 concluding with "MUST force the Path MTU Discovery process to end." It's difficult to read this as anything *other* than "you get one Path Too Big, and just shut down discovery," but that's clearly not what the rest of the document says. (I'll also note that if we are treating capital "MUST" as normative, then "MUST attempt to" is kind of meaningless).

Minor technical comment: Section 5.4 has three paragraphs, starting "Alternatively, the retransmission could be done in immediate response to a notification" that propose a more aggressive means of dealing with packets lost to PMTU issues; most of this text is a warning about how this can go awry and (if you'll excuse a bit of hyperbole) melt the Internet. Given that the first alternative works just fine and appears to be much safer, is this alternative actually something we want to recommend for today's implementations?

The remainder of my comments are editorial.

The second-to-last paragraph of the introduction uses the phrase "such messages" in a way that makes the antecedant difficult to find. I spent a while trying to figure out how PMTUD used TCP three-way-handshakes or blackholed TCP packets to determine PMTU. Suggest: "..relies on ICMPv6 messages to determine..."

The abbreviation "PTB" appears in the second paragraph of section 4. I would ordinarily suggest expanding on first use; but as this is this first and only use, I suggest simply replacing it with the long form used in the rest of the document.

Section 5.1 introduces the term MMS_S, and relates it to EMTU_S. I note that the former is not in the terminology section, while the latter is -- I suspect that they should both be present.

Section 5.2 uses the acronym "ECMP". I would suggest citing the related document in which ECMP is defined and optionally expanding the acronym.

Section 5.2 indicates:

   Also, the instance that sent the packet that elicited the Packet Too
   Big message should be notified that its packet has been dropped, even
   if the PMTU estimate has not changed, so that it may retransmit the
   dropped data.

It is quite nonintuitive how this situation could arise: if the packet is of size X, and the PMTU has not changed, then it follows that X <= PMTU (as the packet would have been reduced in size otherwise). If the PMTU has not changed, it also follows that PMTU <= MTU (from the Packet Too Big message). it follows that X <= MTU (from the Packet Too Big message), and so it really should not have been dropped unless the corresponding router has an implementation flaw of some kind. More importantly, it would seem that an attempt to transmit another packet of size X at this point would run an overwhelmingly high chance of triggering another Packet Too Big for whatever errant reason caused the first one to be sent.

I'm sure this naïve explanation overlooks whatever nonintuitive situation is envisioned by this paragraph. It would be quite helpful if such a situation were described: I, as an implementor, would look at this and say "What? No. I'm not doing that. It's extra work for no benefit."

I suspect it will be dealt with by the RFC editor, but the first normative reference seems to have some kind of issue with the production of the authors' names.

I see several acknowledgements in section B.1 that will be removed prior to publication. The authors may wish to consider moving these names to section 7 for the sake of posterity.