TUBA Working Group                                  D. Piscitello
Internet Draft                           Core Competence, Inc.
Expires 9 November 1994                            9 May 1994

File name draft-ietf-tuba-mtu-00.txt


CLNP Path MTU Discovery

Status of this Memo

This document is an Internet Draft. Internet Drafts are
working documents of the Internet Engineering Task Force
(IETF), its Areas, and its Working Groups. Note that other
groups may also distribute working documents as Internet
Drafts.

Internet Drafts are draft documents valid for a maximum of
six months. Internet Drafts may be updated, replaced, or
obsoleted by other documents at any time. It is not
appropriate to use Internet Drafts as reference material or
to cite them other than as a "working draft" or "work in
progress."

Please check the 1id-abstracts.txt listing contained in the
internet-drafts Shadow Directories on nic.ddn.mil,
nnsc.nsf.net, nic.nordu.net, ftp.nisc.sri.com, or
munnari.oz.au to learn the current status of any Internet
Draft.

Distribution of this memo is unlimited. Comments should be
submitted to the tuba@lanl.gov mailing list.

Abstract

This memo describes a technique for dynamically discovering
the maximum transmission unit (MTU) of an arbitrary CLNP
path. The mechanism described here is applicable to both
"pure-stack" OSI as well as TUBA/CLNP [6] environments, i.e.,
environments where Internet transport protocols (UDP and TCP)
are operated over CLNP. The memo specifies a small change to
the way routers generate one type of CLNP Error Report. For a
path that passes through a router that has not been changed,
this technique might not discover the correct Path MTU, but
it will always choose a Path MTU as accurate as, and in many
cases more accurate than, the Path MTU that would be chosen
by current practice.

Acknowledgements

The mechanism proposed here was first suggested by Geof
Cooper,and incorporated into RFC 1191 [1], Path MTU

Piscitello                                            [page 1]


TUBA Working Group                    CLNP Path MTU Discovery

Discovery, by Jeff Mogul and Steve Deering. The excellent
work of these folks readily extends to CLNP-based internets.


1. Introduction

ISO/IEC 8473, Protocol for Providing the Connectionless
Network Service, [2] is a network layer datagram protocol. As
is the case for hosts in IP-based internets, a CLNP-based
host that has a large amount of data to send to another CLNP-
based host transmits that data as a series of CLNP datagrams.
The desire to reduce or eliminate fragmentation is the same
in CLNP-based internetworking environments as for IP [3].
(Refer to [4] for arguments against fragmentation.). It is
thus desirable to define a mechanism that determines the
largest size datagram that does not require fragmentation
anywhere along the path from the source to the destination;
this is referred to as the Path MTU (PMTU), and it is equal
to the minimum of the MTUs of each hop in the path.

A shortcoming of the current OSI protocol suite is the lack
of a standard mechanism for a host to discover the PMTU of an
arbitrary path. This document addresses this shortcoming by
applying a mechanism demonstrated to be effective on IP-based
internets.

ISO/IEC 8473 indicates that minimum subnetwork service data
unit size an underlying service must offer to CLNP is 512
octets. This is as close as OSI comes to specifying a host
requirement on what is referred to in Internet literature as
a maximum segment size (MSS, [5]). The current practice in
CLNP-based internets is to use the smaller of 512 and the
first-hop MTU as the PMTU for any destination that is not
connected to the same subnetwork as the source. This often
results in the use of  smaller CLNP datagrams than necessary,
because it is increasingly the case that paths supporting
CLNP offer a PMTU greater than 512. As is the case with IP, a
host that sends CLNP  datagrams smaller than the Path MTU
allows wastes Internet resources and applications operating
on that host are provided suboptimal throughput.

Future routing protocols may be required to provide accurate
PMTU information within a routing domain, although perhaps
not across multi-level routing hierarchies. Like IP networks,
CLNP-based networks need a simple mechanism that discovers
PMTUs without wasting resources within a routing domain, and
in interdomain communications exchanges as well. The
mechanism described here should serve the community until
(and perhaps beyond) such time as routing protocol extensions
are developed and deployed.

Piscitello                                           [Page 2]


TUBA Working Group                    CLNP Path MTU Discovery

2. Protocol overview

The technique of using the Don't Fragment (DF) bit in the IP
header to dynamically discover the PMTU of an IP path is
easily extended to CLNP by using the Segmentation Permitted
(SP) flag in the CLNP header. The basic idea from RFC 1191
extends to CLNP in the following manner. A source CLNP host
initially assumes that the MTU of a path is the (known) MTU
of its first hop, and sends all datagrams on that path with
the SP flag set to zero. If any of the datagrams are too
large to be forwarded without fragmentation by some router
along the path, that router will discard them and return a
CLNP Error Report message with the Reason for Discard
parameter set to the value indicating "segmentation needed
but not permitted". Upon receipt of such a message
(consistent with RFC 1191, this is referred to as a "Datagram
Too Big" message), the source host reduces its assumed PMTU
for the path. Since the mechanism relies on the generation of
an Error Report message by a router along the path, hosts
MUST NOT set the Suppress Error Report flag in CLNP headers
when attempting Path MTU discovery.

The PMTU discovery process ends when a host's estimate of the
PMTU is low enough that its datagrams can be delivered
without fragmentation. Alternatively, the host could end the
discovery process by setting the SP flag to one in the
datagram headers; it could do so, for example, because it is
willing to have datagrams fragmented in some circumstances.
Normally, the host continues to set the SP flag to zero in
all datagrams, so that if the route changes and the new PMTU
is lower, the lower PMTU will be discovered.

The Datagram Too Big message as originally specified in ICMP
[7] did not report the MTU of the hop for which the rejected
datagram was too big; the CLNP Error Report fails in this
regard as well, so again, the source host cannot tell exactly
how much to reduce its assumed PMTU given the information
returned in the Error Report message. To remedy this, a new
option is defined for CLNP, the Next-Hop-MTU option, which
shall have the same semantics as the corresponding parameter
in the ICMP header as specified in RFC 1191; i.e., this field
shall be used to report the MTU of what RFC 1191 refers to as
the "constricting (next) hop". This is the only change
specified needed for routers to fully support CLNP PMTU
Discovery.

The PMTU of a path may change over time, due to changes in
the routing topology. Reductions of the PMTU are indicated by
Datagram Too Big messages.


Piscitello                                           [Page 3]


TUBA Working Group                    CLNP Path MTU Discovery

Hosts that choose to implement MTU discovery and cease the
process by setting the SP flag to one change the composition
of the CLNP header (by forcing the addition of a segmentation
part). RFC 1191 suggests that IP hosts that implement MTU
discovery will normally continue to set the DF bit in all
datagrams to detect PMTU changes resulting from routing
changes; it is STRONGLY RECOMMENDED that under the same
circumstances, CLNP hosts follow suit, and to continue to
transmit datagrams in the discovery mode.

To detect increases in a PMTU, a host may periodically
increase its assumed PMTU.  As is the case with IPv4, this
will almost always result in CLNP datagrams being discarded
and Datagram Too Big messages being generated, because in
most cases the PMTU of the path will not have changed, so the
increase "probe" should be done infrequently.

Note: this mechanism essentially guarantees that a CLNP host
will not receive any fragments from a peer doing PMTU
Discovery, so if hosts continue to operate in MTU discovery
mode, it will aid in interoperating with "segmentation-
challenged" hosts; i.e., hosts that are unable to reassemble
fragmented datagrams as a result of having implemented the
non-segmenting subset rather than the full version of CLNP
(These are also distinguished from the "data transfer-
challenged" hosts that only implemented the inactive network
layer protocol.)


3. Host specification

When a host receives a Datagram Too Big message, it MUST
reduce its estimate of the PMTU for the relevant path, based
on the value of the Next-Hop-MTU field in the Error Report
message (see section 4). The precise behavior of a host in
this circumstance is not specified here, since different
applications may have different requirements, and different
implementation architectures may favor different strategies.

After receiving a Datagram Too Big message, a host MUST avoid
eliciting more such messages in the near future. The host has
two choices; (a) reduce the size of the datagrams it sends
along the path as it receives MTU information from the
routers, or (b) set the segmentation flag in the CLNP header
and use segmentation. A host MUST force the PMTU Discovery
process to converge.

Hosts performing PMTU Discovery MUST detect decreases in Path
MTU as fast as possible. Hosts MAY detect increases in PMTU,
but since (a) doing so requires sending datagrams larger than

Piscitello                                           [Page 4]


TUBA Working Group                    CLNP Path MTU Discovery

the current estimated PMTU, and (b) it is likely is that the
PMTU will not have increased, this MUST be done at infrequent
intervals. Consistent with RFC 1191 recommendations for IP,
an attempt to detect an increase by sending a CLNP datagram
larger than the current estimate MUST NOT be done less than 5
minutes after a Datagram Too Big message has been received
for the given destination, or less than one minute after a
previous, successful attempted increase. The recommended
setting of these timers is twice their minimum values (10 and
2 minutes, respectively).

Hosts MUST be able to deal with "pre-PMTU discovery" Error
Reports, since it is not feasible to upgrade all the routers
in an internet in any finite time. These are distinguished
from new Error Report messages because they contain a Reason
for Discard parameter indicating that "segmentation is needed
but not permitted", but DO NOT contain the Next-Hop-MTU
parameter (see section 4). Section 5 discusses possible
strategies a host may follow in response to an old-style
Datagram Too Big message (one sent by an unmodified router).

RFC 1191 recommends that a host MUST never reduce its
estimate of the PMTU below 68 octets (the value of 68 octets
guarantees that 8 octets of user data can be transmitted
given a TCP header of 20 octets and an IPv4 header of 40
octets, see RFC 791). CLNP implementations should not allow
the MTU size to be configured to be less than 512 octets. A
CLNP host SHOULD NEVER reduce its estimate of the PMTU below
512 octets.

Note: this is preferred over insisting that a TUBA host never
reduce its estimate of the Path MTU below 80 octets,
hereafter referred to as "the Somewhat-less-than-official
CLNP minimum MTU"; the value of 82 guarantees that a minimum
of 8 octets of user data can be transmitted, given a TCP
header of 20 octets, and assuming a CLNP header composed of a
fixed part (9 octets), address part (42 octets), and a
padding parameter of 3 octets.

A host MUST not increase its estimate of the Path MTU in
response to the contents of a Datagram Too Big message. A
message purporting to announce an increase in the Path MTU
might be a stale datagram that has been floating around in
the Internet, a false packet injected as part of a denial-of-
service attack, or the result of having multiple paths to the
destination.

3.1. TCP MSS Option

A host performing CLNP PMTU Discovery must obey the rule that

Piscitello                                           [Page 5]


TUBA Working Group                    CLNP Path MTU Discovery

it not send datagrams larger than 512 octets unless it has
permission from the receiver. For TCP connections, this means
that a CLNP host must not send datagrams larger than 74
octets plus the Maximum Segment Size (MSS) sent by its peer.

Note: In RFC 879, the TCP MSS is defined to be the relevant
IP datagram size minus 40, where 40 represents what is
referred to as the "liberal or optimistic" assumption
regarding TCP and IP header size (20 octets each); the
default of 576 octets for the maximum IP datagram size in
this scenario yields a default of 536 octets for the TCP MSS.
Using CLNP, with a correspondingly liberal and optimistic
assumption about CLNP header size (52 octets), the default
CLNP MSS of 512 octets yields a default of 440 octets for the
TCP MSS.

Hosts SHOULD not lower the value they send in the MSS option;
doing so prevents the PMTU Discovery mechanism from
discovering PMTUs larger than the default TCP MSS. For
TUBA/CLNP hosts, the TCP MSS option should be 74 octets less
than the size of the largest datagram the host is able to
reassemble (MMS_R, as defined in RFC 1122 [8]). In many
cases, this will be the architectural limit of 65461 (65535 -
74) octets. A host MAY send an MSS value derived from the MTU
of its connected network (the maximum MTU over its connected
networks, for a multi-homed host); this should not cause
problems for PMTU Discovery, and may dissuade a broken peer
from sending enormous datagrams.

Note: RFC 1191 recommends that hosts refrain from sending an
MSS greater than the architectural limit of 65535 minus the
IP header size. This recomendation applies for TUBA/CLNP
hosts as well (i.e., do not use a value greater than 65461).

4. Router specification

When a router is unable to forward a datagram because (a) the
datagram length exceeds the MTU of the next-hop network, (b)
the SP flag is set to zero in the datagram header, indicating
that segmentation may not be performed on this datagram, and
(c) the Suppress Error Reports flag is reset, the router MUST
attempt to return an Error Report message to the source of
the datagram, with the Reason for Discard parameter code set
to indicate "segmentation required but not permitted".

To support the Path MTU Discovery technique specified in this
memo, a router MUST include the MTU of the constricting next-
hop network in a new Next-Hop-MTU parameter in the Error
Report header. The format of the Next-Hop-MTU parameter is
illustrated in Figure 4.1.

Piscitello                                           [Page 6]


TUBA Working Group                    CLNP Path MTU Discovery

        0          1          2          3
        01234567 89012345 67890123 45678901
       +--------+--------+--------+--------+
       |   Code | Length |   (value of)    |
       |11000010|  (4)   |  Next-Hop-MTU   |
       +--------+--------+--------+--------+

   Figure 4.1. Next-Hop-MTU parameter for CLNP

The value carried in the Next-Hop MTU field is the size in
octets of the largest CLNP datagram that could be forwarded,
along the path of the original datagram, without being
fragmented at this router. The size includes the CLNP header
and CLNP data, and does not include any lower level headers.

This field MUST never contain a value less than 512, since
every router must be able to forward a datagram of 512 octets
without fragmentation.


5. Host processing of old-style Error Report messages

RFC 1191 outlines several possible strategies a host may
follow upon receiving a Datagram Too Big message from a
router that has not implemented the next-hop-MTU parameter.
This section describes the strategies as they apply to
TUBA/CLNP hosts; however, the discussion here is limited to
the strategies that RFC 1191 identifies as tractable.

This section is not part of the protocol specification.

The simplest thing for a CLNP host to do in response to a
Datagram Too Big message is to assume that the PMTU is the
minimum of its currently-assumed PMTU and 512, and to stop
setting the SP flag in datagrams sent on that path. Thus, the
host falls back to the same PMTU as it would choose under
current practice. This strategy terminates quickly and does
no worse than existing practice, but it fails to avoid
fragmentation in some cases, and fails to make the most
efficient utilization of the internetwork in other cases.

More sophisticated strategies involve "searching" for an
accurate PMTU estimate, by continuing to send datagrams with
the SP flag reset while varying datagram sizes.

A good search strategy is one that obtains an accurate
estimate of the Path MTU without causing many packets to be
lost in the process. Several strategies apply algorithmic
functions to the previous PMTU estimate to generate a new
estimate.

Piscitello                                           [Page 7]


TUBA Working Group                    CLNP Path MTU Discovery

The strategy recommended in RFC 1191 for IP applies to CLNP
hosts. It begins with the assumption that there are
relatively few MTU values in use in the Internet, so the
search can be constrained to include only the MTU values that
are likely to appear. In RFC 1191, Mogul and Deering make the
additional assumption that designers tend  to choose MTUs in
similar ways, so they collect groups of similar MTU values
and use the lowest value in the group as a search "plateau",
suggesting that it is better to underestimate an MTU by a few
per cent than to overestimate it by one.

Section 7 provides a table of representative MTU plateaus for
use in PMTU estimation, derived from RFC 1191, but extended
to include technologies that have emerged since its
publication. With this table, convergence is as good as
binary search in the worst case, and is far better in common
cases. Since the plateaus lie near powers of two, if an MTU
is not represented in this table, the algorithm will not
underestimate it by more than a factor of 2.

In RFC 1191, Mogul and Deering note that any search strategy
must have some "memory" of previous estimates in order to
choose the next one, and suggest that the information
available in the Datagram Too Big message itself can be used
for this purpose. Like ICMP Destination Unreachable messages,
all CLNP Error report messages contain the header of the
original datagram, which contains the Total Length of the
datagram that was too big to be forwarded without
fragmentation (note that when the SP flag is reset, the total
length of the CLNP datagram is recorded in the Segment Length
field). Since this Total Length may be less than the current
PMTU estimate, but is nonetheless larger than the actual
PMTU, it may be a good input to the method for choosing the
next PMTU estimate.

The strategy recommended for IP in RFC 1191, and for CLNP in
this document, is to use as the next PMTU estimate the
greatest plateau value that is less than the returned Total
Length field.


6. Host implementation

In RFC 1191, Mogul and Deering discuss how PMTU Discovery is
implemented in host software. Those aspects of the discussion
that are applicable to CLNP MTU Discovery are discussed here.
The issues include:

- - What layer or layers implement PMTU Discovery?
- - Where is the PMTU information cached?

Piscitello                                           [Page 8]


TUBA Working Group                    CLNP Path MTU Discovery

- - How is stale PMTU information removed?
- - What must transport and higher layers do?

6.1. Layering

In the IP architecture, the choice of what size datagram to
send is made by a transport or higher layer protocol, i.e., a
layer above IP. Mogul and Deering call such protocols
"packetization protocols". They explain how implementing PMTU
Discovery in the packetization layers simplifies some of the
inter-layer issues,

but has several drawbacks, and conclude that the IP layer
should store PMTU information and that the ICMP layer should
process received Datagram Too Big messages.

In the OSI, the functions ascribed to ICMP and IP are both
provided in the same (connectionless network) layer. The
division of function between the packetization and network
layer changes slightly. The packetization layers must still
respond to changes in the Path MTU by changing the size of
the datagrams they send, and must also be able to specify
when datagrams are to be sent with the SP flag reset. (As is
the case with IP, the network (CLNP) layer does not simply
reset the SP bit in every packet, since it is possible that a
packetization layer, perhaps a UDP or application outside the
kernel, is unable to change its datagram size.)

To support this layering in IP, packetization layers require
an extension of the network service interface defined in [8];
for CLNP, this is similarly described as follows:

A way to learn of changes in the value of MMS_S, the "maximum
send transport-message size", which is derived from the Path
MTU by subtracting the minimum CLNP header size (52 octets).

Applying the OSI service model, this interaction might take
the form of an OSI network service primitive; i.e., an N-
MSS_S-CHANGE.indication. (For completeness, one may wish to
extend the N-UNITDATA.request primitive in [9] to enable
transport-entities to signal that the SP flag is to be
reset.)

6.2. Storing PMTU information

The general guidelines for storing PMTU information are the
same for CLNP as IP. The network (CLNP) layer should
associate each PMTU value that it has learned with a specific
path, identified by a source address, a destination address,
a CLNP quality-of-service, and if implemented, a security

Piscitello                                           [Page 9]


TUBA Working Group                    CLNP Path MTU Discovery

classification. This association can be stored as a field in
the routing table entries. A host will not have a route for
every possible destination, but it should be able to cache a
per-host route for every active destination. This requirement
is already imposed by the need to process ES-IS Redirect
messages [10].

Mogul and Deering describe PMTU storing guidelines for IP,
which also apply to CLNP. When the first packet is sent to a
host for which no per-host route exists, a route is chosen
either from the set of per-network routes, or from the set of
default routes. The PMTU fields in these route entries should
be initialized to be the MTU of the associated first-hop data
link, and must never be changed by the PMTU Discovery
process. (PMTU

Discovery only creates or changes entries for per-host
routes). Until a Datagram Too Big message is received, the
PMTU associated with the initially-chosen route is presumed
to be accurate.

When a Datagram Too Big message is received, the network
layer determines a new estimate for the Path MTU (either from
a non-zero Next-Hop-MTU value in the Error Report message, or
using the method described in section 5). If a per-host route
for this path does not exist, then one is created (as if a
per-host ES-IS Redirect is being processed; the new route
uses the same first-hop router as the current route). If the
PMTU estimate associated with the per-host route is higher
than the new estimate, then the value in the routing entry is
changed.

The packetization layers must be notified about decreases in
the PMTU (for example, through an implementation equivalent
of the primitive earlier described). Any packetization layer
instance (for example, a TCP connection) that is actively
using the path must be notified if the PMTU estimate is
decreased. Even if the Datagram Too Big message contains an
original datagram header that refers to a UDP packet, the TCP
layer must be notified if any of its connections use the
given path. (The same would be true for CLTP and TP-4
connections in OSI internets.)

The packetization layer instance that sent the CLNP datagram
that elicited the Datagram Too Big message should be notified
that its datagram has been dropped, even if the PMTU estimate
has not changed, so that it may retransmit the dropped
datagram. This notification can be asynchronously generated
by the network (CLNP) layer, or the notification can be
postponed until the packetization instance next attempts to

Piscitello                                          [Page 10]


TUBA Working Group                    CLNP Path MTU Discovery

send a CLNP datagram larger than the PMTU estimate. In the
latter approach, if one assumes that an N-UNITDATA.request is
used to model the request to send a datagram, and the
primitive is extended to include the ability to twiddle the
SP flag, and the datagram is larger than the PMTU estimate,
the send function should fail and return a suitable error
indication. In RFC 1191, Mogul and Deering suggest that this
approach may be more suitable to a connectionless
packetization layer (such as one using UDP), which may be
hard to "notify" from the ICMP (or network) layer; this
should not be the case for CLNP, however, if so, the normal
timeout-based retransmission mechanisms would be used to
recover from the dropped datagrams.

Mogul and Deering are careful to note that the notification
to the packetization layer instances using the path about the
change in the PMTU is distinct from the notification of a
specific instance that a packet has been dropped. The latter
should be done as soon as practical (i.e., asynchronously
from

the point of view of the packetization layer instance), while
the former may be delayed until a packetization layer
instance wants to create a packet. Retransmission should be
done for only those packets that are known to be dropped, as
indicated by a Datagram Too Big message. This applies to CLNP
Path MTU discovery for TUBA/CLNP environments as well.

6.3. Purging stale PMTU information

RFC 1191 provides guidelines for aging PMTU information.
Similar guidelines apply for TUBA/CLNP MTU discovery.

Because (under normal circumstances) a host performing CLNP
PMTU Discovery always resets the SP bit, a stale PMTU value
(one that is too large) will be discovered almost immediately
once a datagram is sent to the given destination. No such
mechanism exists for determining that a stored PMTU value is
too small, so an implementation SHOULD "age" cached PMTU
values. When a PMTU value has not been decreased for some
time (on the order of 10 minutes), the PMTU estimate SHOULD
be set to the first-hop data-link MTU, and the packetization
layers should be notified of the change. This will cause the
complete PMTU Discovery process to take place again.

Note: an implementation should provide a means for changing
the timeout duration, including setting it to "infinity". In
RFC 1191, Mogul and Deering cite the example of hosts
attached to an FDDI network, which is then attached to the
rest of the Internet via a slow serial line; such hosts will

Piscitello                                          [Page 11]


TUBA Working Group                    CLNP Path MTU Discovery

never discover a larger, non-local PMTU, so they should not
be subjected to dropped datagrams every 10 minutes.

An upper layer MUST not retransmit datagrams in response to
an increase in the PMTU estimate, since this increase never
comes in response to an indication of a dropped datagram.

RFC 1191 and this memo recommend that PMTU aging be
implemented by adding a timestamp field to the routing table
entry. This field SHOULD be initialized to a "reserved" value
that indicates that the PMTU has never been changed. Whenever
the PMTU is decreased in response to a Datagram Too Big
message, the timestamp is set to the current time. Once a
minute thereafter, a timer-driven procedure should run
through the routing table, and for each entry whose timestamp
is not "reserved" and is older than the timeout interval,

- - set the PMTU estimate to the MTU of the associated first
  hop

- - notify the packetization layers using this route of the
  increase.

PMTU estimates may disappear from the routing table if the
per-host routes are removed; this can happen in response to
an ES-IS Redirect message, or because certain routing-table
daemons delete old routes after several minutes. Also, on a
multi-homed host a topology change may result in the use of a
different source interface. When this happens, if the
packetization layer is not notified then it may continue to
use a cached PMTU value that is now too small. RFC 1191 and
this memo suggest that the packetization layer be notified of
a possible PMTU change whenever a Redirect message causes a
route change, and whenever a route is deleted from the
routing table.

6.4. TCP layer actions

RFC 1191 provides guidelines for TCP layers when Path MTU
discovery is being performed. Similar guidelines apply for
TUBA/CLNP MTU discovery.

The TCP layer must track the PMTU for the destination of a
connection; it should not send datagrams that would be larger
than this. A simple implementation could ask the network
(CLNP) layer for this value (using a TUBA/CLNP equivalent of
the GET_MAXSIZES interface described in [8]) each time it
created a new segment, but this could be inefficient.
Moreover, TCP implementations that follow the "slow-start"
congestion-avoidance algorithm [11] typically calculate and

Piscitello                                          [Page 12]


TUBA Working Group                    CLNP Path MTU Discovery

cache several other values derived from the PMTU. It may be
simpler to receive asynchronous notification when the PMTU
changes, so that these variables may be updated.

A TCP implementation must also store the MSS value received
from its peer (which defaults to 440), and not send any
segment larger than this MSS, regardless of the PMTU.

When a Datagram Too Big message is received, it implies that
a datagram was dropped by the router that sent the Error
Report message. It is sufficient to treat this as any other
dropped segment, and wait until the retransmission timer
expires to cause retransmission of the segment. If the PMTU
Discovery process requires several steps to estimate the
right PMTU, this could delay the connection by many round-
trip times.

Alternatively, the retransmission could be done in immediate
response to a notification that the Path MTU has changed, but
only for the specific connection specified by the Datagram
Too Big message. The datagram size used in the retransmission
should be no larger than the new PMTU.

Note: Retransmissions MUST not be sent in response to every
Datagram Too Big message, since a burst of several oversized
segments will give rise to several such messages and hence
several retransmissions of the same data. Mogul and Deering
note that if the new estimated PMTU is still wrong, the
process repeats, and there is an exponential growth in the
number of superfluous segments sent.

The TCP layer must be able to recognize when a Datagram Too
Big notification actually decreases the PMTU that it has
already used to send a datagram on the given connection, and
should ignore any other notifications.

Many TCP implementations now incorporate "congestion
advoidance" and "slow-start" algorithms to improve
performance [11, 12]. Unlike a retransmission caused by a TCP
retransmission timeout, a retransmission caused by a Datagram
Too Big message should not change the congestion window. It
should, however, trigger the slow-start mechanism (i.e., only
one segment should be retransmitted until acknowledgements
begin to arrive again).

TCP performance can be reduced if the sender's maximum window
size is not an exact multiple of the segment size in use
(this is not the congestion window size, which is always a
multiple of the segment size). In many systems (such as those
derived from 4.2BSD), the segment size is often set to 1024

Piscitello                                          [Page 13]


TUBA Working Group                    CLNP Path MTU Discovery

octets, and the maximum window size (the "send space") is
usually a multiple of 1024 octets, so the proper relationship
holds by default. If PMTU Discovery is used, however, the
segment size may not be a submultiple of the send space, and
it may change during a connection; this means that the TCP
layer may need to change the transmission window size when
PMTU Discovery changes the PMTU value. The maximum window
size should be set to the greatest multiple of the segment
size (PMTU - 74) that is less than or equal to the sender's
buffer space size.

PMTU Discovery does not affect the value sent in the TCP MSS
option, because that value is used by the other end of the
connection, which may be using an unrelated PMTU value.

6.5. Issues for other transport protocols

Some transport protocols (such as OSI TP4 [13]) are not
allowed to repacketize when doing a retransmission. That is,
once an attempt is made to transmit a datagram of a certain
size, its contents cannot be split into smaller datagrams for
retransmission. In such a case, the original CLNP datagram
should be retransmitted without the SP flag reset, allowing
it to be fragmented as necessary to reach its destination.
Subsequent datagrams, when transmitted for the first time,
should be no larger than allowed by the Path MTU, and should
have the SP flag reset.

The Sun Network File System (NFS) uses a Remote Procedure
Call (RPC) protocol [14] that, in many cases, sends datagrams
that must be fragmented even for the first-hop link. This
might improve performance in certain cases, but it is known
to cause reliability and performance problems, especially
when the client and server are separated by routers. NFS
implementations SHOULD use PMTU Discovery whenever routers
are involved. Most NFS implementations allow the RPC datagram
size to be changed at mount-time (indirectly, by changing the
effective file system block size), but might require some
modification to support changes later on.

Also, since a single NFS operation cannot be split across
several UDP datagrams, certain operations (primarily, those
operating on file names and directories) require a minimum
datagram size that may be larger than the PMTU. NFS
implementations SHOULD NOT reduce the datagram size below
this threshold, even if PMTU Discovery suggests a lower
value. (In this case datagrams should not be sent with SP
flag reset.)



Piscitello                                          [Page 14]


TUBA Working Group                    CLNP Path MTU Discovery

6.6. Management interface

In RFC 1191, Mogul and Deering suggest that an implementation
provide a way for a system utility program to:

- - Specify that PMTU Discovery not be done on a given route

- - Change the PMTU value associated with a given route

The former can be accomplished by associating a flag with the
routing entry; when a packet is sent via a route with this
flag set, the IP layer leaves the DF bit clear no matter what
the upper layer requests. The same can be provided for CLNP
PMTU discovery; when a packet is sent via a route with a
"suppress PMTU discovery" flag set, the network (CLNP) layer
leaves the SP flag reset irrespective of upper layer
requests.

The implementation should also provide a way to change the
timeout period for aging stale PMTU information.


7. Likely values for Path MTUs

The algorithm recommended in section 5 for "searching" the
space of Path MTUs is based on a table of values that
severely restricts the search space. In RFC 1191, Mogul and
Deering describe a table of MTU values that represented all
major data-link technologies in use in the Internet. In this
document, Table 7-1 is revised to consider technologies that
have been introduced to the Internet since the publication of
RFC 1191.

The author has also removed technologies that seem unlikely
transmission media for CLNP; notably, SLIP, WIDEBAND,
1822/ARPANET, Experimental Ethernets, and ARCNET.

Table 7-1 lists data links in order of decreasing MTU, and
groups them so that each set of similar MTUs is associated
with a "plateau" equal to the lowest MTU in the group. As
indicated in RFC 1191, the values in the table, especially
for higher MTU levels, will not remain valid forever; they
are presented here as an implementation suggestion, NOT as a
specification or requirement. Implementors should use up-to-
date references to pick a set of plateaus. It is important
that the table not contain too many entries or the process of
searching for a PMTU might waste Internet resources.
Implementors should also make it convenient for customers
without source code to update the table values in their
systems.

Piscitello                                          [Page 15]


TUBA Working Group                    CLNP Path MTU Discovery

    Plateau     MTU      Comments                  Reference
    ------      ---      --------                  ---------
                65535    Official maximum MTU      RFC 791
                65535    Official maximum NSDU     ISO 8348
                65535    Hyperchannel              RFC 1044
    65535
    32000                Just in case
                17914    16Mb IBM Token Ring       (RFC 1191)
    17914
                 9180    SMDS                      RFC 1209
                 9180    ATM over AAL5             RFC iiii
    9180
                 8166    IEEE 802.4                RFC 1042
    8166
                 4464    IEEE 802.5 (4Mb max)      RFC 1042
                 4352    FDDI (Revised)            RFC 1188
    4352
                 2002    IEEE 802.5 (4Mb)          RFC 1042
    2002
                 1600    Frame Relay (recommended) RFC 1490
                 1600    X.25 Networks             RFC 1356
                 1500    Ethernet Networks         RFC  894
                 1500    Point-to-Point (default)  RFC 1548
                 1492    IEEE 802.3                RFC 1042
    1492
                  512    NETBIOS                   RFC 1088
                  512    Minimum SNSDU size        ISO 8473
    512

Table 7-1: CLNP MTUs in the Internet

7.1. A better way to detect PMTU increases

Rather than detecting increases in the PMTU value by
periodically increasing the PMTU estimate to the first-hop
MTU, it is possible to periodically increase a PMTU estimate
to the lesser of the next-highest value in the plateau table
or the first-hop MTU. If the increased estimate is wrong, at
most one round-trip time is wasted before the correct value
is rediscovered. If the increased estimate is still too low,
a higher estimate will be attempted somewhat later.

Because it may take several such periods to discover a
significant increase in the PMTU, a short timeout period
should be used after the estimate is increased, and a longer
timeout be used after the PTMU estimate is decreased because
of a Datagram Too Big message. For example, after the PTMU
estimate is decreased, the timeout should be set to 10
minutes; once this timer expires and a larger MTU is
attempted, the timeout can be set to a much smaller value

Piscitello                                          [Page 16]


TUBA Working Group                    CLNP Path MTU Discovery

(say, 2 minutes). In no case should the timeout be shorter
than the estimated round-trip time, if this is known.


8. Security considerations

CLNP Path MTU Discovery mechanism is vulnerable to the same
denial-of-service attacks as IP. Both attacks are based on a
malicious party sending false Datagram Too Big messages to an
Internet host. The description of these attacks is repeated
here.

In the first attack, the false message indicates a PMTU much
smaller than reality. This should not entirely stop data
flow, since the victim host should never set its PMTU
estimate below the absolute minimum. Since the minimum MTU is
512, this has less impact than with IP but is nonetheless
intrusive.

In the other attack, the false message indicates a PMTU
greater than reality. If believed, this could cause temporary
blockage as the victim sends datagrams that will be dropped
by some router. Within one round-trip time, the host would
discover its mistake (receiving Datagram Too Big messages
from that router), but frequent repetition of this attack
could cause lots of datagrams to be dropped. A host, however,
should never raise its estimate of the PMTU based on a
Datagram Too Big message, so should not be vulnerable to this
attack.

A malicious party could also cause problems if it could stop
a victim from receiving legitimate Datagram Too Big messages,
but in this case there are simpler denial-of-service attacks
available.


References

[1]  Mogul, J., and S. Deering. Path MTU Discovery, RFC 1191,
     Internet Network Information Center, November 1990.

[2]  ISO/IEC 8473-1992. ISO - Data Communications - Protocol
for
     Providing the Connectionless Network Service, Edition 2.

[3]  Postel, J. Internet Protocol. RFC 791, Internet Network
     Information Center, September 1981.




Piscitello                                          [Page 17]


TUBA Working Group                    CLNP Path MTU Discovery

[4]  Kent, C., and J. Mogul. Fragmentation Considered
     Harmful. Proc. SIGCOMM '87 Workshop on Frontiers in
     Computer Communications Technology. August, 1987.

[5]  Postel, J. The TCP Maximum Segment Size and Related
     Topics. RFC 879, Internet Network Information Center,
     Nov. 1983.

[6]  Piscitello, D. Use of ISO CLNP in TUBA Environments, RFC
     1561, Internet Network Information Center, Dec. 1993.

[7]  Postel, J. Internet Control Message Protocol. RFC 792,
     Internet Network Information Center, September, 1981.

[8]  R. Braden, ed. Requirements for Internet Hosts --
     Communication Layers. RFC 1122, Internet Network
     Information Center, October, 1989.

[9]  ISO/IEC 8348-1992. International Standards Organization
     --Data Communications, OSI Network Service Definition.

[10] ISO/IEC 9542-1992. International Standards Organization
     -- Telecommunications and Information Exchange Between
     Systems, End-system to Intermediate-system exchange
     protocol for use in conjunction with ISO/IEC 8473..

[11] Jacobson, V. Congestion Avoidance and Control. In Proc.
     SIGCOMM '88 Symposium on Communications Architectures
     and Protocols, pages 314-329. Stanford, CA, Aug. 1988.

[12] Van Jacobson, R. Braden, D, Borman. RFC 1323, TCP
     Extensions for High Performance, Internet Network
     Information Center, May 1992.

[13] ISO/IEC 8072. International Standards Organization --
     Open Systems Interconnection. ISO Transport Protocol
     Specification, 1986.

[14] Sun Microsystems, Inc. RPC: Remote Procedure Call
     Protocol. RFC 1057, SRI Network Information Center,
     June, 1988.


Author's Address

David M. Piscitello
Core Competence, Inc.
1620 Tuckerstown Road
Dresher, PA 19025 USA
dave@corecom.com

Piscitello                                          [Page 18]