Network Working Group                                         V. Kashyap
INTERNET DRAFT                                  Sequent Computer Systems
Expiration date:  9 August 1998                               9 Feb 1998


                Modification in Datagram Too Big message
                    <draft-kashyap-too-big-00.txt>

Status of this Memo


   This document is an Internet Draft. Internet Drafts are working
   documents of the Internet Engineering Task Force (IETF), its areas,
   and its working groups. Note that other groups may also distribute
   working documents as Internet Drafts.

   Internet Drafts are draft documents valid for a maximum of six
   months, and may be updated, replaced, or obsoleted by other
   documents at any time. It is inappropriate to use Internet Drafts
   as reference material or to cite them other than as ``work in
   progress''.

   To learn the current status of any Internet Draft, please check the
   ``1id-abstracts.txt'' listing contained in the Internet Drafts
   shadow directories on ftp.is.co.za (Africa), nic.nordu.net
   (Europe), munnari.oz.au (Pacific Rim), ds.internic.net (US East
   Coast), or ftp.isi.edu (US West Coast).

   This memo provides information for the Internet community. This
   memo does not specify an Internet standard of any kind.
   Distribution of this memo is unlimited.



Abstract

   This memo describes a small modification in the 'Datagram Too Big'
   message for both the IPv4(ICMPv4) and IPv6(ICMPv6) standards. The
   document addresses possible reduction in resources on large servers
   when implementing the Path MTU discovery process.


Table of Contents

1.   Introduction
2.   Changes to Datagram Too Big message
2.1  IPv6 Datagram Too Big message
2.2  IPv4 Datagram Too Big message
3.   Router specification
4.   Impact on host implementation of Path MTU Discovery
5.   Security Considerations
6.   References
7.   Author's Address




1. Introduction

   When one IP host has a large amount of data to send to another
   host, the data is transmitted as a series of IP datagrams. It is
   usually preferable that these datagrams be of the largest size that
   does not require fragmentation anywhere along the path from the
   source to the destination. This datagram size is referred to as the
   Path MTU (PMTU), and it is equal to the minimum of the MTUs of each
   hop in the path.

   This shortcoming is overcome by the use of the path MTU discovery
   process as outlined in [1] and [2]. Datagram Too Big is defined in
   [1] and [2].

   With the current specification of Datagram too Big the source host
   gets to know that there is a bottleneck in the path somewhere. It
   cannot aggregate this information to share with other connections
   (unless they are to the same destination). Thus the source host has
   to cache the path information on a per host basis. Any
   representation for the path may be used but in all the current
   implementations (that I am aware of) of Path MTU discovery the path
   information is kept as a routing table entry. The receipt of a
   Datagram Too Big message causes a routing table entry to be created
   for the destination host.


   Any reduction in this table size reduces the resources utilized to
   keep this information and in searching through the large routing
   table.


   The suggestion in this document is to attain the following
   advantages :

        . Aggregation of paths having the same PMTU

        . Reduction in resources utilized to store the Path MTU

        . Speed up in the MTU updates to all connections on a host
          rather than each discovering it independently.

        . On deletion of a network route on the host, each of the
          derived routes/paths has to be deleted too. With the
          reduction in the number of such caches it would take less
          time and simpler algorithms can be used.

        . utilities need to list a smaller set of routes. eg.
          netstat.

        . routing protocols need to exchange smaller tables and/or not
          weed through a large set of derived routes.




2. Changes to  Datagram Too Big message


2.1     IPv6 'Datagram too Big'


       0                   1                   2                   3
       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      | Ver   | Pri     |               Flow Label                    |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                                                               |
      |                      Source Address                           |
      |                                                               |
      |                                                               |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                                                               |
      |                      Destination Address                      |
      |                                                               |
      |                                                               |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |     Type: 2     |  mask  code   |       Checksum              |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |         Maximum Transmission Unit Size                        |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+


                The mask  translates as:

                prefix mask = -1 << mask code

   With the addresses being 128 bits long the code required in 8 bits
   long which fits in the unused ICMP code octet.

   The route/path for the Path MTU process is identified using the
   final destination address ANDed to the number of prefix bits. The
   path/route may also include the flow id [3] but that does not
   effect this discussion. It is yet another component of the path
   identification.


   A mask code of 0 implies all 1s ie. a host route. This is exactly
   same behaviour as the current definition. A value of 128 implies
   that the router used its default route. An indication of default
   route does not provide any information though. It can be considered
   to be equivalent to the host route case.

   If the ICMP message is received as a response to a Multicast
   address the prefix mask information may not be useful.





2.2     IPv4  'Datagram too Big'

       0                   1                   2                   3
       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |   Type = 3    |   Code = 4    |           Checksum            |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |           mask code           |         Next-Hop MTU          |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |      Internet Header + 64 bits of Original Datagram Data      |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+


      The mask code is defined as follows :


      Code              Mask                    Comment
      ----              -----                   -------

      0                 255.255.255.255         host route (default)
      32                0.0.0.0                 default route
      31                128.0.0.0
      .
      .
      .
      8                 255.255.255.0
      .
      .
      .
      1                 255.255.255.254



   The above table is determined :

                mask = -1 << mask code

   The route/path MTU cache entry is identified by the final
   destination AND the mask.



3. Router specification

   The router in addition to returning the next hop MTU (as is done
   now) returns the mask code. The mask code as per the table 1,
   informs the end host of the net mask used by the router in
   determining the path to the final destination.

   IPv4

   Currently the routers return 0 in the 16 bits used by the mask
   code. Hence a router implementing the existing ICMP Datagram too
   Big message will be interpreted exactly as now ie. create a host
   specific route (or Path MTU cache).





   IPv6

   Currently the 8 bits for the icmp code are unused. Hence a router
   implementing the existing ICMP Datagram too Big message will be
   interpreted exactly as now ie. create a host specific route (or
   Path MTU cache).



4. Impact on host implementation of Path MTU Discovery



   When a host receives Datagram Too Big message for a connection it
   has no way of knowing whether the subnet the peer belongs to is
   behind the bottleneck. As a result the host is forced to create a
   path specifically for the peer. This information cannot be shared
   with another connection attempted to another host which may be on
   the same sub network and behind the same bottleneck.

   With the modification suggested in section 2 such sharing of
   information becomes possible.


4.1 Case 1

   Consider two different connections endpoint Ca to Cx and Cb to Cy.

        +----+      +------+  M1  +------+      +------+     +----+
        | Ca |----->|  Ra  |----->|  Rb  |----->|  Rc  |---->| Cx |
        | Cb |      |      |      |      |----->|      |--+  +____+
        +____+      +------+      +------+      +------+  |
                                                          |
                                                          |
                                                          V
                                                       +------+
                                                       |  Cy  |
                                                       +------+
                                fig 1


   The first MTU reduction occurs on the path from Ra to Rb. Instead
   of creating a separate per path route for both Ca and Cb the host
   may keep both the connections using the same route (or any other
   cache to store the pmtu). Currently the host will have to create
   two entries, one for each connection.




4.2 Case 2

  Routing change occurs such that the PMTU for Cb is M2

        +----+      +------+  M1  +------+      +------+     +----+
        | Ca |----->|  Ra  |----->|  Rb  |----->|  Rc  |---->| Cx |
        | Cb |      |      |      |      |      |      |--+  +____+
        +____+      +------+      +------+      +------+  |
                                                          |
                                                          |M2
                                                          V
                                                       +------+
                                                       |  Cy  |
                                                       +------+

                                fig 2

   A Datagram too Big message will be received for the connection Ca
   to Cb. The host can utilize the information in the 'mask code' to
   create a more specific route.

   The above two cases can be extended to 1000s to connections on the
   two paths considered.


4.3 Case 3



        +----+      +------+      +------+      +------+ M1  +----+
        | Ca |----->|  Ra  |----->|  Rb  |----->|  Rc  |---->| Cx |
        | Cb |      |      |      |      |      |      |--+  +____+
        +____+      +------+      +------+      +------+  |
                                                          |
                                                          |
        Ha                                                V
                                                       +------+
                                                       |  Cy  |
                                                       +------+
                                                        Hy
                                fig 3


   The route to Cx from Rc is determined by the route x.y.255.255 but
   the path to Cy is determined by x.y.z.255 (considering an IPv4
   inter network. The PMTU to Cx is determined by M1 at Rc but the
   path to Cy is the same as the first hop MTU all the way from the
   host Ha.

   If the connection Ca to Cx is made first the Ha will have an entry
   corresponding to Ha to the network x.y.255.255. This will cause the
   connection to Cy to use the same information as determined for the
   connection to Cx. Similar situation can occur if the topology
   change occurred while the connections Ca-Cx, Cb-Cy were active as
   in fig 1 to fig 3.




   Since the information is shared between the connections it is
   possible that at the 10 minute interval (as suggested in [1] and
   [2]) host Ha may fail to determine the increased Path MTU to Hy.


   This problem can be avoided if the end hosts implement the policy:

        . on a new connection use a non-PMTU discovered route/path.

        . at every probe time (if the host has data to send) use the
        original route. This will cause a rediscovery of the paths.

4.4 Case 4

   If the Datagram Too Big message returns the code indicating that
   the router used the default route, it may be taken equivalent to
   the indication of the use of a host route.

   If the information returned indicates a more general route than the
   route that was used then the information must be discarded and it
   be considered to be a host route mask. The local routing
   information is received using a routing protocol or set up by an
   administrator and must not be overridden.

5. Security considerations


   If the Datagram Too Big message returns a more general route than
   was used by the host, the indication is taken equivalent to the
   host route mask. This blocks the host from being fed faulty network
   information. The host may however be sent Datagram Too Big messages
   indicating the default route. The end host will end up creating
   host routes instead of subnet routes. This is no different from
   what happens now. A code that indicates a more precise route does
   not have any effect on theflow of data or the path MTU information
   related to the path.




6. References

[1]  J.Mogul, S.Deering. Path MTU Discovery, RFC 1191, November 1990.

[2]  J. McCann, S. Deering, J. Mogul. Path MTU Discovery for IP version 6.
     RFC 1981, August 1996

[3]  S. Deering,  R. Hinden, "Internet Protocol, Version 6 (IPv6) Specification"
     RFC 1883, December 1995.

[4] Conta, A., and S. Deering, "Internet Control Message Protocol
    (ICMPv6) for the Internet Protocol Version 6 (IPv6) Specification",
    RFC 1885, December 1995.



7. Author's Address


Vivek Kashyap
Sequent Computer Systems, Inc.
15450, SW Koll Parkway
Beaverton, OR 97006

ph. 503 - 578 3422

email: viv@sequent.com