Internet Engineering Task Force                         Ed Ellesson
INTERNET DRAFT                                          Steven Blake
draft-ellesson-tos-00.txt                               IBM
                                                        November 1997
                                                        Expires May 1998


       A Proposal for the Format and Semantics of the TOS Byte and
              Traffic Class Byte in IPv4 and IPv6 Headers



Status of this Memo

   This document is an Internet-Draft.  Internet-Drafts are working
   documents of the Internet Engineering Task Force (IETF), its areas,
   and its working groups.  Note that other groups may also distribute
   working documents as Internet-Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   To view the entire list of current Internet-Drafts, please check the
   "1id-abstracts.txt" listing contained in the Internet-Drafts Shadow
   Directories on ftp.is.co.za (Africa), ftp.nordu.net (Europe),
   munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast), or
   ftp.isi.edu (US West Coast).


Abstract

   This draft proposes an arrangement of fields in the IPv4 TOS byte
   [RFC795, RFC1349], and in the IPv6 Traffic Class byte [IPv6], and
   proposes a definition for their associated semantics. The intention
   is to enable the preservation of currently useful differential class-
   based queueing behavior on existing network devices, while
   simultaneously enabling new methods of bandwidth allocation and
   policing via drop preference, all within the context of flows which
   may be encrypted at the IP layer using IPSEC.

   (Note: the IPv6 Class field has recently been expanded to eight bits,
   but this in not yet available in any version of the specification).


1. Introduction

   Many intranets have deployed class-based queuing and/or class-based
   buffer management for a number of years.  Routers and/or other
   network edge devices in these networks typically classify each packet
   received based on IP and TCP/UDP header fields which offer clues
   as to the application type or traffic type, and which therefore


Ellesson/Blake              Expires 5/98                        [Page 1]


INTERNET-DRAFT      draft-ellesson-tos-00.txt             November, 1997

   reflect the relative business importance of the communicating
   application.  These clues include the source/destination port
   numbers, protocol id, and source/destination addresses.  These
   classification criteria are then used to direct packets to
   statically configured queues, or to identify the packets for drop
   eligibility within the network device.

   With the advent of IPSEC encryption of the IP payload, it is no
   longer possible for routers and other network devices to look inside
   the payload of an encrypted IP packet to extract the protocol id or
   source/destination port numbers for the purpose of learning the
   traffic type.  As a result, the functionality of packet
   classification and differential queueing/dropping based on traffic
   type is lost when migrating applications to a network that supports
   IPSEC, unless a different mechanism is enabled for packet
   classification.

   (The other circumstance when the classification information is lost
   is IP fragmentation).

   Commercial enterprises which are considering migrating their private
   corporate network traffic to use the Internet, must ensure the
   security and privacy of their traffic, while, at the same time,
   preserving the precedence of certain traffic types over others in
   order to support service level agreements with their users. If no
   solution is standardized, either the deployment of VPN's and
   extranets based on IPSEC will be impeded, or the current ability to
   prioritize by traffic type will be lost.


2. Requirements

   The following requirements drove the proposed solution:

   -ability to prioritize packets by traffic class, for both delay and
    drop preference, in a standardized, multivendor-interoperable way.

   -ability to support explicit congestion notification, as proposed by
    [ECN94, ECN97, Clark97].

   -ability to isolate certain transport protocols (especially those
    using non-TCP congestion control algorithms) and/or traffic types
    into separate traffic classes for individualized queueing treatment
    in a network device, according to the needs of individual users and
    network operators and network administrators in the context of
    intranets, extranets, and VPN's [Blake97].

   -ability to provision minimum bandwidth available to specific traffic
    classes (for starvation avoidance), either via marking at the
    network edge for dropping priority (new capability), or via setting
    dropping thresholds, leaky bucket parameters, and/or minimum drain
    rate in each queue that is dedicated to a specific traffic class
    (existing capability extended to work within the IPSEC context).


Ellesson/Blake              Expires 5/98                        [Page 2]


INTERNET-DRAFT      draft-ellesson-tos-00.txt             November, 1997

   -ability to mark packets for differential service treatment at the
    packet source, or at an intervening network device, such as a router
    or traffic shaper/policer, for example, at the edge of the access to
    an ISP's wide area network.

   -avoid introduction of a migration penalty for current best effort
    users of a network which has partially deployed profile-meter
    marking.

   -ability to provide all the above functionality in the context of a
    network that supports IPSEC encryption.  (Encryption may be
    implemented either at the packet stream source, or in a downstream
    device.  Packet marking, via the TOS byte or the Traffic Class byte,
    must be introduced at, or upstream of, the encrypting device).

   -same semantics for IPv4 and IPv6, to enable coexistence/migration.


3. Proposed Solution

   IPv4 TOS Byte (and IPv6 Traffic Class Byte) Definition:

                0   1   2   3   4   5   6   7
              +---+---+---+---+---+---+---+---+
              |CE |ECT|DP | Service Class |MBZ|
              +---+---+---+---+---+---+---+---+

   Explicit Congestion Notification (ECN):

     CE: Congestion Experienced (1 bit)
         (router-, or traffic shaper/policer-settable)
        - 0: no congestion experienced
        - 1: congestion experienced

     ECT: ECN-capable transport (1 bit):
          (source host-settable)
        - 0: transport protocol not ECN-capable
        - 1: transport protocol ECN-capable

    Notes on Explicit Congestion Notification:

        -There are two proposals for the use of the CE bit.  In [ECN94]
         and [ECN97], the CE bit is set stochastically based on random
         early detection of congestion (when ECT is set) [RED].  In
         [Clark97], the CE bit is set deterministically for every packet
         under impending congestion, and then the CE bit is filtered
         downstream by a receiver profile meter.  The proposed mapping
         supports both approaches.  Routers shall support the behavior
         specified in [ECN97] by default.  Routers shall support the
         behavior specified in [Clark97] as a configuration option (to
         allow the deployment of receiver-based service allocation).




Ellesson/Blake              Expires 5/98                        [Page 3]


INTERNET-DRAFT      draft-ellesson-tos-00.txt             November, 1997

   Drop Preference (DP) (1 bit):

        - 0: not discard-eligible/in-profile/loss-sensitive
        - 1: discard eligible/out-of-profile/loss-insensitive

     Notes on Drop Preference:

        -[Clark97]- and [Feng97]-style service-allocation
         implementations should use the DP bit to signal in- or out-of-
         profile packets.  Integrated Services policers shall use the DP
         bit to signal non-conformant packets.  Routers shall not re-
         order packets within a flow when the DP bit is toggled.

   Explicit Service Class (4 bits):

        - 0001: Delay Insensitive (minimize cost/worse than best effort)
        - 0000: Normal            (best effort)
        - 1000: Interactive Delay (low average delay)
        - 1001: Low Maximum Delay (ie: low delay/jitter, eg. real time)
        - 0010: Network Control   (maximize reliability)
        - 0100: Maximize Throughput
        - 0011: Network-specific 1
        - 0101: Network-specific 2
        - 0110: Network-specific 3
        - 0111: Network-specific 4
        - 1010: Network-specific 5
        - 1011: IntServ (low priority),    Network-specific 6
        - 1100: IntServ (medium priority), Network-specific 7
        - 1101: IntServ (high priority),   Network-specific 8
        - 1110: Reserved 1
        - 1111: Reserved 2

     Notes on Explicit Service Class:

        -The specific value assignments noted above were chosen to
         preserve backwards compatibility with [RFC1349].

        -The value assignments noted above are shown out of numerical
         order to highlight proximity of interpretation.

        -The first four service classes noted above are ranked in order
         of increasing delay priority.

        -Any of the above service classes may have specific minimum
         bandwidth allocations, delay priorities, and/or drop
         thresholds configured within the routers.  The means of
         configuring these parameters are beyond the scope of this
         specification.

         (One possible implementation method is to direct packets
         with a specific service class to an associated class-based
         queue [CBQ].  This permits the service class to act as an index



Ellesson/Blake              Expires 5/98                        [Page 4]


INTERNET-DRAFT      draft-ellesson-tos-00.txt             November, 1997

         into a queue that has been configured for a particular traffic
         type, in the same way that the port numbers are used today on
         commercial intranets.  Note that the combination of source/
         destination address, protocol id, and service class can be
         used in the same way, that is, as an index into a queue.)

        -The network-specific code points are available to be used with
         any application/traffic type, or set of application types,
         agreed to by the network administrator/network operator and
         the end user/service subscriber.  The network administrator
         may provision resources for each network-specific service class
         as appropriate to provide the level of performance required for
         traffic mapped to that class.  The network administrator may
         associate any network-specific service class with a particular
         drop or delay priority.  By specifying minimum bandwidths per-
         class (mechanism for doing this is outside the scope of this
         draft), the network administrator can avoid starvation of lower
         priority flows.

         -The network-specific code points are not ranked in any implied
          order of loss, delay, or throughput priority.

        -Routing and other network control protocols using this mapping
         which require prioritized handling or reliable delivery by the
         network shall be marked with service class '0010' (Network
         Control).

        -IntServ code values '1011', '1100', '1101', are intended for
         use by packets of a reserved Integrated Services flow where
         RSVP aggregation is deployed [GBH97].  Packets within an
         aggregated reservation should be mapped to one of the three
         service classes (depending on the IntServ traffic class) to
         facilitate packet classification.  The router implementation
         must isolate traffic in these service classes from traffic
         which has not been policed at an RSVP aggregation point.  The
         IntServ service class implementations must prevent non-
         conformant packets (marked by DP) from degrading the QoS of
         other flows within the same service class.

         If RSVP aggregation is not deployed, then these code values
         are available as network-specific service classes.  If
         additional IntServ service classes are desired, they may be
         allocated from the network-specific code points at the
         discretion of the network administrator.

        -Service class-specific routing (i.e., TOS routing) may be
         implemented at the option of the network administrator.
         Specification of the routing metrics to be associated with
         each service class is beyond the scope of this draft.






Ellesson/Blake              Expires 5/98                        [Page 5]


INTERNET-DRAFT      draft-ellesson-tos-00.txt             November, 1997

        -Interaction of network-specific code points with DP and ECN
         field values:

           If supported by network policy, an edge device may instead
           use the DP marking of out-of-profile traffic to provide
           minimum bandwidth guarantees, rather than using minimum
           bandwidths configured in each router.

           In addition, drop and congestion control may be provided
           individually within each service class.  That is, packets may
           be marked for drop eligibility or for explicit congestion
           notification within a specific allocated bandwidth,
           configured for a specific service class of traffic, rather
           than relative to the entire bandwidth available on an
           outgoing interface.

           The service class implementation may choose to ignore the CE,
           ECT, or DP bits.

        -Reserved codepoints are available for future standardization or
         experimentation.

   MBZ (1bit):

         Must be zero.  Reserved for use on experimental networks with
         TOS Byte or Traffic Class Byte definition other than above.


4. Network Interoperability

   Service providers which exchange traffic and support differentiated
   services via service-class-value-marked packets, should either agree
   to compatible definitions for Network-specific values, or they should
   agree to map the Network-specific values into one of the standardized
   values at their interconnection point.

   A scaleable administrative mechanism for managing the mapping of
   traffic type to service class, and from service class to service
   class (across a domain boundary) is key to the manageable deployment
   of this solution on a wide scale.  Scaleable administrative
   mechanisms are beyond the scope of this draft.


5. Backwards Compatibility With RFC 795

   [Ferg97] suggests the usage of the IPv4 Precedence field to signify
   the drop preference of in-profile or out-of-profile packets, as
   defined for example by [Clark97].  Out-of-profile packets would be
   marked with lower precedence than in-profile packets.  Routers which
   implemented preferential discard based on the semantics of [RFC795]
   would preferentially discard out-of-profile packets in times of
   impending congestion.



Ellesson/Blake              Expires 5/98                        [Page 6]


INTERNET-DRAFT      draft-ellesson-tos-00.txt             November, 1997

   A potential problem that may occur during the phase of partial
   deployment of traffic profile meters is that the bulk of existing
   best-effort traffic is marked with the "Routine" precedence value
   '000'. This un-metered traffic which enters a network implementing
   precedence dropping would be treated as out-of-profile.  It is not
   clear that this is always the correct choice, which motivated our
   decision to abandon the existing semantics of the Precedence field
   and explicitly allocate a drop preference bit.

   However, it is also the case that many routing protocol
   implementations transmit their packets with "Internetwork Control"
   precedence '110', as specified in [RFC1812].  During a transition
   period where not all routers have been upgraded to use the proposed
   service class mapping for network control (service class '0010'), it
   may be valuable to provide backwards compatibility with RFC 795
   Precedence semantics.

   We know of two approaches to achieve this:

   -utilize the MBZ bit as an indicator of the version of the TOS/Class
    mapping semantics

      - 0: compliant with [RFC795] (Precedence) and RFC 1349 (TOS)
      - 1: compliant with the proposed mapping

   -utilize the service class '0000' to indicate that the mapping
    is compliant with [RFC795] (Precedence) and RFC 1349 (TOS).

   In the first approach, the MBZ bit is consumed as a version
   indicator.

   In the second approach, the service class '0000' is consumed, but the
   MBZ bit remains reserved for future specification.

   The authors believe it to be the case that existing routing protocols
   typically use a zero TOS value, and further, that most best-
   effort traffic utilizes the zero TOS value.  Existing routing traffic
   transmitted with a non-zero precedence and zero TOS would continue to
   receive preferential queueing by routers which implemented either
   of the above approaches.

   Best-effort traffic which was transmitted with zero precedence and
   zero TOS (which we believe to include the bulk of Internet data
   traffic) and which was not metered would not receive degraded service
   from routers which implemented either of the above TOS approaches
   (this could be configurable in routers implementing the proposed
   mapping).

   Traffic generated by hosts or routers which have not implemented the
   proposed TOS semantics and which utilize a non-zero TOS value would
   be mapped into the corresponding service class by routers
   implementing the proposed semantics (unless the traffic was remapped
   upstream by some other device).  Since the proposed semantics are


Ellesson/Blake              Expires 5/98                        [Page 7]


INTERNET-DRAFT      draft-ellesson-tos-00.txt             November, 1997

   compatible with the TOS classes defined in [RFC1349], this is no
   more of a potential problem than in the case where hosts or routers
   which have implemented the proposed TOS mapping are able to send
   traffic mapped into a service class without network authorization or
   monitoring. (A scaleable mechanism for network devices to remotely
   acquire network authorization policy are beyond the scope of this
   draft.)

   One effect of this change is that the "Normal" service class can no
   longer be utilized by hosts, routers, and profile-meters implementing
   the new proposed TOS semantics (since "Normal" equals "old"
   semantics).  Best-effort traffic which does not require service
   differentiation, but wishes to take advantage of ECN, for example,
   would need to specify an alternative service class, such as
   "Delay Insensitive" ('0001'), or one of the reserved classes (one of
   these options would be standardized).

   It should be noted that backwards compatibility is proposed *FOR IPV4
   ONLY*.  The [RFC795] Precedence semantics would never be utilized by
   IPv6 routers, and the "Normal" service class ('0000') would be
   available for best-effort traffic not requiring service
   differentiation.


6. Security Considerations

   Security considerations are not discussed in this memo.


7. References

   [Blake97]  S. Blake, "Some Issues and Applications of Packet Marking
              for Differentiated Services", Internet Draft
              <draft-blake-diffserv-marking-00.txt>, November 1997.

   [Clark97]  D. Clark and J. Wroclawski, "An Approach to Service
              Allocation in the Internet", Internet Draft
              <draft-clark-diff-svc-alloc-00.txt>, July 1997.

   [CBQ]      S. Floyd and V. Jacobson, "Link-sharing and Resource
              Management Models for Packet Networks", IEEE/ACM
              Transactions on Networking, Vol. 3 no. 4, pp. 365-386,
              August 1995.

   [ECN94]    S. Floyd, "TCP and Explicit Congestion Notification",
              ACM Computer Communications Review, Vol. 24 no. 5, pp. 10-
              23, October 1994.

   [ECN97]    K. Ramakrishnan and S. Floyd, "A Proposal to Add Explicit
              Congestion Notification (ECN) to IPv6 and to TCP",
              Internet Draft <draft-kksjf-ecn-00.txt>, November 1997.




Ellesson/Blake              Expires 5/98                        [Page 8]


INTERNET-DRAFT      draft-ellesson-tos-00.txt             November, 1997

   [Feng97]   W. Feng, D. Kandlur, D. Saha, and K. Shin, "Adaptive
              Packet Marking for Providing Differentiated Services in
              the Internet", Univ. Michigan Technical Report
              CSE-TR-347-97, October 1997,
              http://www.eecs.umich.edu/~wuchang/work/pmg.ps.Z.

   [Ferg97]   P. Ferguson, "Simple Differential Services: IP TOS and
              Precedence, Delay Indication, and Drop Preference,
              Internet Draft <draft-ferguson-delay-drop-00.txt>,
              November 1997.

   [GBH97]    R. Guerin, S. Blake, and S. Herzog, "Aggregating RSVP-
              based QoS Requests", Internet Draft
              <draft-guerin-aggreg-rsvp-00.txt>, November 1997.

   [IPv6]     S. Deering and R. Hinden, "Internet Protocol, Version 6
              (IPv6) Specification", Internet Draft
              <draft-ietf-ipngwg-ipv6-spec-v2-00.txt>, July 1997.

   [RED]      S. Floyd and V. Jacobson, "Random Early Detection Gateways
              for Congestion Avoidance", IEEE/ACM Transactions on
              Networking, August 1993.

   [RFC795]   J. Postel, "Service Mappings", Internet RFC 795, September
              1981.

   [RFC1349]  P. Almquist, "Type of Service in the Internet Protocol
              Suite", Internet RFC 1349, July 1992.

   [RFC1812]  F. Baker, editor, "Requirements for IP Version 4 Routers",
              Internet RFC 1812, June 1995.

   [SIMA]     K. Kilkki, "Simple Integrated Media Access (SIMA)",
              Internet Draft <draft-kalevi-simple-media-access-01.txt>,
              June 1997.




















Ellesson/Blake              Expires 5/98                        [Page 9]


INTERNET-DRAFT      draft-ellesson-tos-00.txt             November, 1997

Appendix A:  Specifying Multiple Drop Preference Levels

   Because the concept of service class is completely general, it is
   possible to utilize different service classes to represent different
   drop preference levels, as would be needed for example by [SIMA].
   Shown below is a possible mapping implementing eight separate drop
   preference levels (where higher drop preference results in higher
   probability of loss):

     DP  Service Class (Value)  (Name)               Drop Preference
     --  ---------------------  ------------------   ---------------
      0         0011            Network-specific 1         0
      1         0011            Network-specific 1         1
      0         0101            Network-specific 2         2
      1         0101            Network-specific 2         3
      0         0110            Network-specific 3         4
      1         0110            Network-specific 3         5
      0         0111            Network-specific 4         6
      1         0111            Network-specific 4         7


Appendix B:  Specifying Multiple Delay Priorities

   As was mentioned in Sec. 3, there are four service classes defined
   which specify relative delay priority (three for IPv4 if backwards
   compatibility with [RFC795] is required):

        - 0001: Delay Insensitive (minimize cost/worse than best effort)
        - 0000: Normal            (best effort)
        - 1000: Interactive Delay (low average delay)
        - 1001: Low Maximum Delay (ie: low delay/jitter, eg. real time)

   If additional provisioned levels of delay priority are required, they
   can be implemented using the Network-specific service classes.  Shown
   below is a possible mapping implementing eight separate delay
   priorities (where a higher priority results in lower average/maximum
   delay):

     Service Class (Value)  (Name)               Delay Priority
     ---------------------  ------------------   --------------
            0001            Delay Insensitive          0
            0011            Network-specific 1         1
            0101            Network-specific 2         2
            1000            Interactive Delay          3
            0110            Network-specific 3         4
            0111            Network-specific 4         5
            1001            Low Maximum Delay          6
            1010            Network-specific 5         7







Ellesson/Blake              Expires 5/98                       [Page 10]


INTERNET-DRAFT      draft-ellesson-tos-00.txt             November, 1997


Authors' Addresses

   Ed Ellesson
   JDGA/501
   IBM Corporation
   4205 S. Miami Blvd.
   Research Triangle Park, NC 27709
   Phone: +1-919-254-4115
   Fax:   +1-919-254-6243
   E-mail: ellesson@raleigh.ibm.com


   Steven Blake
   E95/664
   IBM Corporation
   800 Park Offices Drive
   Research Triangle Park, NC  27709
   Phone:  +1-919-254-2030
   Fax:    +1-919-254-5483
   E-mail: slblake@raleigh.ibm.com


































Ellesson/Blake              Expires 5/98                       [Page 11]