Internet Engineering Task Force                       K. K. Ramakrishnan
INTERNET DRAFT                                        AT&T Labs Research
draft-kksjf-ecn-00.txt                                       Sally Floyd
                                                                    LBNL
                                                           November 1997
                                                      Expires:  May 1998



A Proposal to add Explicit Congestion Notification (ECN) to IPv6 and to TCP



                          Status of this Memo

   This document is an Internet-Draft.  Internet-Drafts are working
   documents of the Internet Engineering Task Force (IETF), its areas,
   and its working groups.  Note that other groups may also distribute
   working documents as Internet-Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   To view the entire list of current Internet-Drafts, please check the
   "1id-abstracts.txt" listing contained in the Internet-Drafts Shadow
   Directories on ftp.is.co.za (Africa), ftp.nordu.net (Europe),
   munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast), or
   ftp.isi.edu (US West Coast).

Abstract

   This note describes a proposed addition of ECN (Explicit Congestion
   Notification) to IPv6 and to TCP.  First we describe TCP's use of
   packet drops as an indication of congestion.  Next we argue that with
   the addition of active queue management (e.g., RED) to the Internet
   infrastructure, where routers detect congestion before the queue
   overflows, routers are no longer limited to packet drops as an
   indication of congestion, but could instead set an ECN bit in the
   packet header, for ECN-capable transport protocols.  We describe when
   the ECN bit would be set in the routers, and describe what
   modifications would be needed to TCP to make it ECN-capable.
   Modifications to other transport protocols (e.g., unreliable unicast
   or multicast, reliable multicast, other reliable unicast transport
   protocols) could be considered as those protocols advance through the
   standards process.



Ramakrishnan and Floyd       Informational                      [Page 1]


draft-kksjf-ecn     Addition of ECN to IPv6 and TCP        November 1997


   TCP's congestion control and avoidance algorithms are based on the
   notion that the network is a black-box [Jacobson88, Jacobson90].  The
   network's state of congestion or otherwise is determined by end-
   systems probing for the network state, by gradually increasing the
   load on the network (by increasing the window of packets that are
   outstanding in the network) until the network becomes congested and a
   packet is lost.  Treating the network as a "black-box" and treating
   loss as an indication of congestion in the network is appropriate for
   pure best-effort data carried by TCP that has little or no
   sensitivity to delay or loss of individual packets.  In addition,
   TCP's congestion management algorithms have techniques built-in (such
   as fast retransmit and fast recovery) to minimize the impact of
   losses from a throughput perspective.

   However, these mechanisms are not intended to help applications that
   are in fact sensitive to the delay or loss of one or more individual
   packets.  Interactive traffic such as telnet, web-browsing, and
   transfer of audio and video data ("real-audio" and "real-video") can
   be sensitive to packet losses (for unreliable data delivery such as
   UDP) or to the increased latency of the packet caused by the need to
   retransmit the packet after a loss (for reliable data delivery such
   as TCP).

   Since TCP determines the appropriate congestion window to use by
   gradually increasing the window size until it experiences a dropped
   packet, this causes the queues at the bottleneck router to build up.
   With most packet drop policies at the router that are not sensitive
   to the load placed by each individual flow, this means that some of
   the packets of latency-sensitive flows are going to be dropped.
   Active queue management mechanisms that detect congestion before the
   queue overflows, and provide an indication of this congestion to TCP,
   is desirable because it avoids some bad properties of dropping on
   queue overflow, especially with drop-tail schemes.  Drop tail
   introduces synchronization of loss across multiple flows which is
   undesirable.  Indicating incipient congestion means that TCP does not
   have to increase its window size up to the point where a router's
   buffer is filled up. This can reduce queuing delays and avoid
   synchronization, which are desirable characteristics.

2. Random Early Detection (RED)

   Random Early Detection (RED) is a mechanism for active queue
   management that has been proposed to detect incipient congestion
   [FJ93], and is currently being deployed in the Internet backbone
   [RED-ietf-draft].  Although RED is meant to be a general mechanism
   using one of several alternatives for congestion indication, in the
   current environment of the Internet RED is restricted to using packet
   drops as a mechanism for congestion indication.  By dropping packets


Ramakrishnan and Floyd       Informational                      [Page 2]


draft-kksjf-ecn     Addition of ECN to IPv6 and TCP        November 1997


   based on the average queue length exceeding a threshold, rather than
   only when the queue overflows, RED maintains the average queue at a
   smaller level, and improves the delay experienced by the flows.
   However, when RED drops packets before the queue actually overflows,
   RED is not forced by memory limitations to discard the packet.  RED
   could set an Explicit Congestion Notification bit in the packet
   header instead of dropping the packet, if such a bit was provided in
   the IP header and understood by the transport protocol.  The use of
   the Explicit Congestion Notification bit would allow the receiver(s)
   to receive the packet, avoiding the potential for excessive delays
   due to retransmissions after packet losses.

3. Explicit Congestion Notification

   We propose that the Internet provide a congestion indication for
   incipient congestion (as in RED and earlier work [RJ90]) where the
   notification can sometimes be through marking packets rather than
   dropping them.  This would require an ECN field in the IP header with
   two bits.  The ECN-Capable bit would be set by the data sender to
   indicate an ECN-capable transport protocol.  The ECN bit would be set
   by the router to indicate congestion to the end nodes. ([Floyd94]
   outlines a scheme where a single bit could be overloaded to serve the
   function of both the ECN-Capable bit and the ECN bit, but the two-bit
   scheme is more straightforward to explain). We expect that routers
   would provide the congestion indication on incipient congestion as
   indicated by the average queue size, using the RED algorithms
   suggested in [FJ93, RED-ietf-draft].  Routers that have a packet
   arriving at a full queue would drop the packet, just as they do now.

   The congestion control algorithms followed at the end-systems would
   be essentially the same as the congestion control response to a
   *single* dropped packet, for a transport protocol where a dropped
   packet is used as an indication of congestion.  For TCP in
   particular, the source TCP would halve its congestion window "cwnd"
   in response to an ECN indication received by the data receiver.
   However, this action is done only once per window of data (i.e., at
   most once per roundtrip time), to avoid reacting multiple times to
   multiple indications of congestion within a roundtrip time.

4. Proposed Algorithm at the Router

   We describe the proposed algorithm at the router in the context of
   current router implementations.  We assume that the router is capable
   of implementing the probability computation for RED and uses a pure
   packet drop mechanism (e.g., drop from front, drop from tail, or
   random drop) whenever a packet arrives at a full queue.

   When the router's buffer is not yet full and the router is prepared


Ramakrishnan and Floyd       Informational                      [Page 3]


draft-kksjf-ecn     Addition of ECN to IPv6 and TCP        November 1997


   to drop a packet to inform end nodes of incipient congestion, the
   router should first check to see if the ECN-Capable bit is set in
   that packet's IP header.  If so, then instead of dropping the packet,
   the router could instead set the ECN bit in the IP header.  When more
   severe congestion has occurred and the router's queue is full, then
   the router has no choice but to drop some packet when a new packet
   arrives.

   The router determines it is congested if the AVERAGE length of any of
   its queues where packets are waiting to be processed or transmitted
   exceeds a threshold. We believe that the router should use the ECN
   bit to notify that it is congested only when the *average* queue
   length, rather than the instantaneous queue length, exceeds a
   threshold.

   There are potentially several alternatives for estimating the average
   queue length and marking the ECN bit. Since there is considerable
   effort involved already in implementing RED, we believe it is best to
   leverage these efforts for ECN as well.  One potential mechanism for
   the averaging and marking is to perform functions similar to RED
   queue management: RED uses an exponential moving average of the queue
   size.  When the average queue size goes above a lower threshold,
   packets are marked with a probability of marking that increases with
   the average queue size.  (Packets that are not ECN-capable are
   dropped instead of marked.) When the average queue size gets up to or
   above a high threshold, all incoming packets should be dropped
   (assuming that the router intends to control the average queue size
   even in the presence of unresponsive traffic).

   It is anticipated that when all of the source end-systems participate
   in TCP's congestion management mechanisms or other compatible
   congestion control, and respond to ECN by reducing their offered
   load, packet losses would be relatively infrequent.  Packet losses in
   this case would occur primarily during transients and in the presence
   of non-cooperating entities.

   When a packet is received by a router with the ECN bit set indicating
   that congestion was encountered upstream, then the bit is left
   unchanged, and the packet transmitted as usual.

5. Support from the Transport Protocol

   ECN requires support from the transport protocol, in addition to the
   ECN field in the IPv6 packet header.  For TCP, ECN requires two new
   mechanisms:  negotiation between the endpoints during setup to
   determine if they are both ECN-capable, and an ECN-Notify bit in the
   TCP header so that the data receiver can inform the data sender when
   a packet has been received with the ECN bit set.  The support


Ramakrishnan and Floyd       Informational                      [Page 4]


draft-kksjf-ecn     Addition of ECN to IPv6 and TCP        November 1997


   required from other transport protocols is likely to be different,
   particular for unreliable or reliable multicast transport protocols,
   and will have to be determined as other transport protocols are
   brought to the IETF for standardization.  The following sections
   describe in detail the proposed TCP use of ECN.  This is also
   described in [Floyd94].  We assume that the source TCP uses the
   current set of congestion control algorithms of Slow-start, Fast
   Retransmit and Fast Recovery [RFC 2001].

5.1. TCP Initialization

   Initially, the source and destination TCPs exchange the desire and/or
   capability to use ECN in the TCP connection setup phase.  As a result
   of the negotiation, the TCP sender indicates using the ECN-Capable
   bit in the IPv6 header that the transport is capable and willing to
   participate in ECN.  This will indicate to the routers that they may
   mark packets with the ECN bit, if they would like to use that as a
   method of congestion notification. If the TCP connection does not
   wish to use ECN notification, the sending TCP sets the ECN-Capable
   bit equal to 0 (i.e., not set), and the TCP receiver ignores the ECN
   bit in received packets.

5.2. The TCP Sender

   For a connection that expects to use ECN, packets are transmitted
   with the ECN-Capable bit set in the IP header (set to a "1").  If the
   sender receives a TCP acknowledgement with the ECN-Notify bit set in
   the TCP header, then the sender knows that congestion was encountered
   in the network on the path from the sender to the receiver.  The
   indication of congestion should be treated just as a congestion loss
   in non-ECN-Capable TCP. That is, the TCP source halves the congestion
   window "cwnd" and reduces the slow start threshold "ssthresh".  The
   sending TCP does NOT increase the congestion window in response to
   the receipt of an ACK packet with the ECN-Notify bit set.  However, a
   very important difference is that TCP does not react to ECN
   congestion indications more than once every window of data (or more
   loosely, more than once every round-trip time). If a response to the
   ECN-Notify bit was made over the last round-trip time, based on the
   window of packets, then the sending TCP doesn't respond to any
   further ECN messages. If at time "t", the source TCP reacted to an
   ECN, then it notes the packets that are outstanding at that time and
   have not yet been acknowledged. Until all these packets are
   acknowledged, say at time "u", the source TCP does not react to
   another ECN indication of congestion.

   In addition, when a TCP sender receives duplicate acks during the
   time interval between "t" and "u", it does not reduce the congestion
   window.  The result is that decreases in the congestion window occur


Ramakrishnan and Floyd       Informational                      [Page 5]


draft-kksjf-ecn     Addition of ECN to IPv6 and TCP        November 1997


   at most once per roundtrip time.

   When the TCP sender receives a packet with the ECN-Notify bit set,
   and therefore reduces its congestion window, the sender does not need
   to slow-start (as is done in Tahoe TCP in response to a packet drop)
   or to stop sending packets for a period of time to allow the queue to
   dissipate (as is done by Reno TCP for roughly half a round-trip time
   in response to a packet drop).  The ECN-Notify bit being set does not
   indicate the urgent transient congestion state of a buffer overflow.
   Incoming acknowledgements will still arrive to "clock out" outgoing
   packets when allowed by the congestion window.

   TCP follows existing algorithms for sending data packets in response
   to incoming ACKs, multiple duplicate acknowledgements, or retransmit
   timeouts [RFC2001].

5.3. The TCP Receiver

   At the destination end-system, when TCP receives a packet with the
   ECN bit set in the IP header, TCP sets the ECN-Notify bit in the TCP
   header in the returning ACK packet.  We do not provide here any
   notion of destination congestion, because this is already being
   indicated in the receiver's advertised window.

   The destination TCP continues to perform the duplicate ACK procedure
   already specified - to generate a duplicate ACK when an out-of-
   sequence packet is received.

   If there is any ACK withholding implemented, as in current TCP
   implementations where the TCP receiver often sends an ACK for two
   arriving data packets, then the TCP destination will send the OR of
   all the ECN bits of packets that the ACK is acknowledging. That is,
   if any packet is received with the ECN bit set, then the ACK carries
   the ECN-Notify bit set.

5.4. Congestion on the ACK-path

   For the current generation of TCP congestion control algorithms, pure
   acknowledgement packets (e.g., packets that do not contain any
   accompanying data) should be sent with the ECN-capable bit off.
   Current TCP receivers have no mechanisms for reducing traffic on the
   ACK-path in response to congestion notification.  Mechanisms for
   responding to congestion on the ACK-path can be relegated as an area
   for future research.  (One simple possibility would be for the sender
   to reduce its congestion window when it receives a pure ACK packet
   with the ECN bit set). For current TCP implementations, a single
   dropped ACK generally has only a very small effect on the TCP's
   sending rate.


Ramakrishnan and Floyd       Informational                      [Page 6]


draft-kksjf-ecn     Addition of ECN to IPv6 and TCP        November 1997


6. Summary of changes required in IPv6 and TCP

   Two bits need to be specified in the IPv6 header, the ECN-Capable bit
   and the ECN bit.  The ECN-Capable bit set to "0" indicates that the
   transport protocol will ignore the ECN bit.  This is the default
   value.  The ECN-Capable bit set to "1" indicates that the transport
   protocol is willing and able to participate in ECN.

   The default value for the ECN bit is "0".  The router sets the ECN
   bit to "1" to indicate congestion to the end nodes.  The ECN bit in a
   packet header should never be reset by a router from "1" to "0".

   TCP requires two changes, a negotiation phase during setup to
   determine if both end nodes are ECN-capable, and a bit in the TCP
   header (possibly one of the "reserved" bits in the TCP flags field)
   as an ECN-Notify bit so that the receiver can inform the sender of a
   packet received with the ECN bit set.

7. Non-relationship to ATM's EFCI indicator or Frame Relay's FECN

   Since these ATM and Frame Relay mechanisms typically have been
   defined without any notion of average queue size as the basis for
   concluding that there is congestion, we believe that they provide a
   very noisy signal. The interpretation we have here for ECN is NOT the
   appropriate reaction for such a noisy signal of congestion
   notification. It is our belief that such mechanisms would be phased
   out over time within the ATM network.  However, if the routers that
   interface to the ATM network have a way of maintaining the average
   queue at the interface, and use it to come to a conclusion that the
   ATM subnet is congested or otherwise, they may use the ECN
   notification that is defined here.

8. Non-compliance by the End Nodes

   We believe that, for the most part, the fairness properties of TCP
   will not be changed with the introduction of ECN.

   A key issue concerns the vulnerability of ECN to non-compliant end-
   nodes (i.e., end nodes that set the ECN-capable bit in packets, but
   do not respond to the ECN bit itself).  These concerns exist even in
   non-ECN environments.  An end-node could "turn off congestion
   control" by not reducing its congestion window in response to packet
   drops.  We recognize that this is a concern for the current Internet.
   It has been argued that routers will have to deploy mechanisms to
   detect and differentially treat packets from non-compliant flows.  It
   is likely that techniques such as end-to-end per-flow scheduling and
   isolation of one flow from another, potentially accompanied by end-
   to-end reservations, could mitigate such effects. Such isolation


Ramakrishnan and Floyd       Informational                      [Page 7]


draft-kksjf-ecn     Addition of ECN to IPv6 and TCP        November 1997


   mechanisms could remove some of the more egregious effects of non-
   compliance.

   However, even in networks just restricted to packet losses as an
   indication of congestion, several methods have been proposed to
   identify and treat non-compliant or unresponsive flows.  These
   mechanisms would be equally applicable for identifying flows that do
   not respond to ECN.  If anything, routers would have a slightly
   easier time identifying flows that do not respond to ECN.  For
   example, routers can observe packets arriving at the router with the
   ECN bit set, as well as keeping note of packets that have the ECN bit
   set at that router itself.

   It has been argued that dropping packets in itself may be considered
   a deterrrent for non-compliance.  However, we believe that the packet
   drop rates are likely to be reasonably low in environments where ECN
   is deployed.  The reduction in load due to packet drops to deal with
   non-compliant nodes is likely to be small.  The control of congestion
   is more likely to come from end-nodes reacting to congestion - either
   from responding to dropped packets or ECN Notify indications and
   halving the window.  ECN should be used at a router when the average
   queue size is below some high threshold; when the average queue size
   exceeds the high threshold, and therefore packet drop/marking rates
   are higher, our recommendation is that routers drop packets, rather
   then setting the ECN bit in packet headers.  Thus, in scenarios with
   low packet drop rates, the fact that the congestion control
   indications are in the form of packet drops rather than ECN bits does
   not significantly change the negative consequences on the compliant
   flows because of some flow "turning off" congestion control.

   We also do not believe that packet dropping itself is an effective
   deterrent for non-compliance.  Many flows that retransmit dropped
   packets could have an incentive to maintain or even increase their
   sending rate in response to packet drops, rather than decreasing
   their sending rate, in the absence of mechanisms at the router to
   provide a negative deterrance for such behavior.  For example, flows
   that use unreliable transport protocols could simply increase their
   use of FEC in response to an increased packet drop rate, and might
   choose increased FEC and no congestion control.  We believe that the
   effect of packet dropping as a deterrence for non-compliance with
   congestion control mechanisms is quite small.  The possibility of
   non-compliant flows does not offer a compelling reason not to deploy
   ECN.

9. Additional Considerations

   Some care is required to handle the ECN and ECN-Capable bits
   appropriately when packets are encapsulated and un-encapsulated for


Ramakrishnan and Floyd       Informational                      [Page 8]


draft-kksjf-ecn     Addition of ECN to IPv6 and TCP        November 1997


   tunnels.  When the router at the end of the tunnel decapsulates the
   packet, then the ECN bit in the encapsulating ('outside') header
   should be ORed with the ECN bit in the encapsulated ('inside') header
   that remains.  Basically, a 1 in the encapsulating header should be
   copied into the encapsulated header.

   An additional issue concerns packets that have the ECN bit set at one
   router, and are later dropped at another router.  For the proposed
   use for ECN in this paper (that is, for data packets for TCP), this
   is not a concern, because end nodes detect dropped data packets, and
   the congestion response of the end nodes to a dropped data packet is
   at least as strong as the congestion response to a packet received
   with the ECN bit set.  This issue will have to be addressed if ECN
   and ECN-Capable bits are used on pure ACK packets, because in current
   implementations of TCP the drop of an ACK packet is not explicitly
   detected by the end nodes.

   If a packet with the ECN bit is later dropped due to corruption (bit
   errors), the end node should still invoke congestion control, just as
   TCP would today, to a dropped data packet.  This issue would also
   have to be addressed in future proposals for distinguishing between
   packets dropped due to corruption and packets dropped due to
   congestion.

10. Conclusions

   Given the current effort to implement RED, we believe this is the
   right time for router vendors to examine how to also implement
   congestion avoidance mechanisms that do not depend on packet drops
   alone.  With the growth of applications and transports that are
   sensitive to delay and loss of a single packet, depending on packet
   loss as a normal congestion notification mechanism appears to be
   insufficient (or at the very least, non-optimal).

















Ramakrishnan and Floyd       Informational                      [Page 9]


draft-kksjf-ecn     Addition of ECN to IPv6 and TCP        November 1997


REFERENCES

   [FJ93] Floyd, S., and Jacobson, V., "Random Early Detection gateways
   for Congestion Avoidance", IEEE/ACM Transactions on Networking, V.1
   N.4, August 1993, p. 397-413.  URL
   "ftp://ftp.ee.lbl.gov/papers/early.pdf".

   [Floyd94] Floyd, S., "TCP and Explicit Congestion Notification", ACM
   Computer Communication Review, V. 24 N. 5, October 1994, p. 10-23.
   URL "ftp://ftp.ee.lbl.gov/papers/tcp_ecn.4.ps.Z".

   [Floyd97] Floyd, S., and Fall, K., "Router Mechanisms to Support
   End-to-End Congestion Control", Technical report, February 1997.  URL
   "ftp://ftp.ee.lbl.gov/papers/collapse.ps".

   [FRED] Lin, D., and Morris, R., "Dynamics of Random Early Detection",
   SIGCOMM '97, September 1997.  URL
   "http://www.inria.fr/rodeo/sigcomm97/program.html#ab078".

   [Jacobson88] V. Jacobson, "Congestion Avoidance and Control", Proc.
   ACM SIGCOMM '88, pp. 314-329.  URL
   "ftp://ftp.ee.lbl.gov/papers/congavoid.ps.Z".

   [Jacobson90] V. Jacobson, "Modified TCP Congestion Avoidance
   Algorithm", Message to end2end-interest mailing list, April 1990.
   URL "ftp://ftp.ee.lbl.gov/email/vanj.90apr30.txt".

   [RED-ietf-draft] B. Braden, D. Clark, J. Crowcroft, B. Davie, S.
   Deering, D. Estrin, S. Floyd, V. Jacobson, G. Minshall, C. Partridge,
   L. Peterson, K. Ramakrishnan, S. Shenker, J. Wroclawski, L. Zhang,
   "Recommendations on Queue Management and Congestion Avoidance in the
   Internet", Internet draft draft-irtf-e2e-queue-mgt-00.txt, March 25,
   1997.

   [RFC2001] W. Stevens, "TCP Slow Start, Congestion Avoidance, Fast
   Retransmit, and Fast Recovery Algorithms", RFC 2001, January 1997.

   [RJ90] K. K. Ramakrishnan and Raj Jain, "A Binary Feedback Scheme for
   Congestion Avoidance in Computer Networks", ACM Transactions on
   Computer Systems, Vol.8, No.2, pp. 158-181, May 1990.

SECURITY CONSIDERATIONS

   Security issues are not discussed in this document.






Ramakrishnan and Floyd       Informational                     [Page 10]


draft-kksjf-ecn     Addition of ECN to IPv6 and TCP        November 1997


AUTHORS' ADDRESSES


   K. K. Ramakrishnan
   AT&T Labs. Research
   Phone: +1 (973) 360-8766
   Email: kkrama@research.att.com
   URL: http://www.research.att.com/info/kkrama

   Sally Floyd
   Lawrence Berkeley National Laboratory
   Phone: +1 (510) 486-7518
   Email: floyd@ee.lbl.gov
   URL: http://www-nrg.ee.lbl.gov/floyd/


   This draft was created in November 1997.
   It expires May 1998.
































Ramakrishnan and Floyd       Informational                     [Page 11]