Skip to main content

Pacing in Transport Protocols
draft-welzl-iccrg-pacing-00

The information below is for an old version of the document.
Document Type
This is an older version of an Internet-Draft whose latest revision state is "Active".
Authors Michael Welzl , Wesley Eddy
Last updated 2024-07-06
RFC stream (None)
Formats
Stream Stream state (No stream defined)
Consensus boilerplate Unknown
RFC Editor Note (None)
IESG IESG state I-D Exists
Telechat date (None)
Responsible AD (None)
Send notices to (None)
draft-welzl-iccrg-pacing-00
Internet Congestion Control                                     M. Welzl
Internet-Draft                                        University of Oslo
Intended status: Informational                                   W. Eddy
Expires: 7 January 2025                                      MTI Systems
                                                             6 July 2024

                     Pacing in Transport Protocols
                      draft-welzl-iccrg-pacing-00

Abstract

   Applications or congestion control mechanisms can produce bursty
   traffic which can cause unnecessary queuing and packet loss.  To
   reduce the burstiness of traffic, the concept of evenly spacing out
   the traffic from a data sender over a round-trip time known as
   "pacing" has been used in many transport protocol implementations.
   This document gives an overview of pacing and how some known pacing
   implementations work.

About This Document

   This note is to be removed before publishing as an RFC.

   The latest revision of this draft can be found at
   https://mwelzl.github.io/draft-iccrg-pacing/draft-welzl-iccrg-
   pacing.html.  Status information for this document may be found at
   https://datatracker.ietf.org/doc/draft-welzl-iccrg-pacing/.

   Discussion of this document takes place on the Internet Congestion
   Control Research Group mailing list (mailto:iccrg@irtf.org), which is
   archived at https://mailarchive.ietf.org/arch/browse/iccrg.
   Subscribe at https://www.ietf.org/mailman/listinfo/iccrg/.

   Source for this draft and an issue tracker can be found at
   https://github.com/mwelzl/draft-iccrg-pacing.

Status of This Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at https://datatracker.ietf.org/drafts/current/.

Welzl & Eddy             Expires 7 January 2025                 [Page 1]
Internet-Draft        Pacing in Transport Protocols            July 2024

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on 7 January 2025.

Copyright Notice

   Copyright (c) 2024 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents (https://trustee.ietf.org/
   license-info) in effect on the date of publication of this document.
   Please review these documents carefully, as they describe your rights
   and restrictions with respect to this document.  Code Components
   extracted from this document must include Revised BSD License text as
   described in Section 4.e of the Trust Legal Provisions and are
   provided without warranty as described in the Revised BSD License.

Table of Contents

   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   2
   2.  Conventions and Definitions . . . . . . . . . . . . . . . . .   3
   3.  Pacing: general considerations and consequences . . . . . . .   3
     3.1.  More likely to saturate a bottleneck  . . . . . . . . . .   3
       3.1.1.  Backing off after the increase  . . . . . . . . . . .   5
       3.1.2.  Able to work with smaller queues  . . . . . . . . . .   5
     3.2.  Getting good RTT estimates  . . . . . . . . . . . . . . .   5
   4.  Implementation examples . . . . . . . . . . . . . . . . . . .   5
     4.1.  Linux TCP . . . . . . . . . . . . . . . . . . . . . . . .   5
     4.2.  Apple OSes  . . . . . . . . . . . . . . . . . . . . . . .   6
     4.3.  QUIC BBR implementations  . . . . . . . . . . . . . . . .   7
   5.  Security Considerations . . . . . . . . . . . . . . . . . . .   8
   6.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .   8
   7.  References  . . . . . . . . . . . . . . . . . . . . . . . . .   8
     7.1.  Normative References  . . . . . . . . . . . . . . . . . .   9
     7.2.  Informative References  . . . . . . . . . . . . . . . . .   9
   Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . .   9
   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  10

1.  Introduction

   Applications commonly generate either bulk data (e.g. files) or
   bursts of data (e.g. segments of media) that transport protocols
   deliver into the network based on congestion control algorithms.

Welzl & Eddy             Expires 7 January 2025                 [Page 2]
Internet-Draft        Pacing in Transport Protocols            July 2024

   RFCs describing congestion control generally refer to a congestion
   window (cwnd) state variable as an upper limit for either the number
   of unacknowledged packets or bytes that a sender is allowed to emit.
   This limits the sender's transmission rate at the granularity of a
   round-trip time (RTT).  If the sender transmits the entire cwnd sized
   data in an instant, this can result in unnecessarily high queuing and
   eventually packet losses at the bottleneck.  Such consequences are
   detrimental to users' applications in terms of both responsiveness
   and goodput.  To solve this problem, the concept of pacing was
   introduced.  Pacing allows to send the same cwnd sized data but
   spread it across a round-trip time more evenly.

   Congestion control specifications always allow to send less than the
   cwnd, or temporarily emit packets at a lower rate.  Accordingly, it
   is in line with these specifications to pace packets.  Pacing is
   known to have advantages -- if some packets arrive at a bottleneck as
   a burst (all packets being back-to-back), loss can be more likely to
   happen than in a case where there are time gaps between packets
   (e.g., when they are spread out over the RTT).  It also means that
   pacing is less likely to cause any sudden, ephemeral increases in
   queuing delay.  Since keeping the queues short reduces packet losses,
   pacing can also yield higher goodput by reducing the time lost in
   loss recovery.

   Because of its known advantages, pacing has become common in
   implementations of congestion controlled transports.  It is also an
   integral element of the "BBR" congestion control mechanism
   [I-D.cardwell-iccrg-bbr-congestion-control].

2.  Conventions and Definitions

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
   "OPTIONAL" in this document are to be interpreted as described in
   BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all
   capitals, as shown here.

3.  Pacing: general considerations and consequences

3.1.  More likely to saturate a bottleneck

   We can distinguish between two reasons for packet losses that are due
   to congestion at a bottleneck with a DropTail (FIFO) queue:

   1.  A flight of N packets arrives.  The amount of data in this flight
       exceeds the amount of data that can be transmitted by the
       bottleneck during the flight's arrival plus the queue length,
       i.e. some data do not fit into the queue.

Welzl & Eddy             Expires 7 January 2025                 [Page 3]
Internet-Draft        Pacing in Transport Protocols            July 2024

   2.  The bottleneck is fully saturated.  The queue is full, and
       packets drain from it more slowly than new packets arrive.

   The second type of loss matches the typical expectation of a
   congestion control algorithm: the cwnd value when loss happens is
   indicative of the bottleneck being fully saturated.  When the first
   type of loss happens, however, a sender's cwnd can be much smaller
   than the Bandwidth*Delay Product (BDP) of the path (the amount of
   data that can be in flight, ignoring the queue).  In the absence of
   other traffic, the probability for the first type of loss to happen
   depends on the queue length and the ratio between the departure and
   the arrival rate during the flight's arrival.  By introducing time
   gaps between the packets of a burst, this ratio is increased, i.e.
   the difference between the departure and the arrival rate becomes
   smaller, and the second type of loss is more likely.

   For example, consider a network path with a bottleneck capacity of 50
   Mbit/s, a queue length of 15000 bytes (or 10 packets of size 1500
   bytes) and an RTT of 30 ms.  Assume that all packets emitted by the
   sender have a size of 1500 bytes.  Then, the BDP equals 125 packets.
   The bottleneck of this network path is fully saturated when a (BDP +
   queue length) amount of bytes are in flight: 135 packets.

   In this network, the first type of loss can happen as follows: say,
   N=40 packets arrive at this bottleneck at a rate of 100 Mbit/s.  In
   an otherwise empty network and assuming an initial window of 10
   packets and no delayed ACKs, this occurs in the third round of slow
   start without pacing, provided that the capacities of all links
   before the bottleneck are at least 100 Mbit/s.  In this case, an
   overshoot will occur: packets are forwarded with half their arrival
   rate, i.e. less than 20 packets can be forwarded during the burst's
   arrival.  The remaining 20 (or more) packets cannot fit into the
   10-packet queue.  A cwnd of 40 packets is much smaller than the (BDP
   + queue) limit of 135 packets, and the bottleneck is not fully
   saturated.

   Let us now assume that the flight of 40 packets is instead paced,
   such that the arrival rate only mildly exceeds the departure rate --
   e.g., they arrive at a rate of 60 Mbit/s.  When the last packet of
   this flight arrives at the bottleneck, the bottleneck should already
   have forwarded 5/6 * 39 = 32.5 packets.  Since only complete packets
   can be sent, the bottleneck has really forwarded 32 packets, and the
   remaining 40-32 = 8 packets fit in the queue.  No loss occurs.

   This example explains how pacing can enable a rate increase to last
   longer than without pacing.  This makes it more likely that a
   bottleneck is saturated, such that cwnd reflects the BDP plus the
   queue length (loss type 2).

Welzl & Eddy             Expires 7 January 2025                 [Page 4]
Internet-Draft        Pacing in Transport Protocols            July 2024

3.1.1.  Backing off after the increase

   The two loss types explained in Section 3.1 require a different back-
   off factor to allow the queue to drain and congestion to dissipate.
   Specifically, in the single-sender single-bottleneck example above,
   when a slow start overshoot occurs as loss type 2, a back-off
   function such as: ssthresh = cwnd * beta with beta >= 0.5 is
   guaranteed to cause a second loss after the end of loss recovery.
   This is because, when cwnd exceeds a fully saturated bottleneck
   (i.e., cwnd > BDP + queue length), cwnd will have grown further by
   another (BDP + queue length) by the time the sender learns about the
   loss.  In this case, beta = 0.5 will cause ssthresh to exceed (BDP +
   queue length) again.

   Since pacing makes loss type 2 more likely, beta < 0.5 may be a
   better choice after slow start overshoot when pacing is used.

3.1.2.  Able to work with smaller queues

   The probability of loss type 1 in Section 3.1 is indirectly
   proportional to the queue length.  Pacing therefore enables a rate
   increase to continue with a smaller queue at the bottleneck than in
   the case without pacing.

3.2.  Getting good RTT estimates

   Since pacing algorithms generally attempt to spread out packets
   evenly across an RTT, it is important to have a good RTT estimate.
   Especially in the beginning of a transfer, when sending the initial
   window, the only RTT estimate available may be from the connection
   establishment handshake.  Being based on only one sample, this is a
   very unreliable estimate, and using it to pace the initial window can
   cause unnecessary delay.  This may be the reason why the Linux TCP
   implementation does not pace the first 10 packets (see Section 4.1).
   As a possible improvement, the initial RTT estimate could also be
   based on a previous connection (temporal sharing) or on another
   ongoing connection (ensemble sharing) [RFC9040].

4.  Implementation examples

4.1.  Linux TCP

   The following description is based on Linux kernel version 6.8.9.

   There are two ways to enable pacing in Linux: 1) via a socket option,
   2) by configuring the FQ queue discipline.  We describe case 1.

Welzl & Eddy             Expires 7 January 2025                 [Page 5]
Internet-Draft        Pacing in Transport Protocols            July 2024

   Independent of the value of the Initial Window (IW), the first 10
   (hardcoded) packets are not paced.  Later, 10 packets will generally
   be sent without pacing every 2^32 packets.

   Every time an ACK arrives, a pacing rate is calculated, as: factor *
   MSS * cwnd / SRTT, where "factor" is a configurable value that, by
   default, is 2 in slow start and 1.2 in congestion avoidance.  MSS is
   the sender maximum segment size [RFC5681], and SRTT is the smoothed
   round-trip time [RFC6298] [TODO check: Linux calculates SRTT
   different from the standard, though RFC 6298 relaxes the rules, so
   maybe it's ok?]  The sender transmits data in line with the
   calculated pacing rate; this is approximated by calculating the rate
   per millisecond, and generally sending the resulting amount of data
   per millisecond as a small burst, every millisecond.  As an
   exception, the per-millisecond amount of data can be a little larger
   when the peer is very close, depending on a configurable value (per
   default, when the minimum RTT is less than 3 milliseconds).

   If the pacing rate is smaller than 2 packets per millisecond, these
   bursts will become 2 packets in size, and they will not be sent every
   millisecond but with a varying time delay (depending on the pacing
   rate).  If the pacing rate is larger than 64 Kbyte per millisecond,
   these bursts will be 64 Kbyte in size, and they will not be sent
   every millisecond but with a varying time delay (depending on the
   pacing rate).  Bursts can always be smaller than described above, or
   be "nothing", if a limiting factor such as the receiver window (rwnd)
   [RFC5681] or the current cwnd disallows transmission.  If the
   previous packet was not sent when expected by the pacing logic, but
   more than half of a pacing gap ago (e.g., due to a cwnd limitation),
   the pacing gap is halved.

   *TEMPORARY NOTE - TO BE REMOVED:* This description is based on the
   longer Linux pacing analysis text that is currently available at:
   https://docs.google.com/document/
   d/1h5hN9isFjT76YjaCphHZdW9LCRYqV4y3GKwRKxgqEO0/edit?usp=sharing
   (https://docs.google.com/document/
   d/1h5hN9isFjT76YjaCphHZdW9LCRYqV4y3GKwRKxgqEO0/edit?usp=sharing) -
   comments or corrections are very welcome!

4.2.  Apple OSes

   (TODO)

Welzl & Eddy             Expires 7 January 2025                 [Page 6]
Internet-Draft        Pacing in Transport Protocols            July 2024

4.3.  QUIC BBR implementations

   Pacing capability is expected in QUIC senders.  While standard QUIC
   congestion control [RFC9002] is based on TCP Reno, which is not
   defined to include pacing (but also does not prohibit it), QUIC
   congestion control requires either pacing or some other burst
   limitation (Section 7.7 of [RFC9002]).  BBR congestion control
   implementations are common in QUIC stacks, and pacing is integral to
   BBR, so this document focuses on it.

   Pacing in QUIC stacks commonly involves:

   1.  Access to lower-level (e.g.  OS and hardware) capabilities needed
       for effective pacing.

   2.  Managing additional timers related to pacing, along with those
       already needed for retransmission, and other events.

   3.  Details of the actual pacing algorithm (e.g. granularity of
       bursts allowed, etc.).

   Examples of different approaches to dealing with these challenges in
   ways that work on multiple operating systems and hardware platforms
   can be found in open source QUIC stacks, such as Google's QUIC
   implementation and Meta's "mvfst".  These provide examples for some
   of the concepts discussed below.

   Unlike TCP implementations that typically run within the operating
   system kernel, QUIC implementations more typically run in user space
   and are thus faced with more challenges regarding timing and coupling
   with the underlying protocol stack and hardware needed to achieve
   pacing.  For instance, if an application trying to do pacing is
   running on a highly loaded system, it may often "wake up late" and
   miss the times that it intends to pace packets.

   When a large amount of data needs to be sent, pacing naively could
   result in an excessive number of timers to be managed and adjusted
   along with all of the other timers that the QUIC stack and rest of
   the application require.  The Hashed Hierarchical Timing Wheel [VL87]
   provides one approach for such cases, but implementations may also
   simply schedule the next send event based on the current pacing rate,
   and then schedule subsequent events as needed, rather than adjusting
   timers for them.  In any case, typically a pacing algorithm should
   allow for some amount of burstiness, in order to efficiently use the
   hardware as well as to be responsive for bursty (but low overall
   rate) applications, and to avoid excessive timer management.

Welzl & Eddy             Expires 7 January 2025                 [Page 7]
Internet-Draft        Pacing in Transport Protocols            July 2024

   Pacing can be done based on different approaches such as a token-
   based or tokenless algorithm.  For instance, a tokenless algorithm
   (e.g. as used in mvfst) might compute a regular interval time and
   batch size (number of packets) to be released every interval and
   achieve the pacing rate.  This allows specific future transmissions
   to be scheduled.  In contrast, a token-based algorithm accumulates
   tokens to permit transmission based on the pacing rate, using a
   "leaky bucket" to control bursts.  In this case the size of bursts
   may be more granular, depending on how much time has elapsed between
   evaluations.

   The additional notion of "burst tokens" (or other burst allowance)
   may be present in order to rapidly transmit data if coming out of a
   quiescent period (e.g. when a flow has been application-limited
   without data to send, e.g. as used in Google's implementation).  A
   number of burst tokens, representing packets that can be sent
   unpaced, is initialized to some value (e.g. 10) when a flow starts or
   becomes quiescent.  If burst tokens are available, outgoing packets
   are sent immediately, without pacing, up to the limit permitted by
   the congestion window, and the burst tokens are depleted by each
   packet sent.  The number of burst tokens is reduced to zero on
   congestion events.  When coming out of quiescence, it is set to the
   minimum of the initial burst size, or the amount of packets that the
   congestion window (in bytes) represents.

   There may be additional "lumpy tokens" that further allow unpaced
   packets after the burst tokens have been consumed, and the congestion
   window does not limit sending.  The amount of lumpy tokens that might
   be present is determined using heuristics, generally limiting to a
   small number of packets (e.g. 1 or 2).

5.  Security Considerations

   While congestion control designs, including aspects such as pacing,
   could result in unwanted competing traffic, they do not directly
   result in new security considerations.

   Transport protocols that provide authentication (including those
   using encryption), or are carried over protocols that provide
   authentication, can protect their congestion control algorithm from
   network attack.  This is orthogonal to the congestion control rules.

6.  IANA Considerations

   This document has no IANA actions.

7.  References

Welzl & Eddy             Expires 7 January 2025                 [Page 8]
Internet-Draft        Pacing in Transport Protocols            July 2024

7.1.  Normative References

   [I-D.cardwell-iccrg-bbr-congestion-control]
              Cardwell, N., Cheng, Y., Yeganeh, S. H., Swett, I., and V.
              Jacobson, "BBR Congestion Control", Work in Progress,
              Internet-Draft, draft-cardwell-iccrg-bbr-congestion-
              control-02, 7 March 2022,
              <https://datatracker.ietf.org/doc/html/draft-cardwell-
              iccrg-bbr-congestion-control-02>.

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119,
              DOI 10.17487/RFC2119, March 1997,
              <https://www.rfc-editor.org/rfc/rfc2119>.

   [RFC8174]  Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC
              2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174,
              May 2017, <https://www.rfc-editor.org/rfc/rfc8174>.

7.2.  Informative References

   [RFC5681]  Allman, M., Paxson, V., and E. Blanton, "TCP Congestion
              Control", RFC 5681, DOI 10.17487/RFC5681, September 2009,
              <https://www.rfc-editor.org/rfc/rfc5681>.

   [RFC6298]  Paxson, V., Allman, M., Chu, J., and M. Sargent,
              "Computing TCP's Retransmission Timer", RFC 6298,
              DOI 10.17487/RFC6298, June 2011,
              <https://www.rfc-editor.org/rfc/rfc6298>.

   [RFC9002]  Iyengar, J., Ed. and I. Swett, Ed., "QUIC Loss Detection
              and Congestion Control", RFC 9002, DOI 10.17487/RFC9002,
              May 2021, <https://www.rfc-editor.org/rfc/rfc9002>.

   [RFC9040]  Touch, J., Welzl, M., and S. Islam, "TCP Control Block
              Interdependence", RFC 9040, DOI 10.17487/RFC9040, July
              2021, <https://www.rfc-editor.org/rfc/rfc9040>.

   [VL87]     Varghese, G. and T. Tauck, "Hashed and hierarchical timing
              wheels: data structures for the efficient implementation
              of a timer facility", DOI 10.1145/37499.37504, 1 November
              1987, <https://doi.org/10.1145/37499.37504>.

Acknowledgments

   TODO acknowledge.

Welzl & Eddy             Expires 7 January 2025                 [Page 9]
Internet-Draft        Pacing in Transport Protocols            July 2024

Authors' Addresses

   Michael Welzl
   University of Oslo
   PO Box 1080 Blindern
   0316  Oslo
   Norway
   Email: michawe@ifi.uio.no
   URI:   http://welzl.at/

   Wesley Eddy
   MTI Systems
   25111 Country Club Blvd, Suite 295
   North Olmsted, OH 44070,
   United States of America
   Email: wes@mti-systems.com

Welzl & Eddy             Expires 7 January 2025                [Page 10]