Skip to main content

rLEDBAT: receiver-driven Low Extra Delay Background Transport for TCP
draft-irtf-iccrg-rledbat-08

Document Type Active Internet-Draft (iccrg RG)
Authors Marcelo Bagnulo , Alberto Garcia-Martinez , Gabriel Montenegro , Praveen Balasubramanian
Last updated 2024-06-22
RFC stream Internet Research Task Force (IRTF)
Intended RFC status Experimental
Formats
Additional resources Mailing list discussion
Stream IRTF state Waiting for IRTF Chair
Consensus boilerplate Yes
Document shepherd Vidhi Goel
Shepherd write-up Show Last changed 2024-06-17
IESG IESG state I-D Exists
Telechat date (None)
Responsible AD (None)
Send notices to vidhi_goel@apple.com
draft-irtf-iccrg-rledbat-08
Network Working Group                                         M. Bagnulo
Internet-Draft                                        A. Garcia-Martinez
Intended status: Experimental           Universidad Carlos III de Madrid
Expires: 24 December 2024                                  G. Montenegro
                                                                        
                                                      P. Balasubramanian
                                                               Confluent
                                                            22 June 2024

 rLEDBAT: receiver-driven Low Extra Delay Background Transport for TCP
                      draft-irtf-iccrg-rledbat-08

Abstract

   This document specifies the rLEDBAT, a set of mechanisms that enable
   the execution of a less-than-best-effort congestion control algorithm
   for TCP at the receiver end.  This document is a product of the
   Internet Congestion Control Research Group (ICCRG) of the Internet
   Research Task Force (IRTF).

Status of This Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at https://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on 24 December 2024.

Copyright Notice

   Copyright (c) 2024 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents (https://trustee.ietf.org/
   license-info) in effect on the date of publication of this document.
   Please review these documents carefully, as they describe your rights
   and restrictions with respect to this document.  Code Components

Bagnulo, et al.         Expires 24 December 2024                [Page 1]
Internet-Draft                   rLEDBAT                       June 2024

   extracted from this document must include Revised BSD License text as
   described in Section 4.e of the Trust Legal Provisions and are
   provided without warranty as described in the Revised BSD License.

Table of Contents

   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   2
   2.  Motivations for rLEDBAT . . . . . . . . . . . . . . . . . . .   3
   3.  rLEDBAT mechanisms  . . . . . . . . . . . . . . . . . . . . .   4
     3.1.  Controlling the receive window  . . . . . . . . . . . . .   6
       3.1.1.  Avoiding window shrinking . . . . . . . . . . . . . .   7
       3.1.2.  Window Scale Option . . . . . . . . . . . . . . . . .   8
     3.2.  Measuring delays  . . . . . . . . . . . . . . . . . . . .   8
       3.2.1.  Measuring RTT to estimate the queueing delay  . . . .   9
       3.2.2.  Measuring one way delay to estimate the queueing
               delay . . . . . . . . . . . . . . . . . . . . . . . .  11
     3.3.  Detecting packet losses and retransmissions . . . . . . .  13
   4.  Experiment Considerations . . . . . . . . . . . . . . . . . .  13
     4.1.  Status of the experiment at the time of this writing. . .  14
   5.  Security Considerations . . . . . . . . . . . . . . . . . . .  15
   6.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .  16
   7.  Acknowledgements  . . . . . . . . . . . . . . . . . . . . . .  16
   8.  Informative References  . . . . . . . . . . . . . . . . . . .  16
   Appendix A.  rLEDBAT pseudo-code  . . . . . . . . . . . . . . . .  17
   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  19

1.  Introduction

   LEDBAT (Low Extra Delay Background Transport) [RFC6817] is a
   congestion-control algorithm used for less-than-best-effort (LBE)
   traffic.

   When LEDBAT traffic shares a bottleneck with other traffic using
   standard congestion control algorithms (for example, TCP traffic
   using Cubic[RFC9438], hereafter referred as standard-TCP for short),
   it reduces its sending rate earlier and more aggressively than
   standard-TCP congestion control, allowing other non-background
   traffic to use more of the available capacity.  In the absence of
   competing traffic, LEDBAT aims to make an efficient use of the
   available capacity, while keeping the queuing delay within predefined
   bounds.

   LEDBAT reacts both to packet loss and to variations in delay.
   Regarding to packet loss, LEDBAT reacts with a multiplicative
   decrease, similar to most TCP congestion controllers.  Regarding
   delay, LEDBAT aims for a target queueing delay.  When the measured
   current queueing delay is below the target, LEDBAT increases the
   sending rate and when the delay is above the target, it reduces the

Bagnulo, et al.         Expires 24 December 2024                [Page 2]
Internet-Draft                   rLEDBAT                       June 2024

   sending rate.  LEDBAT estimates the queuing delay by subtracting the
   measured current one-way delay from the estimated base one-way delay
   (i.e. the one-way delay in the absence of queues).

   The LEDBAT specification [RFC6817] defines the LEDBAT congestion-
   control algorithm, implemented in the sender to control its sending
   rate.  LEDBAT is specified in a protocol and layer agnostic manner.

   LEDBAT++ [I-D.irtf-iccrg-ledbat-plus-plus] is also an LBE congestion
   control algorithm which is inspired in LEDBAT while addressing
   several problems identified with the original LEDBAT specification.
   In particular the differences between LEDBAT and LEDBAT++ include: i)
   LEDBAT++ uses the round-trip-time (RTT) (as opposed to the one way
   delay used in LEDBAT) to estimate the queuing delay; ii) LEDBAT++
   uses an Additive Increase/Multiplicative Decrease algorithm to
   achieve inter-LEDBAT++ fairness and avoid the late-comer advantage
   observed in LEDBAT; iii) LEDBAT++ performs periodic slowdowns to
   improve the measurement of the base delay; iv) LEDBAT++ is defined
   for TCP.

   In this specification, we describe rLEDBAT, a set of mechanisms that
   enable the execution of an LBE delay-based congestion control
   algorithm such as LEDBAT or LEDBAT++ in the receiver end of a TCP
   connection.

   The consensus of the Internet Congestion Control Research Group
   (ICCRG) is to publish this document to encourage further
   experimentation and review of rLEDBAT.  This document is not an IETF
   product and is not a standard.  The status of this document is
   experimental.  In section 4 titled Experiment Considerations, we
   describe the purpose of the experiment and its current status.

2.  Motivations for rLEDBAT

   rLEDBAT enables new use cases and new deployment models, fostering
   the use of LBE traffic and benefitting the global Internet by
   improving overall allocation of resources.  The following scenarios
   are enabled by rLEDBAT:

      Content Delivery Networks and more sophisticated file distribution
      scenarios: Consider the case where the source of a file to be
      distributed (e.g., a software developer that wishes to distribute
      a software update) would prefer to use LBE and it enables LEDBAT/
      LEDBAT++ in the servers containing the source file.  However,
      because the file is being distributed through a CDN that does not
      implement LBE congestion control, the result is that the file
      transfers originated from CDN surrogates will not be using LBE.
      Interestingly enough, in the case of the software update, the

Bagnulo, et al.         Expires 24 December 2024                [Page 3]
Internet-Draft                   rLEDBAT                       June 2024

      developer may also control the software performing the download in
      the client, the receiver of the file, but because current LEDBAT/
      LEDBAT++ are sender-based algorithms, controlling the client is
      not enough to enable LBE congestion control in the communication.
      rLEDBAT would enable the use of LBE traffic class for file
      distribution in this setup.

      Interference from proxies and other middleboxes: Proxies and other
      middleboxes are a commonplace in the Internet.  For instance, in
      the case of mobile networks, proxies are frequently used.  In the
      case of enterprise networks, it is common to deploy corporate
      proxies for filtering and firewalling.  In the case of satellite
      links, Performance Enhancement Proxies (PEPs) are deployed to
      mitigate the effect of the long delay in TCP connection.  These
      proxies terminate the TCP connection on both ends and prevent the
      use of LBE congestion control in the segment between the proxy and
      the sink of the content, the client.  By enabling rLEDBAT, clients
      would be able to enable LBE traffic between them and the proxy.

      Receiver-defined preferences.  It is frequent that the bottleneck
      of the communication is the access link.  This is particularly
      true in the case of mobile devices.  It is then especially
      relevant for mobile devices to properly manage the capacity of the
      access link.  With current technologies, it is possible for the
      mobile device to use different congestion control algorithms
      expressing different preferences for the traffic.  For instance, a
      device can choose to use standard-TCP for some traffic and to use
      LEDBAT/LEDBAT++ for other traffic.  However, this would only
      affect the outgoing traffic since both standard-TCP and LEDBAT/
      LEDBAT++ are sender-driven.  The mobile device has no means to
      manage the traffic in the down-link, which is in most cases, the
      communication bottleneck for a typical eye-ball end-user. rLEDBAT
      enables the mobile device to selectively use LBE traffic class for
      some of the incoming traffic.  For instance, by using rLEDBAT, a
      user can use regular standard-TCP/UDP for video stream (e.g.,
      Youtube) and use rLEDBAT for other background file download.

3.  rLEDBAT mechanisms

   rLEDBAT provides the mechanisms to implement an LBE congestion
   control algorithm at the receiver-end of a TCP connection.  The
   rLEDBAT receiver controls the sender's rate through the Receive
   Window announced by the receiver in the TCP header.

   rLEDBAT assumes that the sender is a standard TCP sender. rLEDBAT
   does not require any rLEDBAT-specific modifications to the TCP
   sender.  The envisioned deployment model for rLEDBAT is that the
   clients implement rLEDBAT and this enables rLEDBAT in communications

Bagnulo, et al.         Expires 24 December 2024                [Page 4]
Internet-Draft                   rLEDBAT                       June 2024

   with existent standard TCP senders.  In particular, the sender MUST
   implement [RFC9293] and it also MUST implement the Time Stamp Option
   as defined in [RFC7323].  Also, the sender SHOULD implement some of
   the standard congestion control mechanisms, such as Cubic [RFC9438]
   or New Reno [RFC5681].

   rLEDBAT does not define a new congestion control algorithm.  The LBE
   congestion control algorithm executed in the rLEDBAT receiver is
   defined in other documents.  The rLEDBAT receiver MUST use an LBE
   congestion control algorithm.  Because rLEDBAT assumes a standard TCP
   sender, the sender will be using a "best effort" congestion control
   algorithm (such as Cubic or New Reno).  Since rLEDBAT uses the
   Receive Window to control the sender's rate and the sender calculates
   the sender's window as the minimum of the Receive window and the
   congestion window, rLEDBAT will only be effective as long as the
   congestion control algorithm executed in the receiver yields a
   smaller window than the one calculated by the sender.  This is
   normally the case when the receiver is using an LBE congestion
   control algorithm.  The rLEDBAT receiver SHOULD use the LEDBAT
   congestion control algorithm [RFC6817] or the LEDBAT++ congestion
   control algorithm [I-D.irtf-iccrg-ledbat-plus-plus].  The rLEDBAT MAY
   use other LBE congestion control algorithms defined elsewhere.
   Irrespective of which congestion control algorithm is executed in the
   receiver, an rLEDBAT connection will never be more aggressive than
   standard-TCP since it is always bounded by the congestion control
   algorithm executed at the sender.

   rLEDBAT is essentially composed of three types of mechanisms, namely,
   those that provide the means to measure the packet delay (either the
   round trip time or the one way delay, depending on the selected
   algorithm), mechanisms to detect packet loss and the means to
   manipulate the Receive Window to control the sender's rate.  The
   former provide input to the LBE congestion control algorithm while
   the latter uses the congestion window computed by the LBE congestion
   control algorithm to manipulate the Receive window, as depicted in
   the figure.

Bagnulo, et al.         Expires 24 December 2024                [Page 5]
Internet-Draft                   rLEDBAT                       June 2024

               +------------------------------------------+
               |   TCP receiver                           |
               |                      +-----------------+ |
               |                      |  +------------+ | |
               |   +---------------------|     RTT    | | |
               |   |                  |  | Estimation | | |
               |   |                  |  +------------+ | |
               |   |                  |                 | |
               |   |                  |  +------------+ | |
               |   |      +--------------| Loss, RTX  | | |
               |   |      |           |  | Detection  | | |
               |   |      |           |  +------------+ | |
               |   v      v           |                 | |
               | +----------------+   |                 | |
               | | LBE Congestion |   |    rLEDBAT      | |
               | |    Control     |   |                 | |
               | +----------------+   |                 | |
               |       |              |  +------------+ | |
               |       |              |  | RCV-WND    | | |
               |       +---------------->| Control    | | |
               |                      |  +------------+ | |
               |                      +-----------------+ |
               +------------------------------------------+

                    Figure 1: The rLEDBAT architecture.

   We describe each of the rLEDBAT components next.

3.1.  Controlling the receive window

   rLEDBAT uses the Receive Window (RCV.WND) of TCP to enable the
   receiver to control the sender's rate.  [RFC9293] defines that the
   RCV.WND is used to announce the available receive buffer to the
   sender for flow control purposes.  In order to avoid confusion, we
   will call fcwnd the value that a standard RFC793bis TCP receiver
   calculates to set in the receive window for flow control purposes.
   We call rlwnd the window value calculated by rLEDBAT algorithm and we
   call RCV.WND the value actually included in the Receive Window field
   of the TCP header.  For a RFC793bis receiver, RCV.WND == fcwnd.

   In the case of rLEDBAT receiver, the rLEDBAT receiver MUST NOT set
   the RCV.WND to a value larger than fcwnd and it SHOULD set the
   RCV.WND to the minimum of rlwnd and fcwnd, honoring both.

   When using rLEDBAT, two congestion controllers are in action in the
   flow of data from the sender to the receiver, namely, the congestion
   control algorithm of TCP in the sender side and the LBE congestion
   control algorithm executed in the receiver and conveyed to the sender

Bagnulo, et al.         Expires 24 December 2024                [Page 6]
Internet-Draft                   rLEDBAT                       June 2024

   through the RCV.WND.  In the normal TCP operation, the sender uses
   the minimum of the congestion window cwnd and the receiver window
   RCV.WND to calculate the sender's window SND.WND.  This is also true
   for rLEDBAT, as the sender is a regular TCP sender.  This guarantees
   that the rLEDBAT flow will never transmit more aggressively than a
   standard-TCP flow, as the sender's congestion window limits the
   sending rate.  Moreover, because a LBE congestion control algorithm
   such as LEDBAT/LEDBAT++ is designed to react earlier and more
   aggressively to congestion than regular TCP congestion control, the
   rlwnd contained in the RCV.WND field of TCP will be in general
   smaller than the congestion window calculated by the TCP sender,
   implying that the rLEDBAT congestion control algorithm will be
   effectively controlling the sender's window.

   In summary, the sender's window is: SND.WND = min(cwnd, rlwnd, fcwnd)

3.1.1.  Avoiding window shrinking

   The LEDBAT/LEDBAT++ algorithm executed in a rLEDBAT receiver
   increases or decreases the rlwnd according to congestion signals
   (variations on the estimations of the queueing delay and packet
   loss).  If the new congestion window is smaller than the current one
   then directly announcing it in the RCV.WND may result in shrinking
   the window, i.e., moving the right window edge to the left.
   Shrinking the window is discouraged as per [RFC9293], as it may cause
   unnecessary packet loss and performance penalty.  To be consistent
   with [RFC9293], the rLEDBAT receiver SHOULD NOT shrink the receive
   window.

   In order to avoid window shrinking, upon the reception of a data
   packet, the announced window can be reduced in the number of bytes
   contained in the packet at most.  This may fall short to honor the
   new calculated value of the rlwnd.  So, in order to reduce the window
   as dictated by the rLEDBAT algorithm, the receiver will progressively
   reduce the advertised RCV.WND, always honoring that the reduction is
   less or equal than the received bytes, until the target window
   determined by the rLEDBAT algorithm is reached.  This implies that it
   may take up to one RTT for the rLEDBAT receiver to drain enough in-
   flight bytes to completely close its receive window without shrinking
   it.  This is more than sufficient to honor the window output from the
   LEDBAT/LEDBAT++ algorithms since they only allow to perform at most
   one multiplicative decrease per RTT.

Bagnulo, et al.         Expires 24 December 2024                [Page 7]
Internet-Draft                   rLEDBAT                       June 2024

3.1.2.  Window Scale Option

   The Window Scale (WS) option [RFC7323] is a means to increase the
   maximum window size permitted by the Receive Window.  The use of the
   WS option implies that the changes in the window are expressed in the
   units resulting of the WS option used in the TCP connection.  This
   means that the rLEDBAT client will have to accumulate the increases
   resulting from different received packets, and only convey a change
   in the window when the accumulated sum of increases is equal or
   higher than one unit used to express the receive window according to
   the WS option in place for the TCP connection.

   Changes in the receive window that are smaller than 1 MSS are
   unlikely to have any immediate impact on the sender's rate, as usual
   TCP segmentation practice results in sending full segments (i.e.,
   segments of size equal to the MSS).  So, accumulating changes in the
   receive window until completing a full MSS in the sender or in the
   receiver makes little difference.

   Current WS option specification [RFC7323] defines that allowed values
   for the WS option are between 0 and 14.  Assuming a MSS around 1500
   bytes, WS option values between 0 and 11 result in the receive window
   being expressed in units that are about 1 MSS or smaller.  So, WS
   option values between 0 and 11 have no impact in rLEDBAT.

   WS option values higher than 11 can affect the dynamics of rLEDBAT,
   since control may become too coarse (e.g., with WS of 14, a change in
   one unit of the receive window implies a change of 10 MSS in the
   effective window).

   For the above reasons, the rLEDBAT client SHOULD set WS option values
   lower than 12.  Additional experimentation is required to explore the
   impact of larger WS values in rLEDBAT dynamics.

   Note that the recommendation for rLEDBAT to set the WS option value
   to lower values does not precludes the communication with servers
   that set the WS option values to larger values, since the WS option
   value used is set independently for each direction of the TCP
   connection.

3.2.  Measuring delays

   Both LEDBAT and LEDBAT++ measure base and current delays to estimate
   the queueing delay.  LEDBAT uses the one way delay while LEDBAT++
   uses the round trip time.  In the next sections we describe how
   rLEDBAT mechanisms enable the receiver to measure the one way delay
   or the round trip time, whatever needed depending on the congestion
   control algorithm used.

Bagnulo, et al.         Expires 24 December 2024                [Page 8]
Internet-Draft                   rLEDBAT                       June 2024

3.2.1.  Measuring RTT to estimate the queueing delay

   LEDBAT++ uses the round trip time (RTT) to estimate the queueing
   delay.  In order to estimate the queueing delay using RTT, the
   rLEDBAT receiver estimates the base RTT (i.e., the constant
   components of RTT) and also measures the current RTT.  By subtracting
   these two values, we obtain the queuing delay to be used by the
   rLEDBAT controller.

   LEDBAT++ discovers the base RTT (RTTb) by taking the minimum value of
   the measured RTTs over a period of time.  The current RTT (RTTc) is
   estimated using a number of recent samples and applying a filter,
   such as the minimum (or the mean) of the last k samples.  Using RTT
   to estimate the queueing delay has a number of shortcomings and
   difficulties that we discuss next.

   The queuing delay measured using RTT includes also the queueing delay
   experienced by the return packets in the direction from the rLEDBAT
   receiver to the sender.  This is a fundamental limitation of this
   approach.  The impact of this error is that the rLEDBAT controller
   will also react to congestion in the reverse path direction which
   results in an even more conservative mechanism.

   In order to measure RTT, the rLEDBAT client MUST enable the Time
   Stamp (TS) option [RFC7323].  By matching the TSVal value carried in
   outgoing packets with the TSecr value observed in incoming packets,
   it is possible to measure RTT.  This allows the rLEDBAT receiver to
   measure RTT even if it is acting as a pure receiver.  In a pure
   receiver there is no data flowing from the rLEDBAT receiver to the
   sender, making impossible to match data packets with acknowledgements
   packets to measure RTT, as it is usually done in TCP for other
   purposes.

   Depending on the frequency of the local clock used to generate the
   values included in the TS option, several packets may carry the same
   TSVal value.  If that happens, the rLEDBAT receiver will be unable to
   match the different outgoing packets carrying the same TSVal value
   with the different incoming packets carrying also the same TSecr
   value.  However, it is not necessary for rLEDBAT to use all packets
   to estimate RTT and sampling a subset of in-flight packets per RTT is
   enough to properly assess the queueing delay.  RTT MUST then be
   calculated as the time since the first packet with a given TSVal was
   sent and the first packet that was received with the same value
   contained in the TSecr.  Other packets with repeated TS values SHOULD
   NOT be used for RTT calculation.

Bagnulo, et al.         Expires 24 December 2024                [Page 9]
Internet-Draft                   rLEDBAT                       June 2024

   Several issues must be addressed in order to avoid an artificial
   increase of the observed RTT.  Different issues emerge depending
   whether the rLEDBAT capable host is sending data packets or pure ACKs
   to measure RTT.  We next consider the issues separately.

3.2.1.1.  Measuring RTT sending pure ACKs

   In this scenario, the rLEDBAT node (node A) sends a pure ACK to the
   other endpoint of the TCP connection (node B), including the TS
   option.  Upon the reception of the TS Option, host B will copy the
   value of the TSVal into the TSecr field of the TS option and include
   that option into the next data packet towards host A.  However, there
   are two reasons why B may not send a packet immediately back to A,
   artificially increasing the measured RTT.  The first reason is when A
   has no data to send.  The second is when A has no available window to
   put more packets in-flight.  We describe next how each of these cases
   is addressed.

   The case where the host B has no data to send when it receives the
   pure Acknowledgement is expected to be rare in the rLEDBAT use cases.
   rLEDBAT will be used mostly for background file transfers so the
   expected common case is that the sender will have data to send
   throughout the lifetime of the communication.  However, if, for
   example, the file is structured in blocks of data, it may be the case
   that seldom, the sender will have to wait until the next block is
   available to proceed with the data transfer and momentarily lack of
   data to send.  To address this situation, the filter used by the
   congestion control algorithm executed in the receiver SHOULD discard
   the larger samples (e.g. a min filter would achieve this) when
   measuring RTT using pure ACK packets.

   The limitation of available sender's window to send more packets can
   come either from the TCP congestion window in host B or from the
   announced receive window from the rLEDBAT in host A.  Normally, the
   receive window will be the one to limit the sender's transmission
   rate, since the LBE congestion control algorithm used by the rLEDBAT
   node is designed to be more restrictive on the sender's rate than
   standard-TCP.  If the limiting factor is the congestion window in the
   sender, it is less relevant if rLEDBAT further reduces the receive
   window due to a bloated RTT measurement, since the rLEDBAT node is
   not actively controlling the sender's rate.  Nevertheless, the
   proposed approach to discard larger samples would also address this
   issue.

   To address the case in which the limiting factor is the receive
   window announced by rLEDBAT, the congestion control algorithm at the
   receiver SHOULD discard RTT measurements done using pure ACK packets
   while reducing the window and avoid including bloated samples in the

Bagnulo, et al.         Expires 24 December 2024               [Page 10]
Internet-Draft                   rLEDBAT                       June 2024

   queueing delay estimation.  The rLEDBAT receiver is aware whether a
   given TSVal value was sent in a pure ACK packet where the window was
   reduced, and if so, it can discard the corresponding RTT measurement.

3.2.1.2.  Measuring RTT when sending data packets

   In the case that the rLEDBAT node is sending data packets and
   matching them with pure ACKs to measure RTT, a factor that can
   artificially increase the RTT measured is the presence of delayed
   Acknowledgements.  According to the TS option generation rules
   [RFC7323], the value included in the TSecr for a delayed ACK is the
   one in the TSVal field of the earliest unacknowledged segment.  This
   may artificially increase the measured RTT.

   If both endpoints of the connection are sending data packets,
   Acknowledgments are piggybacked into the data packets and they are
   not delayed.  Delayed ACKs only increase RTT measurements in the case
   that the sender has no data to send.  Since the expected use case for
   rLEDBAT is that the sender will be sending background traffic to the
   rLEDBAT receiver, the cases where delayed ACKs increase the measured
   RTT are expected to be rare.

   Nevertheless, for those measurements done using data packets sent by
   the rLEDBAT node matching pure ACKs sent from the other endpoint of
   the connection, they will result in an increased RTT.  The additional
   increase in the measured RTT will be up to 500 ms.  The reason for
   this is that delayed ACKs are generated every second data packet
   received and not delayed more than 500 ms according to [RFC9293].
   The rLEDBAT receiver MAY discard RTT measurements done using data
   packets from the rLEBDAT receiver and matching pure ACKs, especially
   if it has recent measurements done using other packet combinations.
   Also, applying a filter that discard larger samples would also
   address this issue (e.g. a min filter).

3.2.2.  Measuring one way delay to estimate the queueing delay

   The LEDBAT algorithm uses the one-way delay of packets as input.  A
   TCP receiver can measure the delay of incoming packets directly (as
   opposed to the sender-based LEDBAT, where the receiver measures the
   one-way delay and needs to convey it to the sender).

   In the case of TCP, the receiver can use the TimeStamp option to
   measure the one way delay by subtracting the timestamp contained in
   the incoming packet from the local time at which the packet has
   arrived.  As noted in [RFC6817] the clock offset between the clock of
   the sender and the clock in the receiver does not affect the LEDBAT
   operation, since LEDBAT uses the difference between the base one way
   delay and the current one way delay to estimate the queuing delay,

Bagnulo, et al.         Expires 24 December 2024               [Page 11]
Internet-Draft                   rLEDBAT                       June 2024

   effectively canceling the clock offset error in the queueing delay
   estimation.  There are however two other issues that the rLEDBAT
   receiver needs to take into account in order to properly estimate the
   one way delay, namely, the units in which the received timestamps are
   expressed and the clock skew.  We address them next.

   In order to measure the one way delay using TCP timestamps, the
   rLEDBAT receiver needs to discover the units in which the values of
   the TS option are expressed and second, to account for the skew
   between the two clocks of the endpoints of the TCP connection.  Note
   that a mismatch of 100 ppm (parts per million) in the estimation at
   the receiver of the clock rate of the sender accounts for 6 ms of
   variation per minute in the measured delay for a communication, just
   one order of magnitude below the target set for controlling the rate
   by rLEDBAT.  Typical skew for untrained clocks is reported to be
   around 100-200 ppm [RFC6817].

   In order to learn both the TS units and the clock skew, the rLEDBAT
   receiver measures how much local time has elapsed between the sender
   has issued two packets with different TS values.  By comparing the
   local time difference and the TS value difference, the receiver can
   assess the TS units and relative clock skews.  In order for this to
   be accurate, the packets carrying the different TS values should
   experience equal (or at least similar delay) when traveling from the
   sender to the receiver, as any difference in the experienced delays
   would introduce error in the unit/skew estimation.  One possible
   approach is to select packets that experienced the minimum delay
   (i.e. close to zero queueing delay) to make the estimations.

   An additional difficulty regarding the estimation of the TS units and
   clock skew in the context of (r)LEDBAT is that the LEDBAT congestion
   controller actions directly affect the (queueing) delay experienced
   by packets.  In particular, if there is an error in the estimation of
   the TS units/skew, the LEDBAT controller will attempt to compensate
   it by reducing/increasing the load.  The result is that the LEDBAT
   operation interferes with the TS units/clock skew measurements.
   Because of this, measurements are more accurate when there is no
   traffic in the connection (in addition to the packets used for the
   measurements).  The problem is that the receiver is unaware if the
   sender is injecting traffic at any point in time, and
   opportunistically seizes quiet intervals to perform measurements.
   The receiver can however, force periodic slowdowns, reducing the
   announced receive window to a few packets and perform the
   measurements then.

   It is possible for the rLEDBAT receiver to perform multiple
   measurements to assess both the TS units and the relative clock skew
   during the lifetime of the connection, in order to obtain more

Bagnulo, et al.         Expires 24 December 2024               [Page 12]
Internet-Draft                   rLEDBAT                       June 2024

   accurate results.  Clock skew measurements are more accurate if the
   time period used to discover the skew is larger, as the impact of the
   skew becomes more apparent.  Due to the same logic, accurately
   learning the clock skew is more pressing as the time separating the
   two delays to compare increases.  It is a reasonable approach for the
   rLEDBAT receiver to perform an early discovery of the TS units (and
   the clock skew) using the first few packets of the TCP connection and
   then improve the accuracy of the TS units/clock skew estimation using
   periodic measurements later in the lifetime of the connection.

3.3.  Detecting packet losses and retransmissions

   The rLEDBAT receiver is capable of detecting retransmitted packets in
   the following way.  We call RCV.HGH the highest sequence number
   corresponding to a received byte of data (not assuming that all bytes
   with smaller sequence numbers have been received already, there may
   be holes) and we call TSV.HGH the TSVal value corresponding to the
   segment in which that byte was carried.  SEG.SEQ stands for the
   sequence number of a newly received segment and we call TSV.SEQ the
   TSVal value of the newly received segment.

   If SEG.SEQ < RCV.HGH and TSV.SEQ > TSV.HGH then the newly received
   segment is a retransmission.  This is so because the newly received
   segment was generated later than another already received segment
   which contained data with a larger sequence number.  This means that
   this segment was lost and was retransmitted.

   The proposed mechanism to detect retransmissions at the receiver
   fails when there are window tail drops.  If all packets in the tail
   of the window are lost, the receiver will not be able to detect a
   mismatch between the sequence numbers of the packets and the order of
   the timestamps.  In this case, rLEDBAT will not react to losses but
   the TCP congestion controller at the sender will, most likely
   reducing its window to 1MSS and take over the control of the sending
   rate, until slow start ramps up and catches the current value of the
   rLEDBAT window.

4.  Experiment Considerations

   The status of this document is Experimental.  The general purpose of
   the proposed experiment is to gain more experience running rLEDBAT
   over different network paths to see if the constants work well in all
   cases.  Specifically, we would like to learn about the following
   aspects of the rLEDBAT mechanism:

      - Interaction between the sender and the receiver Congestion
      control algorithms. rLEDBAT posits that because the rLEDBAT
      receiver is using a less-than-best-effort congestion control

Bagnulo, et al.         Expires 24 December 2024               [Page 13]
Internet-Draft                   rLEDBAT                       June 2024

      algorithm, the receiver congestion control algorithm will expose a
      smaller congestion window (conveyed though the Receive Window)
      than the one resulting from the congestion control algorithm
      executed at the sender.  One of the purposes of the experiment is
      learn how these two interact and if the assumption that the
      receiver side is always controlling the sender's rate (and making
      rLEDBAT effective) holds.  The experiment should include the
      different congestion control algorithms that are currently widely
      used in the Internet, including Cubic, BBR and LEDBAT(++).

      - Interaction between rLEDBAT and Active Queue Management
      techniques such as Codel, PIE and L4S.

4.1.  Status of the experiment at the time of this writing.

   Currently there are the following implementations of rLEDBAT that can
   be used for experimentation:

      - Windows 11. rLEDBAT is available in Microsoft's Windows 11 22H2
      since October 2023 [Windows11].

      - Windows Server 2022. rLEDBAT is available in Microsoft's Windows
      Server 2022 since September 2022 [WindowsServer].

      - Apple. rLEDBAT is available in MacOS and iOS since 2021 [Apple].

      - Linux implementation, open source, available since 2022 at
      https://github.com/net-research/rledbat_module.

      - ns3 implementation, open source, available since 2020 at
      https://github.com/manas11/implementation-of-rLEDBAT-in-ns-3.

   In addition, rLEDBAT has been deployed by Microsoft in wide scale in
   the following services:

      - BITS (Background Intelligent Transfer Service)

      - DO (Delivery Optimization) service

      - Windows update # using DO

      - Windows Store # using DO

      - OneDrive

      - Windows Error Reporting # wermgr.exe; werfault.exe

      - System Center Configuration Manager (SCCM)

Bagnulo, et al.         Expires 24 December 2024               [Page 14]
Internet-Draft                   rLEDBAT                       June 2024

      - Windows Media Player

      - Microsoft Office

      - Xbox (download games) # using DO

   Some initial experiments involving rLEDBAT have been reported in
   [COMNET3].  Experiments involving the interaction of LEDBAT++ and BBR
   are presented in [COMNET2].  An experimental evaluation of the
   LEDBAT++ algorithm is presented in [COMNET1].  As LEDBAT++ is one of
   the less-than-best-effort congestion control algorithms that rLEDBAT
   relies on, the results regarding LEDBAT++ interaction with other
   congestion control algorithms are relevant for the understanding of
   rLEDBAT as well.

5.  Security Considerations

   Overall, we believe that rLEDBAT does not introduce any new
   vulnerabilities to existing TCP endpoints, as it relies on existing
   TCP knobs, notably the Receive Window and Timestamps.

   Specifically, rLEDBAT uses RCV WND to modulate the rate of the
   sender.  An attacker wishing to starve a flow can simply reduce the
   RCV WND, irrespective of whether rLEDBAT is being used or not.

   We can further ask ourselves whether the attacker can use the rLEDBAT
   mechanisms in place to force the rLEDBAT receiver to reduce the RCV
   WND.  There are two ways an attacker can do that.  One would be to
   introduce an artificial delay to the packets either by actually
   delaying the packets or modifying the Timestamps.  This would cause
   the rLEDBAT to believe that a queue is building up and reduce the RCV
   WND.  Note that an attacker to do that must be on path, so if that is
   the case, it is probably more direct to simply reduce the RCV WND.

   The other option would be for the attacker to make believe the
   rLEDBAT receiver that a loss has occurred.  To do that, it basically
   needs to retransmit an old packet (to be precise, it needs to
   transmit a packet with the right sequence number and the right port
   and IP numbers).  This means that the attacker can achieve a
   reduction of incoming traffic to the rLEDBAT receiver not only by
   modifying the RCV WND field of the packets originated from the
   rLEDBAT host, but also by injecting packets with the proper sequence
   number in the other direction.  This may slightly expand the attack
   surface.

Bagnulo, et al.         Expires 24 December 2024               [Page 15]
Internet-Draft                   rLEDBAT                       June 2024

6.  IANA Considerations

   No actions are required from IANA.

7.  Acknowledgements

   This work was supported by the EU through the StandICT CCI project,
   the StandICT CEL6 project, the NGI Pointer RIM project and the H2020
   5G-RANGE project and by the Spanish Ministry of Economy and
   Competitiveness through the 5G-City project (TEC2016-76795-C6-3-R).

   We would like to thank ICCRG chairs Reese Enghardt and Vidhi Goel for
   their support on this work.  We would also like to thank Daniel Havey
   for his help.  We would like to thank Colin Perkins and Vidhi Goel
   for their reviews and comments on earlier versions of this document.

8.  Informative References

   [Apple]    Stuart, S.C. and V.G. Vidhi, "Reduce network delays for
              your app", WWDC21 https://developer.apple.com/videos/play/
              wwdc2021/10239/, 2021.

   [COMNET1]  Bagnulo, M.B. and A.G. Garcia-Martinez, "An experimental
              evaluation of LEDBAT++", Computer Networks Volume 212,
              2022.

   [COMNET2]  Bagnulo, M.B. and A.G. Garcia-Martinez, "When less is
              more: BBR versus LEDBAT++", Computer Networks Volume 219,
              2022.

   [COMNET3]  Bagnulo, M.B., Garcia-Martinez, A.G., Mandalari, A.M.,
              Balasubramanian, P.B,., Havey, D.H., and G.M. Montenegro,
              "Design, implementation and validation of a receiver-
              driven less-than-best-effort transport", Computer
              Networks Volume 233, 2022.

   [I-D.irtf-iccrg-ledbat-plus-plus]
              Balasubramanian, P., Ertugay, O., and D. Havey, "LEDBAT++:
              Congestion Control for Background Traffic", Work in
              Progress, Internet-Draft, draft-irtf-iccrg-ledbat-plus-
              plus-01, 25 August 2020,
              <https://datatracker.ietf.org/doc/html/draft-irtf-iccrg-
              ledbat-plus-plus-01>.

   [RFC5681]  Allman, M., Paxson, V., and E. Blanton, "TCP Congestion
              Control", RFC 5681, DOI 10.17487/RFC5681, September 2009,
              <https://www.rfc-editor.org/info/rfc5681>.

Bagnulo, et al.         Expires 24 December 2024               [Page 16]
Internet-Draft                   rLEDBAT                       June 2024

   [RFC6817]  Shalunov, S., Hazel, G., Iyengar, J., and M. Kuehlewind,
              "Low Extra Delay Background Transport (LEDBAT)", RFC 6817,
              DOI 10.17487/RFC6817, December 2012,
              <https://www.rfc-editor.org/info/rfc6817>.

   [RFC7323]  Borman, D., Braden, B., Jacobson, V., and R.
              Scheffenegger, Ed., "TCP Extensions for High Performance",
              RFC 7323, DOI 10.17487/RFC7323, September 2014,
              <https://www.rfc-editor.org/info/rfc7323>.

   [RFC9293]  Eddy, W., Ed., "Transmission Control Protocol (TCP)",
              STD 7, RFC 9293, DOI 10.17487/RFC9293, August 2022,
              <https://www.rfc-editor.org/info/rfc9293>.

   [RFC9438]  Xu, L., Ha, S., Rhee, I., Goel, V., and L. Eggert, Ed.,
              "CUBIC for Fast and Long-Distance Networks", RFC 9438,
              DOI 10.17487/RFC9438, August 2023,
              <https://www.rfc-editor.org/info/rfc9438>.

   [Windows11]
              Forsmann, C.F., "What's new in Delivery Optimization",
              Microsoft Documentation https://learn.microsoft.com/en-
              us/windows/deployment/do/whats-new-do, 2023.

   [WindowsServer]
              Havey, D.H., "LEDBAT Background Data Transfer for
              Windows", Microsoft Blog 
              https://techcommunity.microsoft.com/t5/networking-
              blog/ledbat-background-data-transfer-for-windows/ba-
              p/3639278, 2022.

Appendix A.  rLEDBAT pseudo-code

   We next describe how to integrate the proposed rLEDBAT mechanisms and
   an LBE delay-based congestion control algorithm such as LEDBAT or
   LEDBAT++. We describe the integrated algorithm as two procedures, one
   that is executed when a packet is received by a rLEDBAT-enabled
   endpoint (Figure 2) and another that is executed when the rLEDBAT-
   enabled endpoint sends a packet (Figure 3).  At the beginning, rlwnd
   is set to its maximum value, so that the sending rate of the sender
   is governed by the flow control algorithm of the receiver and the TCP
   slow start mechanism of the sender, and the ackedBytes variable is
   set to 0.

   We assume that the LBE congestion control algorithm defines a
   WindowIncrease() function and a WindowDecrease() function.  For
   example, in the case of LEDBAT++, the WindowIncrease() function is an
   additive increase, while the WindowDecrease() function is a

Bagnulo, et al.         Expires 24 December 2024               [Page 17]
Internet-Draft                   rLEDBAT                       June 2024

   multiplicative decrease.  In the case of the WindowIncrease(), we
   assume that it takes as input the current window size and the the
   number of bytes that were acknowledged since the last window update
   (ackedBytes) and returns as output the updated window size.  In the
   case of WindowDecrease(), it takes as input the current window size
   and returns the updated window size.

   The data structures used in the algorithms are as follows.  The
   sentList is a list that contains the TSval and the local send time of
   each packet sent by the rLEDBAT-enabled endpoint.  The TSecr field of
   the packets received by the rLEDBAT-enabled endpoint are matched with
   the sendList to compute the RTT.

   The RTT values computed for each received packet are stored in the
   RTTlist, which contains also the received TSecr (to avoid using
   multiple packets with the same TSecr for RTT calculations, only the
   first packet received for a given TSecr is used to compute the RTT).
   It also contains the local time at which the packet was received, to
   allow selecting the RTTs measured in a given period (e.g., in the
   last 10 minutes).  RTTlist is initialized with all its values to its
   maximum.

   procedure receivePacket()
     //Looks for first sent packet with same TSval as TSecr, and,
     //returns time difference
     receivedRTT = computeRTT(sentList, receivedTSecr, receivedTime)

     //Inserts minimum value for a given receivedTSecr
     //note that many received packets may contain same receivedTSecr
     insertRTT (RTTlist, receivedRTT, receivedTSecr, receivedTime)

     filteredRTT = minLastKMeasures(RTTlist, K=4)
     baseRTT = minLastNSeconds(RTTlist, N=180)
     qd = filteredRTT - baseRTT

     //ackedBytes is the number of bytes that can be used to reduce
     //the Receive Window - without shrinking it - if necessary
     ackedBytes = ackedBytes + receiveBytes

     if retransmittedPacketDetected then
           rlwnd = DecreaseWindow(rlwnd) // Only once per RTT
     end if
     if qd < T then
           rlwnd = IncreaseWindow(rlwnd, ackedBytes)
     else
           rlwnd = DecreaseWindow(rlwnd)
     end if
   end procedure

Bagnulo, et al.         Expires 24 December 2024               [Page 18]
Internet-Draft                   rLEDBAT                       June 2024

           Figure 2: Procedure executed when a packet is received

   procedure SENDPACKET
     if (rlwnd > rlwndPrevious) or (rlwnd - rlwndPrevious < ackedBytes)
     then
           rlwndPrevious = rlwnd
     else
           rlwndPrevious = rlwnd - ackedBytes
     end if
     ackedBytes = 0
     rlwndPrevious = rlwnd

     //Compute the RWND to include in the packet
     rlwnd = min(rlwnd, fcwnd)
   end procedure

             Figure 3: Procedure executed when a packet is sent

Authors' Addresses

   Marcelo Bagnulo
   Universidad Carlos III de Madrid
   Email: marcelo@it.uc3m.es

   Alberto Garcia-Martinez
   Universidad Carlos III de Madrid
   Email: alberto@it.uc3m.es

   Gabriel Montenegro
   Email: g.e.montenegro@hotmail.com

   Praveen Balasubramanian
   Confluent
   Email: pravb.ietf@gmail.com

Bagnulo, et al.         Expires 24 December 2024               [Page 19]