Skip to main content

Problem Statement for High Performance Wide Area Networks
draft-xiong-hpwan-problem-statement-00

The information below is for an old version of the document.
Document Type
This is an older version of an Internet-Draft whose latest revision state is "Active".
Authors Quan Xiong , Kehan Yao , Cancan Huang , Han Zhengxin , Junfeng Zhao
Last updated 2024-12-05
RFC stream (None)
Formats
Stream Stream state (No stream defined)
Consensus boilerplate Unknown
RFC Editor Note (None)
IESG IESG state I-D Exists
Telechat date (None)
Responsible AD (None)
Send notices to (None)
draft-xiong-hpwan-problem-statement-00
Network Working Group                                           Q. Xiong
Internet-Draft                                           ZTE Corporation
Intended status: Informational                                    K. Yao
Expires: 8 June 2025                                        China Mobile
                                                                C. Huang
                                                           China Telecom
                                                                  Z. Han
                                                            China Unicom
                                                                 J. Zhao
                                                                   CAICT
                                                         5 December 2024

       Problem Statement for High Performance Wide Area Networks
                 draft-xiong-hpwan-problem-statement-00

Abstract

   High Performance Wide Area Network (HP-WAN) is designed for many
   applications such as scientific research, academia, education and
   other data-intensive applications which demand large volume data
   transmission over WANs, and it needs to ensure large-scale data
   processing and provide efficient transmission services.  This
   document outlines the problems for HP-WANs.

Status of This Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at https://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on 8 June 2025.

Copyright Notice

   Copyright (c) 2024 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

Xiong, et al.              Expires 8 June 2025                  [Page 1]
Internet-Draft   Problems Statement for High Performance   December 2024

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents (https://trustee.ietf.org/
   license-info) in effect on the date of publication of this document.
   Please review these documents carefully, as they describe your rights
   and restrictions with respect to this document.  Code Components
   extracted from this document must include Revised BSD License text as
   described in Section 4.e of the Trust Legal Provisions and are
   provided without warranty as described in the Revised BSD License.

Table of Contents

   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   2
     1.1.  Requirements Language . . . . . . . . . . . . . . . . . .   3
   2.  Terminology . . . . . . . . . . . . . . . . . . . . . . . . .   3
   3.  High-performance Goals for HP-WANs  . . . . . . . . . . . . .   4
   4.  Problem Statements  . . . . . . . . . . . . . . . . . . . . .   6
     4.1.  Long-distance Delay and Slow Feedback . . . . . . . . . .   6
     4.2.  Coarse-grained Exploitation of Network Capacities . . . .   7
     4.3.  Instantaneous Traffic . . . . . . . . . . . . . . . . . .   8
     4.4.  Incast Congestion upon Bottleneck Links . . . . . . . . .   8
   5.  Security Considerations . . . . . . . . . . . . . . . . . . .   9
   6.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .   9
   7.  Acknowledgements  . . . . . . . . . . . . . . . . . . . . . .   9
   8.  References  . . . . . . . . . . . . . . . . . . . . . . . . .   9
     8.1.  Normative References  . . . . . . . . . . . . . . . . . .   9
   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  10

1.  Introduction

   As described in [I-D.kcrh-hpwan-state-of-art], data is fundamental
   for research, academia, education, industrial and other data-
   intensive applications, such as High Performance Computing (HPC) for
   scientific research, cloud storage and backup of industrial internet
   data, distributed training of Artificial Intelligence (AI), and so
   on.  Within these applications, they may generate huge volumes of
   data by using advanced instruments and high-end computing devices.
   It needs to ensure large-scale data transfer within a completion time
   and provide stable and efficient transmission services over non-
   dedicated Wide Area Networks (WANs).  These WANs need to connect
   research institutions, universities, and data centers across large
   geographical areas, and it usually requires massive data transmission
   over long-distance links.  For example, sharing data between research
   institutes must transfer over hundreds or thousands of kilometers.
   Moreover, some applications may demand a periodic and on-demand
   migration with variable transmission frequency, requiring timely data
   transmission.  The large data transfer co-existed services over WANs
   demand high performance, such as effective high-throughput, fairness
   among multiple services, and high network utilization.

Xiong, et al.              Expires 8 June 2025                  [Page 2]
Internet-Draft   Problems Statement for High Performance   December 2024

   More recently, the massive data transmission and long-distance
   connection over complicated WANs have become a key factor affecting
   the performance of existing technologies.  For example, the high-
   volume data may be transmitted over WANs, which depends on the
   transport layer protocols such as Transfer Control Protocol (TCP),
   Quick UDP Internet Connections (QUIC), Remote Direct Memory Access
   (RDMA) and so on.  The traditional congestion control mechanisms can
   not achieve the high performance, which are typically implemented at
   the host (sender and receiver) to control or prevent the congestion.
   For the host, it may adjust sending rates based on the feedback from
   the network when the packet loss or congestion occurred.  But it will
   impact the performance with the long feedback loop and it could also
   be inefficient without the fine-grained awareness of network
   capability.  For the network, it always reactively transfers the
   packets leading to low bandwidth utilization due to the bottleneck
   link and instantaneous congestion.  For example, the network could
   enhance the capability to regulate the traffic to avoid incast
   network congestion preemptively and it could also be actively
   collaborated with the host to adjust the rate efficiently and rapidly
   when congestion occurred.  The negotiation between the host and the
   network is required to assist the network operator's traffic
   management and bandwidth allocation and utilization optimization and
   help the host to adjust the rate with the network resource scheduling
   acknowledgement.  So the host with sophisticated congestion control
   upon more active network coordination should be considered to improve
   overall HP-WANs transmission performance.

   High Performance Wide Area Network (HP-WAN) is designed specifically
   to meet the high-speed, low-latency, and high-capacity needs of
   massive data set applications, which puts forward high performance
   requirements such as effective high-throughput, multiple service
   fairness and high bandwidth utilization.  This document outlines the
   problems for HP-WANs.

1.1.  Requirements Language

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
   "OPTIONAL" in this document are to be interpreted as described in BCP
   14 [RFC2119] [RFC8174] when, and only when, they appear in all
   capitals, as shown here.

2.  Terminology

   The terminology is defined as following.

Xiong, et al.              Expires 8 June 2025                  [Page 3]
Internet-Draft   Problems Statement for High Performance   December 2024

   High Performance Wide Area Networks (HP-WANs): indicate the wide area
   networks designed specifically to meet the high-speed, low-latency,
   and high-capacity needs of research, academia, education, industrial
   and other data-intensive applications.  The primary goal of HP-WAN is
   to achieve massive data transmission within a completion time, which
   puts forward high performance requirements such as effective high-
   throughput, multiple service fairness and high bandwidth utilization.

   It also makes use of the following abbreviations and definitions in
   this document:

   DC:            Data Center

   DCI:           Data Centers Interconnection

   HPC:           High Performance Computing

   WAN:           Wide Area Networks

   MAN:           Metropolitan Area Networks

   PFC:           Priority Flow Control

   ECN:           Explicit Congestion Notification

   ECMP:          Equal-Cost Multipath

   RTT:           Round-Trip Time

   TCP:           Transfer Control Protocol

   RDMA:          Remote Direct Memory Access

   QUIC:          Quick UDP Internet Connections

3.  High-performance Goals for HP-WANs

   The services need to be provided in HP-WANs mainly focus on massive
   data with timely transmission while multiple services may co-exist
   over long-distance networks as described below.

   *  Massive data transmission, bulk or high-volume data transfer, e.g.
      the data volume of a flow could be at 2Gbps~1Tbps.

   *  Timely data transmission, it has a completion time but without
      strict real-time transmission requirements, e.g.
      minutes~milliseconds.

Xiong, et al.              Expires 8 June 2025                  [Page 4]
Internet-Draft   Problems Statement for High Performance   December 2024

   *  Predictable transmission, the transmission frequency is variable
      and predictable, e.g. a periodic or on-demand migration migration.

   *  Long-distance transmission over non-dedicated WANs, between one or
      more sites or DCs, e.g.more than 100km or 1000km.

   *  Multiple services are co-existed with concurrent flows.

   *  Minimize cost.

   *  Data security and integrity.

   From the application perspective, it is required to achieve effective
   high-throughput data transmission for an HP-WAN flow to meet a
   completion time.  Moreover, it is also crucial to maximize bandwidth
   utilization while ensuring fairness among multiple services.  This
   document outlines the high-performance requirements for HP-WANs as
   described below.

   *  Effective high-throughput: HP-WANs put forward high performance
      requirements for the throughput of high-volume data transmission
      within a completion time over WANs.  It will be impacted by the
      performance indicators such as bandwidth, packet loss ratio,
      latency and so on, for example, the packet loss and RTT are
      negatively correlated with throughput.  It is required to achieve
      ultra-high goodput, ultra-low packet loss ratio, low latency and
      resilience to ensure effective high-throughput transmission in HP-
      WANs.

   *  Multiple services fairness: HP-WANs put forward high performance
      requirements for fairness when multiple services are co-existed
      with concurrent flows.  It refers to ensuring that different types
      of services can obtain reasonable resources and services in
      network resource allocation and management in order to meet their
      respective quality of service (QoS) requirements, while ensuring
      the fairness of resource allocation.

   *  Ultra-high bandwidth utilization: HP-WANs put forward high
      performance requirements for the bandwidth utilization of the
      network.  It needs to efficiently use available network capacity
      to maximize data transfer rates and minimize latency to achieve
      the low cost in HP-WANs.  It is required to achieve bandwidth
      utilization rate exceeding 90% to ensure that network resources
      are fully utilized.

Xiong, et al.              Expires 8 June 2025                  [Page 5]
Internet-Draft   Problems Statement for High Performance   December 2024

4.  Problem Statements

   It will be challenging to provide effective high-performance
   transmission in HP-WANs scenarios with massive concurrent services
   and long-distance delays and packet loss.  The long-distance networks
   may have more uncertainties, such as long Round-Trip Time (RTT)
   latency, routing changes, network congestion, packet loss and link
   quality fluctuations, all of which may have a negative impact on the
   throughput.  The services are massive and concurrent with multiple
   types and different traffic models such as the elephant flows with
   short interval time, high speed and large data scale, which may
   occupy a large amount of network resources and lead to the unfairness
   among different flows, low network utilization and cost-
   effectiveness.

   The existing network technologies have various problems and cannot
   meet the performance requirements.  This document outlines the
   problems for HP-WANs.

4.1.  Long-distance Delay and Slow Feedback

   Several congestion control algorithms are implemented such as loss-
   based congestion control algorithms (e.g.  Reno and CUBIC, it depends
   the congestion notification with packet loss) and congestion-based
   congestion control algorithms (e.g.  BBR, it depends on the
   measurement of congestion).  It will delay the network state feedback
   due to the long-distance transmission delays and large RTT, resulting
   in the inability to adjust the transmission rate in a timely manner.
   It will be challenging for congestion control in WANs for controlling
   the total amount of data entering the network to maintain the traffic
   at an acceptable level.  Feedback should be independent of the
   transmission distance, and as timely as possible.

   For example, Explicit Congestion Notification (ECN) can be used for
   Reno and CUBIC to achieve an end-to-end congestion notification based
   on IP and transport layers.  When a congestion occurred, the network
   may signal congestion by ECN markings or by dropping packets, and the
   receiver passes this information back to the sender in transport-
   layer acknowledgements, notifying the source to adjust the
   transmission rate to achieve congestion control.  The long-distance
   will delay the notification and slow the feedback, which result in
   untimely adjustment and buffer overflow, causing a decrease in
   network performance.  Especially for incast congestion based on
   multi-source targeting, the network needs to send a fast feedback
   based on offered load.

Xiong, et al.              Expires 8 June 2025                  [Page 6]
Internet-Draft   Problems Statement for High Performance   December 2024

   For BBR, it actively measures bottleneck bandwidth (BtlBw) and round-
   trip propagation time (RTprop) based on the model to calculate the
   bandwidth delay product (BDP) and then to adjust the transmission
   rate to maximize throughput and minimize latency.  But BBR relies on
   real-time measurement of the parameters which may vary greatly,
   feedback slowly, thereby affecting the control precision of BBR in
   long-distance networks.

   Moreover, other congestion control algorithms such as the Data Center
   Quantized Congestion Notification (DCQCN) and High Precision
   Congestion Control (HPCC++) would not tolerate the slow feedback loop
   over WANs.

4.2.  Coarse-grained Exploitation of Network Capacities

   The existing congestion control mechanisms focus on rate adjustment,
   which can control the sending rate of data flows at the source of
   data transmission, thereby avoiding or reducing network congestion.
   It will be challenging for the host to adjust the sending rates
   efficiently without the awareness of network capacity.  For example,
   for CUBIC, as per [RFC9438], when the packet loss is detected using
   classic ECN mechanism, it will reduce the congestion window based on
   its multiplicative window decrease factor, that will adjust the
   sending rate with sawtooth pattern.  And for L4S as per [RFC9330], it
   uses more frequent ECN tagging to provide low latency and scalable
   throughput and to reduce the convergence time and eliminate the
   sawtooth effect.  However, due to ECN feedback of congestion and
   frequent rate adjustment, it will result in significant changes in
   throughput, which affects bandwidth utilization and transmission
   efficiency.  It still lack more accurate network information which is
   critical for significant transmission capacity gaps between the
   appropriate sending rate and the available network capacity
   especially when transmitting the high-volume data over WANs .

   Moreover, it incurs inconsistency between the sending rate of the
   host and the network transmission capability to achieve accurate
   sending rate adjusting.  For example, when determining the starting
   rate of data transmission, the slow start in congestion control will
   lead to overall throughput bottleneck with insufficient bandwidth
   utilization and fail to fully unleash the potential of the network
   capacity.  But the fast start can not adapt to the cache capacity of
   network devices especially when multiple flows are transmitted over
   the same link, causing network congestion and resulting in packet
   loss and transmission delay.  For HP-WANs, the fine-grained network-
   aware sending rate negotiation needs to comprehensively consider
   factors such as predictable network bandwidth, latency, packet loss
   rate, while balancing bandwidth utilization and congestion avoidance
   in WANs.

Xiong, et al.              Expires 8 June 2025                  [Page 7]
Internet-Draft   Problems Statement for High Performance   December 2024

4.3.  Instantaneous Traffic

   From the network perspective, it can just reactively transfer the
   high-volume data without scheduling the predictable traffic and
   network resources to estimate network congestion preemptively.  It
   will be challenging for the network without the awareness of
   instantaneous traffic which will occupy a large amount of network
   resources, resulting in low bandwidth utilization due to the uneven
   resource allocation.

   For example, in HP-WAN applications, a large amount of data will be
   transmitted, e.g. the data volumes of a single flow may be from 10G
   to 1TB, the massive data transferring with large burst may cause
   instantaneous congestion, packet loss, and queuing delay within
   network devices in WANs.  There will be more aggregations at the edge
   of WANs and it may be accumulated as the flows traverse, join, and
   separate over hops.  It will be challenging for unmanageable
   congestion control for the bursty traffic.

   Moreover, goodput bottleneck with transmission completion time and
   duration brings traffic scheduling challenging.  The applications may
   have multiple concurrent services co-existed with existing dynamic
   flows.  Considering the multiple services with various types and
   different traffic requirements, the traffic is required to be
   scheduled to multiple paths and fine-grained network resources to
   achieve high utilization and QoS guarantee.

4.4.  Incast Congestion upon Bottleneck Links

   It will be challenging for incast congestion causing by bottleneck
   links bandwidth in long-distance and multi-hop networks.  And it will
   be difficult to control packet loss, queuing latency and jitter
   leading to the decrease of throughput.  Incast traffic is the
   mastermind of congestion for the greedy transmission.  The network
   may regulate them to avoid congestion preemptively.  It may
   proactively avoid the path-level congestion and operate actively
   reserving and allocating network bandwidth through a scheduler to
   match the bottleneck link bandwidth as much as possible, thus fully
   utilizing bandwidth and preventing packet loss.

   Moreover, the congestion in the network can be reduced, thereby
   reducing packet loss caused by buffer overflow, through effective
   flow control which refers to a method for ensuring the data is
   transmitted efficiently and reliably and controlling the rate of data
   transmission to prevent the fast sender from overwhelming the slow
   receiver and prevent packet loss in congested situations.  But it
   will be challenging to ensure the fairness among multiple services
   over different distances due to the unequal allocation of network

Xiong, et al.              Expires 8 June 2025                  [Page 8]
Internet-Draft   Problems Statement for High Performance   December 2024

   resources among flows with different RTTs.  For example, some flows
   may occupy more bandwidth due to the use of large window sizes,
   smaller RTTs, or larger packets.

5.  Security Considerations

   This document covers several of representative applications and
   network scenarios that are expected to make use of HP-WAN
   technologies.  Each of the potential use cases does not raise any
   security concerns or issues, but may have security considerations
   from both the use-specific perspective and the technology-specific
   perspective.

6.  IANA Considerations

   This document makes no requests for IANA action.

7.  Acknowledgements

   The authors would like to acknowledge Guangping Huang, Yao Liu and
   Zheng Zhang for their thorough review and very helpful comments.

8.  References

8.1.  Normative References

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119,
              DOI 10.17487/RFC2119, March 1997,
              <https://www.rfc-editor.org/info/rfc2119>.

   [RFC3168]  Ramakrishnan, K., Floyd, S., and D. Black, "The Addition
              of Explicit Congestion Notification (ECN) to IP",
              RFC 3168, DOI 10.17487/RFC3168, September 2001,
              <https://www.rfc-editor.org/info/rfc3168>.

   [RFC7424]  Krishnan, R., Yong, L., Ghanwani, A., So, N., and B.
              Khasnabish, "Mechanisms for Optimizing Link Aggregation
              Group (LAG) and Equal-Cost Multipath (ECMP) Component Link
              Utilization in Networks", RFC 7424, DOI 10.17487/RFC7424,
              January 2015, <https://www.rfc-editor.org/info/rfc7424>.

   [RFC8174]  Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC
              2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174,
              May 2017, <https://www.rfc-editor.org/info/rfc8174>.

Xiong, et al.              Expires 8 June 2025                  [Page 9]
Internet-Draft   Problems Statement for High Performance   December 2024

   [RFC8664]  Sivabalan, S., Filsfils, C., Tantsura, J., Henderickx, W.,
              and J. Hardwick, "Path Computation Element Communication
              Protocol (PCEP) Extensions for Segment Routing", RFC 8664,
              DOI 10.17487/RFC8664, December 2019,
              <https://www.rfc-editor.org/info/rfc8664>.

   [RFC9232]  Song, H., Qin, F., Martinez-Julia, P., Ciavaglia, L., and
              A. Wang, "Network Telemetry Framework", RFC 9232,
              DOI 10.17487/RFC9232, May 2022,
              <https://www.rfc-editor.org/info/rfc9232>.

   [RFC9330]  Briscoe, B., Ed., De Schepper, K., Bagnulo, M., and G.
              White, "Low Latency, Low Loss, and Scalable Throughput
              (L4S) Internet Service: Architecture", RFC 9330,
              DOI 10.17487/RFC9330, January 2023,
              <https://www.rfc-editor.org/info/rfc9330>.

   [RFC9438]  Xu, L., Ha, S., Rhee, I., Goel, V., and L. Eggert, Ed.,
              "CUBIC for Fast and Long-Distance Networks", RFC 9438,
              DOI 10.17487/RFC9438, August 2023,
              <https://www.rfc-editor.org/info/rfc9438>.

Authors' Addresses

   Quan Xiong
   ZTE Corporation
   China
   Email: xiong.quan@zte.com.cn

   Kehan Yao
   China Mobile
   China
   Email: yaokehan@chinamobile.com

   Cancan Huang
   China Telecom
   China
   Email: huangcanc@chinatelecom.cn

   Zhengxin Han
   China Unicom
   China
   Email: hanzx21@chinaunicom.cn

Xiong, et al.              Expires 8 June 2025                 [Page 10]
Internet-Draft   Problems Statement for High Performance   December 2024

   Junfeng Zhao
   CAICT
   Beijing
   China
   Email: zhaojunfeng@caict.ac.cn

Xiong, et al.              Expires 8 June 2025                 [Page 11]