Problem Statement for High Performance Wide Area Networks
draft-xiong-hpwan-problem-statement-03
This document is an Internet-Draft (I-D).
Anyone may submit an I-D to the IETF.
This I-D is not endorsed by the IETF and has no formal standing in the
IETF standards process.
| Document | Type | Active Internet-Draft (individual) | |
|---|---|---|---|
| Authors | Quan Xiong , Kehan Yao , Cancan Huang , Han Zhengxin , Junfeng Zhao | ||
| Last updated | 2026-03-02 | ||
| Replaces | draft-xiong-hpwan-uc-req-problem | ||
| RFC stream | (None) | ||
| Intended RFC status | (None) | ||
| Formats | |||
| Stream | Stream state | (No stream defined) | |
| Consensus boilerplate | Unknown | ||
| RFC Editor Note | (None) | ||
| IESG | IESG state | I-D Exists | |
| Telechat date | (None) | ||
| Responsible AD | (None) | ||
| Send notices to | (None) |
draft-xiong-hpwan-problem-statement-03
Network Working Group Q. Xiong
Internet-Draft ZTE Corporation
Intended status: Informational K. Yao
Expires: 3 September 2026 China Mobile
C. Huang
China Telecom
Z. Han
China Unicom
J. Zhao
CAICT
2 March 2026
Problem Statement for High Performance Wide Area Networks
draft-xiong-hpwan-problem-statement-03
Abstract
High Performance Wide Area Network (HP-WAN) is designed for many
applications such as scientific research, academia, education and
other data-intensive applications which demand high-speed data
transmission over WANs, and it needs to provide high-throughput
transmission within a completion time. This document outlines the
problems for HP-WANs.
Status of This Memo
This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet-
Drafts is at https://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
This Internet-Draft will expire on 3 September 2026.
Copyright Notice
Copyright (c) 2026 IETF Trust and the persons identified as the
document authors. All rights reserved.
Xiong, et al. Expires 3 September 2026 [Page 1]
Internet-Draft Problems Statement for High Performance March 2026
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents (https://trustee.ietf.org/
license-info) in effect on the date of publication of this document.
Please review these documents carefully, as they describe your rights
and restrictions with respect to this document. Code Components
extracted from this document must include Revised BSD License text as
described in Section 4.e of the Trust Legal Provisions and are
provided without warranty as described in the Revised BSD License.
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1. Requirements Language . . . . . . . . . . . . . . . . . . 3
2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 4
3. Technical Goals for HP-WANs . . . . . . . . . . . . . . . . . 4
4. Problem Statement . . . . . . . . . . . . . . . . . . . . . . 5
4.1. Poor Convergence Speed . . . . . . . . . . . . . . . . . 6
4.2. Unscheduled Traffic . . . . . . . . . . . . . . . . . . . 6
4.3. Long Feedback Loop . . . . . . . . . . . . . . . . . . . 7
4.4. Multi-flow Concurrent Transmission . . . . . . . . . . . 8
5. Security Considerations . . . . . . . . . . . . . . . . . . . 8
6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 8
7. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 8
8. References . . . . . . . . . . . . . . . . . . . . . . . . . 8
8.1. Normative References . . . . . . . . . . . . . . . . . . 8
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 10
1. Introduction
As described in [I-D.kcrh-hpwan-state-of-art], data is fundamental
for research, academia, education, industrial and other data-
intensive applications, such as High Performance Computing (HPC) for
scientific research, cloud storage and backup of industrial internet
data, distributed training of Artificial Intelligence (AI), and so
on. The use cases in non-dedicated networks from public operators
such as large file transfer, traffic across data centers and sharing
traffic between dedicated network and non-dedicated network are also
described in [I-D.yx-hpwan-uc-requirements-public-operator].
Xiong, et al. Expires 3 September 2026 [Page 2]
Internet-Draft Problems Statement for High Performance March 2026
Within these applications, they may generate huge volumes of data by
using advanced instruments and high-end computing devices. They need
to be connected between research institutions, universities, and data
centers across large geographical areas over long-distance links.
For example, sharing data between research institutes must transfer
over hundreds or thousands of kilometers. It needs to ensure large-
scale data transfer and provide stable and efficient transmission
services over Wide Area Networks (WANs). These applications may
require a periodic or on-demand high-speed transfer with variable
start time, data volume and transmission patterns, which demanding
data transmission within a completion time.
More recently, the massive data transmission and long-distance
connection over WANs have become a key factor affecting the
performance of existing transport layer protocols such as Transfer
Control Protocol (TCP), Quick UDP Internet Connections (QUIC), Remote
Direct Memory Access (RDMA) and so on. Moreover, the traditional
congestion control algorithms are typically implemented at the host
(sender and receiver) perform blind transmission by controlling the
size of the congestion window with rate adjusting by detection of
overloaded links. It will be difficult to predict the performance
due to the unpredictable behaviour of the WANs. For example, for the
host, without awareness of network capability, it will lead to a poor
convergence speed impacting the completion time due to the slow start
and passive rates adjusting. It will also lead to RTT fluctuation
due to large buffer and long queues upon long feedback loop. For the
network, it will transfer the unscheduled traffic with low bandwidth
utilization due to the bottleneck links and instantaneous congestion.
A concurrent transmission of multiple flows can lead to slow-flow
tailing and deviations in Flow Completion Time (FCT) jitter. All of
above will impact the performance and result in the untimely
transmission of high-volume data.
High Performance Wide Area Network (HP-WAN) is designed for many
applications such as scientific research, academia, education and
other data-intensive applications which demand high-speed data
transmission over WANs, and it needs to provide high-throughput
transmission within a completion time. A variety of problems about
what are specifically in the way for HP-WAN requirements are outlined
in this document.
1.1. Requirements Language
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
"OPTIONAL" in this document are to be interpreted as described in BCP
14 [RFC2119] [RFC8174] when, and only when, they appear in all
capitals, as shown here.
Xiong, et al. Expires 3 September 2026 [Page 3]
Internet-Draft Problems Statement for High Performance March 2026
2. Terminology
This document adopts the terminology defined in
[I-D.kcrh-hpwan-state-of-art].
It also makes use of the following abbreviations and definitions in
this document:
BDP: Bandwidth Delay Product
DC: Data Center
DCI: Data Centers Interconnection
HPC: High Performance Computing
WAN: Wide Area Networks
PFC: Priority Flow Control
ECN: Explicit Congestion Notification
ECMP: Equal-Cost Multipath
RTT: Round-Trip Time
TCP: Transfer Control Protocol
RDMA: Remote Direct Memory Access
QUIC: Quick UDP Internet Connections
FCT: Flow Completion Time
3. Technical Goals for HP-WANs
The services need to be provided in HP-WANs mainly focus on massive
data with timely transmission while multiple services may co-exist
over long-distance WANs as described below.
* Massive data transmission, high-volume data with high-speed
transfer, e.g. the data speed of a flow could be at 2Gbps~1Tbps.
* Requested completion time, the data transmission should be
completed within a requested completion time, e.g. the completion
time could be minutes~milliseconds.
Xiong, et al. Expires 3 September 2026 [Page 4]
Internet-Draft Problems Statement for High Performance March 2026
* Scheduled transmission, traffic patterns could be scheduled by the
sender, e.g. data volume, start time, finish time, service type.
* Long-distance transmission over non-dedicated WANs, with multiple
hops and domains, long RTT latency, routing changes, network
congestion, packet loss, and link quality fluctuations, e.g. the
distance between two sites or DCs could be more than 100km or
1000km.
* Multiple services are co-existed with concurrent flows.
It is required to achieve high-speed data transmission within a
completion time. Moreover, it is also crucial to maximize bandwidth
utilization while ensuring fairness among multiple services. This
document outlines the technical goals for HP-WANs as described below.
* Completion time: achieve the target job completion time within
seconds to minutes, while meeting FCT requirements for all
incoming traffic flows.
* High throughput: ensuring the high-speed data transmission within
a requested completion time for a flow, which could be impacted by
the bandwidth, convergence speed, start time and RTT.
* Efficient use of capacity: efficiently using available network
capacity with fairness to maximize data transfer rates and
minimize the completion time for multiple flows.
* Efficient transmission of concurrent multi-flows: ensuring fair
sharing of link resources among multiple concurrent flows,
avoiding slow-flow tailing and FCT jitter caused by competition of
multi-flows.
4. Problem Statement
The specific requirements of HP-WANs may encompass a wide range of
aspects. These include transport-related technologies such as proxy,
flow control, QoS negotiation, congestion control, admission control
and traffic scheduling. Additionally, they also involve routing-
related technologies like traffic engineering, resource scheduling,
and load balancing.
Existing network technologies face numerous challenges and fall short
of meeting performance requirements. This document highlights the
key issues associated with HP-WANs in the following sub-sections.
Xiong, et al. Expires 3 September 2026 [Page 5]
Internet-Draft Problems Statement for High Performance March 2026
4.1. Poor Convergence Speed
The traditional congestion control mechanisms perform blind
transmission by controlling the size of the congestion window with
rate adjusting by detection of overloaded links. WAN is a black box
to provide unpredictable behaviors for high-speed transmission due to
the issues such as multiple hops and domains, long Round-Trip Time
(RTT), routing changes, network congestion, packet loss, and link
quality fluctuations. The BDP (Bandwidth Delay Product) which
represents the maximum amount of data that can be in transit on the
network at any given time is variable over WANs, so the inflight data
is difficult to predict for host-based congestion control algorithms.
It will lead to the poor convergence speed that the host always takes
significantly long time to identify the optimal sending rate
comparing to the requested completion time.
For example, it will use the slow start and blind detection with
unawareness of network capability leading to long convergence time
such as Cubic (e.g.over 50s), BBR (e.g.over 30s) and BBRv2
(e.g.30~50s). BBR divides the entire process into four stages,
Startup, Drain, ProbeBW and ProbeRTT. The probe cycle of ProbeRTT
state is long, e.g. 10s. The convergence time will be multiple probe
cycle which will impact the completion time at seconds level. There
is a significant transmission capacity gaps between the appropriate
sending rate and the available network capacity. The transport
protocols should signal and collaborate with the network to negotiate
the rate for the host to send traffic.
4.2. Unscheduled Traffic
The host sending large unscheduled traffic without collaboration will
lead to the instantaneous congestion in WANs. For multiple high-
speed flows, the random arrival and departure of cross-traffic
without scheduling creates significant fluctuations for available
capacity in WANs. The network infrastructure may struggle to handle
high-volume data transfers efficiently if applications do not
proactively schedule the traffic. Without awareness of the traffic
patterns, the network risks unscheduled resource allocation, leading
to low bottleneck bandwidth utilization, reduced overall throughput,
and uncontrolled completion time.
For example, for HPC applications, a large amount of data will be
transmitted, e.g. the data volumes of a single flow may be from 10G
to 1TB, the host sends the unscheduled large traffic leading to the
instantaneous congestion, packet loss, and queuing delay within
network devices in WANs, resulting in low throughput. Considering
the multiple services with various types of flows, the optimal
bandwidth and transmission time may be different and the traffic is
Xiong, et al. Expires 3 September 2026 [Page 6]
Internet-Draft Problems Statement for High Performance March 2026
random to join and leave without to be scheduled to multiple paths
and fine-grained network resources, which can not achieve the timely
transmission. The resource of WANs should be scheduled at the
elements along the path to provide predictable capability for high-
speed transmission.
4.3. Long Feedback Loop
The congestion algorithms are implemented by controlling the size of
the congestion window and adjusting the sending rates upon the
network status feedback. It will delay the network feedback due to
the long-distance transmission delays and large RTT, resulting in the
inability to adjust the transmission rate in a timely manner. It
will be challenging for congestion control over WANs for controlling
the total amount of data entering the network to maintain the traffic
at an acceptable level, leading to RTT fluctuation due to long queues
and large buffer at network devices with high-speed transmission upon
the long network state feedback loop. Especially when multiple flows
targeting an aggregating node, the maximum value is exceeding devices
buffer capacity.
For example, the loss-based congestion control algorithms, such as
Reno and CUBIC, depends on the congestion notification with packet
loss. Explicit Congestion Notification (ECN) can be used to achieve
an end-to-end congestion notification based on IP and transport
layers. When a congestion occurred, the network may signal
congestion by ECN markings or by dropping packets, and the receiver
passes this information back to the sender in transport-layer
acknowledgements, notifying the source to adjust the transmission
rate. It will use the slow start, requiring large buffer which is
impacted by multiple hops and long RTT latency over WANs.
And the congestion-based congestion control algorithms such as BBR,
depends on the measurement of congestion, it actively measures
bottleneck bandwidth (BtlBw) and round-trip propagation time (RTprop)
based on the model to calculate the BDP and then to adjust the
transmission rate to maximize throughput and minimize latency. But
BBR relies on real-time measurement of the parameters, and will
optimize the buffer overflow, but it is not significant under large
RTT, e.g. retransmission will increase when the buffer size is less
than two BDPs, thereby affecting the control precision of BBR in
long-distance networks.
Xiong, et al. Expires 3 September 2026 [Page 7]
Internet-Draft Problems Statement for High Performance March 2026
4.4. Multi-flow Concurrent Transmission
An AI/HPC job may be decomposed into multiple tasks for parallel
transmissions over a network. The insufficient transmission
throughput and blind competition among multiple flows will lead to
slow flow tailing and FCT transmission jitter.
For a single flow, traditional congestion control mechanisms
implemented on hosts lack rate controls, resulting in unbounded rate
adjustments and the transmission rate exhibits a sawtooth-like
fluctuation. When this flow is transmitted concurrently with other
flows, it causes competing for bottleneck bandwidth, resulting in
tail latency that drags down overall task throughput. This will
trigger queuing delays and congestion packet loss, creating slow
flows and making the completion time of a single flow uncontrollable.
For multiple flows within a job, the passive competition for
bandwidth resources often leads to a cyclical pattern of "peak
overflows" (causing queuing delays) and "valley underflows" (causing
waiting delays), resulting in significant jitter and deviation in
FCTs of multiple flows. The FCT jitter significantly undermines job
completion reliability and performance in concurrent network
environments.
5. Security Considerations
This document covers several of representative applications and
network scenarios that are expected to make use of HP-WAN
technologies. Each of the potential use cases does not raise any
security concerns or issues, but may have security considerations
from both the use-specific perspective and the technology-specific
perspective.
6. IANA Considerations
This document makes no requests for IANA action.
7. Acknowledgements
The authors would like to acknowledge Bin Tan, Guangping Huang, Yao
Liu and Zheng Zhang for their thorough review and very helpful
comments.
8. References
8.1. Normative References
Xiong, et al. Expires 3 September 2026 [Page 8]
Internet-Draft Problems Statement for High Performance March 2026
[I-D.kcrh-hpwan-state-of-art]
King, D., Chown, T., Rapier, C., Huang, D., and K. Yao,
"Current State of the Art for High Performance Wide Area
Networks", Work in Progress, Internet-Draft, draft-kcrh-
hpwan-state-of-art-03, 20 October 2025,
<https://datatracker.ietf.org/doc/html/draft-kcrh-hpwan-
state-of-art-03>.
[I-D.yx-hpwan-uc-requirements-public-operator]
Yao, K. and Q. Xiong, "High Performance Wide Area Network
(HPWAN) Use Cases and Requirements -- From Public
Operator's View", Work in Progress, Internet-Draft, draft-
yx-hpwan-uc-requirements-public-operator-00, 20 February
2025, <https://datatracker.ietf.org/doc/html/draft-yx-
hpwan-uc-requirements-public-operator-00>.
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119,
DOI 10.17487/RFC2119, March 1997,
<https://www.rfc-editor.org/info/rfc2119>.
[RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition
of Explicit Congestion Notification (ECN) to IP",
RFC 3168, DOI 10.17487/RFC3168, September 2001,
<https://www.rfc-editor.org/info/rfc3168>.
[RFC7424] Krishnan, R., Yong, L., Ghanwani, A., So, N., and B.
Khasnabish, "Mechanisms for Optimizing Link Aggregation
Group (LAG) and Equal-Cost Multipath (ECMP) Component Link
Utilization in Networks", RFC 7424, DOI 10.17487/RFC7424,
January 2015, <https://www.rfc-editor.org/info/rfc7424>.
[RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC
2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174,
May 2017, <https://www.rfc-editor.org/info/rfc8174>.
[RFC8664] Sivabalan, S., Filsfils, C., Tantsura, J., Henderickx, W.,
and J. Hardwick, "Path Computation Element Communication
Protocol (PCEP) Extensions for Segment Routing", RFC 8664,
DOI 10.17487/RFC8664, December 2019,
<https://www.rfc-editor.org/info/rfc8664>.
[RFC9232] Song, H., Qin, F., Martinez-Julia, P., Ciavaglia, L., and
A. Wang, "Network Telemetry Framework", RFC 9232,
DOI 10.17487/RFC9232, May 2022,
<https://www.rfc-editor.org/info/rfc9232>.
Xiong, et al. Expires 3 September 2026 [Page 9]
Internet-Draft Problems Statement for High Performance March 2026
[RFC9331] De Schepper, K. and B. Briscoe, Ed., "The Explicit
Congestion Notification (ECN) Protocol for Low Latency,
Low Loss, and Scalable Throughput (L4S)", RFC 9331,
DOI 10.17487/RFC9331, January 2023,
<https://www.rfc-editor.org/info/rfc9331>.
[RFC9438] Xu, L., Ha, S., Rhee, I., Goel, V., and L. Eggert, Ed.,
"CUBIC for Fast and Long-Distance Networks", RFC 9438,
DOI 10.17487/RFC9438, August 2023,
<https://www.rfc-editor.org/info/rfc9438>.
Authors' Addresses
Quan Xiong
ZTE Corporation
China
Email: xiong.quan@zte.com.cn
Kehan Yao
China Mobile
China
Email: yaokehan@chinamobile.com
Cancan Huang
China Telecom
China
Email: huangcanc@chinatelecom.cn
Zhengxin Han
China Unicom
China
Email: hanzx21@chinaunicom.cn
Junfeng Zhao
CAICT
Beijing
China
Email: zhaojunfeng@caict.ac.cn
Xiong, et al. Expires 3 September 2026 [Page 10]