Skip to main content

Scenarios and Deployment Considerations for High Performance Wide Area Network
draft-zhao-hpwan-scenarios-deployment-00

Document Type Active Internet-Draft (individual)
Authors Junfeng Zhao , Quan Xiong
Last updated 2024-10-21
RFC stream (None)
Intended RFC status (None)
Formats
Stream Stream state (No stream defined)
Consensus boilerplate Unknown
RFC Editor Note (None)
IESG IESG state I-D Exists
Telechat date (None)
Responsible AD (None)
Send notices to (None)
draft-zhao-hpwan-scenarios-deployment-00
Network Working Group                                            J. Zhao
Internet-Draft                                                     CAICT
Intended status: Informational                                  Q. Xiong
Expires: 24 April 2025                                   ZTE Corporation
                                                         21 October 2024

 Scenarios and Deployment Considerations for High Performance Wide Area
                                Network
                draft-zhao-hpwan-scenarios-deployment-00

Abstract

   This document describes the typical scenarios and deployment
   considerations for High Performance Wide Area Networks (HP-WANs).  It
   also provides simulation results for data transmission in WANs and
   analyses the impacts on throughput..

Status of This Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at https://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on 24 April 2025.

Copyright Notice

   Copyright (c) 2024 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents (https://trustee.ietf.org/
   license-info) in effect on the date of publication of this document.
   Please review these documents carefully, as they describe your rights
   and restrictions with respect to this document.  Code Components
   extracted from this document must include Revised BSD License text as
   described in Section 4.e of the Trust Legal Provisions and are
   provided without warranty as described in the Revised BSD License.

Zhao & Xiong              Expires 24 April 2025                 [Page 1]
Internet-Draft   Scenarios and Deployment Considerations    October 2024

Table of Contents

   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   2
     1.1.  Requirements Language . . . . . . . . . . . . . . . . . .   3
   2.  Terminology . . . . . . . . . . . . . . . . . . . . . . . . .   3
   3.  Typical Scenarios for HP-WANs . . . . . . . . . . . . . . . .   3
     3.1.  Long-distance Data Transmission . . . . . . . . . . . . .   3
     3.2.  Collaborative and Interactive Data Transmission . . . . .   4
   4.  Deployment Considerations for HP-WANs . . . . . . . . . . . .   5
     4.1.  Host Optimization Deployment  . . . . . . . . . . . . . .   5
     4.2.  WAN optimization Deployment . . . . . . . . . . . . . . .   6
     4.3.  Gateway Deployment  . . . . . . . . . . . . . . . . . . .   6
   5.  Simulation Results  . . . . . . . . . . . . . . . . . . . . .   7
     5.1.  The Impact of Long-distance Delay . . . . . . . . . . . .   7
     5.2.  The Impact of Packet Loss . . . . . . . . . . . . . . . .   8
   6.  Security Considerations . . . . . . . . . . . . . . . . . . .   9
   7.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .   9
   8.  Acknowledgements  . . . . . . . . . . . . . . . . . . . . . .   9
   9.  References  . . . . . . . . . . . . . . . . . . . . . . . . .   9
     9.1.  Normative References  . . . . . . . . . . . . . . . . . .   9
   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  10

1.  Introduction

   As per [I-D.xiong-hpwan-uc-req-problem], High Performance Wide Area
   Network (HP-WAN) puts forward higher performance requirements for
   WANs.  The high performance data transmission should provide the
   advantages of low latency, high throughput and low CPU utilization,
   which can significantly improve the performance and efficiency of the
   intra-DC and DC interconnection network.  At present, the tests and
   deployments of long-distance, high-performance data transmission have
   been carried out among the operators WAN, cloud service providers DC
   interconnection network and research institutions private network.
   However, there are still challenges in providing high performance in
   long-distance and wide area networks deployment:

   *  the high utilization and high throughput capabilities for long-
      distance links;

   *  the efficient congestion control mechanisms to avoid packet loss;

   *  fair sharing of link bandwidth resources among multiple concurrent
      applications;

   *  the packet ACK delay increases exponentially with distance, which
      will be challenging for high-performance applications, especially
      distributed processing models.

Zhao & Xiong              Expires 24 April 2025                 [Page 2]
Internet-Draft   Scenarios and Deployment Considerations    October 2024

   This document describes the typical scenarios and deployment
   considerations for High Performance Wide Area Networks (HP-WANs).  It
   also provides simulation results for data transmission in WANs and
   analyses the impacts on throughput.

1.1.  Requirements Language

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
   "OPTIONAL" in this document are to be interpreted as described in BCP
   14 [RFC2119] [RFC8174] when, and only when, they appear in all
   capitals, as shown here.

2.  Terminology

   The terminology is defined as [I-D.xiong-hpwan-uc-req-problem].

3.  Typical Scenarios for HP-WANs

   According to different transmission distances and deployment
   requirements, the high-throughput transmission includes two types of
   scenarios: high volume data transmission over thousands of kilometers
   in WANs and the collaborative data transmission over hundreds of
   kilometers in MANs.

3.1.  Long-distance Data Transmission

   There are two types of scenarios: massive research data transmission
   between HPCs and data transmission of training samples between the
   DCs for AI.  The long-distance data transmission scenario is shown in
   Figure 1, where the data flows are transmitted between two sites or
   DCs, with a location distance ranging from 100km to 1000km.

                             +---100km~1000km---+
                             |                  |
             +--------+      |                  |      +--------+
             | Host A |------+       WAN        +------| Host B |
             +--------+      |                  |      +--------+
              Site/DC        |                  |       Site/DC
                             +------------------+

            Figure 1: Long-distance Data Transmission over WANs

Zhao & Xiong              Expires 24 April 2025                 [Page 3]
Internet-Draft   Scenarios and Deployment Considerations    October 2024

   Massive research data transmission between HPCs: The scenario of
   thousands of kilometers of big data migration mainly refers to the
   high-throughput transmission of massive data between scientific
   research institutions.  At present, research institutions in some
   countries, such as the US ESnet6 and the EU EuroHPC program, are
   deploying wide area high-performance networks to support the
   construction and operation of high-performance computing and data
   interconnection infrastructure.  In this scenario, data transmission
   is usually carried out regularly or in demand, with each transmission
   ranging from a few terabytes to several hundred terabytes, Data
   transmission costs and security is required to balance.

   Data transmission of training samples between the DCs for AI: The
   construction of the large-scale DC for AI is limited by energy and
   land resources.  Allocating training tasks to data centers with lower
   computing power and electricity prices has become a cost-effective
   option.  When the distance between data DCs over 1000km, a wide area
   high-performance network is required to transmit high-throughput
   training samples and corpus data.  Usually, training large models in
   the billions or trillions tokens requires several hundred terabytes
   to over P of corpus data, with a large amount of data transmission
   per session, which places high demands on transmission throughput and
   stability.

3.2.  Collaborative and Interactive Data Transmission

   There are two types of scenarios: data transmission between storage
   and computing separation data centers and high-throughput data
   transmission between DCs under distributed intelligent computing.
   The collaborative and interactive data transmission scenario is shown
   in Figure 2, where data flows are transmitted between two or more
   DCs, with a location distance ranging from 80km to 100km.

                  +-------------80km~100km-----------------+
                  |                                        |
             +----+----+                              +----+----+
             | Core DC |                              | Core DC |
             +----+----+            MAN               +----+----+
                  |                                        |
                  |                                        |
             +----+----+          +---------+         +----+----+
             | Edge DC +----------+ Edge DC +---------+ Edge DC |
             +---------+          +---------+         +---------+

    Figure 2: Collaborative and Interactive Data Transmission over MANs

Zhao & Xiong              Expires 24 April 2025                 [Page 4]
Internet-Draft   Scenarios and Deployment Considerations    October 2024

   Storage and computing separation scenario: the cloud services
   providers deploy multiple data center with storage and intelligent
   computing devices deployed separately in MAN (under 100km).  By
   extending the high-performance transmission technology used within
   the original DC to across data centers, the DC cluster with the
   separated storage and computing is constructed.  In 2023, Amazon has
   implemented a Storage and computing separation data center for high-
   throughput data transmission on the MAN with a speed of 100Gbps and
   100 kilometers.  In addition, the training sample of customers in
   industries such as government and finance is "sensitive data", and
   the consequences of data leakage are very serious.  The sample data
   needs to be storage in the customer's private DC and connected to the
   cloud service provider's DC for AI through a wide area high-
   performance network.

   Distributed coordination reasoning scenario: in order to improve the
   user experience of computing services, the architecture with
   centralized training and distributed reasoning is deployed.  The
   training is carried out at core computing nodes that are far away
   from the user, the inference is respond to the user at distributed
   edge nodes with closer distance, shorter latency, and better
   experience.  Local sample data needs to be transmitted back between
   the core and edge DCs through a high-performance MAN to fine tune and
   optimize the trained model.  In addition, user inference requests and
   response data require low latency transmission.

4.  Deployment Considerations for HP-WANs

4.1.  Host Optimization Deployment

   The host optimization deployment mainly adopts the improved transport
   layer protocol on the NIC of host server to achieve long-distance and
   efficient transmission based on lossy networks.  The optimization of
   the transport layer protocol may involve caching and resembling for
   out of order packages, packet loss tolerant and error correction
   mechanism based on lossy network, etc.  The host optimization
   deployment is as Figure 3 shown.

Zhao & Xiong              Expires 24 April 2025                 [Page 5]
Internet-Draft   Scenarios and Deployment Considerations    October 2024

         +--------------+      +---------------+      +--------------+
         |              |      |               |      |              |
    +----+----+         |      |      WAN      |      |         +----+----+
    | Host A  |         +------+     (lossy)   +------+         | Host B  |
    +----+----+         |      |               |      |         +----+----+
         |DCN or        |      |               |      |DCN or        |
         |dedicated line|      |               |      |dedicated line|
         +--------------+      +---------------+      +--------------+
   The NIC with transport                             The NIC with transport
   protocol optimization                              protocol optimization

         Figure 3: Host Optimization Deployment Consideration

4.2.  WAN optimization Deployment

   The WAN optimize the performance of packet loss, bandwidth
   utilization, and latency to provide high-throughput data transmission
   between DCs.  The optimization of wide area networks may involve path
   selection, congestion control and flow control etc.  The
   deterministic forwarding may also reduce the packet loss ratio,
   latency, and jitter in wide area networks.  The WAN optimization
   deployment is as Figure 4 shown.

      +--------------+      +------------------+      +--------------+
      |              |      |                  |      |              |
 +----+----+         |      |      WAN         |      |         +----+----+
 | Host A  |         +------+(High performance)+------+         | Host B  |
 +----+----+         |      |                  |      |         +----+----+
      |DCN or        |      |                  |      |DCN or        |
      |dedicated line|      |                  |      |dedicated line|
      +--------------+      +------------------+      +--------------+
                             The optimization of
                             packet loss, bandwidth
                             utilization, and latency
                             in WAN

         Figure 4: Host Optimization Deployment Consideration

4.3.  Gateway Deployment

   The solution requires the deployment of gateway devices at the DC
   edge to isolate or relay traffic within the data center and wide area
   network.  The gateway devices should support high-performance
   services packet caching, buffering, and retransmission, and implement
   The collaboration and Interaction between gateway and WAN through
   running optimized high-performance transport layer protocols,

Zhao & Xiong              Expires 24 April 2025                 [Page 6]
Internet-Draft   Scenarios and Deployment Considerations    October 2024

   including high-performance services intelligence sensitive, routing
   selection and congestion control.  In addition, the gateway also
   needs to have mapping and conversion of different high-performance
   protocols running in the data center and WAN.  The gateway deployment
   is as Figure 5 shown.

                             +-------------+
 +---------+   +---------+   |             |   +---------+   +---------+
 | Host A  +---+ Gateway +---+   WAN       +---+ Gateway +---+ Host B  |
 +---------+   +---------+   |  (Lossy)    |   +---------+   +---------+
                             +-------------+

               Figure 5: Gateway Deployment Consideration

5.  Simulation Results

5.1.  The Impact of Long-distance Delay

   Based on the current implementation over 100km, the selection of
   delay parameters in this experiment is mainly aimed at wide area
   scenarios of 100~2000 km, with round trip time (RTT) of 1-20ms.  In
   terms of parameter selection, this experiment is based on the
   superposition verification from 100km (1ms delay) to 2000km (20ms
   delay).

   The impact of long-distance delay on throughput is shown as Figure 6.

 +-------------+--------------------+---------------+--------------------+
 |RTT latency  |message length(byte)|  distance     |Throughput(Gbps)    |
 +-------------+--------------------+---------------+--------------------+
 |less than 1ms|less than 1024      |less than 100km|more than90%@100Gbps|
 +-------------+--------------------+---------------+--------------------+
 |     1ms     |   256K             |     100km     |more than90%@100Gbps|
 +-------------+--------------------+---------------+--------------------+
 |     2ms     |   512K             |     200km     |more than90%@100Gbps|
 +-------------+--------------------+---------------+--------------------+
 |     5ms     |   1M               |     500km     |more than90%@100Gbps|
 +-------------+--------------------+---------------+--------------------+
 |     10ms    |   8M               |     1000km    |more than90%@100Gbps|
 +-------------+--------------------+---------------+--------------------+

      Figure 6: The Impact of Long-distance Delay on Throughput

Zhao & Xiong              Expires 24 April 2025                 [Page 7]
Internet-Draft   Scenarios and Deployment Considerations    October 2024

   The transmission performance of RDMA in different network
   environments is Verified.  The impact of long distance and latency on
   throughput performance is shown in Table 1.  As latency increases
   (1~20ms), the RDMA message size needs to be continuously increased to
   achieve high-performance transmission with 100% throughput.  Due to
   the maximum message length of 2GB, a bandwidth of 100Gbit/s can be
   achieved without loss, satisfying the throughput theoretical
   calculation equation.

   Throughput = Window_Size/RTT (1)

   The overall analysis shows that by adjusting RDMA parameters (such as
   message length), high-performance transmission of 1000km (with over
   90% throughput) can be achieved; The message length setting is
   actually related to the specific network application, device cache
   space, and cache threshold settings, and the increase of message
   length is unlimited.

5.2.  The Impact of Packet Loss

   The traditional RDMA adopts the Go-Back-N retransmission mechanism,
   which retransmits all data packets after the dropped data packet N.
   Loss of packets can cause significant performance degradation in
   RDMA.  However, TCP only needs to retransmit lost individual packets,
   and the latest RDMA network cards have started using selective
   repeat.  Therefore, the calculation formulas for TCP packet loss rate
   (p), message size (MSS), latency (RTT) and bandwidth capacity (C) can
   be referred to:

   Throughput = Min{MSS/RTT*C*(1/p)} (2)

   The actual testing performance of RDMA differs from that of TCP, and
   the main impact of wide area networks is latency, with retransmission
   and congestion control algorithm models being similar.  Therefore,
   the theoretical rate of RDMA is empirically judged by adjusting the
   value of parameter C in equation (2).  (TCP empirical value C = 1.0)

   When both bigger delay and packet loss coexist and over 80%
   throughput of a 100G link, the packet loss rate in the data center
   must be less than 0.005%; In the scenario of wide area
   interconnection in DCs, due to the increase in retransmission cost
   and response time caused by propagation link delay, the packet loss
   threshold is more strict and harsh in the data center, requiring the
   network to achieve lossless as much as possible.  In a wide area
   scenario, even with the optimization algorithm of selective
   retransmission, it is difficult to achieve a bandwidth utilization
   rate of over 70% when the packet loss rate is less than 0.001%.

Zhao & Xiong              Expires 24 April 2025                 [Page 8]
Internet-Draft   Scenarios and Deployment Considerations    October 2024

   In general, the network performance indicators for RDMA over a wide
   area of 1000 kilometers are as follows: the throughput of RDMA over a
   wide area is directly proportional to the length of message size, and
   inversely proportional to the network packet loss rate and latency.
   To ensure 80% throughput of links over 100Gbps and 1000 kilometers,
   the message length needs to be greater than 512KB, resulting in
   extremely strict packet loss rate indicators due to increased
   latency.

6.  Security Considerations

   TBA

7.  IANA Considerations

   This document makes no requests for IANA action.

8.  Acknowledgements

   TBA

9.  References

9.1.  Normative References

   [I-D.xiong-hpwan-uc-req-problem]
              Xiong, Q., Yao, K., Huang, C., Zhengxin, H., and J. Zhao,
              "Use Cases, Requirements and Problems for High Performance
              Wide Area Network", Work in Progress, Internet-Draft,
              draft-xiong-hpwan-uc-req-problem-00, 12 October 2024,
              <https://datatracker.ietf.org/doc/html/draft-xiong-hpwan-
              uc-req-problem-00>.

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119,
              DOI 10.17487/RFC2119, March 1997,
              <https://www.rfc-editor.org/info/rfc2119>.

   [RFC3168]  Ramakrishnan, K., Floyd, S., and D. Black, "The Addition
              of Explicit Congestion Notification (ECN) to IP",
              RFC 3168, DOI 10.17487/RFC3168, September 2001,
              <https://www.rfc-editor.org/info/rfc3168>.

   [RFC7424]  Krishnan, R., Yong, L., Ghanwani, A., So, N., and B.
              Khasnabish, "Mechanisms for Optimizing Link Aggregation
              Group (LAG) and Equal-Cost Multipath (ECMP) Component Link
              Utilization in Networks", RFC 7424, DOI 10.17487/RFC7424,
              January 2015, <https://www.rfc-editor.org/info/rfc7424>.

Zhao & Xiong              Expires 24 April 2025                 [Page 9]
Internet-Draft   Scenarios and Deployment Considerations    October 2024

   [RFC8174]  Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC
              2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174,
              May 2017, <https://www.rfc-editor.org/info/rfc8174>.

   [RFC8664]  Sivabalan, S., Filsfils, C., Tantsura, J., Henderickx, W.,
              and J. Hardwick, "Path Computation Element Communication
              Protocol (PCEP) Extensions for Segment Routing", RFC 8664,
              DOI 10.17487/RFC8664, December 2019,
              <https://www.rfc-editor.org/info/rfc8664>.

   [RFC9232]  Song, H., Qin, F., Martinez-Julia, P., Ciavaglia, L., and
              A. Wang, "Network Telemetry Framework", RFC 9232,
              DOI 10.17487/RFC9232, May 2022,
              <https://www.rfc-editor.org/info/rfc9232>.

   [RFC9438]  Xu, L., Ha, S., Rhee, I., Goel, V., and L. Eggert, Ed.,
              "CUBIC for Fast and Long-Distance Networks", RFC 9438,
              DOI 10.17487/RFC9438, August 2023,
              <https://www.rfc-editor.org/info/rfc9438>.

Authors' Addresses

   Junfeng Zhao
   CAICT
   Beijing
   China
   Email: zhaojunfeng@caict.ac.cn

   Quan Xiong
   ZTE Corporation
   China
   Email: xiong.quan@zte.com.cn

Zhao & Xiong              Expires 24 April 2025                [Page 10]