Skip to main content

Use Cases and Requirements for Implementing Lossless Techniques in Wide Area Networks
draft-huang-rtgwg-wan-lossless-uc-00

Document Type Active Internet-Draft (individual)
Authors Hongyi Huang , Tao He , Tianran Zhou
Last updated 2024-03-03
RFC stream (None)
Intended RFC status (None)
Formats
Stream Stream state (No stream defined)
Consensus boilerplate Unknown
RFC Editor Note (None)
IESG IESG state I-D Exists
Telechat date (None)
Responsible AD (None)
Send notices to (None)
draft-huang-rtgwg-wan-lossless-uc-00
Networking                                                 H. Huang, Ed.
Internet-Draft                                                    Huawei
Intended status: Standards Track                                   T. He
Expires: 5 September 2024                                   China Unicom
                                                                 T. Zhou
                                                                  Huawei
                                                            4 March 2024

Use Cases and Requirements for Implementing Lossless Techniques in Wide
                             Area Networks
                  draft-huang-rtgwg-wan-lossless-uc-00

Abstract

   This document outlines the use cases and requirements for
   implementing lossless data transmission techniques in Wide Area
   Networks (WANs), motivated by the increasing demand for high-
   bandwidth and reliable data transport in applications such as high-
   performance computing (HPC), genetic sequencing, and audio/video
   production.  The challenges associated with existing data transport
   protocols in WAN environments are discussed, along with the proposal
   of requirements for enhancing lossless transmission capabilities to
   support emerging data-intensive applications.

Status of This Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at https://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on 5 September 2024.

Copyright Notice

   Copyright (c) 2024 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

Huang, et al.           Expires 5 September 2024                [Page 1]
Internet-Draft   Lossless WAN Use Cases and Requirements      March 2024

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents (https://trustee.ietf.org/
   license-info) in effect on the date of publication of this document.
   Please review these documents carefully, as they describe your rights
   and restrictions with respect to this document.  Code Components
   extracted from this document must include Revised BSD License text as
   described in Section 4.e of the Trust Legal Provisions and are
   provided without warranty as described in the Revised BSD License.

Table of Contents

   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   2
   2.  Use Cases . . . . . . . . . . . . . . . . . . . . . . . . . .   3
     2.1.  High-Performance Computing (HPC) Services for Scientific
           Research  . . . . . . . . . . . . . . . . . . . . . . . .   3
     2.2.  Rapid Transmission Services for Genetic Sequencing for
           Timely Medical Services . . . . . . . . . . . . . . . . .   4
     2.3.  Stable Transmission Services for Large-Scale Audio/Video
           Data Migration  . . . . . . . . . . . . . . . . . . . . .   4
   3.  Problem Analysis and Goal . . . . . . . . . . . . . . . . . .   4
     3.1.  Problem Analysis  . . . . . . . . . . . . . . . . . . . .   4
       3.1.1.  Impact of Packet Loss . . . . . . . . . . . . . . . .   5
     3.2.  Goal  . . . . . . . . . . . . . . . . . . . . . . . . . .   6
   4.  Challenges and Requirements . . . . . . . . . . . . . . . . .   6
   5.  Security Considerations . . . . . . . . . . . . . . . . . . .   7
   6.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .   7
   7.  Informative References  . . . . . . . . . . . . . . . . . . .   7
   Appendix A.  Appendix-title . . . . . . . . . . . . . . . . . . .   8
     A.1.  Appendix-subtitle . . . . . . . . . . . . . . . . . . . .   8
   Acknowledgements  . . . . . . . . . . . . . . . . . . . . . . . .   8
   Contributors  . . . . . . . . . . . . . . . . . . . . . . . . . .   8
   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .   8

1.  Introduction

   The big data is the very foundation of innovation across numerous
   fields.  From high-performance computing (HPC) in scientific research
   to the latest advancements in genetic sequencing and the production
   of high-definition multimedia content, the need for rapid, reliable,
   and lossless data transmission across wide area networks (WANs) has
   never been more critical.  Traditional network protocols, designed in
   an era before these immense data demands, struggle to keep up,
   particularly when it comes to ensuring zero data loss over long
   distances.

   This document focuses on the pressing need for lossless data
   transmission techniques in WANs, driven by the requirements of data-
   intensive applications that form the backbone of scientific, medical,

Huang, et al.           Expires 5 September 2024                [Page 2]
Internet-Draft   Lossless WAN Use Cases and Requirements      March 2024

   and creative industries.  For example, the Energy Sciences Network
   (ESnet) [ESnet] supports vast amounts of scientific data movement
   that underpin groundbreaking research.  Similarly, in the healthcare
   sector, the explosion of data from genetic sequencing calls for
   unprecedented levels of data transmission reliability and efficiency.
   The media and entertainment industry also faces challenges in moving
   large volumes of raw content with stable network instead of manual
   tranportation of physical storage.

   These scenarios underscore a growing disconnect between the
   capabilities of existing WAN protocols and the evolving demands of
   modern applications.  The challenges of ensuring zero-loss
   transmission in an infrastructure not originally designed for such
   demands highlight the need for new solutions.

   This document aims to shed light on the necessity for advanced
   lossless transmission technologies in WANs.  By identifying the
   limitations of current network protocols and outlining the
   requirements for new developments, we hope to pave the way for a new
   generation of WANs.  These networks will not only meet the current
   demands of data-intensive applications but will also support the next
   wave of digital innovation.

2.  Use Cases

   The necessity for implementing lossless data transmission techniques
   in Wide Area Networks (WANs) is underscored by several critical
   application areas.  These use cases highlight the imperative for
   reliable, high-speed data transfer capabilities to support the
   demanding requirements of modern data-intensive operations.

2.1.  High-Performance Computing (HPC) Services for Scientific Research

   High-Performance Computing (HPC) services are fundamental to
   scientific advancements, where collaborative efforts across various
   geographical regions are commonplace.  For instance, the study of
   PSII proteins, which are crucial for understanding how water
   molecules split to produce oxygen, generates between 30 to 120 high-
   resolution images per second during experiments.  This results in
   60-100 GB of data every five minutes, necessitating rapid and
   lossless data transfer from the National Renewable Energy
   Laboratory's equipment back to analysis labs such as the Lawrence
   Berkeley National Laboratory.  The efficiency and reliability of WANs
   in this context are not just beneficial but essential for
   facilitating the seamless collaboration between scientists in
   different domains, enabling them to share and analyze large datasets
   effectively.

Huang, et al.           Expires 5 September 2024                [Page 3]
Internet-Draft   Lossless WAN Use Cases and Requirements      March 2024

2.2.  Rapid Transmission Services for Genetic Sequencing for Timely
      Medical Services

   The field of genetic sequencing has seen exponential growth, driven
   by the decreasing costs and widespread application of sequencing
   technologies.  This growth is matched by the burgeoning data volumes
   generated, which require efficient and lossless transmission to cloud
   or private data centers for analysis.  For example, sequencing a
   single human genome produces 100GB to 200GB of data.  With daily data
   production rates reaching 6TB to 12TB and annual data management
   needs surpassing 1.6PB, the demand for high-speed, reliable data
   transfer is evident.  The existing network transfer efficiencies
   present significant bottlenecks, extending the turnaround times for
   sequencing services and impacting the timely delivery of precision
   medicine.

2.3.  Stable Transmission Services for Large-Scale Audio/Video Data
      Migration

   The competitive landscape of the audio and video industry, coupled
   with the shift towards cloud-based post-production processes,
   necessitates the transfer of large volumes of raw footage across
   WANs.  Traditional methods of data transportation, involving physical
   media and manual transfer, are not only time-consuming but also
   inefficient.  For instance, film crews generating 2TB of data daily
   resort to physically moving storage media to processing locations, a
   process that significantly delays the production cycle and weakens
   market responsiveness.  The requirement for a network infrastructure
   capable of handling such extensive data transfers quickly and without
   loss is critical for maintaining the pace of production and ensuring
   the quality of the final multimedia content.

3.  Problem Analysis and Goal

3.1.  Problem Analysis

   The primary objective in the realm of Wide Area Networks (WANs) is to
   provide long-term, stable, and high-capacity network services that
   can accommodate the sudden surges in data transmission demands,
   essential for data migration across diverse geographical locations.
   This goal is predicated on leveraging the inherent statistical
   multiplexing advantage of IP networks, which allows for cost-
   effective bandwidth allocation and enhanced overall network
   throughput.  The ability to meet these data transmission requirements
   efficiently is crucial for supporting the backbone of today’s data-
   driven applications, ranging from scientific research to global
   financial transactions and multimedia content delivery.

Huang, et al.           Expires 5 September 2024                [Page 4]
Internet-Draft   Lossless WAN Use Cases and Requirements      March 2024

   Despite the advantages of statistical multiplexing in IP networks,
   such as cost reduction and throughput optimization, this model
   introduces significant challenges in ensuring absolute resource
   guarantee and, consequently, zero packet loss.  The practice of
   overprovisioning bandwidth, common among service providers, does not
   equate to lossless data transmission, which is a critical shortfall
   when compared to dedicated light networks or resources with hard
   isolation.

3.1.1.  Impact of Packet Loss

   In the scenarios outlined for data migration—whether for high-
   performance computing services, genetic sequencing, or audio/video
   data migration—the reliance on traditional transmission protocols
   like TCP or RDMA [RoCEv2] is common.  However, both protocols are
   adversely affected by packet loss, especially over long-haul
   transmissions.

   For TCP, algorithms such as CUBIC, a loss-based congestion control
   mechanism, see a dramatic throughput decline of up to 89.9% with just
   a 2% packet loss when the Round-Trip Time (RTT) is 30ms.  BBR,
   another TCP congestion control that bases on bandwidth and delay,
   also suffers significantly when packet loss exceeds 5%, with
   throughput plummeting in scenarios where packet loss reaches 20%. The
   cost of retransmissions in these conditions is notably high, with
   slight packet loss (<1%) scenarios showing a retransmission rate 6-10
   times higher than CUBIC, and in severe packet loss scenarios, the
   rate can increase exponentially.

   RDMA, often used within data centers for inter-node data access over
   UDP, relies on a goBackN retransmission mechanism.  Its throughput
   dramatically decreases with packet loss rates greater than 0.1%, and
   a 2% packet loss rate effectively reduces throughput to zero.  To
   maintain unaffected throughput, the packet loss rate must be kept
   below one in a hundred thousand.

   These challenges underscore a critical gap in the current
   capabilities of IP networks to support the demanding requirements of
   modern, data-intensive applications.  The inability to ensure zero
   packet loss across WANs not only impacts application performance but
   also limits the potential for innovation and collaboration across key
   sectors reliant on rapid and reliable data transmission.

Huang, et al.           Expires 5 September 2024                [Page 5]
Internet-Draft   Lossless WAN Use Cases and Requirements      March 2024

3.2.  Goal

   The overarching goal in the evolution of Wide Area Networks (WANs) to
   serve the afore-mentioned use cases is to enable lossless, zero-
   packet-loss transmission services tailored for the seamless migration
   of data across different geographical areas.  In an age where digital
   data's volume, velocity, and variety are expanding exponentially,
   ensuring the lossless transmission of this data during inter-regional
   migration activities becomes indispensable.  This is critically
   important for applications and operations that rely on the integrity
   and timeliness of data, such as AI/HPC computing and data backup and
   recovery.

4.  Challenges and Requirements

   The quest for lossless data transmission in Wide Area Networks (WANs)
   is confronted with significant challenges, notably the phenomenon of
   elephant flows—large, bursty data transfers that can cause
   instantaneous congestion and packet loss within network device
   queues.  This not only increases application latency but also
   diminishes throughput, adversely affecting application performance.
   In data centers, certain lossless technologies are deployed to
   enhance the performance of such applications:

   *  *Priority-based Flow Control (PFC)*: Widely adopted for its
      ability to manage traffic flow, PFC [PFC] works by halting the
      transmission of specific queues when downstream congestion is
      detected, thereby achieving zero packet loss.  The foundational
      flow control mechanism, defined by IEEE 802, involves sending a
      pause frame from a receiving device to a sending device to
      temporarily halt traffic, allowing time for congestion to clear
      before resuming transmission.

   *  *Explicit Congestion Notification (ECN) with Data Center Quantized
      Congestion Notification (DCQCN)*: DCQCN [DCQCN], the most
      extensively used congestion control algorithm in RDMA networks,
      requires network devices to support ECN functionality [RFC3168],
      with other protocol functionalities implemented on the network
      card of the host machine.  DCQCN ensures high throughput in RDMA
      networks needing zero packet loss by signaling congestion through
      ECN markers sent from congested nodes to the sender, prompting a
      reduction in sending rate.

   However, the application of these data center-oriented lossless
   techniques to WANs encounters obstacles due to the larger scale and
   longer RTTs inherent in WAN environments.  Challenges and
   corresponding requirements arise such as:

Huang, et al.           Expires 5 September 2024                [Page 6]
Internet-Draft   Lossless WAN Use Cases and Requirements      March 2024

   *  *Backpressure from PFC*: The widespread application of PFC in
      large-scale networks can lead to head-of-line blocking, deadlocks,
      and congestion spreading, which degrade network throughput.  Such
      challenges make the traditional PFC backpressure mechanisms poorly
      suited for the high stability demands of WANs, necessitating
      innovation in protocol design to alleviate issues like deadlocks
      and PFC storms. *Requirement 1*: Innovate and improve upon the PFC
      backpressure mechanism for WANs, addressing and mitigating the
      risk of deadlocks and congestion spreading to ensure stable and
      lossless data transmission.

   *  *ECN-Based Congestion Control Limitations*: While ECN facilitates
      sender rate control through network collaboration, its
      effectiveness diminishes over longer distances typical of WANs.
      The delayed congestion notifications result in prolonged control
      loops, making it challenging to quickly alleviate congestion.
      *Requirement 2*: Optimize the ECN control loop for WANs, enhancing
      the network's ability to manage congestion through improved
      routing and control strategies, thereby ensuring efficient and
      lossless transmission across vast geographical distances.

   These challenges underscore the need for tailored solutions that
   address the unique demands and conditions of WANs.  By adapting and
   innovating on existing lossless transmission technologies from data
   center networks, the goal of achieving zero packet loss in WANs
   becomes attainable, paving the way for enhanced data mobility and
   application performance.

5.  Security Considerations

   TBD.

6.  IANA Considerations

   TBD.

7.  Informative References

   [RFC3168]  Ramakrishnan, K., Floyd, S., and D. Black, "The Addition
              of Explicit Congestion Notification (ECN) to IP",
              RFC 3168, DOI 10.17487/RFC3168, September 2001,
              <https://www.rfc-editor.org/rfc/rfc3168>.

   [RoCEv2]   "Supplement to InfiniBand architecture specification
              volume 1 release 1.2.2 annex A17 - RoCEv2 (IP routable
              RoCE).", n.d..

Huang, et al.           Expires 5 September 2024                [Page 7]
Internet-Draft   Lossless WAN Use Cases and Requirements      March 2024

   [DCQCN]    et.al., Y. Z., "Congestion Control for Large-Scale RDMA
              Deployments", August 2015,
              <https://conferences.sigcomm.org/sigcomm/2015/pdf/papers/
              p523.pdf>.

   [PFC]      "IEEE Standard for Local and metropolitan area networks--
              Media Access Control (MAC) Bridges and Virtual Bridged
              Local Area Networks--Amendment 17- Priority-based Flow
              Control", n.d..

   [ESnet]    "Energy Sciences Networks", n.d..

Appendix A.  Appendix-title

A.1.  Appendix-subtitle

Acknowledgements

   TBD.

Contributors

   TBD.

Authors' Addresses

   Hongyi Huang (editor)
   Huawei
   Beijing
   China
   Email: hongyi.huang@huawei.com

   Tao He
   China Unicom
   Email: het21@chinaunicom.cn

   Tianran Zhou
   Huawei
   Email: zhoutianran@huawei.com

Huang, et al.           Expires 5 September 2024                [Page 8]