Internet-Draft | transport challenges | September 2023 |
Huang, et al. | Expires 15 March 2024 | [Page] |
- Workgroup:
- tsvwg
- Internet-Draft:
- draft-huang-tsvwg-transport-challenges-00
- Published:
- Intended Status:
- Informational
- Expires:
The Challenges that Current Service Transports are Facing
Abstract
This document discusses the challenges for improving the transmission quality when lack of information between network and application, and then provide some basic requirements that new synergy mechanisms should possess.¶
Status of This Memo
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.¶
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.¶
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."¶
This Internet-Draft will expire on 15 March 2024.¶
Copyright Notice
Copyright (c) 2023 IETF Trust and the persons identified as the document authors. All rights reserved.¶
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License.¶
1. Introduction
Currently, the Internet transport protocols are evolving rapidly. On one hand, this is due to the consideration of user privacy and security that drives the transport protocol evolution towards built-in encryption; On the other hand, TCP ossification caused by excessive intervention of intermediate devices is also frustrating the industry, and then e2e built-in encryption becomes the most popular design of new transport protocols. However, network and transport are not independent nor unrelated; they are closely rely on each other to work, thus there must have some synergy mechanisms between them to help the transmission work better. In the past, transport protocols like TCP enable the collaboration between network and application through plaintext message headers. But now, this is no longer possible in increasingly popular secure transport protocols like QUIC, and the industry urgently needs a new way to achieve this synergy.¶
This document discusses the challenges for improving the transmission quality when lack of information between network and application, and then provide some basic requirements that new synergy mechanisms should possess.¶
2. Conventions and Definitions
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.¶
3. Challenges of Improving transmission quality in WAN
3.1. Network Undifferentiated Scheduling
DSCP is designed to ensure Quality of Service (QoS) for transmission in network by encoding the 6 bits in the header of an IP packet to classify service categories and achieve differentiated services. However, as the variety of Internet applications continues to increase, current differentiating services become coarse granularity, e.g., internet traffic is all treated as Best Effort, and network devices are unable to obtain effective and legitimate application information to forward the internet traffic appropriately with quality. For instance, service specific bandwidth, latency, or jitter requirements cannot be adequately met, resulting in relative poor end user experience. This is also pointed out in [I-D.kaippallimalil-tsvwg-media-hdr-wireless]. Even though DSCP is implemented in the real deployments agreed among service provider and ISPs, the benefit is quite limited due to the lack of information density. For example, the specific traffic paying for the good quality service still cannot get a satisfied improvement of quality during the busy hours.¶
At another point, network undifferentiated secheduling also affects some network functions to be fully utilized. An example would be the usage of CoMP (Coordinated Multipoint transmission/reception) in LTE scenarios, which is used to manage interference effect through collaborative processing among different cells or base stations, thereby improving network efficiency. In our experience of the intra-eNB deployment, if additional service level information, such as desired completion time and start/end signals, is provided, the CoMP success rate can be greatly improved and so does the network's goodput.¶
3.2. Heuristic Network Conditions
Application transmissions rely on network, thus network conditions greatly affect applications performance. Because current application and network are loosely cooperated and little information is shared, applications can only passively make speculative adjustments through end-to-end feedback. Such adjustments not only lack precision but also have lagged effect. This is discussed in following sections.¶
3.2.1. Slow Start
Current transport protocols increase the sending packets gradually through slow start, usually starting with a small initial window of around 10, to avoid injecting too much packets into the network. This has been effectively preventing network from collapse for decades. This also means bandwidth utilization is low during the slow start phase. It becomes significant with the widespread adoption of technologies such as 5G and fiber-to-the-home (FTTH). Particularly, when the network's BDP (Bandwidth Delay Product) increases, the duration of slow start becomes longer, resulting in poor transmission efficiency for show flows. In the test reports of [_5G], it is mentioned that BBR slow start phase lasts around 6s before it converges to the high network bandwidth in 5G mobile web browsing scenario. In [flash], it is highlighted that with a flow duration of 1 s (which transferred over 1 MB of data), the bandwidth efficiencies for Cubic and BBR were only 53% and 48%. This significantly impacts the transmission quality of short flow applications, e.g., mobile app dowload/update, cloud album, or first page loading of apps.¶
3.2.2. Bandwidth and RTT Probing
Current congestion control algorithms often rely on E2E feedback to infer the network state and adjust packet transmissions accordingly. However, in the case of RTT is relatively large, which is quite common in WAN scenarios, the increased transmission time in the network results in longer E2E feedback cycles, and the feedback signals may not reach the sender in a timely manner. In such situations, the sender is unable to accurately perceive congestion and make timely adjustments, leading to lower effective throughput in wide area and long-distance networks. Therefore, the effectiveness of performance adjustments may be adversely affected in these circumstances.¶
We conducted tests on the throughput performance of BBR and CUBIC under different network conditions, including 64 concurrent traffic, 2 Gbps link capacity, and varying levels of latency and packet loss. Under the scenario of a 5ms latency and a 0.01% packet loss rate, the total throughput of CUBIC has already dropped to less than 10% of the total bandwidth. BBR showed a significant enhancement in this scenario, achieving a throughput of over 50% even with a 5ms latency and a 0.1% packet loss rate. However, as the latency increased to 10ms, the throughput of BBR decreased to only about 30%, and further decreased to around 20% with a 15ms latency. It is evident that BBR improves overall throughput performance, it fails to fully utilize the available network resources as latency increases.¶
The problem is particularly prominent in heterogeneous environment, e.g., traffic aggregating across both data centers and WAN. The internal delay within the data center is short and allows for quick convergence, resulting in significant dynamic changes in bandwidth. On the other hand, the WAN side has a longer feedback period and slower convergence, making it challenging to accurately predict the bandwidth situation. As a result, the overall network resources and performance cannot be well balanced. This is also discussed in [Annulus] and [Cross-Datacenter].¶
3.2.3. Multiple Path
As network coverage and diversity continue to improve, wide-area multipath application becomes a trend. 3GPP has already introduced Access Traffic Streering, Switching, and Splitting (ATSSS), which is one of the prevalent use case of network-assisted multipath transport. However, practical multiple path deployments often face the coexistence of high-quality and low-quality links, with different lost rate and RTT for different disjoint paths. Relying solely on e2e path congestion control to guess the network condition on each path, especially for highly dynamic wireless networks, can easily lead to traffic scheduling instability and suboptimal regimes. Quantifying the network behaviour precisely and taking advantage of it in multiple path mechanisms can be a way to achieve fast convergence and better experience.¶
3.3. Heterogeneous Environment
Real networks have feature of segmented heterogeneity, such as the potential mutual influence between WAN-side traffic and data center internal traffic, or a traffic could go through comparatively less stable wireless segment inside enterprise or home broadband scenarios and stable fixed cable segment in WAN. However, one single set of parameters, or even one single congestion control algorithm cannot achieve optimized performance in such a complex enviroment. For instance, as the tests in [Pantheon], BBR can handle scenarios with random packet loss like 5G and Wi-Fi, but its throughput may not be as good as cubic in other situations, while cubic's throughput is poor in scenarios with random packet loss; In a multipath scenario, due to the dynamic and diverse nature of different paths, a fixed set of algorithm parameters may not achieve optimal performance. Currently, the work in IETF is mainly limited to idealized scenarios only relying on e2e feedback which has been used for decades, and has not extensively considered new ways, e.g., adaptive solutions for transport protocols, when traversing heterogeneous networks. And these new ways may require a good collaboration and information exchange between network and endpoints.¶
4. Requirements for Synergy Mechanisms between Network and Endpoint
In conclusion, the improvement of transmission quality should not solely rely on passive heuristic network conditions at the endpoints. Further enhancement should involve the synergy between the network and the client side. Several requirements for this collaborative mechanism are listed as following:¶
- There should have 2 kinds of collaborations: one for host to network, the other for network to host. Either one mechanism for each or one mechanism for both.¶
- The mechanisms should enhance the corresponding transmission quality and end-user experience, rather than deteriorating them.¶
- The mechanisms must be secure and trustworthy, preventing malicious attacks and tampering.¶
- The mechanisms must effectively prevent deception and abuse.¶
- The mechanisms should not cause transport protocols to become ossification. Specifically, the information transmitted through the collaborative mechanism should be incremental and referential, instead of decisive or heavily dependent. Its presence should result in a better experience, while the absence SHOULD NOT degrade the experience compared to current situation.¶
5. Existing Mechanisms
ECN [rfc3168] is widely deployed in the industry that uses 2 bits in the IP header to convey congestion information. It combines with AQM mechanisms in network devices, setting the CE code point in the IP header to indicate congestion before the queue overflows, thereby notifying endpoints to reduce their sending rate. Futhermore, L4S [rfc9330] redefines the semantics of ECT(1) code point and isolates L4S traffic from traditional traffic through the usage of dual-queue AQM in the middlebox, to achieve low latency.¶
ECN and L4S are essentially the collaboration between network and endpoints to achieve the desired low loss and low latency goals. However, this approach cannot completely address the challenges described in Section 3. Additionally, as elaborated in [L4SinCellular], L4S is quite sensitivity to time varying network, such as wireless and Wi-Fi networks, which may make it difficult to simultaneously achieve high throughput and low latency in such environments. If more information is provided for collaboration, issues may be overcomed more easily.¶
6. Security Considerations
This document has no security considerations.¶
7. IANA Considerations
This document has no IANA actions.¶
8. References
8.1. Normative References
- [RFC2119]
- Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, , <https://www.rfc-editor.org/rfc/rfc2119>.
- [RFC8174]
- Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, , <https://www.rfc-editor.org/rfc/rfc8174>.
8.2. Informative References
- [Annulus]
- SAEED, A., GUPTA, V., GOYAL, P., SHARIF, M., PAN, R., AMMAR, M., ZEGURA, E., JANG, K., ALIZADEH, M., KABBANI, A., and A. VAHDAT, "A Dual Congestion Control Loop for Datacenter and WAN Traffic Aggregates", .
- [Cross-Datacenter]
- ZENG, G., BAI, W., CHEN, G., CHEN, K., HAN, D., ZHU, Y., and L. CUI, "Congestion Control for Cross-Datacenter Networks", .
- [flash]
- GUO, L. and J. LEE, "TCP-FLASH - A Fast Reacting TCP for Modern Networks", .
- [I-D.kaippallimalil-tsvwg-media-hdr-wireless]
- Kaippallimalil, J., Gundavelli, S., and S. Dawkins, "Media Header Extensions for Wireless Networks", Work in Progress, Internet-Draft, draft-kaippallimalil-tsvwg-media-hdr-wireless-02, , <https://datatracker.ietf.org/doc/html/draft-kaippallimalil-tsvwg-media-hdr-wireless-02>.
- [L4SinCellular]
- MATHIEU, B. and S. TUFFIN, "Evaluating the L4S Architecture in Cellular Networks with a Programmable Switch", .
- [Pantheon]
- YAN, F., MA, J., HILL, G., RAGHAVAN, D., WAHBY, R., LEVIS, P., and K. WINSTEIN, "Pantheon: the training ground for Internet congestion-control research", .
- [rfc3168]
- Ramakrishnan, K., Floyd, S., and D. Black, "The Addition of Explicit Congestion Notification (ECN) to IP", RFC 3168, DOI 10.17487/RFC3168, , <https://www.rfc-editor.org/rfc/rfc3168>.
- [rfc9330]
- Briscoe, B., Ed., De Schepper, K., Bagnulo, M., and G. White, "Low Latency, Low Loss, and Scalable Throughput (L4S) Internet Service: Architecture", RFC 9330, DOI 10.17487/RFC9330, , <https://www.rfc-editor.org/rfc/rfc9330>.
- [_5G]
- Xu, D., Zhou, A., Zhang, X., Wang, G., Liu, X., An, C., Shi, Y., Liu, L., and H. Ma, "Understanding Operational 5G: A First Measurement Study on Its Coverage, Performance and Energy Consumption", .