BGP PORT EC for AIDC
draft-zhang-idr-portid-ec-01
This document is an Internet-Draft (I-D).
Anyone may submit an I-D to the IETF.
This I-D is not endorsed by the IETF and has no formal standing in the
IETF standards process.
| Document | Type | Active Internet-Draft (individual) | |
|---|---|---|---|
| Authors | Junye Zhang , Rui Zhuang , Zheng Zhang , Dongyu Yuan | ||
| Last updated | 2026-03-15 | ||
| RFC stream | (None) | ||
| Intended RFC status | (None) | ||
| Formats | |||
| Stream | Stream state | (No stream defined) | |
| Consensus boilerplate | Unknown | ||
| RFC Editor Note | (None) | ||
| IESG | IESG state | I-D Exists | |
| Telechat date | (None) | ||
| Responsible AD | (None) | ||
| Send notices to | (None) |
draft-zhang-idr-portid-ec-01
IDR J. Zhang
Internet-Draft R. Zhuang
Intended status: Standards Track China Mobile
Expires: 16 September 2026 Z. Zhang
D. Yuan
ZTE Corporation
15 March 2026
BGP PORT EC for AIDC
draft-zhang-idr-portid-ec-01
Abstract
This document introduces a new BGP extended community attribute for
use in AI computing, which announces the port ID between Leaf
switches and servers as preparation for sending large-scale traffic
before initiating AI tasks.
Status of This Memo
This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet-
Drafts is at https://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
This Internet-Draft will expire on 16 September 2026.
Copyright Notice
Copyright (c) 2026 IETF Trust and the persons identified as the
document authors. All rights reserved.
Zhang, et al. Expires 16 September 2026 [Page 1]
Internet-Draft Abbreviated Title March 2026
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents (https://trustee.ietf.org/
license-info) in effect on the date of publication of this document.
Please review these documents carefully, as they describe your rights
and restrictions with respect to this document. Code Components
extracted from this document must include Revised BSD License text as
described in Section 4.e of the Trust Legal Provisions and are
provided without warranty as described in the Revised BSD License.
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1. Requirements Language . . . . . . . . . . . . . . . . . . 4
2. Format . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
3. Specification . . . . . . . . . . . . . . . . . . . . . . . . 5
4. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 6
5. Security Considerations . . . . . . . . . . . . . . . . . . . 6
6. References . . . . . . . . . . . . . . . . . . . . . . . . . 6
6.1. Normative References . . . . . . . . . . . . . . . . . . 7
6.2. Informative References . . . . . . . . . . . . . . . . . 7
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 7
1. Introduction
With the rapid development of Artificial Intelligence (AI) and
Machine Learning (ML), AI tasks often generate large traffic due to
the characteristics of large language model computation (LLM). If
the link bandwidth is insufficient, packet loss may occur. AI
computation has very high reliability requirements and extremely low
tolerance for packet loss and latency. When there is link congestion
in the network that leads to packet loss or excessive latency, it
will have a significant impact on the computational efficiency of AI
tasks.
In data centers used for AI and machine learning, BGP is often used
as the routing protocol [RFC7938]. In some implementations,
sufficient bandwidth between the destination server and its connected
leaf switches must be ensured before sending traffic for AI tasks.
On the network side, specifically the area comprised of the Leaf and
Spine switches in Figure 1, there are numerous ECMP links.
Techniques such as Packet Spray can be used to minimize congestion
and packet loss. However, on the computing side, specifically the
last hop between the Leaf switches and the server, congestion can
easily lead to packet loss, significantly reducing the efficiency of
AI tasks. To minimize or eliminate packet loss on the last hop, BGP
needs to be extended to include port information on the destination
leaf switch. This allows the sender to negotiate based on this
information before sending traffic, ensuring sufficient bandwidth is
Zhang, et al. Expires 16 September 2026 [Page 2]
Internet-Draft Abbreviated Title March 2026
available in the last hop and preventing congestion and packet loss
due to insufficient bandwidth. To reduce the stress caused by full-
mesh connections, Leaf switches do not establish neighbors with each
other.
+--------+ +--------+
| Spine1 | | Spine2 |
+-+-+-+-++ +-+-+-+-++
| | | | | | | |
+------------------------------------+ | | |
| | | | | +-------------------------+ | |
| | | | | | +-------+ |
| | | | | | | |
| | | | +-------------------------------------+
| | | +-----------------------+ | | |
+---+ +------+ | | | | |
| | | | | | | |
+-+-+---+ +--+-+--+ +-+--+--+ ++--+---+
| Leaf1 | | Leaf2 | | Leaf3 | | Leaf4 |
+--+-+--+ +---+-+-+ +--+-+--+ +---+-+-+
| | | | | | | |
| +--------+ | | | +----------+ | |
| | | | | | | |
| +--------|-+ | | +----------|-+ |
| | | | | | | |
+-----+-+-+ +-+-+-+---+ +---+-+---+ +-+---+---+
| Server1 | | Server2 | | Server3 | | Server4 |
+---------+ +---------+ +---------+ +---------+
Figure 1
Figure 1 shows a typical data center used for AI computing. In this
network, when Server2, 3, or 4 sends traffic to Server1 through
leaf1, a common incast congestion problem may occur. That is, the
link 1 between Leaf1 switch and Server1 may be congested due to
insufficient bandwidth, resulting in packet loss.
Currently, some implementations negotiate before sending traffic from
devices like Server2 and Server3 to Server1. The AI task traffic is
only sent if the link bandwidth between the destination server and
its connected Leaf switch (referred to as the destination switch) is
sufficient. This negotiation method is outside the scope of this
draft. However, before negotiation, the port information connecting
the destination switch to the server needs to be obtained. This
information will be sent via the newly added extended community
"Route Port ID" in BGP.
Zhang, et al. Expires 16 September 2026 [Page 3]
Internet-Draft Abbreviated Title March 2026
1.1. Requirements Language
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in [RFC2119].
2. Format
When announcing the route to the connected server, the BGP protocol
on the Leaf switch carries the switch's address and the port ID
information connected to the destination server.
Transitive IPv4-Address-Specific Extended Community defined in
[RFC7153] and [I-D.ietf-idr-rfc4360-bis] with new sub-type "Route
Port ID" is used for carry the IPv4 address of Leaf switch and the
related port ID to the destination server.
Transitive IPv6-Address-Specific Extended Community defined in
[RFC5701] with new sub-type "Route Port ID" is used for carry the
IPv6 address of the leaf switch and the related port ID to the
destination server.
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| 0x01 or 0x41 | Sub-Type | Global Administrator |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Global Administrator (cont.) | Local Administrator |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 2
Figure 2 shows the format of IPv4-Address-Specific Extended
Community, where:
* Sub-Type: TBD. This indicates that this is the Route Port ID
extended community;
* Global Administrator: 4 octets, set to the IPv4 address of the
switch that advertises the server route. This address can be the
loopback address for establishing the BGP connection;
* Local Administrator: 2 octets, set to the ID of the port
connecting the switch and the server, with a value range of 1 to
65535.
Zhang, et al. Expires 16 September 2026 [Page 4]
Internet-Draft Abbreviated Title March 2026
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| 0x00 or 0x40 | Sub-Type | Global Administrator |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Global Administrator (cont.) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Global Administrator (cont.) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Global Administrator (cont.) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Global Administrator (cont.) | Local Administrator |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 3
Figure 3 shows the format of IPv6-Address-Specific Extended
Community, where:
* Sub-Type: TBD. This indicates that this is the Route Port ID
extended community;
* Global Administrator: 16 octets, set to the IPv6 address of the
switch that advertises the server route. This address can be the
loopback address for establishing the BGP connection;
* Local Administrator: 2 octets, set to the ID of the port
connecting the switch and the server, with a value range of 1 to
65535.
3. Specification
When the Leaf switch advertises routes to the server, the
advertisement includes the Route Port ID extended community, which is
transmitted along with the route advertisement.
In the example shown in Figure 1, Leaf1, when advertising routes to
the Spine switch, includes the Route Port ID extended community,
which contains the Loopback address used to establish the BGP
connection and the port ID connected to the server. The Leaf2 is the
same.
Upon receiving the route carrying the Route Port ID extended
community, the leaf switch checks if the address is reachable. If
unreachable, the extended community is ignored. If reachable, the
address and port information are stored locally or sent to the
server. This storing or sending process is outside the scope of this
draft.
Zhang, et al. Expires 16 September 2026 [Page 5]
Internet-Draft Abbreviated Title March 2026
Because data centers used for AI computing have a large number of
ECMP paths, deploying this feature requires enabling the ADD-PATH
advertisement function defined in [RFC7911], to ensure the
propagation of extended community attributes. Spine or higher-level
switches do not need to generate entries based on this extended
community attribute. To avoid a large number of route advertisements
that may result from enabling the ADD-PATH function, this
advertisement SHOULD be limited to a single PoD.
When a server wants to send large traffic for AI tasks, it will
negotiate bandwidth based on the destination switch and port
information obtained from BGP. Traffic will only be sent after
successful negotiation, thus avoiding packet loss caused by
congestion. Traffic will be sent to the server via the successfully
negotiated Leaf switch. This negotiation process is outside the
scope of this draft.
In the example shown in Figure 1, the routes advertised by Leaf1 and
Leaf2 to Server1 will carry the Route Port ID extended community.
When Server3 wants to send AI task traffic to Server1, it can first
negotiate with Leaf1. If the negotiation fails, it may negotiate
with Leaf2. Only after the negotiation succeeds will the traffic be
sent. In this example, assuming Leaf1 is successfully negotiated,
traffic will be sent to Server1 through Leaf1.
4. IANA Considerations
IANA is requested to allocate two new code points from the
"Transitive IPv4-Address-Specific Extended Community Sub-Types" and
the "Transitive IPv6-Address-Specific Extended Community Sub-Types"
registry.
+======+===============+===============+
| Type | Description | Reference |
+======+===============+===============+
| TBD | Route Port ID | This Document |
+------+---------------+---------------+
Table 1: TABLE_1
5. Security Considerations
This extension to BGP has similar security implications as BGP
Extended Communities [RFC7153], [RFC5701] and
[I-D.ietf-idr-rfc4360-bis].
6. References
Zhang, et al. Expires 16 September 2026 [Page 6]
Internet-Draft Abbreviated Title March 2026
6.1. Normative References
[I-D.ietf-idr-rfc4360-bis]
Sangli, S. R. and N. Kao, "BGP Extended Communities
Attribute", Work in Progress, Internet-Draft, draft-ietf-
idr-rfc4360-bis-02, 6 December 2025,
<https://datatracker.ietf.org/doc/html/draft-ietf-idr-
rfc4360-bis-02>.
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119,
DOI 10.17487/RFC2119, March 1997,
<https://www.rfc-editor.org/info/rfc2119>.
[RFC5701] Rekhter, Y., "IPv6 Address Specific BGP Extended Community
Attribute", RFC 5701, DOI 10.17487/RFC5701, November 2009,
<https://www.rfc-editor.org/info/rfc5701>.
[RFC7153] Rosen, E. and Y. Rekhter, "IANA Registries for BGP
Extended Communities", RFC 7153, DOI 10.17487/RFC7153,
March 2014, <https://www.rfc-editor.org/info/rfc7153>.
6.2. Informative References
[RFC7911] Walton, D., Retana, A., Chen, E., and J. Scudder,
"Advertisement of Multiple Paths in BGP", RFC 7911,
DOI 10.17487/RFC7911, July 2016,
<https://www.rfc-editor.org/info/rfc7911>.
[RFC7938] Lapukhov, P., Premji, A., and J. Mitchell, Ed., "Use of
BGP for Routing in Large-Scale Data Centers", RFC 7938,
DOI 10.17487/RFC7938, August 2016,
<https://www.rfc-editor.org/info/rfc7938>.
Authors' Addresses
Junye Zhang
China Mobile
China
Email: zhangjunye@chinamobile.com
Rui Zhuang
China Mobile
China
Email: zhuangruiyjy@chinamobile.com
Zhang, et al. Expires 16 September 2026 [Page 7]
Internet-Draft Abbreviated Title March 2026
Zheng Zhang
ZTE Corporation
China
Email: zhang.zheng@zte.com.cn
Dongyu Yuan
ZTE Corporation
China
Email: yuan.dongyu@zte.com.cn
Zhang, et al. Expires 16 September 2026 [Page 8]