Export of RoCEv2 Base Transport Header (BTH) Information Using IP Flow Information Export (IPFIX)
draft-lin-opsawg-ipfix-rocev2-00
This document is an Internet-Draft (I-D).
Anyone may submit an I-D to the IETF.
This I-D is not endorsed by the IETF and has no formal standing in the
IETF standards process.
| Document | Type | Active Internet-Draft (individual) | |
|---|---|---|---|
| Authors | Changwang Lin , Haiyang Zhang | ||
| Last updated | 2026-02-27 | ||
| RFC stream | (None) | ||
| Intended RFC status | (None) | ||
| Formats | |||
| Stream | Stream state | (No stream defined) | |
| Consensus boilerplate | Unknown | ||
| RFC Editor Note | (None) | ||
| IESG | IESG state | I-D Exists | |
| Telechat date | (None) | ||
| Responsible AD | (None) | ||
| Send notices to | (None) |
draft-lin-opsawg-ipfix-rocev2-00
OPSAWG C. Lin
Internet-Draft H. Zhang
Intended status: Standards Track New H3C Technologies
Expires: 31 August 2026 27 February 2026
Export of RoCEv2 Base Transport Header (BTH) Information Using IP Flow
Information Export (IPFIX)
draft-lin-opsawg-ipfix-rocev2-00
Abstract
This document defines a new set of IP Flow Information Export (IPFIX)
Information Elements (IEs) for exporting Base Transport Header (BTH)
information for RDMA over Converged Ethernet version 2 (RoCEv2)
traffic. These extensions enable network monitoring systems to
collect and analyze the characteristics of RDMA traffic widely used
in high-performance computing, storage, and artificial intelligence
applications.
Status of This Memo
This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet-
Drafts is at https://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
This Internet-Draft will expire on 31 August 2026.
Copyright Notice
Copyright (c) 2026 IETF Trust and the persons identified as the
document authors. All rights reserved.
Lin & Zhang Expires 31 August 2026 [Page 1]
Internet-Draft Export of RoCEv2 BTH Using IPFIX February 2026
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents (https://trustee.ietf.org/
license-info) in effect on the date of publication of this document.
Please review these documents carefully, as they describe your rights
and restrictions with respect to this document. Code Components
extracted from this document must include Revised BSD License text as
described in Section 4.e of the Trust Legal Provisions and are
provided without warranty as described in the Revised BSD License.
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2
2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 3
3. New IPFIX Information Elements for RoCEv2 BTH . . . . . . . . 3
4. Operational Considerations . . . . . . . . . . . . . . . . . 4
5. Security Considerations . . . . . . . . . . . . . . . . . . . 4
6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 5
6.1. New IPFIX IEs for RoCEv2 BTH . . . . . . . . . . . . . . 5
6.1.1. rdmaOpCode . . . . . . . . . . . . . . . . . . . . . 5
6.1.2. rdmaPartitionKey . . . . . . . . . . . . . . . . . . 6
6.1.3. rdmaDestinationQP . . . . . . . . . . . . . . . . . . 6
6.1.4. rdmaSourceQP . . . . . . . . . . . . . . . . . . . . 6
6.1.5. rdmaPacketSequenceNumber . . . . . . . . . . . . . . 7
6.1.6. rdmaBTHFlags1 . . . . . . . . . . . . . . . . . . . . 7
6.1.7. rdmaBTHFlags2 . . . . . . . . . . . . . . . . . . . . 8
6.1.8. rdmaBTHFlags3 . . . . . . . . . . . . . . . . . . . . 8
7. References . . . . . . . . . . . . . . . . . . . . . . . . . 9
7.1. Normative References . . . . . . . . . . . . . . . . . . 9
7.2. Informative References . . . . . . . . . . . . . . . . . 9
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 10
1. Introduction
Remote Direct Memory Access (RDMA) [RFC5040] is a network technology
that allows a computer to read from or write to the memory of another
computer directly, without involving the operating system. This
zero-copy and kernel-bypass feature greatly reduces CPU overhead and
communication latency. InfiniBand [IBTA-SPEC] and RDMA over
Converged Ethernet (RoCE) are two mainstream RDMA implementations
that bypass the operating system kernel and achieve zero-copy data
transfer.
RoCE technology has become a key component of high-performance data
center networks, especially in low-latency, high-throughput scenarios
such as artificial intelligence training, distributed storage, and
financial transactions. RDMA over Converged Ethernet version 2
(RoCEv2) runs on top of UDP (port 4791) and inherits the transport
layer protocol of the InfiniBand Architecture (IBA).
Lin & Zhang Expires 31 August 2026 [Page 2]
Internet-Draft Export of RoCEv2 BTH Using IPFIX February 2026
The existing IPFIX [RFC7011] standard lacks the ability to monitor
specific fields of RoCEv2, which limits the ability of network
operators to perform in-depth analysis, troubleshooting, and
performance optimization of RDMA traffic. To close this gap, this
document defines a new set of Information Elements (IEs) to carry
RoCEv2 BTH key fields.
2. Terminology
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
"OPTIONAL" in this document are to be interpreted as described in BCP
14 [RFC2119] [RFC8174] when, and only when, they appear in all
capitals, as shown here.
The following terms are used as defined in [RFC7011]:
* IPFIX
* IPFIX Information Elements
The following terms are used in this document:
* QP: Queue Pair, a communication endpoint in RDMA architecture.
* BTH: Base Transport Header, the RDMA transport header containing
QP information.
* P_Key: Partition Key.
* PSN: Packet Sequence Number.
3. New IPFIX Information Elements for RoCEv2 BTH
This section defines new IPFIX IEs for exporting RoCEv2 BTH key
fields.
rdmaOpCode
The RoCEv2 BTH OpCode field, which indicates the type of the
InfiniBand Architecture data packet.
rdmaPartitionKey
The RoCEv2 BTH Partition Key field, which identifies the logical
partition to which the data packet belongs.
rdmaDestinationQP
The RoCEv2 BTH Destination Queue Pair (QP) field, indicating the
work QP number at the destination.
Lin & Zhang Expires 31 August 2026 [Page 3]
Internet-Draft Export of RoCEv2 BTH Using IPFIX February 2026
rdmaSourceQP
The RoCEv2 BTH Source Queue Pair (QP) field (if present),
indicating the work QP number at the Source.
rdmaPacketSequenceNumber
The RoCEv2 BTH Packet Sequence Number (PSN) field, which is used
to detect whether data packets are lost or duplicated, ensuring
the reliability and orderliness of transmission.
rdmaBTHFlags1
The SE, M, Pad, and TVer fields of The RoCEv2 BTH. The size of
this Information Element is 1 octet.
rdmaBTHFlags2
The F/R, and B/R fields of The RoCEv2 BTH. The size of this
Information Element is 1 octet.
rdmaBTHFlags3
The A field of The RoCEv2 BTH. The size of this Information
Element is 1 octet.
4. Operational Considerations
The exporter needs to parse RoCEv2 BTH information, which may affect
the exporter's performance. Implementers SHOULD consider measures to
mitigate this impact, such as sampling rate limiting or hardware
acceleration.
5. Security Considerations
The Security Considerations for IPFIX [RFC7011] apply to this
document as well.
When exporting RDMA BTH information across security domains, to
prevent information security risks caused by the leakage of sensitive
network topology details such as QP numbers, it is recommended to
implement comprehensive protection measures, such as using the
encrypted transmission options of the IPFIX framework (such as DTLS
[RFC9147]) to ensure the confidentiality and integrity of data during
transmission.
To defend against Denial-of-Service (DOS) attacks that may be caused
by maliciously crafted RoCEv2 packets and to prevent the exporter
from being overloaded by processing a large amount of invalid
traffic, it is recommended to adopt certain protection strategies,
such as configuring reasonable rate limiting policies to prevent the
monitoring system from being overwhelmed by massive data packets and
to ensure its stable operation.
Lin & Zhang Expires 31 August 2026 [Page 4]
Internet-Draft Export of RoCEv2 BTH Using IPFIX February 2026
6. IANA Considerations
6.1. New IPFIX IEs for RoCEv2 BTH
This document specifies new IPFIX IEs to enable export of RoCEv2 BTH
key fields along with other flow information. This document requests
IANA to add these IPFIX IEs to the "IPFIX Information Elements"
registry available at [IANA-IPFIX].
Table 1 lists the new IPFIX IEs for RoCEv2 BTH:
+============+==========================+===============+
| Element ID | Name | Reference |
+============+======================--==+===============+
| TBD1 | rdmaOpCode | This document |
+------------+--------------------------+---------------+
| TBD2 | rdmaPartitionKey | This document |
+------------+--------------------------+---------------+
| TBD3 | rdmaDestinationQP | This document |
+------------+--------------------------+---------------+
| TBD4 | rdmaSourceQP | This document |
+------------+--------------------------+---------------+
| TBD5 | rdmaPacketSequenceNumber | This document |
+------------+--------------------------+---------------+
| TBD6 | rdmaBTHFlags1 | This document |
+------------+--------------------------+---------------+
| TBD7 | rdmaBTHFlags2 | This document |
+------------+--------------------------+---------------+
| TBD8 | rdmaBTHFlags3 | This document |
+------------+--------------------------+---------------+
Table 1: New IEs in the "IPFIX Information Elements" Registry
6.1.1. rdmaOpCode
Name: rdmaOpCode
Element ID: TBD1
Description: The RoCEv2 BTH OpCode field, which indicates the type
of the InfiniBand Architecture data packet.
Abstract Data Type: unsigned8
Data Type Semantics: identifier
Status: current
Lin & Zhang Expires 31 August 2026 [Page 5]
Internet-Draft Export of RoCEv2 BTH Using IPFIX February 2026
Reference: [this document]
6.1.2. rdmaPartitionKey
Name: rdmaPartitionKey
Element ID: TBD2
Description: The RoCEv2 BTH Partition Key field, which identifies
the logical partition to which the data packet belongs.
Abstract Data Type: unsigned16
Data Type Semantics: identifier
Status: current
Reference: [this document]
6.1.3. rdmaDestinationQP
Name: rdmaDestinationQP
Element ID: TBD3
Description: The RoCEv2 BTH Destination Queue Pair (QP) field,
indicating the work QP number at the destination. The actual
effective bits are 24 bits, stored in the lower 24 bits of the
32-bit field, and the higher 8 bits should be 0.
Abstract Data Type: unsigned32
Data Type Semantics: identifier
Status: current
Reference: [this document]
6.1.4. rdmaSourceQP
Name: rdmaSourceQP
Element ID: TBD4
Description: The RoCEv2 BTH Source Queue Pair (QP) field (if
present), indicating the work QP number at the Source. The actual
effective bits are 24 bits, stored in the lower 24 bits of the
32-bit field, and the higher 8 bits should be 0.
Lin & Zhang Expires 31 August 2026 [Page 6]
Internet-Draft Export of RoCEv2 BTH Using IPFIX February 2026
Abstract Data Type: unsigned32
Data Type Semantics: identifier
Status: current
Reference: [this document]
6.1.5. rdmaPacketSequenceNumber
Name: rdmaPacketSequenceNumber
Element ID: TBD5
Description: The RoCEv2 BTH Packet Sequence Number (PSN) field,
which is used to detect whether data packets are lost or
duplicated, ensuring the reliability and orderliness of
transmission. The actual effective bits are 24 bits, stored in
the lower 24 bits of the 32-bit field, and the higher 8 bits
should be 0.
Abstract Data Type: unsigned32
Data Type Semantics: default
Status: current
Reference: [this document]
6.1.6. rdmaBTHFlags1
Name: rdmaBTHFlags1
Element ID: TBD6
Description: The SE, M, Pad, and TVer fields of The RoCEv2 BTH.
The size of this Information Element is 1 octet.
0 1 2 3 4 5 6 7
+--+-+-+-+-+-+-+-+
|SE|M|Pad| TVer |
+--+-+-+-+-+-+-+-+
Bits 0: Solicited Event (SE) field.
Bits 1: Migration Request (M) field.
Bits 2-3: Pad Count (Pad) field.
Bits 4-7: Transport Header Version (TVer) field.
Lin & Zhang Expires 31 August 2026 [Page 7]
Internet-Draft Export of RoCEv2 BTH Using IPFIX February 2026
Abstract Data Type: unsigned8
Data Type Semantics: default
Status: current
Reference: [this document]
6.1.7. rdmaBTHFlags2
Name: rdmaBTHFlags2
Element ID: TBD7
Description: The F/R, and B/R fields of The RoCEv2 BTH.
The size of this Information Element is 1 octet.
0 1 2 3 4 5 6 7
+---+---+---+---+---+---+---+---+
|F/R|B/R| Resv |
+---+---+---+---+---+---+---+---+
Bits 0: Forward Explicit Congestion Notification (FECN)/Res1
(F/R) field.
Bits 1: Backward Explicit Congestion Notification (BECN)/Res1
(B/R) field.
Bits 2-7: Reserved field.
Abstract Data Type: unsigned8
Data Type Semantics: flags
Status: current
Reference: [this document]
6.1.8. rdmaBTHFlags3
Name: rdmaBTHFlags3
Element ID: TBD8
Description: The A field of The RoCEv2 BTH.
The size of this Information Element is 1 octet.
Lin & Zhang Expires 31 August 2026 [Page 8]
Internet-Draft Export of RoCEv2 BTH Using IPFIX February 2026
0 1 2 3 4 5 6 7
+-+-+-+-+-+-+-+-+
|A| Resv |
+-+-+-+-+-+-+-+-+
Bits 0: Acknowledge Request (A) field.
Bits 1-7: Reserved field.
Abstract Data Type: unsigned8
Data Type Semantics: flags
Status: current
Reference: [this document]
7. References
7.1. Normative References
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119,
DOI 10.17487/RFC2119, March 1997,
<https://www.rfc-editor.org/info/rfc2119>.
[RFC5040] Recio, R., Metzler, B., Culley, P., Hilland, J., and D.
Garcia, "A Remote Direct Memory Access Protocol
Specification", RFC 5040, DOI 10.17487/RFC5040, October
2007, <https://www.rfc-editor.org/info/rfc5040>.
[RFC7011] Claise, B., Ed., Trammell, B., Ed., and P. Aitken,
"Specification of the IP Flow Information Export (IPFIX)
Protocol for the Exchange of Flow Information", STD 77,
RFC 7011, DOI 10.17487/RFC7011, September 2013,
<https://www.rfc-editor.org/info/rfc7011>.
[RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC
2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174,
May 2017, <https://www.rfc-editor.org/info/rfc8174>.
7.2. Informative References
[IANA-IPFIX]
"IP Flow Information Export (IPFIX) Entities", n.d.,
<https://www.iana.org/assignments/ipfix/ipfix.xhtml>.
Lin & Zhang Expires 31 August 2026 [Page 9]
Internet-Draft Export of RoCEv2 BTH Using IPFIX February 2026
[IBTA-SPEC]
InfiniBand Trade Association, "InfiniBand Architecture
Specification", InfiniBand Architecture
Specification Volume 1-2, Release 1.6, December 2023,
<https://www.infinibandta.org/ibta-specification/>.
[RFC9147] Rescorla, E., Tschofenig, H., and N. Modadugu, "The
Datagram Transport Layer Security (DTLS) Protocol Version
1.3", RFC 9147, DOI 10.17487/RFC9147, April 2022,
<https://www.rfc-editor.org/info/rfc9147>.
Authors' Addresses
Changwang Lin
New H3C Technologies
Beijing
China
Email: linchangwang.04414@h3c.com
Haiyang Zhang
New H3C Technologies
Beijing
China
Email: zhang.haiyangA@h3c.com
Lin & Zhang Expires 31 August 2026 [Page 10]