Skip to main content

IKEv2 support for per-resource Child SAs
draft-ietf-ipsecme-multi-sa-performance-08

Yes

Erik Kline
Roman Danyliw

No Objection

Deb Cooley
Jim Guichard

Recuse


No Record

Francesca Palombini
John Scudder
Murray Kucherawy
Zaheduzzaman Sarker
Éric Vyncke

Summary: Needs one more YES or NO OBJECTION position to pass.

Erik Kline
Yes
Roman Danyliw
Yes
Deb Cooley
No Objection
Gunter Van de Velde
No Objection
Comment (2024-04-22 for -06) Sent
# Gunter Van de Velde, RTG AD, comments for draft-ietf-ipsecme-multi-sa-performance-06

Thank you for the work put into this document.

Please find some non-blocking COMMENT points.

Please find https://www.ietf.org/blog/handling-iesg-ballot-positions/ documenting the handling of ballots.

###COMMENTS:
##generic comments:
The abstract implies the possibility of utilizing various resources to enhance performance for the same traffic selector, yet the document consistently mentions only the CPU. If multiple CPUs are indeed the sole resource envisaged for the child SAs associated with a single traffic selector, it would be advantageous for the document to specify this more clearly in the abstract. Generally, the term "resource" encompasses a broad range of elements within networking (i.e. bandwidth, QoS queues, optical paths, ECMP paths, etc); however, in this draft, it appears to specifically refer to computing resources.  

I was on the verge of making the ballot a DISCUSS because section 2 talking about performance bottlenecks and detailing that some state can not be shared without impacting performance justifies the existence of this rfc-to-be. While it is well-understood when using different CPUs (sequence numbers etc), but it is not so simple to understand what the performance benefit is when separate queues are the resource differences. Maybe i misunderstood how this operates together in symbiose? or misunderstood the word queue (it may have different meaning in IPsec then on L3-network-interfaces)

##detailed comments
134	1.2.  Terminology

This section only has the pre-existing terminology. I was wondering if terms like the new SA_RESOURCE_INFO should be mentioned to have everything documented in a single place
This section could be a good place to extend more explicit on what is exactly meant with the term resource in the context of this draft.
should SADB_ACQUIRE be mentioned? SPD?

141	2.  Performance bottlenecks

Here is the header 'performance bottlenecks'. Only a single bottleneck is mentioned in the section. Maybe the section title can be phrased in such that it covers the content more explicit for readability. For a network generalist it is unclear which other bottlenecks exist.

143	   There are a number of practical reasons why most implementations have
144	   to limit a Child SA to only one specific hardware resource, but a key
145	   limitation is that sharing the cryptographic state, counters and
146	   sequence numbers between multiple CPUs that are trying to use these
147	   shared states at the same time is not feasible without a significant
148	   performance penalty.  There is a need to negotiate and establish
149	   multiple Child SAs with identical TSi/TSr on a per-resource basis.

This phrase is rather long and not so easy to digest. What about this re-edit. I also took liberty to expand on TSi/TSr as a first time use:

"
There are several pragmatic reasons why most implementations must restrict a Child Security Association (SA) to a single specific hardware resource. A primary limitation arises from the challenges associated with sharing cryptographic states, counters, and sequence numbers among multiple CPUs. When these CPUs attempt to simultaneously utilize shared states, it becomes impractical to do so without incurring a significant performance penalty. It is necessary to negotiate and establish multiple Child Security Associations (SAs) with identical Traffic Selector initiator (TSi) and Traffic Selector responder (TSr) on a per-resource basis."

168	   Upon installation, each resource-specific Child SA is associated with
169	   an additional local selector, such as CPU or queue.  These resource-
170	   specific Child SAs MUST be negotiated with identical Child SA
171	   properties that were negotiated for the initial Child SA.  This

In section 2 is written that for improved performance "the cryptographic state, counters and sequence numbers between multiple CPUs" is difficult to share. THis is trivial to understand with CPUs, but how does that explicit correlate with queues? 

192	   There are various considerations that an implementation can use to
193	   determine the best way to install multiple Child SAs.

The best 'way' or the best 'procedure'?

195	   A simple distribution could be to install one additional Child SA on
196	   each CPU.  An implementation MAY ensure that one Child SA can be used

A distribution of what? is this referring to an implementation?

213	   When the number of queue or CPU resources are different between the
214	   peers, the peer with the least amount of resources may decide to not
215	   install a second outbound Child SA for the same resource as it will
216	   never use it to send traffic.  However, it MUST install all inbound
217	   Child SAs as it has committed to receiving traffic on these
218	   negotiated Child SAs.

Is there risk to create an overload of SAs for a single resource?

224	   Section 2.9.  Based on the trigger TSi entry, an implementations can

s/implementations/implementation/

243	   All multi-octet fields representing integers are laid out in big
244	   endian order (also known as "most significant byte first", or
245	   "network byte order").

is this necessary to be explained? is that not part of what RFC7296 specifies anyway?

261	   *  Protocol ID (1 octet) - MUST be 0.  MUST be ignored if not 0.
263	   *  SPI Size (1 octet) - MUST be 0.  MUST be ignored if not 0.

and

280	   *  Protocol ID (1 octet) - MUST be 0.  MUST be ignored if not 0.
282	   *  SPI Size (1 octet) - MUST be 0.  MUST be ignored if not 0.

no code-point reservations needed for experimental? or future use?

267	   *  Resource Identifier (optional).  This opaque data may be set to
268	      convey the local identity of the resource.

should there be no restrictions on what can be considered as a local identity?
What of this identity is an extremely long blockchain blob? what would happen? is that allowed?

290	   Implementations supporting per-CPU SAs SHOULD extend their local SPD

later in the text is the it is mentioned per-queue also... Does this behave differently then the per-CPU principle?


344	   An implementation that does not accept any further resource specific
345	   Child SAs MUST NOT return the NO_ADDITIONAL_SAS error because this
346	   can be interpreted by the peer that no other Child SAs with different
347	   TSi/TSr are allowed either.  Instead, it MUST return TS_MAX_QUEUE.

should anything be mentioned about state kept on the implementation that has no more resources?
What if the remote side tries to open 10M SAs? (is it an attack vector?)
Jim Guichard
No Objection
Mahesh Jethanandani
No Objection
Comment (2024-04-29) Sent
My comments are split between COMMENTs and NITs.

-------------------------------------------------------------------------------
COMMENT
-------------------------------------------------------------------------------

From an operational perspective, the shepherd write-up brought up the question of how this draft would be operationalized. In other words, is there an augment of the existing YANG model planned that would update the model to add the ability to configure multiple SAs? If not, how does a user specify their interest in enabling this feature?

No reference entries found for these items, which were mentioned in the text:
[TBD2] and [TBD1].

-------------------------------------------------------------------------------
NIT
-------------------------------------------------------------------------------

All comments below are about very minor potential issues that you may choose to
address in some way - or ignore - as you see fit. Some were flagged by
automated tools (via https://github.com/larseggert/ietf-reviewtool), so there
will likely be some false positives. There is no need to let me know what you
did with these suggestions.

Reference [RFC6982] to RFC6982, which was obsoleted by RFC7942 (this may be on
purpose).

Section 1.2, paragraph 1
> n initial IKEv2 exchange is used to setup an IKE SA and the initial Child SA.
>                                     ^^^^^
The verb "set up" is spelled as two words. The noun "setup" is spelled as one.

Section 2, paragraph 1
> he Exchange negotiating the Child SA (eg IKE_AUTH or CREATE_CHILD_SA). If thi
>                                       ^^
The abbreviation "e.g." (= for example) requires two periods.

Section 4, paragraph 2
> hild SAs. If per-CPU packet trigger (eg SADB_ACQUIRE) messages are implemente
>                                      ^^
The abbreviation "e.g." (= for example) requires two periods.

Section 4, paragraph 3
> ed on the trigger TSi entry, an implementations can select the most optimal t
>                              ^^^^^^^^^^^^^^^^^^
The plural noun "implementations" cannot be used with the article "an". Did you
mean "an implementation" or "implementations"?

Section 5.1, paragraph 5
>  identifier in their packet trigger (eg SADB_ACQUIRE) message from the SPD t
>                                      ^^
The abbreviation "e.g." (= for example) requires two periods.

Section 6, paragraph 1
> lthough having a very large number (eg hundreds or thousands) of SAs may slo
>                                     ^^
The abbreviation "e.g." (= for example) requires two periods.

Section 6, paragraph 2
> he inbound SA and outbound SA independently from each other. It is likely tha
>                               ^^^^^^^^^^^^^^^^^^
The usual collocation for "independently" is "of", not "from". Did you mean
"independently of"?

Section 6, paragraph 4
> elonging to a specific resource. The notify data SHOULD NOT be an identifier 
>                                  ^^^^^^^^^^
The verb "notify" does not usually follow articles like "The". Check that
"notify" is spelled correctly; using "notify" as a noun may be non-standard.

Section 8, paragraph 4
> the ESP flow, to a specific Q or CPU e.g ethtool ntuple configuration. The SP
>                                      ^^^
The abbreviation "e.g." (= for example) requires two periods.
Orie Steele
No Objection
Comment (2024-04-30) Not sent
```
19	   Version 2 (IKEv2) to support the negotiation of multiple Child SAs
```

Expand SA on first use, even though the term is defined in RFC7296?

Its possible TSi/TSr might also be helpful to expand for readers.
Warren Kumari
(was Discuss) No Objection
Comment (2024-04-30) Sent
Thank you for addressing my DISCUSS!

I think that this document is both cool and useful....
Paul Wouters
Recuse
Comment (2024-04-06 for -06) Not sent
I am an author
Francesca Palombini
No Record
John Scudder
No Record
Murray Kucherawy
No Record
Zaheduzzaman Sarker
No Record
Éric Vyncke
No Record