Skip to main content

Minimal IP Encapsulating Security Payload (ESP)
draft-ietf-lwig-minimal-esp-12

Yes

Erik Kline

No Objection

Murray Kucherawy
(Alvaro Retana)
(Robert Wilton)

Note: This ballot was opened for revision 08 and is now closed.

Erik Kline
Yes
Murray Kucherawy
No Objection
Paul Wouters
(was Discuss) No Objection
Comment (2022-08-19 for -11) Sent
Thanks for changes to the document.

My DISCUSS items were addressed enough for me to not block the document for publication.


Old DISCUSS items:

[1]

Section 2:

It suggests a partial SPI match can be used, based on the assumption that
the SPI number is known to have mostly zeros because the device only uses
a hardcoded limited set (eg 257 to 260). While this is true for the outbound
SPI, this may not be true for the inbound SPI, especially if the peer is not
a "minimal ESP" device but a regular multipurpose OS. I think some clarification
is needed for this minimum implementation optimization.

[2]

Section 2.1:

       SPI that are not randomly generated over 32 bits may lead to privacy
       and security concerns.

The "may lead to security concerns" would be something that at the very least needs
to be understood and specified in the Security Considerations section. If it is too
difficult to determine the concerns, perhaps this optimization should be removed from
the draft.

       As a result, the use of alternative designs requires careful security
       and privacy reviews.

If it is known this proposal requires careful security reviews, were these done? If
so, why not replace this warning of danger with the actual output of those reviews?
If reviews were not done, it would imply this document hasn't fully worked out its
Security Considerations.

[3]

       SPI can typically be used to implement a key update

What is a "key update" in this context? It seems this section is suggesting to use
part of the SPI octet space to signal things to another part of the code on the device?
If so, would that code part then clear out those overloaded SPI octets or would they go
(unencrypted!) over the network for everyone to see?

[4]

       While the use of randomly generated SPIs may reduce the leakage or
       privacy of security related information by ESP itself, these
       information may also be leaked otherwise.

This is not a strong argument. This sentence and the entire paragraph really seem to
want to say something like "if you can see the network packets, the information
leak would already be present by seeing the encrypted traffic, irrespective of
whether the SPI is truly random or selected in a way that identifies the manufacturer"

[5]

       The security of all data
       protected under a given key decreases slightly with each message

I do not know of a generic claim like this for ESP. Can a reference be provided?
In general, rekeying is done to avoid decrypting previous traffic in case of a key compromise.
Or perhaps you mean the limits of algorithms like AES_CBC (or 3DES) with respect to
birthday and collision attacks? eg the commonly used maximum of 2^32-1 crypto operations
(which is not the same as maximum packets)

In these cases, the SN is only relevant for very high speed links, eg gbps and would never
apply to an IoT device that requires minimal ESP.

[6]

As noted in the TSVART review:

       Also, for devices that spend significant time sleeping, the SN
       would jump hugely on first waking. That shouldn't require any
       larger window (unless a stale packet from prior to the sleep was
       only released after a new packet on waking). But the receiver
       would need to be able to somehow detect massive jumps in the high
       order bits that are not communicated in the SN field.

Perhaps the document can add more specific detail on how to use the commonly
implemented time values into valid SNs that avoid ESN issues ?


[7]

       so the constrained device may not proceed to such checks

The language issue here inverts the meaning. What is meant is "so the constrained device
may omit such checks"

[8]

       TFC has not yet being widely adopted for standard ESP traffic.

It is widely implemented (eg in Linux). I agree that using it seems rare.
I am not convinced the reason for this is as is written. The issue I think
more relates to deciding to what size to pad. The easiest is to use the MTU,
but due to various encapsulation techniques (ESPinUDP, PPP-OE) it is not always
clear what the MTU of the IPsec link is. And path MTU discovery with IPsec does
not really work in practice.

But if the application/device tends to send packets between 1 and say 125 bytes,
it could always pad to 125 to not leak any information by packet size. The question
on when to do this or not really depends on the traffic being protected. And if this
the case, then it might be best to let the IKEv2 negotiation determine whether or not
to use this - just like regular use of TFC.

Regardless, TFC is optional and a minimum implementation can just omit it. Since
this document would also be combined with efforts reducing sending bytes to
preserve energy, it would make sense to avoid using TFC padding. Especially for sensors
that for example just always send a one byte temperature value to begin with.

       Such information could be used by the attacker in case a vulnerability is
       disclosed on the specific device.

I don't think "vulnerability" here is the issue. It could lead to exposing the size
of the original packet being protected by IPsec, which could (or could not) leak
information to an observer on the network.

[9]

       a minimal ESP implementation may not generate such dummy packet.

I think what is meant is "MUST NOT generate".

[10]

The Next Header Section is better named Dummy Packet. While it discusses the mandatory
Next Header field, it really only states not to send Dummy Packets. But it almost reads
as if the Next Header can be ignored or omitted.

[11]

       4.  Avoid Padding by sending payload data which are aligned to
           the cipher block length - 2 for the ESP trailer.

Isn't this advise just moving the padding from the IPsec layer to the application
layer? Eg the packet size or energy use would not be different if one implements
this advise?

[12]

Would it be useful to be able to signal a "mininum ESP" via IKEv2? I can imagine a simple
Notify could be used to signal this. A peer receiving this could then ensure it is
behaving in a "minimum ESP" compatible way even if it is a multi-purpose OS.

Comments:

There is a bit excessive and inconsistent linking to RFC 4303 throughout
the document. I think on first use of ESP the RFC can be referenced, but
further the document can just talk about ESP without keeping links to RFC 4303.
(I also thought there should not be any links in the abstract?)

The document should maybe mention IPsec v3 is meant for "ESP". IPsec v3 is a superset
of IPsec v2. There is no compatibility issue because the "new" things in v3 are
all negotiated via IKE.

I don't understand "a form of partial sequence integrity", as integrity
is a boolean - it passes or fails. I don't understand "partial"

"it becomes crucial" is a bit weak. I would say it must be guaranteed
that ESP on IoT remains interoperable with currently deployed ESP.

       This may raise some privacy issues as an
       observer is likely to be able to determine the constrained devices of
       the network.

This text might be better placed in a Privacy Considerations section.

The term "traffic shaping" is used in the document to refer to traffic being
padded (padding or TFC). Perhaps my personal exposure to Linux has caused me
to think of "traffic shaping" to mean to control the speed or flow of traffic,
and not meaning "modifying traffic size".
Roman Danyliw
(was Discuss) No Objection
Comment (2022-09-12 for -11) Sent for earlier
Thank you to David Mandelberg for the SECDIR review.

Thank you for addressing my DISCUSS and COMMENT feedback.
Zaheduzzaman Sarker
No Objection
Comment (2022-04-06 for -08) Not sent
Thanks for this specification. Thanks to Bob Briscoe for the TSVART review.
Éric Vyncke
No Objection
Comment (2022-04-04 for -08) Sent
Thank you for the work put into this document. 

Please find below some non-blocking COMMENT points (but replies would be appreciated even if only for my own education), and some nits.

Special thanks to Mohit Sethi for the shepherd's write-up including the difficulties to reach WG consensus, I only regret the absence of justification for the intended status. 

I hope that this helps to improve the document,

Regards,

-éric

## Section 2

"SA lookup needs to be performed using the longest match", I will let the SEC ADs to raise this point if required, but my understanding of IPsec is that it is not a "longest match" but a "full match on IP SA & SPI".

"the combination of a fixed value and the memory address of the SAD structure", should the 'fixed value' be changed on every reboot/reset of the IPsec code ?

Please expand "SAD" on first use.

## Section 2.1

The first paragraphs indicate that local SPI is for inbound traffic, but the last paragraph appears to be about outbound traffic from sensors. Unsure how to reconciliate the two parts of this section.

Probably just editorial in this informational document, but I wonder how to reconciliate the two proposed alternatives for SPI generation:

- section 2 use a 'low grade' random SPI

- section 2.1 use a combo of SAD + rekey index

## Section 3

When using time for sequence number (I like the idea BTW), what measures should be taken to handle the 32-bit rollover ?

Unsure whether I agree with the text around disabling anti-reply even for a IoT device, especially for actuators. The text has 
  "These resources need also to balance that absence of anti-replay mechanism,
   may lead to unnecessary integrity check operations that might be
   significantly more expensive as well."

which appears too lenient IMHO.

# NITS  

s/32 bits field/32-bit field/

## Section 2

s/ valueand checks/ value and checks/
Alvaro Retana Former IESG member
No Objection
No Objection (for -08) Not sent

                            
Lars Eggert Former IESG member
No Objection
No Objection (2022-04-05 for -08) Sent
Section 2. , paragraph 6, comment:
>    [RFC4303] does not require the SPI to be randomly generated over 32
>    bits.  However, this is the recommended way to generate SPIs as it
>    provides some privacy benefits and avoids, for example, correlation
>    between ESP communications.  To randomly generate a 32 bit SPI, the
>    node generates a random 32 bit valueand checks it does not fall in
>    the 0-255 range.  If the SPI has an acceptable value, it is used to
>    index the inbound session, otherwise the SPI is re-generated until an
>    acceptable value is found.

Wouldn't it be simpler to compute a 24-bit random value and left-shift it by
eight? Or left-shift the 32-bit value; both remove the need to check.

Found terminology that should be reviewed for inclusivity; see
https://www.rfc-editor.org/part2/#inclusive_language for background and more
guidance:

 * Term "dummy"; alternatives might be "placeholder", "sample", "stand-in",
   "substitute".

Thanks to Roni Even for their General Area Review Team (Gen-ART) review
(https://mailarchive.ietf.org/arch/msg/gen-art/W3R6WdPRLgAuvMIJYWnld5uaCmU).

-------------------------------------------------------------------------------
NIT
-------------------------------------------------------------------------------
All comments below are about very minor potential issues that you may choose to
address in some way - or ignore - as you see fit. Some were flagged by
automated tools (via https://github.com/larseggert/ietf-reviewtool), so there
will likely be some false positives. There is no need to let me know what you
did with these suggestions.

Section 2. , paragraph 6, nit:
-    node generates a random 32 bit valueand checks it does not fall in
+    node generates a random 32 bit value and checks it does not fall in
+                                        +

Section 3. , paragraph 6, nit:
-    are no requirements to implement an anti-replay protection mechanism
-    implemented by IPsec.  Similarly to the SN the implementation of anti
-  -----------------------
+    are no requirements to implement an anti-replay protection mechanism.
+                                                                        +

Section 4. , paragraph 4, nit:
-    would typically be the case when the Data Payload is of fix size.
+    would typically be the case when the Data Payload is of fixed size.
+                                                               ++

Document still refers to the "Simplified BSD License", which was corrected in
the TLP on September 21, 2021. It should instead refer to the "Revised BSD
License".

Uncited references: [RFC2119] and [RFC8174].

Section 1. , paragraph 1, nit:
> igure 1 describes an ESP Packet. Currently ESP is implemented in the kernel o
>                                  ^^^^^^^^^
A comma may be missing after the conjunctive/linking adverb "Currently".

Section 1. , paragraph 2, nit:
> y to fit multiple purpose usage of these OS. However, completeness of the IPs
>                                    ^^^^^^^^
The plural demonstrative "these" does not agree with the singular noun "OS".

Section 1. , paragraph 2, nit:
>  as well as multipurpose scope of these OS is often performed at the expense
>                                   ^^^^^^^^
The plural demonstrative "these" does not agree with the singular noun "OS".

Section 1. , paragraph 3, nit:
> or constrained devices remains inter-operable with the standard ESP implemen
>                                ^^^^^^^^^^^^^^
This word is normally spelled as one.

Section 2. , paragraph 2, nit:
> ommunications, this document recommends to index SA with the SPI only. The i
>                              ^^^^^^^^^^^^^^^^^^^
The verb "recommends" is used with the gerund form.

Section 2.1. , paragraph 3, nit:
> g the key is being used. For example, a SPI might be encoded with the Securit
>                                       ^
Use "an" instead of "a" if the following word starts with a vowel sound, e.g.
"an article", "an hour".

Section 2.1. , paragraph 4, nit:
> h privacy and security concerns. Typically some specific values or subset of
>                                  ^^^^^^^^^
A comma may be missing after the conjunctive/linking adverb "Typically".

Section 2.1. , paragraph 5, nit:
> ed information by ESP itself, these information may also be leaked otherwise
>                               ^^^^^^^^^^^^^^^^^
The plural demonstrative "these" does not agree with the singular noun
"information".

Section 2.1. , paragraph 5, nit:
> affic pattern before determining non random SPI can be used. Typically, temp
>                                  ^^^^^^^^^^
This expression is normally spelled as one or with a hyphen.

Section 2.1. , paragraph 5, nit:
> s, used outdoors may not leak privacy sensitive information and most of its t
>                               ^^^^^^^^^^^^^^^^^
This word is normally spelled with a hyphen.

Section 2.1. , paragraph 5, nit:
> opened) may leak truly little privacy sensitive information outside the local
>                               ^^^^^^^^^^^^^^^^^
This word is normally spelled with a hyphen.

Section 2.1. , paragraph 6, nit:
>  packet. The SN is set by the sender so the receiver can implement anti-repl
>                                     ^^^
Use a comma before "so" if it connects two independent clauses (unless they are
closely connected and short).

Section 3. , paragraph 6, nit:
> ability to spoof and replay an acknowledgement is of limited interest and mig
>                                ^^^^^^^^^^^^^^^
Do not mix variants of the same word ("acknowledgement" and "acknowledgment")
within a single text.

Section 3. , paragraph 6, nit:
> y discarding any packets that present a SN whose value is too much in the pa
>                                       ^
Use "an" instead of "a" if the following word starts with a vowel sound, e.g.
"an article", "an hour".

Section 3. , paragraph 6, nit:
> s based on the largest possible value a SN can take over a session. When SN
>                                       ^
Use "an" instead of "a" if the following word starts with a vowel sound, e.g.
"an article", "an hour".

Section 3. , paragraph 7, nit:
> ned devices, this document recommends to implement some rekey mechanisms (see
>                            ^^^^^^^^^^^^^^^^^^^^^^^
The verb "recommends" is used with the gerund form.

Section 4. , paragraph 5, nit:
> ns - may also reveal important privacy oriented information. Some constrained
>                                ^^^^^^^^^^^^^^^^
This word is normally spelled with a hyphen.

Section 4. , paragraph 5, nit:
>  a sufficient tradeoff between the require energy to send additional payload
>                                    ^^^^^^^
The word "require" is not a noun. Did you mean "requirement"?

Section 5. , paragraph 4, nit:
> on the cryptographic suite used. Currently [RFC8221] only recommends cryptog
>                                  ^^^^^^^^^
A comma may be missing after the conjunctive/linking adverb "Currently".

Section 5. , paragraph 4, nit:
> ent with a size different from zero. It length is defined by the security rec
>                                      ^^
It seems that the possessive pronoun "its" fits better in this context. Please
verify.

Section 7. , paragraph 2, nit:
> oss reboots, this document recommends to consider algorithms that are nonce
>                            ^^^^^^^^^^^^^^^^^^^^^^
The verb "recommends" is used with the gerund form.

Section 7. , paragraph 5, nit:
> of the encryption algorithm transform or the energy associated with it are es
>                                      ^^^
Use a comma before "or" if it connects two independent clauses (unless they are
closely connected and short).

Section 7. , paragraph 10, nit:
> eration must follow [RFC4086]. In addition [SP-800-90A-Rev-1] provides approp
>                                   ^^^^^^^^
A comma may be missing after the conjunctive/linking adverb "addition".

Section 9. , paragraph 2, nit:
> In particular Scott Fluhrer suggested to include the rekey index in the SPI.
>                             ^^^^^^^^^^^^^^^^^^^^
The verb "suggested" is used with the gerund form.
Martin Duke Former IESG member
No Objection
No Objection (2022-04-04 for -08) Sent
Thanks to Bob Briscoe for the TSVART review.

Sec 2.1. I find it odd that a node implementing IPSec is overburdened by generating a random number, but this is not my domain.

Sec 3. Bob and the authors had an interesting discussion on time-based SN and replay windows. It seems to me that the best way to do this would be for the receiver to keep a replay window of some number of packets rather than SNs. The receiver would then store the last, say, 10 packet SNs regardless of how many SNs that covered. This would avoid all the issues with the sender skipping many SNs.
Robert Wilton Former IESG member
No Objection
No Objection (for -09) Not sent