Encrypted Key Transport for DTLS and Secure RTP

Summary: Has 2 DISCUSSes. Needs 3 more YES or NO OBJECTION positions to pass.

(Ben Campbell) (was Yes) Discuss

Discuss (2019-02-21 for -09)
I'm adding a process discuss to hold things until we get clarity around the IANA expert reviews. 

I know Benjamin mentioned this in his DISCUSS; I am duplicating it here in case we clear up the rest of Benjamin's discuss points prior to the IANA questions.

Benjamin Kaduk Discuss

Discuss (2019-11-19)
The -10 introduced text implying that the DTLS 1.3 retransmission rules
are normative, that is in conflict with the existing text indicating
that DTLS 1.2 retransmission rules are normative (see COMMENT).

The DTLS 1.3 Ack message is a dedicated content-type, not a

I support Alexey's Discuss about the ABNF breakage.
Note that there is a similar issue in the names of the TLS extensions in
the IANA considerations -- the names now include "\_" instead of just "_".
Comment (2019-11-19)
This document is written under the assumption that the EKT content will
be the only content after the encrypted SRTP payload (and authentication
tag, if present).  That's true at present, of course, but I would still
like to see a little discussion of how it might coexist with other SRTP
extensions that place content as a trailer (both would need to be
parseable from the tail of the content and have a length field; and they
woule either need to share a message-type namespace or have a profile
specification to indicate what order they appear in), though the
discussion that already occurred suffices to make this not a
Discuss-level point.

Section 5 has:

   the DTLS-SRTP peer in the server role to the client.  This allows
   those peers to process EKT keying material in SRTP (or SRTCP) and
   retrieve the embedded SRTP keying material.  This combination of

but in Section 4 we say that "use with SRTCP would be similar, but is
reserved for a future specification".  (There may be one or two other
places that have text placing SRTCP on the same footing as SRTP even
though they are not, at present.)

Also section 5

   In cases where the DTLS termination point is more trusted than the
   media relay, the protection that DTLS affords to EKT key material can
   allow EKT keys to be tunneled through an untrusted relay such as a
   centralized conference bridge.  For more details, see

I did not chase the reference, but it seems like this sentence might
apply equally for "EKT keys to be tunneled" and "SRTP master keys to be
tunneled".  I trust the authors to say what they mean :)

Section 5.2.2

What do I do when I receive an EKTKey containing an ekt_spi value for
which I already have stored parameters?

   When an EKTKey is received and processed successfully, the recipient
   MUST respond with an Ack handshake message as described in Section 7
   of [I-D.ietf-tls-dtls13].  The EKTKey message and Ack MUST be

Ack is a content type, not a handshake type.  (Per DISCUSS point)

   When an EKTKey is received and processed successfully, the recipient
   MUST respond with an Ack handshake message as described in Section 7
   of [I-D.ietf-tls-dtls13].  The EKTKey message and Ack MUST be
   retransmitted following the rules in Section 4.2.4 of [RFC6347].

It's a little weird to cite DTLS 1.3 for the Ack message but then revert
to DTLS 1.2 for the retransmission schedule...

   EKT MAY be used with versions of DTLS prior to 1.3.  In such cases,
   the Ack message is still used to provide reliability.  Thus, DTLS
   implementations supporting EKT with DTLS pre-1.3 will need to have
   explicit affordances for sending the Ack message in response to an
   EKTKey message, and for verifying that an Ack message was received.
   The retransmission rules for both sides are the same as in DTLS 1.3.

...but here we say that the DTLS 1.3 retransmission rules are
authoritative.  (per DISCUSS)

Section 6

   With EKT, each SRTP sender and receiver MUST generate distinct SRTP
   master keys.  This property avoids any security concern over the re-

Er, does an SRTP receiver have a master key ("what does it encrypt if
it's not sending anything")?

   In some systems, when a member of a conference leaves the
   conferences, the conferences is rekeyed so that member no longer has
   the key.  When changing to a new EKTKey, it is possible that the
   attacker could block the EKTKey message getting to a particular
   endpoint and that endpoint would keep sending media encrypted using
   the old key.  To mitigate that risk, the lifetime of the EKTKey MUST
   be limited using the ekt_ttl.

Do we want to give any concrete guidance about ekt_ttl values?

Alexey Melnikov (was Yes, Discuss) Discuss

Discuss (2019-07-15)
Recent ABNF changes (from "*" to "\*") made the ABNF invalid.
Comment (2019-07-15)
I share Benjamin's concern about extensibility.

In 4.4.1:

   The default EKT Cipher is the Advanced Encryption Standard (AES) Key
   Wrap with Padding [RFC5649] algorithm.  It requires a plaintext
   length M that is at least one octet, and it returns a ciphertext with
   a length of N = M + (M mod 8) + 8 octets.

I started looking at RFC 5649. Maybe I was tired and my math was wrong, but I couldn't figure out how you came up with the N value above.
In particular, where is the "+ 8" coming from?

In 6:

   An attacker who tampers with the bits in FullEKTField can prevent the
   intended receiver of that packet from being able to decrypt it.  This
   is a minor denial of service vulnerability.  Similarly the attacker
   could take an old FullEKTField from the same session and attach it to
   the packet.  The FullEKTField would correctly decode and pass
   integrity checks.  However, the key extracted from the FullEKTField ,
   when used to decrypt the SRTP payload, would be wrong and the SRTP
   integrity check would fail.  Note that the FullEKTField only changes
   the decryption key and does not change the encryption key.  None of
   these are considered significant attacks as any attacker that can
   modify the packets in transit and cause the integrity check to fail.

The last sentence seems to be incomplete. Did you mean "can" instead of the last "and"?

Adam Roach Yes

Comment (2019-02-19 for -09)
Thanks to the work that everyone has put in on getting an EKT mechanism
specified and finalized. I have a handful of comments that I would like to see
considered prior to publication of the document.



>  EKT provides a way for an SRTP session participant, to securely
>  transport its SRTP master key and current SRTP rollover counter to
>  the other participants in the session.

Nit: "...participant to securely..."



>   EKTMsgTypeExtension = %x03-FF

Shouldn't this be "%x01 / %x03-ff" ?

>   SRTPMasterKeyLength = BYTE
>   SRTPMasterKey = 1*256BYTE

I think this either needs to be "1*255BYTE", or we need text that explicitly
indicates that an SRTPMasterKeyLength value of 0x00 means "256 bytes." Probably
the former.

I think this is even further constrained by the fact that EKTCiphertext is
limited to 256 bytes, and contains the SRTPMasterKeyLength, SRTPMasterKey,
SSRC, and ROC (and is not compressed) -- which means the SRTPMasterKeyLength
can't be more than (256 - 1 - 4 - 4 =) 247 bytes. So perhaps "1*247BYTE" is
more appropriate?



>  The creation of the EKTField MUST precede the normal SRTP
>  packet processing.

Why? This seems unnecessary and unnecessarily complicated. If the order of
operations has an impact on the bits on the wire (I don't see how it does?),
then please include some explanatory text here that clarifies the reason for
this constraint.



>  When a packet is sent with the ShortEKTField, the ShortEKFField is
>  simply appended to the packet.

Nit: s/ShortEKFField/ShortEKTField/



>  5.  If the SSRC in the EKTPlaintext does not match the SSRC of the
>      SRTP packet received, then all the information from this
>      EKTPlaintext MUST be discarded and the following steps in this
>      list are skipped.

I can see implementors easily interpreting this as requiring them to discard
the RTP payload as well. If that's not the intention (I don't think it is),
consider adding text like "The FullEKTField is removed from the packet then
normal SRTP or SRTCP processing occurs."



>  Section 4.2.1 recommends that SRTP senders continue using an old key
>  for some time after sending a new key in an EKT tag.

This is the first appearance of the phrase "EKT tag," which never seems to be
properly defined. I presume this is meant to be the combination of the EKT
Ciphertext and the SPI?

In any case, please clearly define this term somewhere, preferably before using
it the first time.



>  cannot be used and they also need to create a counter that keeps
>  track of how many times the key has been used to encrypt data to
>  ensure it does not exceed the T value for that cipher (see ).

The parenthetical phrase appears to be missing something here.

>  If
>  either of these limits are exceeded, the key can no longer be used

Nit: "...either... is exceeded..."

>  for encryption.  At this point implementation need to either use the

Nit: "...implementations need..."



>  If a source has its EKTKey changed by the key management, it MUST
>  also change its SRTP master key

I suppose it's not terribly important for interop, but the implication that this
change takes place immediately seems to contradict the 250 ms period specified
in §4.2.1. Perhaps a few words here about how these two normative statements
are intended to interact would save implementors a bit of grief.



>  This document defines the use of EKT with SRTP.  Its use with SRTCP
>  would be similar, but is reserved for a future specification.

After reading this far, I was quite surprised to find this qualification. If
this is the intention for this document, please adjust the rest of the text to
match. Some examples follow.

>  The following shows the syntax of the EKTField expressed in ABNF
>  [RFC5234].  The EKTField is added to the end of an SRTP or SRTCP
>  packet.
>  Rollover Counter (ROC): On the sender side, this is set to the
>  current value of the SRTP rollover counter in the SRTP/SRTCP context
>  associated with the SSRC in the SRTP or SRTCP packet.
>  1.  The final byte is checked to determine which EKT format is in
>      use.  When an SRTP or SRTCP packet contains a ShortEKTField, the
>      ShortEKTField is removed from the packet then normal SRTP or
>      SRTCP processing occurs.
>      The reason for
>      using the last byte of the packet to indicate the type is that
>      the length of the SRTP or SRTCP part is not known until the
>      decryption has occurred.
>  7.  At this point, EKT processing has successfully completed, and the
>      normal SRTP or SRTCP processing takes place.
>  This allows
>  those peers to process EKT keying material in SRTP (or SRTCP) and
>  retrieve the embedded SRTP keying material.



>     To accommodate packet loss, it is
>     RECOMMENDED that three consecutive packets contain the
>     FullEKTField be transmitted.

Nit: "...containing..." (alternately, remove "be transmitted" -- both make a
grammatically correct sentance)

More substantially -- under "New sender:", I'm a little surprised that there
isn't any mention of other senders re-keying in response to a new sender
joining. In the vast majority of conferences, when a sender joins, that same
entity generally will also be a receiver. It seems this should trigger other
senders to include the key in their next packet.



>  Rekey:
>     By sending EKT tag over SRTP, the rekeying event shares fate with
>     the SRTP packets protected with that new SRTP master key.

Is this actually true? Going back to the 250 ms period specified in §4.2.1, it
seems that the master key is sent out in packets pretty far removed from those
it actually protects.

Between this and the inconsistency I mention in §4.5 above, this increasingly
feels like maybe there were two different ways of reasoning about the timing
of sending a master key versus the timing of actually using it. Does the text
in §4.2.1 perhaps represent an outdated notion of how this is intended to



>     If sending audio and video, the RECOMMENDED
>     frequency is the same as the rate of intra coded video frames.  If
>     only sending audio, the RECOMMENDED frequency is every 100ms.

Is this "100ms" correct?  Assuming, say, the use of Opus at voice quality with
20 ms packets, this is taking packets on the order of 40 bytes in length and
tacking on something like 20 to 30 bytes to every fifth packet. That's an
increase in overall stream size on the order of roughly 15% to 20%.

At the same time, when using real-time video, intra frames are going to happen
roughly every 500 ms to 1500 ms. If a cadence on that order is okay for
audiovisual streams, I have to imagine it's okay for audio streams.

So, to clarify: is this "100ms" a typo for "1000 ms"?



>                  +----------+-------+---------------+
>                  | Name     | Value | Specification |
>                  +----------+-------+---------------+
>                  | AESKW128 |     1 | RFCAAAA       |
>                  | AESKW256 |     2 | RFCAAAA       |
>                  | Reserved |   255 | RFCAAAA       |
>                  +----------+-------+---------------+
>                        Table 3: EKT Cipher Types

Section 5.2.1 reserves "0" as well. I suspect we want to replicate that
reservation in this table.

Deborah Brungard No Objection

Alissa Cooper No Objection

Comment (2019-02-20 for -09)
I think I-D.ietf-tls-dtls13 needs to be a normative reference.

(Spencer Dawkins) No Objection

Suresh Krishnan No Objection

Mirja Kühlewind No Objection

Comment (2019-02-19 for -09)
Just a quick clarification question:
Sec 4.2.1: "   Outbound packets SHOULD continue to use the old SRTP Master Key for
   250 ms after sending any new key.  This gives all the receivers in
   the system time to get the new key before they start receiving media
   encrypted with the new key."
I assume that 250ms is selected under the assumption that longer RTTs are a problem for interactive communication anyway? Or where does this value come from?

(Eric Rescorla) No Objection

Comment (2019-02-16 for -09)
Rich version of this review at:

S 4.4.1.
>      FullEKTField is retransmitted 3 times, that only counts as 1
>      encryption.
>      Security requirements for EKT ciphers are discussed in Section 6.
>   4.4.1.  Ciphers

How do I know which cipher is in use? Is it attached to EKTKey?

S 5.2.2.
>      Note: To be clear, EKT can be used with versions of DTLS prior to
>      1.3.  The only difference is that in a pre-1.3 TLS stacks will not
>      have built-in support for generating and processing Ack messages.
>      If an EKTKey message is received that cannot be processed, then the
>      recipient MUST respond with an appropriate DTLS alert.

How important is it that you (a) be able to change EKTKeys and (b) be
able to work with DTLS < 1.3? Because if the answer to these is "no",
then you can just send EKTKeys in EncryptedExtensions.

S 6.
>      With EKT, each SRTP sender and receiver MUST generate distinct SRTP
>      master keys.  This property avoids any security concern over the re-
>      use of keys, by empowering the SRTP layer to create keys on demand.
>      Note that the inputs of EKT are the same as for SRTP with key-
>      sharing: a single key is provided to protect an entire SRTP session.
>      However, EKT remains secure even when SSRC values collide.

How am I supposed to decrypt in case I don't have a FullEKTField? Am I
supposed to use the IP address.

S 6.
>      context, e.g., from a different sender.  When the underlying SRTP
>      transform provides integrity protection, this attack will just result
>      in packet loss.  If it does not, then it will result in random data
>      being fed to RTP payload processing.  An attacker that is in a
>      position to mount these attacks, however, could achieve the same
>      effects more easily without attacking EKT.

Why don't you add an epoch so that you can't roll back?

S 4.1.
>        :                                                               :
>        +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>        |   Security Parameter Index    | Length                        |
>        +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>        |0 0 0 0 0 0 1 0|
>        +-+-+-+-+-+-+-+-+

This encoding seems suboptimal, in that you burn an extra byte for
every FullEKTField. Given that:

1. You are only defining two types
2. It seems unlikely that there will ever be an EKTCiphertext longer
than 128 bits.

I would suggest the following encoding:

- The first bit of the last byte indicates whether this is
FullEKTField or <Something else.>. If it's FullEKTField, the rest is
used for length. Otherwise, the rest is used for type.

Alvaro Retana No Objection

Martin Vigoureux No Objection

Ignas Bagdonas No Record

Roman Danyliw No Record

Warren Kumari No Record

Barry Leiba No Record

Éric Vyncke No Record

Magnus Westerlund No Record