Internet Engineering Task Force
AVT Working Group Baugher, McGrew,
INTERNET-DRAFT Oran (Cisco)
Expires: April 2002 Blom, Carrara,Naslund,
Norrman (Ericsson)
November 2001
The Secure Real Time Transport Protocol
<draft-ietf-avt-srtp-02.txt>
Status of this memo
This document is an Internet-Draft and is in full conformance with
all provisions of Section 10 of RFC2026.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that other
groups may also distribute working documents as Internet-Drafts.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or cite them other than as "work in progress".
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/lid-abstracts.txt
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html
Abstract
This document describes the Secure Real Time Transport Protocol
(SRTP), a profile of the Real Time Transport Protocol (RTP), which
can provide confidentiality, message authentication, and replay
protection.
SRTP can achieve high throughput and low packet expansion. SRTP
proves to be a suitable protection for heterogeneous environments,
i.e. environments including both wired and wireless links. To get
such features, default transforms are described, based on an additive
stream cipher for encryption, a keyed-hash based function for message
Baugher, et al. [Page 1]
INTERNET-DRAFT SRTP November, 2001
authentication, and an 'implicit' index for sequencing based on the
RTP sequence number.
TABLE OF CONTENTS
1. Notational Conventions.........................................3
2. Goals..........................................................3
3. SRTP Framework.................................................4
3.1 SRTP Cryptographic Contexts...................................6
3.1.1 Transform-independent parameters............................6
3.1.2 Transform-dependent parameters..............................7
3.1.3 Mapping SRTP Packets to Cryptographic Contexts..............7
3.2 SRTP Packet Processing........................................7
3.2.1 Packet Index Determination..................................8
3.2.2 Cryptographic Transforms....................................9
3.2.3 Replay Protection...........................................10
3.3 Secure RTCP...................................................10
4. Pre-Defined Transforms.........................................13
4.1 Encryption....................................................13
4.1.1 AES in Counter Mode.........................................15
4.1.1.1 Keystream generation......................................15
4.1.2 AES in f8-Mode..............................................15
4.1.2.1 Keystream Generation......................................16
4.1.2.2 SRTP IV Formation.........................................17
4.1.2.3 SRTCP IV Formation........................................17
4.1.3 NULL Cipher.................................................18
4.2 Message Authentication and Integrity..........................18
4.2.1. HMAC/SHA1..................................................18
4.2.2 TMMH/16.....................................................18
4.3 Key Derivation................................................20
4.3.1 Key Derivation Algorithm....................................20
4.3.2 AES-CM PRF..................................................21
4.3.3 SRTCP Key Derivation........................................21
5. Default and Mandatory Transforms...............................22
5.1 Encryption: AES-CM............................................22
5.2 Authentication/Integrity: HMAC/SHA1...........................22
5.3 Key Derivation: AES-CM PRF....................................22
6. SRTP Parameters................................................22
7. Adding SRTP Transforms.........................................23
8. Rationale......................................................23
8.1 Key derivation................................................23
8.2 Salting key...................................................24
8.3 TMMH _ Message Integrity from Universal Hashing...............24
8.4 Data Origin Authentication considerations.....................24
9. Key Management Considerations..................................25
10. Security Considerations.......................................25
10.1 Key Usage....................................................25
10.2 SSRC collision and two-time pad..............................26
10.3 Confidentiality of the RTP Payload...........................26
10.4 Confidentiality of the RTP Header............................27
Baugher, et al. [Page 2]
INTERNET-DRAFT SRTP November, 2001
10.5 Integrity of the RTP packet..................................27
10.5.1 Integrity of the RTP header: IHA...........................28
11. Interaction with Forward Error Correction mechanisms..........28
12. IANA Considerations...........................................29
13. Open issue....................................................29
14. Acknowledgements..............................................29
15. Author's Addresses............................................29
16. References....................................................30
Appendix A: Pseudocode for Index Determination,
and ROC and s_l Update............................................32
Appendix B: Test Vectors..........................................32
B.1 AES-f8 Test Vectors...........................................32
B.2 AES-CM Test Vectors...........................................33
B.3 TMMH/16 Test Vectors..........................................34
1. Notational Conventions
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in [RFC2119].
Terminology is conform to [RFC2828].
By convention, the most left bit (byte) is the most significant one.
By XOR we mean bitwise addition modulo 2 of binary strings, and ||
denotes concatenation. E.g. if C = A || B, then the most significant
bits of C are the bits of A, and the least significant bits of C
equals the bits of B. Hexadecimal numbers are prefixed by 0x.
At the time of writing, NIST has not published the Advanced
Encryption Standard, AES [AES]. However, as it is clear that AES will
be the Rijndael algorithm as specified in [AES], we shall throughout
this document let AES denote the block cipher Rijndael.
2. Goals
The security goals for SRTP are to ensure:
* the confidentiality of the RTP payload, and
* the integrity protection of the entire RTP packet, together with
protection against replayed RTP packets.
Each of these security services is optional and independent.
Other, functional, goals for the protocol are:
* a framework that permits upgrade to new cryptographic transforms,
Baugher, et al. [Page 3]
INTERNET-DRAFT SRTP November, 2001
* low bandwidth cost, i.e. a framework preserving RTP header
compression efficiency,
and, asserted by the pre-defined transforms:
* a low computational cost,
* a small footprint (i.e. small code size and data memory for keying
information and replay lists),
* limited packet expansion to support the bandwidth economy goal,
* independence from the underlying transport, network, and physical
layer used by RTP, in particular high tolerance to packet loss and
re-ordering, and robustness to transmission bit-errors.
The described security services are also provided for RTCP, the
control protocol defined for RTP [RFC1889], with the exception that
integrity and replay protection for the RTCP packets are mandatory
when SRTP services are applied to the RTP packets of the
corresponding session.
These properties ensure that SRTP is a suitable protection scheme for
RTP in both wired and wireless scenarios.
3. SRTP Framework
RTP is the Real Time Transport Protocol [RFC1889]. We define SRTP as
a profile of RTP, in a way analogous to RFC1890 which defines the
audio/video profile for RTP. Conceptually, we consider a 'bump in the
stack' implementation which resides between the RTP application and
the transport layer, which intercepts RTP packets and then forwards
an equivalent SRTP packet on the sending side, and which intercepts
SRTP packets and passes an equivalent RTP packet up the stack on the
receiving side.
Baugher, et al. [Page 4]
INTERNET-DRAFT SRTP November, 2001
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-->+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |V=2|P|X| CC |M| PT | sequence number |
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | timestamp |
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | synchronization source (SSRC) identifier |
| +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
| | contributing source (CSRC) identifiers |
| | .... |
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | RTP extension (optional) |
| +>+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | | |
| | | payload |
| | | .... |
+>+>+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | | authentication tag (optional) |
| | | |
| | | .... |
| | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
| +- Encrypted Portion
+---- Authenticated Portion
Figure 1. The format of an SRTP packet.
The format of an SRTP packet is illustrated in Figure 1. The optional
authentication tag is the only field defined by SRTP that is not in
RTP.
The added field is:
Authentication tag: variable length, optional
The authentication tag shall be used to carry authentication
data. The Authenticated Portion of an SRTP packet consists of
the entire equivalent RTP packet. Note that, if encryption and
authentication are applied, then 'payload' in the
Authenticated Portion refers to the correspondent encrypted
payload. The authentication tag provides authentication of the
RTP header and payload, and it indirectly provides replay
protection by authenticating the sequence number.
Baugher, et al. [Page 5]
INTERNET-DRAFT SRTP November, 2001
The Encrypted Portion of an SRTP packet consists of the RTP payload
of the equivalent RTP packet.
3.1 SRTP Cryptographic Contexts
Each SRTP session requires the sender and receiver to maintain
cryptographic state information. This information is called the
cryptographic context.
By a session key, we mean a key that is to enter a cryptographic
transform (e.g. encryption or authentication), and a master key is a
random bit string (given by the key management protocol) from which
session keys are derived in a cryptographically secure way.
3.1.1 Transform-independent parameters
The transport-independent parameters of the cryptographic context
consists of:
* a 32-bit rollover counter, ROC, which records how many times the
16-bit RTP sequence number has been reset to zero after passing
through 65,535. Unlike the sequence number, SEQ, which SRTP extracts
from the RTP packet header, the ROC is maintained by SRTP. This ROC
is thus a parameter internal to SRTP.
* for the receiver only, a sequence number s_l, which is the last
received sequence number (possibly authenticated, if authentication
is provided). Here, 'sequence number' refers to the 16-bit SEQ
carried in the RTP packet header.
* identifier for the encryption algorithm, i.e. the cipher and its
mode of operation, and related parameters,
* identifier for the authentication protection algorithm, and related
parameters (when authentication is provided),
* a replay list L, maintained by the receiver only (when
authentication is provided),
* integers n_e and n_a, determining the length of the session keys
for encryption and authentication,
* the master key(s),
* a 16-bit integer, the session key derivation-rate,
* FirstSEQ+ROC and LastSEQ+ROC as key lifetime for each of the master
keys (FirstSEQ and LastSEQ are the RTP sequence numbers inside whose
Baugher, et al. [Page 6]
INTERNET-DRAFT SRTP November, 2001
range the master key is valid, and ROC is the rollover counter).
These values are absolute quantities, not relative.
3.1.2 Transform-dependent parameters
Any encryption, authentication/integrity, and key derivation
parameters that depend on the transform definitions are defined in
the Transforms section. Future SRTP transform specifications MUST
include a section to list the cryptographic context's parameters for
that transform.
3.1.3 Mapping SRTP Packets to Cryptographic Contexts
Recall that an RTP session for each participant is defined [RFC1889]
by a pair of destination transport addresses (one network address
plus a port pair for RTP and RTCP), and that a multimedia session is
defined as a collection of RTP sessions. For example, a particular
multimedia session could include an audio RTP session, a video RTP
session, and a text RTP session.
A cryptographic context shall be uniquely identified by the triplet
context identifier:
<SSRC, destination network address, destination transport port
number>
where the destination network address and the destination transport
port are the ones in the current packet. It is assumed that, when
presented with this information, the key management returns a context
with the information as described in Section 3.1.
3.2 SRTP Packet Processing
To construct a proper SRTP packet, given an RTP packet, the sender
does the following:
1. Determine which cryptographic context to use as described in
Section 3.1.3.
2. Determine the index of the SRTP packet as described in Section
3.2.1, using the rollover counter in the cryptographic context and
the sequence number in the RTP packet.
3. Determine the session keys, as described in Section 4.3.
4. Encrypt the Encrypted Portion of the packet (see Section 4, for
the defined ciphers), using the encryption keys found in Step 3.
Baugher, et al. [Page 7]
INTERNET-DRAFT SRTP November, 2001
5. If authentication is provided, compute the authentication tag for
the Authenticated Portion of the packet, as described in Section 4,
using the index determined in Step 2 and the authentication key found
in Step 3. Note that the Encrypted Portion is encrypted before the
authentication tag is computed.
To authenticate and decrypt a SRTP packet, the receiver does the
following:
1. Determine which cryptographic context to use as described in
Section 3.1.3.
2. Estimate the index of the SRTP packet from the rollover counter in
the cryptographic context and the sequence number in the RTP packet,
as described in Section 3.2.1.
3. Determine the session keys, as described in Section 4.3.
4. If authentication is provided, check if the packet has been
replayed, by checking the Replay List to ensure that no packet with
that index has been received and authenticated before. If that index
is in the list, then the packet has been replayed and is invalid. It
MUST be discarded, and the event SHOULD be logged.
Next, perform verification of the authentication tag, using the
authentication key and packet index from Step 2. If the result is
'AUTHENTICATION FAILURE' (see Section 4), the packet MUST be
discarded from further processing and the event SHOULD be logged.
5. Decrypt the Encrypted Portion of the packet (see Section 4, for
the defined ciphers), using the decryption keys found in Step 3.
6. Update the rollover counter and last sequence number, s_l, in the
local context to the values used in the packet index estimated in
Step 2.
3.2.1 Packet Index Determination
SRTP implementations use an 'implicit' packet index for sequencing.
When the session starts, the sender side shall set the rollover
counter, ROC, to zero. Each time the RTP sequence number, SEQ, wraps
modulo 2^16, the sender side shall increment ROC by one. The sender's
packet index is then defined as i = 65,536 * ROC + SEQ.
Receiver-side implementations use the RTP sequence number to
reconstruct the correct index (that is, location in the sequence of
all RTP packets). Also here, the index is defined as SEQ + ROC *
65,536, where the RTP sequence number is SEQ and the rollover
Baugher, et al. [Page 8]
INTERNET-DRAFT SRTP November, 2001
counter is ROC, maintained locally by the receiver as described
below.
A robust approach for the proper use of a rollover counter requires
its handling and use to be well defined. In particular, out-of-order
RTP packets with sequence numbers close to 65,536 or zero must be
properly dealt with.
A receiver reconstructs the index i of a packet with sequence number
SEQ using the estimate
i = 65,536 * v + SEQ,
where v is chosen from the set { ROC-1, ROC, ROC+1 } such that i is
closest to the value 65,536 * ROC + s_l. If the value ROC+1 is used,
then the rollover counter ROC in the cryptographic context is
incremented by one (see Appendix A).
The index i is used in replay protection (Section 3.2.3), encryption
and authentication (Section 4), and for the key derivation (Section
4.3).
As the rollover counter is 32 bits long, the maximum number of
packets in any given SRTP session is 2^48 = 281,474,976,710,656.
After that number of SRTP packets have been sent with a given key,
the sender MUST not send any more packets with that key. This
limitation enforces a security benefit by providing an upper bound on
the amount of traffic that can pass before cryptographic keys are
changed. Re-keying (see Section 9) MUST be triggered, no later than
after this amount of traffic, and MAY be triggered earlier, e.g. for
increased security and access control to media. Re-occurring key
derivation, as determined by a non-zero derivation rate (see Section
4.3), gives even stronger security benefits, but does NOT change the
above absolute maximum value.
For the receiver, the 'implicit index' approach works as long as the
reorder and loss of the packets is not too great. In particular,
32,768 packets would need to be lost, or a packet would need to be
32,768 packets out of sequence in order for synchronization to be
lost. Such drastic loss or reorder is likely to disrupt the RTP
application itself.
3.2.2 Cryptographic Transforms
While there are numerous encryption and message authentication
algorithms that can be used in SRTP, we define (Section 4) default
algorithms in order to avoid the complexity of specifying the
encodings for the signaling of algorithm and parameter identifiers.
The defined algorithms have been chosen as they fulfil the goals
Baugher, et al. [Page 9]
INTERNET-DRAFT SRTP November, 2001
listed in Section 2. Recommendation on how to extend SRTP with new
transforms are given in Section 7.
3.2.3 Replay Protection
Robust replay protection is possible when authentication of RTP
packets is present.
A packet is 'replayed' when it is stored by an adversary, and then
re-injected into the network. SRTP provides protection against such
attacks whenever authentication is provided, through the storage of
the indices of the most recently received and authenticated packets.
Each SRTP receiver maintains a Replay List, which conceptually
contains the indices of all of the packets which have been received
and authenticated. In practice, the list can use a 'sliding window'
approach, so that a fixed amount of storage suffices for replay
protection. Packet indices which lag behind the packet index in the
context by more than SRTP-WINDOW-SIZE can be assumed to have been
received, where SRTP-WINDOW-SIZE is a parameter that MUST be at least
64, and which MAY be set to a higher value.
The Replay List can be efficiently implemented by using a bitmap to
represent which packets have been received, as described in the
Security Architecture for IP [RFC2401].
Note that there are no provisions for managing transmitted Sequence
Number values among multiple senders using the same crypto contexts,
thus the anti-replay service SHOULD NOT be used in a multi-sender
environment that employs a single crypto context.
3.3 Secure RTCP
Secure RTCP follows the definition of Secure RTP. SRTCP is defined as
a profile of RTCP, and it adds two mandatory new fields to the RTCP
packet definition, the SRTCP index and the authentication tag. Those
fields are appended to an RTCP packet in order to form an equivalent
SRTCP packet, so that they follow any other profile specific
extensions. An SRTCP packet is illustrated in Figure 2.
Baugher, et al. [Page 10]
INTERNET-DRAFT SRTP November, 2001
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-->+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |V=2|P| RC | PT=SR=200 | length |
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | SSRC of sender |
| +>+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
| | | ... |
| | | sender info |
| | | ... |
| | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | | ... |
| | | report block 1 |
| | | ... |
| | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | | ... |
| | | report block 2 |
| | | ... |
| | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | | |
| | | ... |
| | | |
| | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | |V=2|P| SC | PT=SDES=202 | length |
| | +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
| | | SSRC/CSRC_1 |
| | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | | SDES items |
| | | ... |
| | +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
| | | |
| | | ... |
| | | |
| +>+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
| | |E| SRTCP index |
+-|>+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | | ... |
| | | authentication field |
| | | |
| | | ... |
| | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
| +-- Encrypted Portion (optional)
+---- Authenticated Portion (mandatory when SRTP is used for RTP
session)
Figure 2. The format of a Secure RTCP packet, consisting of
underlying RTCP compound packet with Sender Report and SDES packet.
Baugher, et al. [Page 11]
INTERNET-DRAFT SRTP November, 2001
The added fields are:
E bit and SRTCP index: 32 bits, mandatory
The SRTCP index is a 31-bit counter for the SRTCP
packets. The index is explicitly included in each packet, in
contrast to the 'implicit' index approach used for SRTP.
As Section 9.1 of [RFC1889] allows the split of a compound
RTCP packet into two lower-layer packets, one to be encrypted
and one to be sent in the clear, indices with their most
significant bit (E bit) set to '1' are reserved for encrypted
packets, and indices with most significant bit set to '0' are
used for non-encrypted packets. With this restriction, the
rest of the bits are set to zero before the first SRTCP packet
is sent, and is incremented by one after each SRTCP is sent.
Except for differences in the most significant (E) bit,
indices form a strictly increasing sequence.
Authentication Tag: variable length, mandatory
The authentication tag shall be used to carry message
authentication data. The Authenticated Portion of an SRTCP
packet consists of the entire equivalent (eventually compound)
RTP packet and SRTCP index.
The Encrypted Portion of an SRTCP packet consists of the RTCP payload
of the equivalent compound RTCP packet, from the first RTCP packet,
i.e. from the ninth (9) byte to the end of the compound packet.
SRTCP packet processing is identical to that of SRTP packet
processing, with the following changes:
* SRTCP replay protection is as defined in Section 3.2.3, but using
the SRTCP index as the index i and maintains separate values for s_l
and the replay list specific to SRTCP. SRTCP replay protection is
mandatory.
* SRTCP encryption is as defined in Section 4, but using the
definition of the SRTCP Encrypted Portion as defined in this section,
using the SRTCP index as the index i. The encryption transforms shall
be the same selected for the protection of the associated SRTP
stream(s) (when RTP is encrypted too), while the NULL algorithm shall
be applied to the RTCP packets to be authenticated but not encrypted.
* The SRTCP authentication tag is defined as in Section 4, but with
the Authenticated Portion of the SRTCP packet defined in this
section, and using the SRTCP index as the index i. SRTCP
authentication is mandatory. The authentication transforms and
related parameters (e.g., key size) shall be the same selected for
the protection of the associated SRTP stream(s) (when SRTP is
authenticated too).
Baugher, et al. [Page 12]
INTERNET-DRAFT SRTP November, 2001
* SRTCP decryption is performed as in Section 4, but only if the
SRTCP index has its most significant bit (E bit) equal to 1. If so,
the encrypted portion is decrypted, using the SRTCP index as the
index i. In case the most significant bit of the index is 0, the
payload is simply copied.
There MAY also exist some minor transform specific changes, see
Section 4 for the defined transforms.
The encryption prefix (Section 6.1 of [RFC1889]), a random 32-bit
quantity intended to improve privacy, MUST NOT be used. This is
because we strongly recommend ciphers secure against known plaintext
attacks. The pre-defined SRTP encryption uses a secure, additive
stream cipher, and thus the prefix offers no benefit at all.
The maximum number of SRTCP packets with a fixed key is limited to
2^31 = 2,147,483,648.
Authentication MUST be applied to RTCP, as it is the control
protocol (e.g. it has a BYE packet). Note however, the cost for RTCP
authentication is not of the same order of RTP authentication, as the
session bandwidth allocated to RTCP recommended is at 5% and the RTCP
packets have less frequency. However, when adding authentication to
RTCP, the overhead in bandwidth SHOULD be considered (it will be more
than 5%).
4. Pre-Defined Transforms
4.1 Encryption
Generic parameters, common to all pre-defined, non-NULL, encryption
transforms:
* BLOCK CIPHER is the block cipher used
* n_b is the bit-size of the block for the block cipher
* k_e is the session encrypting key
* n_e is the length of k_e (the default is 128 bits)
* k_s is the so called salting key
* n_s is the length of the salting key. The default value is equal to
n_b. Another (shorter) value MUST be explicitly signaled.
* SRTP_PREFIX_LENGTH is the octet length of the keystream prefix, an
(at least) 8-bit integer, inferred from the message authentication
code in use.
The session key is by default derived as specified in Section 4.3.
The salting key is obtained directly from the cryptographic context.
The encryption transforms defined in SRTP use a "seekable" segmented
keystream generator, which for each secret key maps the RTP packet
Baugher, et al. [Page 13]
INTERNET-DRAFT SRTP November, 2001
index into a pseudorandom keystream segment, used to encrypt a single
RTP packet (with that packet index). The process of encrypting a
packet consists of generating the keystream segment corresponding to
the packet, and then bitwise exclusive-oring that keystream segment
onto the Encrypted Portion of the RTP packet. Decryption is done the
same way, but swapping the roles of the plaintext and ciphertext.
The definition of how the keystream is generated, given the index,
depends on the cipher and its mode of operation. Below, two such key
stream generators are defined. The NULL cipher is also defined, to be
used when encryption of RTP is not required.
The initial octets of each keystream segment MAY be reserved for
use in a message authentication code, in which case the keystream
used for encryption starts immediately after the last reserved
octet. The initial reserved octets are called the keystream prefix,
and the remaining octets are called the keystream suffix. This
process is illustrated in Figure 3.
+----+ +------------------+---------------------------------+
| KG |-->| Keystream Prefix | Keystream Suffix |---+
+----+ +------------------+---------------------------------+ |
|
+---------------------------------+ v
| Encrypted Portion of RTP Packet |->(*)
+---------------------------------+ |
|
+---------------------------------+ |
| Encrypted Portion of SRTP Packet|<--+
+---------------------------------+
Figure 3: SRTP Encryption. Here KG denotes the keystream
generator, and (*) denotes bitwise exclusive-or.
The number of octets in the keystream prefix is denoted as
SRTP_PREFIX_LENGTH. The key stream prefix is reserved for use with
certain message authentication transforms, indicated by positive,
non-zero value of this latter parameter. This means that even if
confidentiality is not to be provided, the keystream generator output
MAY still need to be computed, in which case the default keystream
generator SHALL be used.
The default cipher is the Advanced Encryption Standard (AES), and we
define two modes of running AES, Segmented Integer Counter Mode AES
and AES in f8-mode. In the sequel, let E(k,x) be AES applied to key k
and input block x.
Baugher, et al. [Page 14]
INTERNET-DRAFT SRTP November, 2001
4.1.1 AES in Counter Mode
The default keystream generator cipher SHALL be AES [AES] used in the
Segmented Integer Counter Mode, with a n_e = 128-bit key size and a
n_b = 128-bit block size.
Conceptually, counter mode consists of encrypting successive
integers. The actual definition is somewhat more complicated, in
order to randomize the starting point of the integer sequence. Each
packet is encrypted with a distinct keystream segment, which is
computed as follows.
4.1.1.1 Keystream generation
A keystream segment is the concatenation of the 128-bit output blocks
of the AES cipher in the encrypt direction, using key k = k_e, in
which the block indices are in increasing order. Symbolically, each
keystream segment looks like
E(k,A) || E(k,A + 1 mod 2^128) || E(k,A + 2 mod 2^128) ...
The 128-bit integer value A is defined as 2^16 times the packet
index, i, plus k_s (the salting key), modulo 2^128:
A = (k_s + (i * 2^16)) modulo 2^128.
Note that the initial value A is fixed for each packet. The number of
blocks of keystream generated for any fixed value of A MUST NOT
exceed 2^16.
The AES has a block size of 128 bits, so 2^16 output blocks are
sufficient to generate the 2^23 bits of keystream needed to encrypt
the largest possible RTP packet (actually, except for IPv6
'jumbograms' [RFC2675], which are not likely to be used for RTP-based
multimedia traffic).
This restriction on the maximum number of RTP packets ensures
the security of the encryption method by limiting the effectiveness
of probabilistic attacks [BR98].
4.1.2 AES in f8-mode
To encrypt UMTS (Universal Mobile Telecommunications System, as 3G
networks) data, a solution (see [ES3D]) known as the f8-algorithm has
been developed. On a high level, the proposed scheme is a variant of
Output Feedback Mode (OFB) [HAC], with a more elaborate
initialization and feedback function. As in normal OFB, the core
Baugher, et al. [Page 15]
INTERNET-DRAFT SRTP November, 2001
consists of a block cipher. We also here define the use of AES as
default block cipher to be used in f8-mode for RTP encryption, with
128-bit key and block size.
Figure 2 shows the structure of block cipher, E, running in what we
shall call "f8-mode of operation".
IV
|
|
v
+------+
| |
+--->| E |
| | |
| +------+
| |
m --> * +-----------+-------------+-- ... ------+
| IV' | | | |
| | j=1 --> * j=2 --> * ... j=L-1 --> *
| | | | |
| | +--> * +--> * ... +--> *
| | | | | | | |
| v | v | v | v
| +------+ | +------+ | +------+ | +------+
| | | | | | | | | | | |
k_e ---+--->| E | | | E | | | E | | | E |
| | | | | | | | | | |
+------+ | +------+ | +------+ | +------+
| | | | | | |
+------+ +--------+ +-- ... ----+ |
| | | |
v v v v
S(0) S(1) S(2) . . . S(L-1)
Figure 2. f8-mode of operation (asterisk, *, denotes bitwise XOR).
4.1.2.1 Keystream Generation
As above, let E(k_e,x) be the 128-bit output of AES in the encrypt
direction when applied to the n_e = 128-bit key k_e and n_b = 128-bit
plaintext block x. The Initialization Vector (IV) is determined as
described in Section 4.1.2.2.
Let IV', S(j), and m denote n_b-bit blocks, determined below. The
keystream, S(0) || ... || S(L-1), for an N-bit message is defined by
setting IV' = E(k_e XOR m, IV), and S(-1) = 00..0. For j = 0,1,..,
L-1 where L = N/n_b (rounded up to nearest integer) compute
Baugher, et al. [Page 16]
INTERNET-DRAFT SRTP November, 2001
S(j) = E(k_e, IV' XOR j XOR S(j-1))
Notice that the IV is not used directly. Instead it is fed through E
under another key to produce an internal, "masked" value (denoted
IV') to prevent an attacker from gaining known input/output pairs.
The role of the internal counter is to prevent short keystream
cycles. The value of the key mask m is defined to be
m = k_s || 0x555..5,
i.e. the salting key, appended by the binary pattern 0101.. to fill
the entire desired key size, n_e.
The maximum allowable packet size can be determined as follows. The
AES has a block size of 128 bits, and assuming that AES behaves like
a random function, it is (heuristically) secure to generate about
2^64 output blocks, which is sufficient to generate 2^71 bits of
keystream. For practical sizes of the RTP packets, much fewer blocks
are required though, and the counter j above will often be
sufficient if implemented as a 16- or 32-bit counter.
4.1.2.2 SRTP IV Formation
The purpose of the following IV formation is to provide a feature
which we call implict header authentication (IHA), see Section
10.5.1.
The IV for 128-bit block AES-f8 is formed in the following way:
IV = 0x00 || M || PT || SEQ || TS || SSRC || ROC
M, PT, SEQ, TS, SSRC are taken from the RTP header; ROC is from the
crypto context.
The presence of the SSRC as part of the IV allows AES_f8 to be used
when a master key is shared between multiple streams, see Section
10.2.
4.1.2.3 SRTCP IV Formation
The IV for 128-bit block AES-f8 is formed in the following way:
IV = 0x00000000 || E || SRTCP index || V || P || RC || PT || length
|| SSRC
V, P, RC, PT, length, SSRC are taken from the first header in the
RTCP compound packet. E || SRTCP index is the added 32-bit index to
the packet.
Baugher, et al. [Page 17]
INTERNET-DRAFT SRTP November, 2001
4.1.3 NULL Cipher
The NULL cipher is used when no confidentiality for RTP is requested.
The keystream can be thought of as "000..0", e.g. the encryption
simply copies the plaintext input into the ciphertext output.
4.2 Message Authentication and Integrity
Common parameters
* k_a is the session authentication key.
* n_a is the bit-length of the authentication key. The default is 128
bits.
* n_tag is the bit-length of the output authentication tag. The
default is 32 bits.
* SRTP_PREFIX_LENGTH is the octet length of the keystream prefix as
defined above.
* M is the Authenticated Portion as specified in Section 3 for RTP
and 3.3 for RTCP.
The session key is by default derived as specified in Section 4.3.
The values of n_a, n_tag, and SRTP_PREFIX_LENGTH MUST be fixed for
any particular fixed value of the key.
Below we describe the process of computing authentication tags. The
SRTP receiver verifies a message/authentication tag pair as follows.
A new authentication tag is computed using one of the algorithms
below, and it is compared to the tag associated with the message. If
the two tags are equal, then the message/tag pair is valid;
otherwise, it is not and the error audit message "AUTHENTICATION
FAILURE" MUST be returned.
4.2.1. HMAC/SHA1
The default authentication code is HMAC with SHA1 [HMAC]. When
HMAC/SHA1 is used, the SRTP_PREFIX_LENGTH is 0. For RTP, the HMAC is
applied to the concatenation of the Authenticated Portion of the
packet (M) and the rollover counter in the cryptographic context,
i.e. HMAC(k_a, M || ROC). For RTCP, we apply HMAC to the
corresponding M, only. By default, the output shall be truncated to
the n_tag left-most bits.
4.2.2 TMMH/16
TMMH is a simple function that maps a key and a message to a hash
value. This hash value is encrypted by combining it with the
keystream prefix to make the authentication tag, as described below.
Baugher, et al. [Page 18]
INTERNET-DRAFT SRTP November, 2001
TMMH/16 uses sixteen bit unsigned words as a basic data unit, and
besides the above common parameters we define the following
parameters for convenience:
- MESSAGE_LENGTH is the octet length of M.
- K is the key, i.e. k_a.
- KEY_LENGTH is the octet length of K, i.e. n_a divided by 8.
- TAG is the authentication tag, which is the output of TMMH/16
- TAG_LENGTH is the octet length of the authentication tag, i.e.
n_tag divided by 8. This value defines SRTP_PREFIX_LENGTH to be equal
to TAG_LENGTH.
- PREFIX is the key stream prefix for the current packet as defined
in Section 4.1.
The values of KEY_LENGTH and TAG_LENGTH MUST obey the alignment
restrictions described below.
For TMMH/16, a word is 16-bits long; with the word being 2-bytes
long, the TAG_LENGTH and KEY_LENGTH MUST be even; if MESSAGE_LENGTH
is odd, the MESSAGE MUST be padded with a zero octet, but this does
not change the value of MESSAGE_LENGTH.
The words of the key are denoted as K[0], K[1], ..., K[KEY_WORDS],
and the words of the message (after zero padding, if needed) are
denoted as M[1], M[2], ..., M[MSG_WORDS], where MSG_WORDS is the
smallest number such that 2 * MSG_WORDS is at least MESSAGE_LENGTH,
and KEY_WORDS is KEY_LENGTH / 2.
If MESSAGE_LENGTH is greater than KEY_LENGTH - TAG_LENGTH, then the
value of TMMH/16 is undefined. Implementations MUST indicate an
error if asked to hash a message with such a length. Otherwise,
the hash value is defined to be the length TAG_WORDS sequence of
words in which the j-th word in the sequence is defined as
T[j] = [[ K[j] * MESSAGE_LENGTH +32 K[j+1] * M[1] +32 K[j+2] * M[2]
+32 ... K[j+MSG_WORDS] * M[MSG_WORDS] ] modulo p ] modulo 2^16
where j ranges from zero to TAG_WORDS-1.
Here, TAG_WORDS is equal to TAG_LENGTH/2, and p is equal to
2^16 + 1. The symbol * denotes multiplication and the symbol +32
denotes addition modulo 2^32.
To compute the authentication tag of an SRTP packet, the TMMH hash
value of that message is computed, then that value is combined with
Baugher, et al. [Page 19]
INTERNET-DRAFT SRTP November, 2001
the keystream prefix as defined in Section 4.1. The combining
operation is word-wise addition modulo 2^16 (for TMMH/16).
TAG[j] = T[j] +16 PREFIX[j], where j ranges from zero to TAG_WORDS-1.
Note that for RTP, where HMAC is applied to M || ROC, TMMH is applied
to M only. This is so, because the dependence on ROC is for TMMH
inherent to the PREFIX quantity.
4.3 Key Derivation
4.3.1 Key Derivation Algorithm
Regardless of the encryption or authentication transform that is
employed (it may be a defined transform or newly introduced according
to Section 7), SRTP key derivation is the process of generating
session keys, without extra communication between the parties and in
a sender-receiver synchronized way.
packet index ---+
|
|
v
+-----------+ +--------+ session encr_key
| ext | master | |---------->
| key mgmt | key | key |
| (optional |-------->| deriv |---------->
| rekey) | | | session auth_key
+-----------+ +--------+
Figure 4: SRTP key derivation.
At least one initial key derivation is always performed by SRTP.
Further applications of the key derivation MAY be performed,
according to the 'key derivation rate' value in the crypto context.
Let m >= 64, and n be positive integers. A pseudo random function
family is a set of keyed functions {PRF_m^n(k,x)} such that for
(secret) random key k, given m-bit x, PRF_m^n(k,x) is an n-bit
string, computationally indistinguishable from random n-bit strings.
Let a DIV t denote integer division of a by t, rounded down, and with
the convention that a DIV 0 = 0 for all a. We also make the
convention of treating a DIV t as a bit string of the same length as
a, and thus "a DIV t" will in general have leading zeros. Key
Baugher, et al. [Page 20]
INTERNET-DRAFT SRTP November, 2001
derivation is defined as follows. To generate session key(s) for the
current packet, let the n-bit SRTP key for this packet be
PRF_m^n(k_master, <label> || (index DIV key_derivation_rate) ||
0x555...)
where <label> is a 4-bit constant (see below), key_derivation_rate is
as determined in the crypto context, and index is the packet index
(i.e. the 48-bit ROC || SEQ for SRTP). We then pad by 1010... to fill
the m-bit input size.
The session keys are now derived using:
- k_e (SRTP encryption): <label> = 0x0, n = n_e.
- k_a (SRTP authentication): <label> = 0x1, n = n_a.
where n_e and n_a are as determined in the cryptographic context.
Note that for the defined counter mode and f8 transforms, the salting
key k_s is used directly as determined in the cryptographic context
(not going through the derivation).
Note that for a key_derivation_rate of 0, anyway the initial key
derivation application will take place once. The derivation operation
is facilitated if the non-zero rates are chosen to be powers of 2, or
preferably, powers of 256.
Note that the previously mentioned limit on key usage to at most 2^48
packets for one given key applies both to the derived session keys
and to the master keys, as key derivation does not increase this
maximum number.
4.3.2 AES-CM PRF
The currently defined PRF is keyed by 128 to 256 bit (master) keys,
has input block size m = 128 and can produce n-bit outputs for
essentially arbitrary n. We define PRF_m^n(k,x) to be AES in counter
mode as described in Section 4.1.1, applied to (master) key k, input
block A = x, and with the output keystream truncated to the n first
(left-most) bits. (Requiring n/128, rounded up, applications of AES.)
4.3.3 SRTCP Key Derivation
SRTCP uses the same master key as SRTP, i.e. it is shared between the
two protocols. To do this securely, the following changes are done to
Section 4.3.1 when applying session key derivation for SRTCP.
Baugher, et al. [Page 21]
INTERNET-DRAFT SRTP November, 2001
Replace the index by the 32-bit quantity: 0 || SRTCP index (i.e.
excluding the E-bit, replacing it with a fixed 0-bit), and use
<label> = 0x2 for the SRTCP encryption key and <label> = 0x3 for the
SRTCP authentication key.
SRTCP SHALL use the same salting key as SRTP.
5. Default and Mandatory Transforms
The "default" transforms also are "mandatory-to-implement" transforms
in SRTP. Of course, "mandatory-to-implement" does not imply
"mandatory-to-use".
5.1 Encryption: AES-CM
AES running in Counter Mode, as defined in Section 4.1.1, is the
default encryption algorithm, which is mandatory-to-implement.
5.2 Authentication/Integrity: HMAC/SHA1
HMAC/SHA1, as defined in Section 4.2.1, is the default and mandatory-
to-implement message authentication code.
5.3 Key Derivation: AES-CM PRF
The AES Counter Mode PRF defined in Sections 4.3.1 and 4.3.2, is the
default and mandatory-to-implement method for generating keys.
6. SRTP Parameters
The SRTP-WINDOW-SIZE is defined to be at least 64 (Section 3.2.3).
The current defined modes are Segmented Integer Counter Mode
(default), f8-mode (Section 4), and the NULL Cipher. The default
cipher is AES (Section 4), used with a block- and encryption key size
of n_b = n_e = 128 bits.
The current defined authentication functions are the HMAC/SHA1 and
TMMH/16. Default value is absence of authentication for RTP
(authentication is mandatory for RTCP). For HMAC/SHA1, the default
key-size is n_a = 128 bits and the output length is n_tag = 32 bits.
SRTP_PREFIX_LENGTH is therefore by default 0.
The default size of the master key and salting key shall thus also be
128 bits.
Baugher, et al. [Page 22]
INTERNET-DRAFT SRTP November, 2001
The default value for the key derivation-rate field in the context is
"0", in practice meaning "no key-derivation" (though one (1)
application of it is mandatory, see Section 4.3).
7. Adding SRTP Transforms
Sections 4 provide examples of the level of detail needed for
defining transforms (Section 4). Whenever a new transform is to be
added to SRTP, a companion standards-track RFC MUST be written to
exactly define how the new transform can be used with SRTP (and
SRTCP). Such a companion RFC should avoid to overlap with the SRTP
protocol document. Note however, that it might be necessary to extend
the cryptographic context's definition with new parameters, or add
steps to the packet processing. The companion RFC shall explain any
known issues regarding interactions between the transform and other
aspects of SRTP.
Encryption and authentication transforms require some set of optional
parameters or have optional modes of operation. The companion RFC
shall select fixed or default values for these parameters (whenever
possible), to reduce key management complexity. The mode of operation
of ciphers and related parameters (e.g. IV-formation for RTP and
RTCP) shall be defined.
Each new transform document should specify its key attributes, e.g.
size of keys (minimum, maximum, recommended), format of keys,
recommended/required processing of input keying material,
requirements/recommendations on re-keying and key derivation, etc.
8. Rationale
8.1 Key derivation
Key derivation has been introduced to lighten the burden on the key-
exchange: the four keys necessary to protect the RTP session (SRTP
and SRTPC encryption keys, SRTP and SRTCP authentication keys) are
derived from a single master key in a cryptographically secure way.
The security stands (and falls) with the master key as the derived
session keys are cryptographically independent (under reasonable
assumptions on the PRF, here AES-based).
Subsequent applications of the key derivation are optional but will
give security benefits when enabled. They prevent a cryptanalysist
from obtaining large amounts of ciphertext produced by a single fixed
session key. They provide backwards and forward security in the sense
that a compromised session key does not compromise other session keys
derived from the same master (but of course, a leaked master key
reveals all session keys).
Baugher, et al. [Page 23]
INTERNET-DRAFT SRTP November, 2001
If future encryption transforms are added, having a short IV that
cannot fit the SEQ+ROC combination, a proper refresh-policy will
enable these algorithms to encrypt longer streams without need to
involve expensive key management operations.
8.2 Salting key
The salting key has been introduced to protect against some attacks
on additive stream ciphers, see Section 10.1. For simplicity, we per
default require the salting key to have the same size as the block
size of the cipher.
8.3 TMMH: Message Integrity from Universal Hashing
The Truncated Multi-Modular Hash Function (TMMH) is a "universal"
hash function suitable for message authentication in the Wegman-
Carter paradigm [WC81]. It is simple, quick, and especially
appropriate for Digital Signal Processors and other processors with a
fast multiply operation, though a straightforward implementation
requires storage equal in length to the largest message to be hashed.
TMMH offers secure (provably secure under randomness assumptions on
the added prefix) and very efficient MACs. However, as this approach
to message integrity is new (not conceptually, but within
standardization), we have chosen to make HMAC the default transform
as many devices already have an HMAC implementation used for other
purposes. We envision a migration to TMMH so that HMAC may eventually
be phased-out from SRTP.
8.4 Data Origin Authentication Considerations
Note that in unicast and, in general, in keys-per-user scenarios,
integrity and data origin authentication are provided together.
However, in group scenarios where the keys are shared between
members, the MAC tag only proves that a member of the group sent the
packet, but does not prove the actual sender. Data origin
authentication (DOA) for multicast and group RTP sessions is a hard
problem that needs a solution; while some promising proposals are
being investigated [PCST1, PCST2], more work is needed to rigorously
specify these technologies. Thus SRTP data origin authentication in
groups is for further study.
DOA can be done otherwise using signatures. However, this has high
impact in terms of bandwidth and processing time, therefore we do not
consider signatures in the discussion.
The presence of mixers and translators does not allow data origin
authentication in case the RTP payload and/or the RTP header are
Baugher, et al. [Page 24]
INTERNET-DRAFT SRTP November, 2001
manipulated. Note that this type of middle entities also disrupts
end-to-end confidentiality (being the IV formation dependent e.g. on
the RTP header preservation).
9. Key Management Considerations
The SSRC and the random initial sequence number are known to the key
management.
A particular key management system might allow the different RTP
sessions to use identical cryptographic master keys. Note that this
is possible if the design of the synchronization mechanism, i.e. the
IV in the case of the f8-mode, avoids keystream re-use (the two-time
pad, Section 10.2). If this is used, the SSRC MUST be unique per
stream.
A particular key management system might choose to provide re-key by
associating a key for a crypto context with a pair of SEQ+ROC values,
<FirstSEQ+ROC, LastSEQ+ROC>. The key management specification may
require the SRTP implementation to check the SEQ+ROC of an incoming
SRTP packet against the interval for the master key in the context
before using the key. These interactions are defined by the key
management interface to SRTP and are not defined by this protocol
specification.
The key management interface might use the defaults for the SRTP
protocol or define values for any and all SRTP parameters such as the
following:
- cipher and related parameters, including mode of operation
- key(s), i.e. correct master (and salting) key(s), and related
parameters,
- authentication algorithm(s), and related parameter,
- re-keying (key lifetime) and key derivation parameters,
- SSRC, network address, RTP port pair
- Current value of ROC and SEQ (or zeros prior to session
commencement)
- Replay window size
10. Security Considerations
10.1 Key Usage
The effective key size is determined (upper bounded) by the size of
the master key and, for encryption, the size of the salting key.
Any additive stream cipher is vulnerable to attacks that use
statistical knowledge about the plaintext source to enable key
collision and time-memory tradeoff attacks [MF00,H80,Bi96]. These
attacks take advantage of commonalities among plaintexts, and provide
Baugher, et al. [Page 25]
INTERNET-DRAFT SRTP November, 2001
a way for a cryptanalyst to amortize the computational effort of
decryption over many keys, thus reducing the effective key size of
the cipher. A detailed analysis of these attacks and their
applicability to the encryption of Internet traffic is provided in
[MF00]. In summary, the effective key size of SRTP when used in a
security system in which m distinct keys are used, is equal to the
key size of the cipher less the logarithm (base two) of m. Protection
against such attacks can be provided simply by increasing the size of
the keys used, which here can be accomplished by the use of the
"salting key". Note that the salting key MUST be random, but MAY be
public.
Implementations SHOULD use keys that are as large as possible. Please
note that in many cases increasing the key size of a cipher does not
affect the throughput of that cipher.
10.2 SSRC collision and two-time pad
Any fixed keystream output, generated from the same key and index
should only be used to encrypt once. Re-using such keystream
(jokingly called a 'two-time pad' system by cryptographers), can
seriously compromise security. The NSA's VENONA project [C99]
provides a historical example of such a compromise. In SRTP, a 'two-
time pad' is avoided by requiring the key, or some other parameter of
cryptographic significance, to be unique per RTP stream.
It may in some cases be desirable that multiple crypto contexts
contain identical master keys. For instance, there could be a desire
for a group to share a single key. Issues as above (two-time pad)
MUST then be considered. As discussed in Section 9, f8 may allow such
sharing by its use of the SSRC in the IV; however, the effect of an
eventual RTP SSRC collision detection MUST be taken into account.
Note that sharing a master key between multiple streams in a
multimedia session implies using a distinct SSRC in the IV of AES-f8.
This means, each SSRC MUST be unique among all the RTP streams inside
that multimedia session, to avoid unlucky IV combinations and end up
in two-time pad.
10.3 Confidentiality of the RTP Payload
By using 'seekable' stream ciphers, SRTP avoids the denial of service
attacks that are possible on stream ciphers that lack this property
(these attacks are described in Section 3.4 of [B96]). It is
important to be aware that, as with any stream cipher, the exact
length of the payload is revealed by the encryption. This means
that it may be possible to deduce certain "formatting bits" of the
payload, as the length of the codec output might vary due to certain
parameter settings etc. This, in turn, implies that the corresponding
Baugher, et al. [Page 26]
INTERNET-DRAFT SRTP November, 2001
bit of the keystream can be deduced. However, if the stream cipher is
secure (counter mode and f8 are provably secure under certain
assumptions), knowledge of a few bits of the keystream will not aid
an attacker in predicting the following keystream bits. Thus, the
payload length (and information deducible from this) will leak, but
nothing else.
10.4 Confidentiality of the RTP Header
With the described proposal, RTP headers are sent in the clear to
allow for header compression. This means that data such as payload
type, synchronization source identifier, and timestamp are available
to an eavesdropper. Moreover, since RTP allows for future extensions
of headers, we cannot foresee what kind of possibly sensitive
information might also be "leaked".
The described proposal is a low-cost method, which allows header
compression to reduce bandwidth. It is up to the endpoints policies
to decide about the security protocol to employ. If the header
compression is omitted, other solutions might be applicable. In other
words, we provide a solution that works in the most general scenario,
even in the most demanding one (like conversational multimedia over
low-bandwidth, unreliable media). Of course the solution will then
also work in less restricted environments, but we suggest that if one
really needs to protect headers, and is allowed to do so by the
surrounding environment, then one should also look at alternatives,
e.g. IPsec. In addition, we strongly recommend the use of profiles to
select the right trade-off for the required level of security, e.g.
if the headers can be left in cleartext or not.
10.5 Integrity of the RTP packet
Additive ciphers do not provide any security service other than
privacy. In particular, they do not provide message authentication
(see [RK99] or [HAC] for a discussion of this security service).
However, SRTP uses a message authentication code to provide that
security service.
With HMAC being a well-studied authentication scheme, based on a
provably secure construction, the security against MAC forgery
depends on the key-size and the size of the output tags (or for some
attacks, half the size of the tag due to the "birthday-paradox").
The default size for HMAC has been fixed to 32 bits. Other size
values may be defined. The use of a truncated size is motivated by
the fact that it may be desirable, e.g. in wireless environments, to
save bandwidth. The choice of such a truncation MUST be evaluated to
the reduction in security it implies. The default 32-bit size is a
Baugher, et al. [Page 27]
INTERNET-DRAFT SRTP November, 2001
compromise, offering a reasonable level of security, taking into
account the real-time aspects of the protected protocol. High
security applications SHOULD however use larger tags.
The fact that authentication is optional is motivated by the fact
that, while the function is typically highly desired, there are
certain cases (notably in cellular environments) where it has an
impact in terms of cost, e.g. for bandwidth consumption. Also,
independently of the tag length, a single transmission bit error in
the protected part of the packet or in the tag itself forces the
entire packet to be dropped. Given a fixed quality, it implies the
necessity of higher protection of the transmitted unit, hence higher
cost. In those cases, it is up to the user security profile to
request authentication.
10.5.1 Integrity of the RTP header: IHA
The IV formation of the f8-mode gives implicit authentication of the
RTP header, even if no cryptographic integrity protection is present.
This means that modifying bits of the RTP header will cause the
decryption process at the receiver to produce essentially random
garbage.
11. Interaction with Forward Error Correction mechanisms
Some considerations are due when Forward Error Correction mechanisms
are performed, e.g. as specified in RFC 2733. In particular, the
order in which SRTP processing and the error correction processing
are applied, is of concern.
The optimal order would be the following:
- on the sender side, first encrypt the packet, then perform the FEC
processing, finally authenticate
- on the receiver side, first authenticate the packet, then perform
the FEC processing, finally decrypt.
The motivations for the above ordering are:
- FEC expands the packet, so performing encryption after FEC would be
more expensive
- on the receiver side, authentication has to be verified before
getting engaged in the FEC processing, to reduce effects of certain
denial of service attacks
- adding redundancy before encrypting, slightly reduces the effective
key-size and resistance to attacks
Baugher, et al. [Page 28]
INTERNET-DRAFT SRTP November, 2001
However, this implies to split the security processing.
Implementations could gain in keeping the security process strictly
tied, in this case the recommendation is that the security processing
takes place after FEC on the sender's side, and before FEC on the
receiver's side. This implies the cost of placing encryption after
FEC processing, as above explained, hence a convenient choice is left
to the application. For interoperability clearness, implementations
are requested to place the security process after FEC on the sender's
side, and before FEC on the receiver's side. This is also default
behavior; another choice has to be agreed out-of-band.
12. IANA Considerations
The RTP specification establishes a registry of profile names for use
by higher-level control protocols, such as the Session Description
Protocol (SDP), to refer to transport methods. This profile registers
the name "RTP/SAVP".
13. Open Issue
It is open issue to investigate the need for AES-CM to provide a mean
to support the use of the same master key for multiple streams. This
feature was supported in the previous drafts by insertion of the SSRC
in the IV (under the constraint of unique SSRC).
The feature is currently supported only by the non-mandatory-to-
implement f8-AES. The reason for raising this question is that there
might be cases where the feature is needed, e.g. when a single master
key is available but there are multiple streams. As an example, it is
likely that such simplistic key management is used in very 'thin'
clients that cannot afford implementing anything but the mandatory
transform. Thus, this may be a restriction in SRTP's applicability in
such devices.
14. Acknowledgements
The authors would like to thank Magnus Westerlund, Brian Weis, Robert
Fairlie-Cuninghame, and Adrian Perrig for their reviews and comments.
15. Author's Addresses
Questions and comments should be directed to the authors and
avt@ietf.org:
Mark Baugher
Cisco Systems, Inc.
5510 SW Orchid Street Phone: +1 503-245-4543
Baugher, et al. [Page 29]
INTERNET-DRAFT SRTP November, 2001
Portland, OR 97219 USA Email: mbaugher@cisco.com
Rolf Blom
Ericsson Research
SE-16480 Stockholm Phone: +46 8 58531707
Sweden EMail: rolf.blom@era.ericsson.se
Elisabetta Carrara
Ericsson Research
SE-16480 Stockholm Phone: +46 8 50877040
Sweden EMail: elisabetta.carrara@era.ericsson.se
David A. McGrew
Cisco Systems, Inc.
San Jose, CA 95134-1706 Phone: +1 301-349-5815
USA EMail: mcgrew@cisco.com
Mats Naslund
Ericsson Research
SE-16480 Stockholm Phone: +46 8 58533739
Sweden EMail: mats.naslund@era.ericsson.se
Karl Norrman
Ericsson Research
SE-16480 Stockholm Phone: +46 8 4044502
Sweden EMail: karl.norrman@era.ericsson.se
David Oran
Cisco Systems, Inc.
San Jose, CA 95134-1706
USA EMail: oran@cisco.com
16. References
[AES] NIST, "Advanced Encryption Standard (AES)", Draft FIPS,
http://www.nist.gov/aes/
[C99] Crowell, W. P., "Introduction to the VENONA Project",
http://www.nsa.gov:8080/docs/venona/index.html.
[ES3D] ETSI SAGE 3GPP Standard Algorithms Task Force, "Security
Algorithms Group of Experts (SAGE); General Report on the
Design,Specification and Evaluation of 3GPP Standard
Confidentiality and Integrity Algorithms", Public report,
Draft Version 1.0, Dec 1999.
[ES3E] ETSI SAGE 3GPP Standard Algorithms Task Force, "Security
Algorithms Group of Experts (SAGE) Report on the Evaluation of
3GPP Standard Confidentiality and Integrity Algorithms",
Public report, Draft Version 1.0, Dec 1999.
Baugher, et al. [Page 30]
INTERNET-DRAFT SRTP November, 2001
[HAC] Menezes, A., Van Oorschot, P., and Vanstone, S., "Handbook of
Applied Cryptography", CRC Press, 1997, ISBN 0-8493-8523-7.
[HMAC] Krawczyk, H., Bellare, M., and Canetti, R.: "HMAC: Keyed-
hashing for message authentication". IETF RFC 2104, February
1997.
[H80] Hellman, M. E., "A cryptanalytic time-memory trade-off", IEEE
Transactions on Information Theory, July 1980, pp. 401-406.
[MF00] McGrew, D., and Fluhrer, S., "Attacks on Encryption of
Redundant Plaintext and Implications on Internet Security",
the Proceedings of the Seventh Annual Workshop on Selected
Areas in Cryptography (SAC 2000), Springer-Verlag.
[RFC1889] Schulzrinne, H., Casner, S., Frederick, R., Jacobson,V.,
"RTP: A Transport Protocol for Real-Time Applications", IETF
RFC 1889.
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", IETF RFC 2119, March 1997.
[RFC2401] Kent, S., and R. Atkinson, "Security Architecture for IP",
IETF RFC 2401, November 1998.
[RFC2675] Borman, D., Deering, S., Hinden, R., "IPv6 Jumbograms",
IETF RFC 2675, August 1999.
[RFC2828] Shirey, R., "Internet Security Glossary", IETF RFC 2828,
May 2000.
[RK99] Rescorla, E., and Korver, B., "Guidelines for Writing RFC
Text on Security Considerations," draft-rescorla-sec-cons-
00.txt
[PCST1] Perrig, A., Canetti, R., Tygar, D., Song, D., "Efficient and
Secure Source Authentication for Multicast", in Proc. of
Network and Distributed System Security Symposium NDSS 2001,
pp. 35-46, 2001.
[PCST2] Perrig, A., Canetti, R., Tygar, D., Song, D., "Efficient
Authentication and Signing of Multicast Streams over Lossy
Channels", in Proc. of IEEE Security and Privacy Symposium
S&P2000, pp. 56-73, 2000.
[WC81] M. N. Wegman and J. L. Carter, "New Hash Functions and Their
Use in Authentication and Set Equality", JCSS 22, 265-279,
1981.
Baugher, et al. [Page 31]
INTERNET-DRAFT SRTP November, 2001
Appendix A: Pseudocode for Index Determination, and ROC and s_l Update
Pseudocode for the algorithm to process a packet with sequence number
SEQ, determining the index i and updating the rollover counter and
sequence number for the last (authenticated) packet, s_l.
if (s_l < 32,768)
if (SEQ - s_l > 32,768)
set i to SEQ + 65,536 * (ROC-1)
else
set i to SEQ + 65,536 * ROC
endif
else
if (s_l - 32,768 > SEQ)
set ROC to ROC + 1
endif
set i to SEQ + ROC * 65,536
endif
set s_l to SEQ
Appendix B: Test Vectors
B.1 AES-f8 Test Vectors
All values are in hexadecimal.
SRTP PREFIX LENGTH : 0
RTP packet header : 806e5cba50681de55c621599
RTP packet payload : 70736575646f72616e646f6d6e657373
20697320746865206e65787420626573
74207468696e67
ROC : d462564a
key : 234829008467be186c3de14aae72d62c
salt key : 32f2870d
key-mask (m) : 32f2870d555555555555555555555555
key XOR key-mask : 11baae0dd132eb4d3968b41ffb278379
IV : 006e5cba50681de55c621599d462564a
IV' : 595b699bbd3bc0df26062093c1ad8f73
j : 0
IV' XOR j : 595b699bbd3bc0df26062093c1ad8f73
S(-1) : 00000000000000000000000000000000
S(-1) XOR IV' XOR j : 595b699bbd3bc0df26062093c1ad8f73
S(0) : 71ef82d70a172660240709c7fbb19d8e
Baugher, et al. [Page 32]
INTERNET-DRAFT SRTP November, 2001
plaintext : 70736575646f72616e646f6d6e657373
ciphertext : 019ce7a26e7854014a6366aa95d4eefd
j : 1
IV' XOR j : 595b699bbd3bc0df26062093c1ad8f72
S(0) : 71ef82d70a172660240709c7fbb19d8e
S(0) XOR IV' XOR j : 28b4eb4cb72ce6bf020129543a1c12fc
S(1) : 3abd640a60919fd43bd289a09649b5fc
plaintext : 20697320746865206e65787420626573
ciphertext : 1ad4172a14f9faf455b7f1d4b62bd08f
j : 2
IV' XOR j : 595b699bbd3bc0df26062093c1ad8f70
S(1) : 3abd640a60919fd43bd289a09649b5fc
S(1) XOR IV' XOR j : 63e60d91ddaa5f0b1dd4a93357e43a8c
S(2) : 584d14a591acfca846b3aa3a0ab50fec
plaintext : 74207468696e67
ciphertext : 2c6d60cdf8c29b
B.2 AES-CM Test Vectors
All values are in hexadecimal.
AES-CM Key:
75387824D1F1F3815641B65D78D51EDB96C9781981053CBBCB36927844F1932C
Block Cipher Key: 75387824D1F1F3815641B65D78D51EDB
Salting key: 96C9781981053CBBCB36927844F1932C
Packet Index: 12345678
Counter Keystream
96C9781981053CBBCB36A4AC9B69932C EA0AA027BA6D56E44B28F43A7E3E5F58
96C9781981053CBBCB36A4AC9B69932D CBDB3107EDA8D420D3EF7AB7FF290166
96C9781981053CBBCB36A4AC9B69932E AED6F7CB14ED49174336CC010AEB8780
96C9781981053CBBCB36A4AC9B69932F 4C3A754AF027A5C8CCB40E0FE20AF246
96C9781981053CBBCB36A4AC9B699330 01A6D1CE983EF993E980CC9568587E3D
Keystream Segment (final output)
EA0AA027BA6D56E44B28F43A7E3E5F58CBDB3107EDA8D420D3EF7AB7FF290166
AED6F7CB14ED49174336CC010AEB87804C3A754AF027A5C8CCB40E0FE20AF246
01A6D1CE983EF993E980CC9568587E3D...
Baugher, et al. [Page 33]
INTERNET-DRAFT SRTP November, 2001
B.3 TMMH/16 Test Vectors
This section provides test vectors which can be used to test an
implementation of TMMH/16. The key, message, and outputs are
expressed as octet sequences, with each octet in hexadecimal.
KEY_LENGTH: 10
TAG_LENGTH: 2
key: { 0x01, 0x23, 0x45, 0x67, 0x89, 0xab, 0xcd, 0xef, 0xfe, 0xdc }
message: { 0xca, 0xfe, 0xba, 0xbe, 0xba, 0xde }
output: { 0x9d, 0x6a }
KEY_LENGTH: 10
TAG_LENGTH: 2
key: { 0x01, 0x23, 0x45, 0x67, 0x89, 0xab, 0xcd, 0xef, 0xfe, 0xdc }
message: { 0xca, 0xfe, 0xba }
output: { 0xc8, 0x8e }
KEY_LENGTH: 10
TAG_LENGTH: 4
key: { 0x01, 0x23, 0x45, 0x67, 0x89, 0xab, 0xcd, 0xef, 0xfe, 0xdc }
message: { 0xca, 0xfe, 0xba, 0xbe, 0xba, 0xde }
output: { 0x9d, 0x6a, 0xc0, 0xd3 }
This Internet-Draft expires in April 2002.
Baugher, et al. [Page 34]