Internet Engineering Task Force AVT Working Group Baugher, McGrew, INTERNET-DRAFT Oran (Cisco) Expires: April 2002 Blom, Carrara,Naslund, Norrman (Ericsson) November 2001 The Secure Real Time Transport Protocol <draft-ietf-avt-srtp-02.txt> Status of this memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or cite them other than as "work in progress". The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/lid-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html Abstract This document describes the Secure Real Time Transport Protocol (SRTP), a profile of the Real Time Transport Protocol (RTP), which can provide confidentiality, message authentication, and replay protection. SRTP can achieve high throughput and low packet expansion. SRTP proves to be a suitable protection for heterogeneous environments, i.e. environments including both wired and wireless links. To get such features, default transforms are described, based on an additive stream cipher for encryption, a keyed-hash based function for message Baugher, et al. [Page 1]
INTERNET-DRAFT SRTP November, 2001 authentication, and an 'implicit' index for sequencing based on the RTP sequence number. TABLE OF CONTENTS 1. Notational Conventions.........................................3 2. Goals..........................................................3 3. SRTP Framework.................................................4 3.1 SRTP Cryptographic Contexts...................................6 3.1.1 Transform-independent parameters............................6 3.1.2 Transform-dependent parameters..............................7 3.1.3 Mapping SRTP Packets to Cryptographic Contexts..............7 3.2 SRTP Packet Processing........................................7 3.2.1 Packet Index Determination..................................8 3.2.2 Cryptographic Transforms....................................9 3.2.3 Replay Protection...........................................10 3.3 Secure RTCP...................................................10 4. Pre-Defined Transforms.........................................13 4.1 Encryption....................................................13 4.1.1 AES in Counter Mode.........................................15 4.1.1.1 Keystream generation......................................15 4.1.2 AES in f8-Mode..............................................15 4.1.2.1 Keystream Generation......................................16 4.1.2.2 SRTP IV Formation.........................................17 4.1.2.3 SRTCP IV Formation........................................17 4.1.3 NULL Cipher.................................................18 4.2 Message Authentication and Integrity..........................18 4.2.1. HMAC/SHA1..................................................18 4.2.2 TMMH/16.....................................................18 4.3 Key Derivation................................................20 4.3.1 Key Derivation Algorithm....................................20 4.3.2 AES-CM PRF..................................................21 4.3.3 SRTCP Key Derivation........................................21 5. Default and Mandatory Transforms...............................22 5.1 Encryption: AES-CM............................................22 5.2 Authentication/Integrity: HMAC/SHA1...........................22 5.3 Key Derivation: AES-CM PRF....................................22 6. SRTP Parameters................................................22 7. Adding SRTP Transforms.........................................23 8. Rationale......................................................23 8.1 Key derivation................................................23 8.2 Salting key...................................................24 8.3 TMMH _ Message Integrity from Universal Hashing...............24 8.4 Data Origin Authentication considerations.....................24 9. Key Management Considerations..................................25 10. Security Considerations.......................................25 10.1 Key Usage....................................................25 10.2 SSRC collision and two-time pad..............................26 10.3 Confidentiality of the RTP Payload...........................26 10.4 Confidentiality of the RTP Header............................27 Baugher, et al. [Page 2]
INTERNET-DRAFT SRTP November, 2001 10.5 Integrity of the RTP packet..................................27 10.5.1 Integrity of the RTP header: IHA...........................28 11. Interaction with Forward Error Correction mechanisms..........28 12. IANA Considerations...........................................29 13. Open issue....................................................29 14. Acknowledgements..............................................29 15. Author's Addresses............................................29 16. References....................................................30 Appendix A: Pseudocode for Index Determination, and ROC and s_l Update............................................32 Appendix B: Test Vectors..........................................32 B.1 AES-f8 Test Vectors...........................................32 B.2 AES-CM Test Vectors...........................................33 B.3 TMMH/16 Test Vectors..........................................34 1. Notational Conventions The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119]. Terminology is conform to [RFC2828]. By convention, the most left bit (byte) is the most significant one. By XOR we mean bitwise addition modulo 2 of binary strings, and || denotes concatenation. E.g. if C = A || B, then the most significant bits of C are the bits of A, and the least significant bits of C equals the bits of B. Hexadecimal numbers are prefixed by 0x. At the time of writing, NIST has not published the Advanced Encryption Standard, AES [AES]. However, as it is clear that AES will be the Rijndael algorithm as specified in [AES], we shall throughout this document let AES denote the block cipher Rijndael. 2. Goals The security goals for SRTP are to ensure: * the confidentiality of the RTP payload, and * the integrity protection of the entire RTP packet, together with protection against replayed RTP packets. Each of these security services is optional and independent. Other, functional, goals for the protocol are: * a framework that permits upgrade to new cryptographic transforms, Baugher, et al. [Page 3]
INTERNET-DRAFT SRTP November, 2001 * low bandwidth cost, i.e. a framework preserving RTP header compression efficiency, and, asserted by the pre-defined transforms: * a low computational cost, * a small footprint (i.e. small code size and data memory for keying information and replay lists), * limited packet expansion to support the bandwidth economy goal, * independence from the underlying transport, network, and physical layer used by RTP, in particular high tolerance to packet loss and re-ordering, and robustness to transmission bit-errors. The described security services are also provided for RTCP, the control protocol defined for RTP [RFC1889], with the exception that integrity and replay protection for the RTCP packets are mandatory when SRTP services are applied to the RTP packets of the corresponding session. These properties ensure that SRTP is a suitable protection scheme for RTP in both wired and wireless scenarios. 3. SRTP Framework RTP is the Real Time Transport Protocol [RFC1889]. We define SRTP as a profile of RTP, in a way analogous to RFC1890 which defines the audio/video profile for RTP. Conceptually, we consider a 'bump in the stack' implementation which resides between the RTP application and the transport layer, which intercepts RTP packets and then forwards an equivalent SRTP packet on the sending side, and which intercepts SRTP packets and passes an equivalent RTP packet up the stack on the receiving side. Baugher, et al. [Page 4]
INTERNET-DRAFT SRTP November, 2001 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-->+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |V=2|P|X| CC |M| PT | sequence number | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | timestamp | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | synchronization source (SSRC) identifier | | +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ | | contributing source (CSRC) identifiers | | | .... | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | RTP extension (optional) | | +>+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | | | | | payload | | | | .... | +>+>+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | authentication tag (optional) | | | | | | | | .... | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | +- Encrypted Portion +---- Authenticated Portion Figure 1. The format of an SRTP packet. The format of an SRTP packet is illustrated in Figure 1. The optional authentication tag is the only field defined by SRTP that is not in RTP. The added field is: Authentication tag: variable length, optional The authentication tag shall be used to carry authentication data. The Authenticated Portion of an SRTP packet consists of the entire equivalent RTP packet. Note that, if encryption and authentication are applied, then 'payload' in the Authenticated Portion refers to the correspondent encrypted payload. The authentication tag provides authentication of the RTP header and payload, and it indirectly provides replay protection by authenticating the sequence number. Baugher, et al. [Page 5]
INTERNET-DRAFT SRTP November, 2001 The Encrypted Portion of an SRTP packet consists of the RTP payload of the equivalent RTP packet. 3.1 SRTP Cryptographic Contexts Each SRTP session requires the sender and receiver to maintain cryptographic state information. This information is called the cryptographic context. By a session key, we mean a key that is to enter a cryptographic transform (e.g. encryption or authentication), and a master key is a random bit string (given by the key management protocol) from which session keys are derived in a cryptographically secure way. 3.1.1 Transform-independent parameters The transport-independent parameters of the cryptographic context consists of: * a 32-bit rollover counter, ROC, which records how many times the 16-bit RTP sequence number has been reset to zero after passing through 65,535. Unlike the sequence number, SEQ, which SRTP extracts from the RTP packet header, the ROC is maintained by SRTP. This ROC is thus a parameter internal to SRTP. * for the receiver only, a sequence number s_l, which is the last received sequence number (possibly authenticated, if authentication is provided). Here, 'sequence number' refers to the 16-bit SEQ carried in the RTP packet header. * identifier for the encryption algorithm, i.e. the cipher and its mode of operation, and related parameters, * identifier for the authentication protection algorithm, and related parameters (when authentication is provided), * a replay list L, maintained by the receiver only (when authentication is provided), * integers n_e and n_a, determining the length of the session keys for encryption and authentication, * the master key(s), * a 16-bit integer, the session key derivation-rate, * FirstSEQ+ROC and LastSEQ+ROC as key lifetime for each of the master keys (FirstSEQ and LastSEQ are the RTP sequence numbers inside whose Baugher, et al. [Page 6]
INTERNET-DRAFT SRTP November, 2001 range the master key is valid, and ROC is the rollover counter). These values are absolute quantities, not relative. 3.1.2 Transform-dependent parameters Any encryption, authentication/integrity, and key derivation parameters that depend on the transform definitions are defined in the Transforms section. Future SRTP transform specifications MUST include a section to list the cryptographic context's parameters for that transform. 3.1.3 Mapping SRTP Packets to Cryptographic Contexts Recall that an RTP session for each participant is defined [RFC1889] by a pair of destination transport addresses (one network address plus a port pair for RTP and RTCP), and that a multimedia session is defined as a collection of RTP sessions. For example, a particular multimedia session could include an audio RTP session, a video RTP session, and a text RTP session. A cryptographic context shall be uniquely identified by the triplet context identifier: <SSRC, destination network address, destination transport port number> where the destination network address and the destination transport port are the ones in the current packet. It is assumed that, when presented with this information, the key management returns a context with the information as described in Section 3.1. 3.2 SRTP Packet Processing To construct a proper SRTP packet, given an RTP packet, the sender does the following: 1. Determine which cryptographic context to use as described in Section 3.1.3. 2. Determine the index of the SRTP packet as described in Section 3.2.1, using the rollover counter in the cryptographic context and the sequence number in the RTP packet. 3. Determine the session keys, as described in Section 4.3. 4. Encrypt the Encrypted Portion of the packet (see Section 4, for the defined ciphers), using the encryption keys found in Step 3. Baugher, et al. [Page 7]
INTERNET-DRAFT SRTP November, 2001 5. If authentication is provided, compute the authentication tag for the Authenticated Portion of the packet, as described in Section 4, using the index determined in Step 2 and the authentication key found in Step 3. Note that the Encrypted Portion is encrypted before the authentication tag is computed. To authenticate and decrypt a SRTP packet, the receiver does the following: 1. Determine which cryptographic context to use as described in Section 3.1.3. 2. Estimate the index of the SRTP packet from the rollover counter in the cryptographic context and the sequence number in the RTP packet, as described in Section 3.2.1. 3. Determine the session keys, as described in Section 4.3. 4. If authentication is provided, check if the packet has been replayed, by checking the Replay List to ensure that no packet with that index has been received and authenticated before. If that index is in the list, then the packet has been replayed and is invalid. It MUST be discarded, and the event SHOULD be logged. Next, perform verification of the authentication tag, using the authentication key and packet index from Step 2. If the result is 'AUTHENTICATION FAILURE' (see Section 4), the packet MUST be discarded from further processing and the event SHOULD be logged. 5. Decrypt the Encrypted Portion of the packet (see Section 4, for the defined ciphers), using the decryption keys found in Step 3. 6. Update the rollover counter and last sequence number, s_l, in the local context to the values used in the packet index estimated in Step 2. 3.2.1 Packet Index Determination SRTP implementations use an 'implicit' packet index for sequencing. When the session starts, the sender side shall set the rollover counter, ROC, to zero. Each time the RTP sequence number, SEQ, wraps modulo 2^16, the sender side shall increment ROC by one. The sender's packet index is then defined as i = 65,536 * ROC + SEQ. Receiver-side implementations use the RTP sequence number to reconstruct the correct index (that is, location in the sequence of all RTP packets). Also here, the index is defined as SEQ + ROC * 65,536, where the RTP sequence number is SEQ and the rollover Baugher, et al. [Page 8]
INTERNET-DRAFT SRTP November, 2001 counter is ROC, maintained locally by the receiver as described below. A robust approach for the proper use of a rollover counter requires its handling and use to be well defined. In particular, out-of-order RTP packets with sequence numbers close to 65,536 or zero must be properly dealt with. A receiver reconstructs the index i of a packet with sequence number SEQ using the estimate i = 65,536 * v + SEQ, where v is chosen from the set { ROC-1, ROC, ROC+1 } such that i is closest to the value 65,536 * ROC + s_l. If the value ROC+1 is used, then the rollover counter ROC in the cryptographic context is incremented by one (see Appendix A). The index i is used in replay protection (Section 3.2.3), encryption and authentication (Section 4), and for the key derivation (Section 4.3). As the rollover counter is 32 bits long, the maximum number of packets in any given SRTP session is 2^48 = 281,474,976,710,656. After that number of SRTP packets have been sent with a given key, the sender MUST not send any more packets with that key. This limitation enforces a security benefit by providing an upper bound on the amount of traffic that can pass before cryptographic keys are changed. Re-keying (see Section 9) MUST be triggered, no later than after this amount of traffic, and MAY be triggered earlier, e.g. for increased security and access control to media. Re-occurring key derivation, as determined by a non-zero derivation rate (see Section 4.3), gives even stronger security benefits, but does NOT change the above absolute maximum value. For the receiver, the 'implicit index' approach works as long as the reorder and loss of the packets is not too great. In particular, 32,768 packets would need to be lost, or a packet would need to be 32,768 packets out of sequence in order for synchronization to be lost. Such drastic loss or reorder is likely to disrupt the RTP application itself. 3.2.2 Cryptographic Transforms While there are numerous encryption and message authentication algorithms that can be used in SRTP, we define (Section 4) default algorithms in order to avoid the complexity of specifying the encodings for the signaling of algorithm and parameter identifiers. The defined algorithms have been chosen as they fulfil the goals Baugher, et al. [Page 9]
INTERNET-DRAFT SRTP November, 2001 listed in Section 2. Recommendation on how to extend SRTP with new transforms are given in Section 7. 3.2.3 Replay Protection Robust replay protection is possible when authentication of RTP packets is present. A packet is 'replayed' when it is stored by an adversary, and then re-injected into the network. SRTP provides protection against such attacks whenever authentication is provided, through the storage of the indices of the most recently received and authenticated packets. Each SRTP receiver maintains a Replay List, which conceptually contains the indices of all of the packets which have been received and authenticated. In practice, the list can use a 'sliding window' approach, so that a fixed amount of storage suffices for replay protection. Packet indices which lag behind the packet index in the context by more than SRTP-WINDOW-SIZE can be assumed to have been received, where SRTP-WINDOW-SIZE is a parameter that MUST be at least 64, and which MAY be set to a higher value. The Replay List can be efficiently implemented by using a bitmap to represent which packets have been received, as described in the Security Architecture for IP [RFC2401]. Note that there are no provisions for managing transmitted Sequence Number values among multiple senders using the same crypto contexts, thus the anti-replay service SHOULD NOT be used in a multi-sender environment that employs a single crypto context. 3.3 Secure RTCP Secure RTCP follows the definition of Secure RTP. SRTCP is defined as a profile of RTCP, and it adds two mandatory new fields to the RTCP packet definition, the SRTCP index and the authentication tag. Those fields are appended to an RTCP packet in order to form an equivalent SRTCP packet, so that they follow any other profile specific extensions. An SRTCP packet is illustrated in Figure 2. Baugher, et al. [Page 10]
INTERNET-DRAFT SRTP November, 2001 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-->+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |V=2|P| RC | PT=SR=200 | length | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | SSRC of sender | | +>+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ | | | ... | | | | sender info | | | | ... | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | ... | | | | report block 1 | | | | ... | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | ... | | | | report block 2 | | | | ... | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | | | | | ... | | | | | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | |V=2|P| SC | PT=SDES=202 | length | | | +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ | | | SSRC/CSRC_1 | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | SDES items | | | | ... | | | +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ | | | | | | | ... | | | | | | +>+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ | | |E| SRTCP index | +-|>+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | ... | | | | authentication field | | | | | | | | ... | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | +-- Encrypted Portion (optional) +---- Authenticated Portion (mandatory when SRTP is used for RTP session) Figure 2. The format of a Secure RTCP packet, consisting of underlying RTCP compound packet with Sender Report and SDES packet. Baugher, et al. [Page 11]
INTERNET-DRAFT SRTP November, 2001 The added fields are: E bit and SRTCP index: 32 bits, mandatory The SRTCP index is a 31-bit counter for the SRTCP packets. The index is explicitly included in each packet, in contrast to the 'implicit' index approach used for SRTP. As Section 9.1 of [RFC1889] allows the split of a compound RTCP packet into two lower-layer packets, one to be encrypted and one to be sent in the clear, indices with their most significant bit (E bit) set to '1' are reserved for encrypted packets, and indices with most significant bit set to '0' are used for non-encrypted packets. With this restriction, the rest of the bits are set to zero before the first SRTCP packet is sent, and is incremented by one after each SRTCP is sent. Except for differences in the most significant (E) bit, indices form a strictly increasing sequence. Authentication Tag: variable length, mandatory The authentication tag shall be used to carry message authentication data. The Authenticated Portion of an SRTCP packet consists of the entire equivalent (eventually compound) RTP packet and SRTCP index. The Encrypted Portion of an SRTCP packet consists of the RTCP payload of the equivalent compound RTCP packet, from the first RTCP packet, i.e. from the ninth (9) byte to the end of the compound packet. SRTCP packet processing is identical to that of SRTP packet processing, with the following changes: * SRTCP replay protection is as defined in Section 3.2.3, but using the SRTCP index as the index i and maintains separate values for s_l and the replay list specific to SRTCP. SRTCP replay protection is mandatory. * SRTCP encryption is as defined in Section 4, but using the definition of the SRTCP Encrypted Portion as defined in this section, using the SRTCP index as the index i. The encryption transforms shall be the same selected for the protection of the associated SRTP stream(s) (when RTP is encrypted too), while the NULL algorithm shall be applied to the RTCP packets to be authenticated but not encrypted. * The SRTCP authentication tag is defined as in Section 4, but with the Authenticated Portion of the SRTCP packet defined in this section, and using the SRTCP index as the index i. SRTCP authentication is mandatory. The authentication transforms and related parameters (e.g., key size) shall be the same selected for the protection of the associated SRTP stream(s) (when SRTP is authenticated too). Baugher, et al. [Page 12]
INTERNET-DRAFT SRTP November, 2001 * SRTCP decryption is performed as in Section 4, but only if the SRTCP index has its most significant bit (E bit) equal to 1. If so, the encrypted portion is decrypted, using the SRTCP index as the index i. In case the most significant bit of the index is 0, the payload is simply copied. There MAY also exist some minor transform specific changes, see Section 4 for the defined transforms. The encryption prefix (Section 6.1 of [RFC1889]), a random 32-bit quantity intended to improve privacy, MUST NOT be used. This is because we strongly recommend ciphers secure against known plaintext attacks. The pre-defined SRTP encryption uses a secure, additive stream cipher, and thus the prefix offers no benefit at all. The maximum number of SRTCP packets with a fixed key is limited to 2^31 = 2,147,483,648. Authentication MUST be applied to RTCP, as it is the control protocol (e.g. it has a BYE packet). Note however, the cost for RTCP authentication is not of the same order of RTP authentication, as the session bandwidth allocated to RTCP recommended is at 5% and the RTCP packets have less frequency. However, when adding authentication to RTCP, the overhead in bandwidth SHOULD be considered (it will be more than 5%). 4. Pre-Defined Transforms 4.1 Encryption Generic parameters, common to all pre-defined, non-NULL, encryption transforms: * BLOCK CIPHER is the block cipher used * n_b is the bit-size of the block for the block cipher * k_e is the session encrypting key * n_e is the length of k_e (the default is 128 bits) * k_s is the so called salting key * n_s is the length of the salting key. The default value is equal to n_b. Another (shorter) value MUST be explicitly signaled. * SRTP_PREFIX_LENGTH is the octet length of the keystream prefix, an (at least) 8-bit integer, inferred from the message authentication code in use. The session key is by default derived as specified in Section 4.3. The salting key is obtained directly from the cryptographic context. The encryption transforms defined in SRTP use a "seekable" segmented keystream generator, which for each secret key maps the RTP packet Baugher, et al. [Page 13]
INTERNET-DRAFT SRTP November, 2001 index into a pseudorandom keystream segment, used to encrypt a single RTP packet (with that packet index). The process of encrypting a packet consists of generating the keystream segment corresponding to the packet, and then bitwise exclusive-oring that keystream segment onto the Encrypted Portion of the RTP packet. Decryption is done the same way, but swapping the roles of the plaintext and ciphertext. The definition of how the keystream is generated, given the index, depends on the cipher and its mode of operation. Below, two such key stream generators are defined. The NULL cipher is also defined, to be used when encryption of RTP is not required. The initial octets of each keystream segment MAY be reserved for use in a message authentication code, in which case the keystream used for encryption starts immediately after the last reserved octet. The initial reserved octets are called the keystream prefix, and the remaining octets are called the keystream suffix. This process is illustrated in Figure 3. +----+ +------------------+---------------------------------+ | KG |-->| Keystream Prefix | Keystream Suffix |---+ +----+ +------------------+---------------------------------+ | | +---------------------------------+ v | Encrypted Portion of RTP Packet |->(*) +---------------------------------+ | | +---------------------------------+ | | Encrypted Portion of SRTP Packet|<--+ +---------------------------------+ Figure 3: SRTP Encryption. Here KG denotes the keystream generator, and (*) denotes bitwise exclusive-or. The number of octets in the keystream prefix is denoted as SRTP_PREFIX_LENGTH. The key stream prefix is reserved for use with certain message authentication transforms, indicated by positive, non-zero value of this latter parameter. This means that even if confidentiality is not to be provided, the keystream generator output MAY still need to be computed, in which case the default keystream generator SHALL be used. The default cipher is the Advanced Encryption Standard (AES), and we define two modes of running AES, Segmented Integer Counter Mode AES and AES in f8-mode. In the sequel, let E(k,x) be AES applied to key k and input block x. Baugher, et al. [Page 14]
INTERNET-DRAFT SRTP November, 2001 4.1.1 AES in Counter Mode The default keystream generator cipher SHALL be AES [AES] used in the Segmented Integer Counter Mode, with a n_e = 128-bit key size and a n_b = 128-bit block size. Conceptually, counter mode consists of encrypting successive integers. The actual definition is somewhat more complicated, in order to randomize the starting point of the integer sequence. Each packet is encrypted with a distinct keystream segment, which is computed as follows. 4.1.1.1 Keystream generation A keystream segment is the concatenation of the 128-bit output blocks of the AES cipher in the encrypt direction, using key k = k_e, in which the block indices are in increasing order. Symbolically, each keystream segment looks like E(k,A) || E(k,A + 1 mod 2^128) || E(k,A + 2 mod 2^128) ... The 128-bit integer value A is defined as 2^16 times the packet index, i, plus k_s (the salting key), modulo 2^128: A = (k_s + (i * 2^16)) modulo 2^128. Note that the initial value A is fixed for each packet. The number of blocks of keystream generated for any fixed value of A MUST NOT exceed 2^16. The AES has a block size of 128 bits, so 2^16 output blocks are sufficient to generate the 2^23 bits of keystream needed to encrypt the largest possible RTP packet (actually, except for IPv6 'jumbograms' [RFC2675], which are not likely to be used for RTP-based multimedia traffic). This restriction on the maximum number of RTP packets ensures the security of the encryption method by limiting the effectiveness of probabilistic attacks [BR98]. 4.1.2 AES in f8-mode To encrypt UMTS (Universal Mobile Telecommunications System, as 3G networks) data, a solution (see [ES3D]) known as the f8-algorithm has been developed. On a high level, the proposed scheme is a variant of Output Feedback Mode (OFB) [HAC], with a more elaborate initialization and feedback function. As in normal OFB, the core Baugher, et al. [Page 15]
INTERNET-DRAFT SRTP November, 2001 consists of a block cipher. We also here define the use of AES as default block cipher to be used in f8-mode for RTP encryption, with 128-bit key and block size. Figure 2 shows the structure of block cipher, E, running in what we shall call "f8-mode of operation". IV | | v +------+ | | +--->| E | | | | | +------+ | | m --> * +-----------+-------------+-- ... ------+ | IV' | | | | | | j=1 --> * j=2 --> * ... j=L-1 --> * | | | | | | | +--> * +--> * ... +--> * | | | | | | | | | v | v | v | v | +------+ | +------+ | +------+ | +------+ | | | | | | | | | | | | k_e ---+--->| E | | | E | | | E | | | E | | | | | | | | | | | | +------+ | +------+ | +------+ | +------+ | | | | | | | +------+ +--------+ +-- ... ----+ | | | | | v v v v S(0) S(1) S(2) . . . S(L-1) Figure 2. f8-mode of operation (asterisk, *, denotes bitwise XOR). 4.1.2.1 Keystream Generation As above, let E(k_e,x) be the 128-bit output of AES in the encrypt direction when applied to the n_e = 128-bit key k_e and n_b = 128-bit plaintext block x. The Initialization Vector (IV) is determined as described in Section 4.1.2.2. Let IV', S(j), and m denote n_b-bit blocks, determined below. The keystream, S(0) || ... || S(L-1), for an N-bit message is defined by setting IV' = E(k_e XOR m, IV), and S(-1) = 00..0. For j = 0,1,.., L-1 where L = N/n_b (rounded up to nearest integer) compute Baugher, et al. [Page 16]
INTERNET-DRAFT SRTP November, 2001 S(j) = E(k_e, IV' XOR j XOR S(j-1)) Notice that the IV is not used directly. Instead it is fed through E under another key to produce an internal, "masked" value (denoted IV') to prevent an attacker from gaining known input/output pairs. The role of the internal counter is to prevent short keystream cycles. The value of the key mask m is defined to be m = k_s || 0x555..5, i.e. the salting key, appended by the binary pattern 0101.. to fill the entire desired key size, n_e. The maximum allowable packet size can be determined as follows. The AES has a block size of 128 bits, and assuming that AES behaves like a random function, it is (heuristically) secure to generate about 2^64 output blocks, which is sufficient to generate 2^71 bits of keystream. For practical sizes of the RTP packets, much fewer blocks are required though, and the counter j above will often be sufficient if implemented as a 16- or 32-bit counter. 4.1.2.2 SRTP IV Formation The purpose of the following IV formation is to provide a feature which we call implict header authentication (IHA), see Section 10.5.1. The IV for 128-bit block AES-f8 is formed in the following way: IV = 0x00 || M || PT || SEQ || TS || SSRC || ROC M, PT, SEQ, TS, SSRC are taken from the RTP header; ROC is from the crypto context. The presence of the SSRC as part of the IV allows AES_f8 to be used when a master key is shared between multiple streams, see Section 10.2. 4.1.2.3 SRTCP IV Formation The IV for 128-bit block AES-f8 is formed in the following way: IV = 0x00000000 || E || SRTCP index || V || P || RC || PT || length || SSRC V, P, RC, PT, length, SSRC are taken from the first header in the RTCP compound packet. E || SRTCP index is the added 32-bit index to the packet. Baugher, et al. [Page 17]
INTERNET-DRAFT SRTP November, 2001 4.1.3 NULL Cipher The NULL cipher is used when no confidentiality for RTP is requested. The keystream can be thought of as "000..0", e.g. the encryption simply copies the plaintext input into the ciphertext output. 4.2 Message Authentication and Integrity Common parameters * k_a is the session authentication key. * n_a is the bit-length of the authentication key. The default is 128 bits. * n_tag is the bit-length of the output authentication tag. The default is 32 bits. * SRTP_PREFIX_LENGTH is the octet length of the keystream prefix as defined above. * M is the Authenticated Portion as specified in Section 3 for RTP and 3.3 for RTCP. The session key is by default derived as specified in Section 4.3. The values of n_a, n_tag, and SRTP_PREFIX_LENGTH MUST be fixed for any particular fixed value of the key. Below we describe the process of computing authentication tags. The SRTP receiver verifies a message/authentication tag pair as follows. A new authentication tag is computed using one of the algorithms below, and it is compared to the tag associated with the message. If the two tags are equal, then the message/tag pair is valid; otherwise, it is not and the error audit message "AUTHENTICATION FAILURE" MUST be returned. 4.2.1. HMAC/SHA1 The default authentication code is HMAC with SHA1 [HMAC]. When HMAC/SHA1 is used, the SRTP_PREFIX_LENGTH is 0. For RTP, the HMAC is applied to the concatenation of the Authenticated Portion of the packet (M) and the rollover counter in the cryptographic context, i.e. HMAC(k_a, M || ROC). For RTCP, we apply HMAC to the corresponding M, only. By default, the output shall be truncated to the n_tag left-most bits. 4.2.2 TMMH/16 TMMH is a simple function that maps a key and a message to a hash value. This hash value is encrypted by combining it with the keystream prefix to make the authentication tag, as described below. Baugher, et al. [Page 18]
INTERNET-DRAFT SRTP November, 2001 TMMH/16 uses sixteen bit unsigned words as a basic data unit, and besides the above common parameters we define the following parameters for convenience: - MESSAGE_LENGTH is the octet length of M. - K is the key, i.e. k_a. - KEY_LENGTH is the octet length of K, i.e. n_a divided by 8. - TAG is the authentication tag, which is the output of TMMH/16 - TAG_LENGTH is the octet length of the authentication tag, i.e. n_tag divided by 8. This value defines SRTP_PREFIX_LENGTH to be equal to TAG_LENGTH. - PREFIX is the key stream prefix for the current packet as defined in Section 4.1. The values of KEY_LENGTH and TAG_LENGTH MUST obey the alignment restrictions described below. For TMMH/16, a word is 16-bits long; with the word being 2-bytes long, the TAG_LENGTH and KEY_LENGTH MUST be even; if MESSAGE_LENGTH is odd, the MESSAGE MUST be padded with a zero octet, but this does not change the value of MESSAGE_LENGTH. The words of the key are denoted as K[0], K[1], ..., K[KEY_WORDS], and the words of the message (after zero padding, if needed) are denoted as M[1], M[2], ..., M[MSG_WORDS], where MSG_WORDS is the smallest number such that 2 * MSG_WORDS is at least MESSAGE_LENGTH, and KEY_WORDS is KEY_LENGTH / 2. If MESSAGE_LENGTH is greater than KEY_LENGTH - TAG_LENGTH, then the value of TMMH/16 is undefined. Implementations MUST indicate an error if asked to hash a message with such a length. Otherwise, the hash value is defined to be the length TAG_WORDS sequence of words in which the j-th word in the sequence is defined as T[j] = [[ K[j] * MESSAGE_LENGTH +32 K[j+1] * M[1] +32 K[j+2] * M[2] +32 ... K[j+MSG_WORDS] * M[MSG_WORDS] ] modulo p ] modulo 2^16 where j ranges from zero to TAG_WORDS-1. Here, TAG_WORDS is equal to TAG_LENGTH/2, and p is equal to 2^16 + 1. The symbol * denotes multiplication and the symbol +32 denotes addition modulo 2^32. To compute the authentication tag of an SRTP packet, the TMMH hash value of that message is computed, then that value is combined with Baugher, et al. [Page 19]
INTERNET-DRAFT SRTP November, 2001 the keystream prefix as defined in Section 4.1. The combining operation is word-wise addition modulo 2^16 (for TMMH/16). TAG[j] = T[j] +16 PREFIX[j], where j ranges from zero to TAG_WORDS-1. Note that for RTP, where HMAC is applied to M || ROC, TMMH is applied to M only. This is so, because the dependence on ROC is for TMMH inherent to the PREFIX quantity. 4.3 Key Derivation 4.3.1 Key Derivation Algorithm Regardless of the encryption or authentication transform that is employed (it may be a defined transform or newly introduced according to Section 7), SRTP key derivation is the process of generating session keys, without extra communication between the parties and in a sender-receiver synchronized way. packet index ---+ | | v +-----------+ +--------+ session encr_key | ext | master | |----------> | key mgmt | key | key | | (optional |-------->| deriv |----------> | rekey) | | | session auth_key +-----------+ +--------+ Figure 4: SRTP key derivation. At least one initial key derivation is always performed by SRTP. Further applications of the key derivation MAY be performed, according to the 'key derivation rate' value in the crypto context. Let m >= 64, and n be positive integers. A pseudo random function family is a set of keyed functions {PRF_m^n(k,x)} such that for (secret) random key k, given m-bit x, PRF_m^n(k,x) is an n-bit string, computationally indistinguishable from random n-bit strings. Let a DIV t denote integer division of a by t, rounded down, and with the convention that a DIV 0 = 0 for all a. We also make the convention of treating a DIV t as a bit string of the same length as a, and thus "a DIV t" will in general have leading zeros. Key Baugher, et al. [Page 20]
INTERNET-DRAFT SRTP November, 2001 derivation is defined as follows. To generate session key(s) for the current packet, let the n-bit SRTP key for this packet be PRF_m^n(k_master, <label> || (index DIV key_derivation_rate) || 0x555...) where <label> is a 4-bit constant (see below), key_derivation_rate is as determined in the crypto context, and index is the packet index (i.e. the 48-bit ROC || SEQ for SRTP). We then pad by 1010... to fill the m-bit input size. The session keys are now derived using: - k_e (SRTP encryption): <label> = 0x0, n = n_e. - k_a (SRTP authentication): <label> = 0x1, n = n_a. where n_e and n_a are as determined in the cryptographic context. Note that for the defined counter mode and f8 transforms, the salting key k_s is used directly as determined in the cryptographic context (not going through the derivation). Note that for a key_derivation_rate of 0, anyway the initial key derivation application will take place once. The derivation operation is facilitated if the non-zero rates are chosen to be powers of 2, or preferably, powers of 256. Note that the previously mentioned limit on key usage to at most 2^48 packets for one given key applies both to the derived session keys and to the master keys, as key derivation does not increase this maximum number. 4.3.2 AES-CM PRF The currently defined PRF is keyed by 128 to 256 bit (master) keys, has input block size m = 128 and can produce n-bit outputs for essentially arbitrary n. We define PRF_m^n(k,x) to be AES in counter mode as described in Section 4.1.1, applied to (master) key k, input block A = x, and with the output keystream truncated to the n first (left-most) bits. (Requiring n/128, rounded up, applications of AES.) 4.3.3 SRTCP Key Derivation SRTCP uses the same master key as SRTP, i.e. it is shared between the two protocols. To do this securely, the following changes are done to Section 4.3.1 when applying session key derivation for SRTCP. Baugher, et al. [Page 21]
INTERNET-DRAFT SRTP November, 2001 Replace the index by the 32-bit quantity: 0 || SRTCP index (i.e. excluding the E-bit, replacing it with a fixed 0-bit), and use <label> = 0x2 for the SRTCP encryption key and <label> = 0x3 for the SRTCP authentication key. SRTCP SHALL use the same salting key as SRTP. 5. Default and Mandatory Transforms The "default" transforms also are "mandatory-to-implement" transforms in SRTP. Of course, "mandatory-to-implement" does not imply "mandatory-to-use". 5.1 Encryption: AES-CM AES running in Counter Mode, as defined in Section 4.1.1, is the default encryption algorithm, which is mandatory-to-implement. 5.2 Authentication/Integrity: HMAC/SHA1 HMAC/SHA1, as defined in Section 4.2.1, is the default and mandatory- to-implement message authentication code. 5.3 Key Derivation: AES-CM PRF The AES Counter Mode PRF defined in Sections 4.3.1 and 4.3.2, is the default and mandatory-to-implement method for generating keys. 6. SRTP Parameters The SRTP-WINDOW-SIZE is defined to be at least 64 (Section 3.2.3). The current defined modes are Segmented Integer Counter Mode (default), f8-mode (Section 4), and the NULL Cipher. The default cipher is AES (Section 4), used with a block- and encryption key size of n_b = n_e = 128 bits. The current defined authentication functions are the HMAC/SHA1 and TMMH/16. Default value is absence of authentication for RTP (authentication is mandatory for RTCP). For HMAC/SHA1, the default key-size is n_a = 128 bits and the output length is n_tag = 32 bits. SRTP_PREFIX_LENGTH is therefore by default 0. The default size of the master key and salting key shall thus also be 128 bits. Baugher, et al. [Page 22]
INTERNET-DRAFT SRTP November, 2001 The default value for the key derivation-rate field in the context is "0", in practice meaning "no key-derivation" (though one (1) application of it is mandatory, see Section 4.3). 7. Adding SRTP Transforms Sections 4 provide examples of the level of detail needed for defining transforms (Section 4). Whenever a new transform is to be added to SRTP, a companion standards-track RFC MUST be written to exactly define how the new transform can be used with SRTP (and SRTCP). Such a companion RFC should avoid to overlap with the SRTP protocol document. Note however, that it might be necessary to extend the cryptographic context's definition with new parameters, or add steps to the packet processing. The companion RFC shall explain any known issues regarding interactions between the transform and other aspects of SRTP. Encryption and authentication transforms require some set of optional parameters or have optional modes of operation. The companion RFC shall select fixed or default values for these parameters (whenever possible), to reduce key management complexity. The mode of operation of ciphers and related parameters (e.g. IV-formation for RTP and RTCP) shall be defined. Each new transform document should specify its key attributes, e.g. size of keys (minimum, maximum, recommended), format of keys, recommended/required processing of input keying material, requirements/recommendations on re-keying and key derivation, etc. 8. Rationale 8.1 Key derivation Key derivation has been introduced to lighten the burden on the key- exchange: the four keys necessary to protect the RTP session (SRTP and SRTPC encryption keys, SRTP and SRTCP authentication keys) are derived from a single master key in a cryptographically secure way. The security stands (and falls) with the master key as the derived session keys are cryptographically independent (under reasonable assumptions on the PRF, here AES-based). Subsequent applications of the key derivation are optional but will give security benefits when enabled. They prevent a cryptanalysist from obtaining large amounts of ciphertext produced by a single fixed session key. They provide backwards and forward security in the sense that a compromised session key does not compromise other session keys derived from the same master (but of course, a leaked master key reveals all session keys). Baugher, et al. [Page 23]
INTERNET-DRAFT SRTP November, 2001 If future encryption transforms are added, having a short IV that cannot fit the SEQ+ROC combination, a proper refresh-policy will enable these algorithms to encrypt longer streams without need to involve expensive key management operations. 8.2 Salting key The salting key has been introduced to protect against some attacks on additive stream ciphers, see Section 10.1. For simplicity, we per default require the salting key to have the same size as the block size of the cipher. 8.3 TMMH: Message Integrity from Universal Hashing The Truncated Multi-Modular Hash Function (TMMH) is a "universal" hash function suitable for message authentication in the Wegman- Carter paradigm [WC81]. It is simple, quick, and especially appropriate for Digital Signal Processors and other processors with a fast multiply operation, though a straightforward implementation requires storage equal in length to the largest message to be hashed. TMMH offers secure (provably secure under randomness assumptions on the added prefix) and very efficient MACs. However, as this approach to message integrity is new (not conceptually, but within standardization), we have chosen to make HMAC the default transform as many devices already have an HMAC implementation used for other purposes. We envision a migration to TMMH so that HMAC may eventually be phased-out from SRTP. 8.4 Data Origin Authentication Considerations Note that in unicast and, in general, in keys-per-user scenarios, integrity and data origin authentication are provided together. However, in group scenarios where the keys are shared between members, the MAC tag only proves that a member of the group sent the packet, but does not prove the actual sender. Data origin authentication (DOA) for multicast and group RTP sessions is a hard problem that needs a solution; while some promising proposals are being investigated [PCST1, PCST2], more work is needed to rigorously specify these technologies. Thus SRTP data origin authentication in groups is for further study. DOA can be done otherwise using signatures. However, this has high impact in terms of bandwidth and processing time, therefore we do not consider signatures in the discussion. The presence of mixers and translators does not allow data origin authentication in case the RTP payload and/or the RTP header are Baugher, et al. [Page 24]
INTERNET-DRAFT SRTP November, 2001 manipulated. Note that this type of middle entities also disrupts end-to-end confidentiality (being the IV formation dependent e.g. on the RTP header preservation). 9. Key Management Considerations The SSRC and the random initial sequence number are known to the key management. A particular key management system might allow the different RTP sessions to use identical cryptographic master keys. Note that this is possible if the design of the synchronization mechanism, i.e. the IV in the case of the f8-mode, avoids keystream re-use (the two-time pad, Section 10.2). If this is used, the SSRC MUST be unique per stream. A particular key management system might choose to provide re-key by associating a key for a crypto context with a pair of SEQ+ROC values, <FirstSEQ+ROC, LastSEQ+ROC>. The key management specification may require the SRTP implementation to check the SEQ+ROC of an incoming SRTP packet against the interval for the master key in the context before using the key. These interactions are defined by the key management interface to SRTP and are not defined by this protocol specification. The key management interface might use the defaults for the SRTP protocol or define values for any and all SRTP parameters such as the following: - cipher and related parameters, including mode of operation - key(s), i.e. correct master (and salting) key(s), and related parameters, - authentication algorithm(s), and related parameter, - re-keying (key lifetime) and key derivation parameters, - SSRC, network address, RTP port pair - Current value of ROC and SEQ (or zeros prior to session commencement) - Replay window size 10. Security Considerations 10.1 Key Usage The effective key size is determined (upper bounded) by the size of the master key and, for encryption, the size of the salting key. Any additive stream cipher is vulnerable to attacks that use statistical knowledge about the plaintext source to enable key collision and time-memory tradeoff attacks [MF00,H80,Bi96]. These attacks take advantage of commonalities among plaintexts, and provide Baugher, et al. [Page 25]
INTERNET-DRAFT SRTP November, 2001 a way for a cryptanalyst to amortize the computational effort of decryption over many keys, thus reducing the effective key size of the cipher. A detailed analysis of these attacks and their applicability to the encryption of Internet traffic is provided in [MF00]. In summary, the effective key size of SRTP when used in a security system in which m distinct keys are used, is equal to the key size of the cipher less the logarithm (base two) of m. Protection against such attacks can be provided simply by increasing the size of the keys used, which here can be accomplished by the use of the "salting key". Note that the salting key MUST be random, but MAY be public. Implementations SHOULD use keys that are as large as possible. Please note that in many cases increasing the key size of a cipher does not affect the throughput of that cipher. 10.2 SSRC collision and two-time pad Any fixed keystream output, generated from the same key and index should only be used to encrypt once. Re-using such keystream (jokingly called a 'two-time pad' system by cryptographers), can seriously compromise security. The NSA's VENONA project [C99] provides a historical example of such a compromise. In SRTP, a 'two- time pad' is avoided by requiring the key, or some other parameter of cryptographic significance, to be unique per RTP stream. It may in some cases be desirable that multiple crypto contexts contain identical master keys. For instance, there could be a desire for a group to share a single key. Issues as above (two-time pad) MUST then be considered. As discussed in Section 9, f8 may allow such sharing by its use of the SSRC in the IV; however, the effect of an eventual RTP SSRC collision detection MUST be taken into account. Note that sharing a master key between multiple streams in a multimedia session implies using a distinct SSRC in the IV of AES-f8. This means, each SSRC MUST be unique among all the RTP streams inside that multimedia session, to avoid unlucky IV combinations and end up in two-time pad. 10.3 Confidentiality of the RTP Payload By using 'seekable' stream ciphers, SRTP avoids the denial of service attacks that are possible on stream ciphers that lack this property (these attacks are described in Section 3.4 of [B96]). It is important to be aware that, as with any stream cipher, the exact length of the payload is revealed by the encryption. This means that it may be possible to deduce certain "formatting bits" of the payload, as the length of the codec output might vary due to certain parameter settings etc. This, in turn, implies that the corresponding Baugher, et al. [Page 26]
INTERNET-DRAFT SRTP November, 2001 bit of the keystream can be deduced. However, if the stream cipher is secure (counter mode and f8 are provably secure under certain assumptions), knowledge of a few bits of the keystream will not aid an attacker in predicting the following keystream bits. Thus, the payload length (and information deducible from this) will leak, but nothing else. 10.4 Confidentiality of the RTP Header With the described proposal, RTP headers are sent in the clear to allow for header compression. This means that data such as payload type, synchronization source identifier, and timestamp are available to an eavesdropper. Moreover, since RTP allows for future extensions of headers, we cannot foresee what kind of possibly sensitive information might also be "leaked". The described proposal is a low-cost method, which allows header compression to reduce bandwidth. It is up to the endpoints policies to decide about the security protocol to employ. If the header compression is omitted, other solutions might be applicable. In other words, we provide a solution that works in the most general scenario, even in the most demanding one (like conversational multimedia over low-bandwidth, unreliable media). Of course the solution will then also work in less restricted environments, but we suggest that if one really needs to protect headers, and is allowed to do so by the surrounding environment, then one should also look at alternatives, e.g. IPsec. In addition, we strongly recommend the use of profiles to select the right trade-off for the required level of security, e.g. if the headers can be left in cleartext or not. 10.5 Integrity of the RTP packet Additive ciphers do not provide any security service other than privacy. In particular, they do not provide message authentication (see [RK99] or [HAC] for a discussion of this security service). However, SRTP uses a message authentication code to provide that security service. With HMAC being a well-studied authentication scheme, based on a provably secure construction, the security against MAC forgery depends on the key-size and the size of the output tags (or for some attacks, half the size of the tag due to the "birthday-paradox"). The default size for HMAC has been fixed to 32 bits. Other size values may be defined. The use of a truncated size is motivated by the fact that it may be desirable, e.g. in wireless environments, to save bandwidth. The choice of such a truncation MUST be evaluated to the reduction in security it implies. The default 32-bit size is a Baugher, et al. [Page 27]
INTERNET-DRAFT SRTP November, 2001 compromise, offering a reasonable level of security, taking into account the real-time aspects of the protected protocol. High security applications SHOULD however use larger tags. The fact that authentication is optional is motivated by the fact that, while the function is typically highly desired, there are certain cases (notably in cellular environments) where it has an impact in terms of cost, e.g. for bandwidth consumption. Also, independently of the tag length, a single transmission bit error in the protected part of the packet or in the tag itself forces the entire packet to be dropped. Given a fixed quality, it implies the necessity of higher protection of the transmitted unit, hence higher cost. In those cases, it is up to the user security profile to request authentication. 10.5.1 Integrity of the RTP header: IHA The IV formation of the f8-mode gives implicit authentication of the RTP header, even if no cryptographic integrity protection is present. This means that modifying bits of the RTP header will cause the decryption process at the receiver to produce essentially random garbage. 11. Interaction with Forward Error Correction mechanisms Some considerations are due when Forward Error Correction mechanisms are performed, e.g. as specified in RFC 2733. In particular, the order in which SRTP processing and the error correction processing are applied, is of concern. The optimal order would be the following: - on the sender side, first encrypt the packet, then perform the FEC processing, finally authenticate - on the receiver side, first authenticate the packet, then perform the FEC processing, finally decrypt. The motivations for the above ordering are: - FEC expands the packet, so performing encryption after FEC would be more expensive - on the receiver side, authentication has to be verified before getting engaged in the FEC processing, to reduce effects of certain denial of service attacks - adding redundancy before encrypting, slightly reduces the effective key-size and resistance to attacks Baugher, et al. [Page 28]
INTERNET-DRAFT SRTP November, 2001 However, this implies to split the security processing. Implementations could gain in keeping the security process strictly tied, in this case the recommendation is that the security processing takes place after FEC on the sender's side, and before FEC on the receiver's side. This implies the cost of placing encryption after FEC processing, as above explained, hence a convenient choice is left to the application. For interoperability clearness, implementations are requested to place the security process after FEC on the sender's side, and before FEC on the receiver's side. This is also default behavior; another choice has to be agreed out-of-band. 12. IANA Considerations The RTP specification establishes a registry of profile names for use by higher-level control protocols, such as the Session Description Protocol (SDP), to refer to transport methods. This profile registers the name "RTP/SAVP". 13. Open Issue It is open issue to investigate the need for AES-CM to provide a mean to support the use of the same master key for multiple streams. This feature was supported in the previous drafts by insertion of the SSRC in the IV (under the constraint of unique SSRC). The feature is currently supported only by the non-mandatory-to- implement f8-AES. The reason for raising this question is that there might be cases where the feature is needed, e.g. when a single master key is available but there are multiple streams. As an example, it is likely that such simplistic key management is used in very 'thin' clients that cannot afford implementing anything but the mandatory transform. Thus, this may be a restriction in SRTP's applicability in such devices. 14. Acknowledgements The authors would like to thank Magnus Westerlund, Brian Weis, Robert Fairlie-Cuninghame, and Adrian Perrig for their reviews and comments. 15. Author's Addresses Questions and comments should be directed to the authors and avt@ietf.org: Mark Baugher Cisco Systems, Inc. 5510 SW Orchid Street Phone: +1 503-245-4543 Baugher, et al. [Page 29]
INTERNET-DRAFT SRTP November, 2001 Portland, OR 97219 USA Email: mbaugher@cisco.com Rolf Blom Ericsson Research SE-16480 Stockholm Phone: +46 8 58531707 Sweden EMail: rolf.blom@era.ericsson.se Elisabetta Carrara Ericsson Research SE-16480 Stockholm Phone: +46 8 50877040 Sweden EMail: elisabetta.carrara@era.ericsson.se David A. McGrew Cisco Systems, Inc. San Jose, CA 95134-1706 Phone: +1 301-349-5815 USA EMail: mcgrew@cisco.com Mats Naslund Ericsson Research SE-16480 Stockholm Phone: +46 8 58533739 Sweden EMail: mats.naslund@era.ericsson.se Karl Norrman Ericsson Research SE-16480 Stockholm Phone: +46 8 4044502 Sweden EMail: karl.norrman@era.ericsson.se David Oran Cisco Systems, Inc. San Jose, CA 95134-1706 USA EMail: oran@cisco.com 16. References [AES] NIST, "Advanced Encryption Standard (AES)", Draft FIPS, http://www.nist.gov/aes/ [C99] Crowell, W. P., "Introduction to the VENONA Project", http://www.nsa.gov:8080/docs/venona/index.html. [ES3D] ETSI SAGE 3GPP Standard Algorithms Task Force, "Security Algorithms Group of Experts (SAGE); General Report on the Design,Specification and Evaluation of 3GPP Standard Confidentiality and Integrity Algorithms", Public report, Draft Version 1.0, Dec 1999. [ES3E] ETSI SAGE 3GPP Standard Algorithms Task Force, "Security Algorithms Group of Experts (SAGE) Report on the Evaluation of 3GPP Standard Confidentiality and Integrity Algorithms", Public report, Draft Version 1.0, Dec 1999. Baugher, et al. [Page 30]
INTERNET-DRAFT SRTP November, 2001 [HAC] Menezes, A., Van Oorschot, P., and Vanstone, S., "Handbook of Applied Cryptography", CRC Press, 1997, ISBN 0-8493-8523-7. [HMAC] Krawczyk, H., Bellare, M., and Canetti, R.: "HMAC: Keyed- hashing for message authentication". IETF RFC 2104, February 1997. [H80] Hellman, M. E., "A cryptanalytic time-memory trade-off", IEEE Transactions on Information Theory, July 1980, pp. 401-406. [MF00] McGrew, D., and Fluhrer, S., "Attacks on Encryption of Redundant Plaintext and Implications on Internet Security", the Proceedings of the Seventh Annual Workshop on Selected Areas in Cryptography (SAC 2000), Springer-Verlag. [RFC1889] Schulzrinne, H., Casner, S., Frederick, R., Jacobson,V., "RTP: A Transport Protocol for Real-Time Applications", IETF RFC 1889. [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", IETF RFC 2119, March 1997. [RFC2401] Kent, S., and R. Atkinson, "Security Architecture for IP", IETF RFC 2401, November 1998. [RFC2675] Borman, D., Deering, S., Hinden, R., "IPv6 Jumbograms", IETF RFC 2675, August 1999. [RFC2828] Shirey, R., "Internet Security Glossary", IETF RFC 2828, May 2000. [RK99] Rescorla, E., and Korver, B., "Guidelines for Writing RFC Text on Security Considerations," draft-rescorla-sec-cons- 00.txt [PCST1] Perrig, A., Canetti, R., Tygar, D., Song, D., "Efficient and Secure Source Authentication for Multicast", in Proc. of Network and Distributed System Security Symposium NDSS 2001, pp. 35-46, 2001. [PCST2] Perrig, A., Canetti, R., Tygar, D., Song, D., "Efficient Authentication and Signing of Multicast Streams over Lossy Channels", in Proc. of IEEE Security and Privacy Symposium S&P2000, pp. 56-73, 2000. [WC81] M. N. Wegman and J. L. Carter, "New Hash Functions and Their Use in Authentication and Set Equality", JCSS 22, 265-279, 1981. Baugher, et al. [Page 31]
INTERNET-DRAFT SRTP November, 2001 Appendix A: Pseudocode for Index Determination, and ROC and s_l Update Pseudocode for the algorithm to process a packet with sequence number SEQ, determining the index i and updating the rollover counter and sequence number for the last (authenticated) packet, s_l. if (s_l < 32,768) if (SEQ - s_l > 32,768) set i to SEQ + 65,536 * (ROC-1) else set i to SEQ + 65,536 * ROC endif else if (s_l - 32,768 > SEQ) set ROC to ROC + 1 endif set i to SEQ + ROC * 65,536 endif set s_l to SEQ Appendix B: Test Vectors B.1 AES-f8 Test Vectors All values are in hexadecimal. SRTP PREFIX LENGTH : 0 RTP packet header : 806e5cba50681de55c621599 RTP packet payload : 70736575646f72616e646f6d6e657373 20697320746865206e65787420626573 74207468696e67 ROC : d462564a key : 234829008467be186c3de14aae72d62c salt key : 32f2870d key-mask (m) : 32f2870d555555555555555555555555 key XOR key-mask : 11baae0dd132eb4d3968b41ffb278379 IV : 006e5cba50681de55c621599d462564a IV' : 595b699bbd3bc0df26062093c1ad8f73 j : 0 IV' XOR j : 595b699bbd3bc0df26062093c1ad8f73 S(-1) : 00000000000000000000000000000000 S(-1) XOR IV' XOR j : 595b699bbd3bc0df26062093c1ad8f73 S(0) : 71ef82d70a172660240709c7fbb19d8e Baugher, et al. [Page 32]
INTERNET-DRAFT SRTP November, 2001 plaintext : 70736575646f72616e646f6d6e657373 ciphertext : 019ce7a26e7854014a6366aa95d4eefd j : 1 IV' XOR j : 595b699bbd3bc0df26062093c1ad8f72 S(0) : 71ef82d70a172660240709c7fbb19d8e S(0) XOR IV' XOR j : 28b4eb4cb72ce6bf020129543a1c12fc S(1) : 3abd640a60919fd43bd289a09649b5fc plaintext : 20697320746865206e65787420626573 ciphertext : 1ad4172a14f9faf455b7f1d4b62bd08f j : 2 IV' XOR j : 595b699bbd3bc0df26062093c1ad8f70 S(1) : 3abd640a60919fd43bd289a09649b5fc S(1) XOR IV' XOR j : 63e60d91ddaa5f0b1dd4a93357e43a8c S(2) : 584d14a591acfca846b3aa3a0ab50fec plaintext : 74207468696e67 ciphertext : 2c6d60cdf8c29b B.2 AES-CM Test Vectors All values are in hexadecimal. AES-CM Key: 75387824D1F1F3815641B65D78D51EDB96C9781981053CBBCB36927844F1932C Block Cipher Key: 75387824D1F1F3815641B65D78D51EDB Salting key: 96C9781981053CBBCB36927844F1932C Packet Index: 12345678 Counter Keystream 96C9781981053CBBCB36A4AC9B69932C EA0AA027BA6D56E44B28F43A7E3E5F58 96C9781981053CBBCB36A4AC9B69932D CBDB3107EDA8D420D3EF7AB7FF290166 96C9781981053CBBCB36A4AC9B69932E AED6F7CB14ED49174336CC010AEB8780 96C9781981053CBBCB36A4AC9B69932F 4C3A754AF027A5C8CCB40E0FE20AF246 96C9781981053CBBCB36A4AC9B699330 01A6D1CE983EF993E980CC9568587E3D Keystream Segment (final output) EA0AA027BA6D56E44B28F43A7E3E5F58CBDB3107EDA8D420D3EF7AB7FF290166 AED6F7CB14ED49174336CC010AEB87804C3A754AF027A5C8CCB40E0FE20AF246 01A6D1CE983EF993E980CC9568587E3D... Baugher, et al. [Page 33]
INTERNET-DRAFT SRTP November, 2001 B.3 TMMH/16 Test Vectors This section provides test vectors which can be used to test an implementation of TMMH/16. The key, message, and outputs are expressed as octet sequences, with each octet in hexadecimal. KEY_LENGTH: 10 TAG_LENGTH: 2 key: { 0x01, 0x23, 0x45, 0x67, 0x89, 0xab, 0xcd, 0xef, 0xfe, 0xdc } message: { 0xca, 0xfe, 0xba, 0xbe, 0xba, 0xde } output: { 0x9d, 0x6a } KEY_LENGTH: 10 TAG_LENGTH: 2 key: { 0x01, 0x23, 0x45, 0x67, 0x89, 0xab, 0xcd, 0xef, 0xfe, 0xdc } message: { 0xca, 0xfe, 0xba } output: { 0xc8, 0x8e } KEY_LENGTH: 10 TAG_LENGTH: 4 key: { 0x01, 0x23, 0x45, 0x67, 0x89, 0xab, 0xcd, 0xef, 0xfe, 0xdc } message: { 0xca, 0xfe, 0xba, 0xbe, 0xba, 0xde } output: { 0x9d, 0x6a, 0xc0, 0xd3 } This Internet-Draft expires in April 2002. Baugher, et al. [Page 34]