Internet Engineering Task Force Baugher, McGrew, AVT Working Group Oran (Cisco) INTERNET-DRAFT Blom, Carrara, Naslund, EXPIRES: July 2002 Norrman (Ericsson) February 2002 The Secure Real Time Transport Protocol <draft-ietf-avt-srtp-03.txt> Status of this memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or cite them other than as "work in progress". The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/lid-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html Abstract This document describes the Secure Real Time Transport Protocol (SRTP), a profile of the Real Time Transport Protocol (RTP), which can provide confidentiality, message authentication, and replay protection to the RTP/RTCP traffic. SRTP can achieve high throughput and low packet expansion. SRTP proves to be a suitable protection for heterogeneous environments, i.e., environments including both wired and wireless links. To get such features, default transforms are described, based on an additive stream cipher for encryption, a keyed-hash based function for message authentication, and an 'implicit' index for sequencing/synchronization based on the RTP sequence number for SRTP and an index number for Secure RTCP (SRTCP). INTERNET-DRAFT SRTP February, 2002 TABLE OF CONTENTS 1. Notational Conventions............................................3 2. Goals.............................................................3 3. SRTP Framework....................................................4 3.1 SRTP Cryptographic Contexts.....................................5 3.1.1 Transform-independent parameters............................6 3.1.2 Transform-dependent parameters..............................7 3.1.3 Mapping SRTP Packets to Cryptographic Contexts..............8 3.2 SRTP Packet Processing..........................................8 3.2.1 Packet Index Determination, and ROC, s_l Update............10 3.2.2 Replay Protection..........................................12 3.3 Secure RTCP....................................................12 4. Pre-Defined Cryptographic Transforms.............................16 4.1 Encryption.....................................................16 4.1.1 AES in Counter Mode........................................18 4.1.2 AES in f8-mode.............................................19 4.1.3 NULL Cipher................................................21 4.2 Message Authentication and Integrity...........................21 4.2.1. HMAC/SHA1.................................................22 4.2.2 TMMH.......................................................22 4.3 Key Derivation.................................................25 4.3.1 Key Derivation Algorithm...................................25 4.3.2 SRTCP Key Derivation.......................................27 4.3.3 AES-CM PRF.................................................27 5. Default and Mandatory Transforms.................................27 5.1 Encryption: AES-CM and NULL....................................27 5.2 Message Authentication/Integrity: HMAC/SHA1....................27 5.3 Key Derivation: AES-CM PRF.....................................27 6. SRTP/SRTCP Parameters............................................28 7. Adding SRTP Transforms...........................................28 8. Rationale........................................................29 8.1 Key derivation.................................................29 8.2 Salting key....................................................29 8.3 TMMH: Message Integrity from Universal Hashing.................30 8.4 Data Origin Authentication Considerations......................30 9. Key Management Considerations....................................30 10. Security Considerations.........................................32 10.1 SSRC collision and two-time pad...............................32 10.2 Key Usage.....................................................33 10.3 Confidentiality of the RTP Payload............................34 10.4 Confidentiality of the RTP Header.............................34 10.5 Integrity of the RTP packet...................................35 10.5.1 Integrity of the RTP header: IHA..........................35 11. Interaction with Forward Error Correction mechanisms............36 12. Scenarios.......................................................36 12.1 Two-party Unicast.............................................37 12.1.1 One bi-directional RTP stream.............................37 12.1.2 One master key per party..................................37 12.2 Multicast.....................................................38 Baugher, et al. [Page 2]
INTERNET-DRAFT SRTP February, 2002 12.2.1 Small conference with one sender..........................38 12.2.2 Large multicast with one sender...........................39 12.3 Re-keying and access control..................................39 12.4 Summary of basic scenarios....................................41 13. IANA Considerations.............................................41 14. Acknowledgements................................................41 15. Author's Addresses..............................................41 16. References......................................................42 Appendix A: Pseudocode for Index Determination......................44 Appendix B: Test Vectors............................................44 B.1 AES-f8 Test Vectors............................................44 B.2 AES-CM Test Vectors............................................45 B.3 TMMH Test Vectors..............................................46 1. Notational Conventions The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119]. Terminology is conform to [RFC2828]. By convention, the left most bit (byte) is the most significant one. By XOR we mean bitwise addition modulo 2 of binary strings, and || denotes concatenation. In other words, if C = A || B, then the most significant bits of C are the bits of A, and the least significant bits of C equal the bits of B. Hexadecimal numbers are prefixed by 0x. 2. Goals The security goals for SRTP are to ensure: * the confidentiality of the RTP and RTCP payloads, and * the integrity protection of the entire RTP and RTCP packets, together with protection against replayed packets. These security services are optional and independent from each other, except that SRTCP integrity protection is mandatory. Other, functional, goals for the protocol are: * a framework that permits upgrading with new cryptographic transforms, * low bandwidth cost, i.e., a framework preserving RTP header compression efficiency, and, asserted by the pre-defined transforms: Baugher, et al. [Page 3]
INTERNET-DRAFT SRTP February, 2002 * a low computational cost, * a small footprint (i.e. small code size and data memory for keying information and replay lists), * limited packet expansion to support the bandwidth economy goal, * independence from the underlying transport, network, and physical layers used by RTP, in particular high tolerance to packet loss and re-ordering, and robustness to transmission bit-errors in the encrypted payload. These properties ensure that SRTP is a suitable protection scheme for RTP/RTCP in both wired and wireless scenarios. 3. SRTP Framework RTP is the Real Time Transport Protocol [RFC1889]. We define SRTP as a profile of RTP, in a way analogous to RFC1890 which defines the audio/video profile for RTP. Conceptually, we consider it to be a 'bump in the stack' implementation which resides between the RTP application and the transport layer, which intercepts RTP packets and then forwards an equivalent SRTP packet on the sending side, and which intercepts SRTP packets and passes an equivalent RTP packet up the stack on the receiving side. The format of an SRTP packet is illustrated in Figure 1. The Encrypted Portion of an SRTP packet consists of the encryption of the RTP payload of the equivalent RTP packet. (Our use of the word 'encryption' includes also the possibility of a 'NULL'- encryption.) The optional MKI and optional authentication tag are the only fields defined by SRTP that are not in RTP. Only 8-bit alignment is assumed. MKI (Master Key Identifier): variable length, optional The MKI is defined, signaled, and used by key management. The MKI identifies the master key from which the session key(s) were derived that authenticate and/or encrypt the particular packet. Note that the MKI SHALL NOT identify the SRTP cryptographic context, which is identified according to Section 3.1.3. The MKI MAY be used by key management for the purposes of re-keying and identifies a particular master key within the cryptographic context, viz. Section 3.1.1. Authentication tag: variable length, optional Baugher, et al. [Page 4]
INTERNET-DRAFT SRTP February, 2002 The authentication tag shall be used to carry message authentication data. The Authenticated Portion of an SRTP packet consists of the RTP header followed by the Encrypted Portion of the SRTP packet. Thus, note that if both encryption and authentication are applied, encryption SHALL be applied before authentication on the sender side and conversely on the receiver side. The authentication tag provides authentication of the RTP header and payload, and it indirectly provides replay protection by authenticating the sequence number. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-->+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |V=2|P|X| CC |M| PT | sequence number | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | timestamp | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | synchronization source (SSRC) identifier | | +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ | | contributing source (CSRC) identifiers | | | .... | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | RTP extension (optional) | | +>+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | | | | | payload | | | | .... | +-+>+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | ~ SRTP MKI (optional) ~ | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | ~ authentication tag (optional) ~ | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | +- Encrypted Portion +---- Authenticated Portion Figure 1. The format of an SRTP packet. 3.1 SRTP Cryptographic Contexts Each SRTP session requires the sender and receiver to maintain cryptographic state information. This information is called the cryptographic context. By a session key, we mean a key which is used directly in a cryptographic transform (e.g. encryption or message authentication), and by a master key, we mean a random bit string (given by the key management protocol) from which session keys are derived in a cryptographically secure way. Baugher, et al. [Page 5]
INTERNET-DRAFT SRTP February, 2002 3.1.1 Transform-independent parameters The transform-independent parameters of the cryptographic context for SRTP consist of: * a 32-bit unsigned rollover counter (ROC), which records how many times the 16-bit RTP sequence number has been reset to zero after passing through 65,535. Unlike the sequence number (SEQ), which SRTP extracts from the RTP packet header, the ROC is maintained by SRTP as described in Section 3.2.1. We define the index of the SRTP packet corresponding to a given ROC and RTP sequence number to be the 48-bit quantity i = 2^16 * ROC + SEQ. * for the receiver only, a 16-bit sequence number s_l, which is the last received sequence number (possibly authenticated, if authentication is provided), * an identifier for the encryption algorithm, i.e., the cipher and its mode of operation, and related parameters (when encryption is provided), * an identifier for the message authentication algorithm, and related parameters (when authentication is provided), * a replay list, maintained by the receiver only (when authentication and replay protection are provided), containing indexes of recently received and authenticated SRTP packets, * an indicator (0/1) as to whether an MKI is present in SRTP and SRTCP packets, * if the MKI indicator is set to one, the length (in bytes) of the MKI field, and (for the sender) the actual value of the currently active MKI, (the value of the last three MKI-related parameters above MUST be kept fixed for the life-time of the context) * the master key(s), * for each master key, means to maintain a count of the number of SRTP packets that has been processed with that master key (essential for security, see Sections 3.2.1 and 10), either in the form of an explicit counter, or, the value of the first SRTP index for which the key was used, * non-negative integers n_e, and n_a, determining the length of the session keys for encryption, and message authentication. Baugher, et al. [Page 6]
INTERNET-DRAFT SRTP February, 2002 The master key(s) MUST be random and kept secret. In addition, for each master key, SRTP MAY choose to specify the following associated values: * a master salt, to be used in the key derivation of session keys. This value, when used, MUST be random, but MAY be public. Use of master salt is strongly recommended, see Section 10.2. A 'NULL'- salt is treated as 00...0. * an integer in the set {1,2,4,...,2^16}, the 'key_derivation_rate', where an unspecified value is treated as zero, * if the MKI-indicator is one, the actual MKI value for which the master key is valid, * <'From', 'To'> values, specifying the lifetime for a master key, expressed in terms of the two 48-bit index values inside whose range (including the range end-points) the master key is valid. These values are absolute quantities, not relative. SRTCP by default uses the same cryptographic context parameters, except: * no rollover counter or s_l-value needs to be maintained as the RTCP index is explicitly carried in each SRTCP packet, * a separate replay list is maintained (when replay protection is provided), * SRTCP maintains a separate counter for its master key (even if the master key is the same as that for SRTP, see below) as a means to maintain a count of the number of SRTCP packets that have been processed with that key (c.f. above). Note in particular that the master key(s) MAY be shared between SRTP and SRTCP, if the pre-defined transforms (including the key derivation) are used but the session key(s) MUST NOT be so shared. 3.1.2 Transform-dependent parameters All encryption, authentication/integrity, and key derivation parameters are defined in the Transforms section dedicated to the particular encryption, authentication, or key derivation transform (see Section 4). Typical examples of such parameters are block size of ciphers, session keys, data for IV formation, etc. We note again (it cannot be stressed enough) that SRTP and SRTCP MUST use distinct (pseudo-)random session keys. Future SRTP transform specifications MUST include a section to list the additional cryptographic context's parameters for that transform, if any. Baugher, et al. [Page 7]
INTERNET-DRAFT SRTP February, 2002 3.1.3 Mapping SRTP Packets to Cryptographic Contexts Recall that an RTP session for each participant is defined [RFC1889] by a pair of destination transport addresses (one network address plus a port pair for RTP and RTCP), and that a multimedia session is defined as a collection of RTP sessions. For example, a particular multimedia session could include an audio RTP session, a video RTP session, and a text RTP session. A cryptographic context shall be uniquely identified by the triplet context identifier: context id = <SSRC, destination network address, destination transport port number>, where the destination network address and the destination transport port are the ones in the current RTP packet (for the sender) or SRTP packet (for the receiver). It is assumed that, when presented with this information, the key management returns a context with the information as described in Section 3.1. As noted above, SRTP and SRTCP by default shares the bulk of the parameters in the cryptographic context. Thus, retrieving the crypto context parameters for an SRTCP stream in practice may imply a binding to the correspondent SRTP crypto context. It is up to the implementation to assure such binding, since the RTCP port may not be directly deducible from the RTP port only. Alternatively, the key management MAY choose to provide separate SRTP- and SRTCP-contexts, duplicating the common parameters (such as master key(s)). The latter approach then also enables SRTP and SRTCP to use, e.g., distinct transforms, if so desired. If no valid context can be found for a packet corresponding to a certain context identifier, that packet MUST be discarded from further processing. 3.2 SRTP Packet Processing The following applies to SRTP. SRTCP is described in Section 3.3. Assuming initialization of the cryptographic context(s) has taken place via key management, and as described in Section 3.2.1, to construct a proper SRTP packet, given an RTP packet, the sender has to do the following: 1. Determine which cryptographic context to use as described in Section 3.1.3. Baugher, et al. [Page 8]
INTERNET-DRAFT SRTP February, 2002 2. Determine the index of the SRTP packet as described in Section 3.2.1, using the rollover counter in the cryptographic context and the sequence number in the RTP packet. 3. Determine the master key and master salt. If the MKI indicator in the context is set to one, this is done using the current MKI in the cryptographic context, otherwise, the index determined in the previous step is used. 4. Determine the session keys and salt (if used by the transform) as described in Section 4.3, using master key, master salt, key_derivation_rate and session key-lengths in the cryptographic context and the index, determined in Steps 2 and 3. 5. If encryption is provided, encrypt the RTP payload to produce the Encrypted Portion of the packet (see Section 4.1, for the defined ciphers), using the encryption algorithm indicated in the cryptographic context, the session encryption key and salt (if used) found in Step 4, and the index found in Step 2. 6. If the MKI indicator is set to one, append the MKI to the packet. 7. If message authentication is provided, compute the authentication tag for the Authenticated Portion of the packet, as described in Section 4.2, using the current rollover counter (if used by the transform), the authentication algorithm indicated in the cryptographic context, and the session authentication key found in Step 4. Append the authentication tag to the packet. 8. If necessary, update the ROC as in Section 3.2.1, using the packet index determined in Step 2. To authenticate and decrypt a SRTP packet, the receiver has to do the following: 1. Determine which cryptographic context to use as described in Section 3.1.3. 2. Estimate the index of the SRTP packet from the rollover counter in the cryptographic context and the sequence number in the SRTP packet, as described in Section 3.2.1. 3. Determine the master key and master salt. If the MKI indicator in the context is set to one, this is done using the MKI in the SRTP packet, otherwise, the index from the previous step is used. 4. Determine the session keys, and session salt (if used by the transform) as described in Section 4.3, using master key, key_derivation_rate and session key-lengths in the cryptographic context and the index, determined in Steps 2 and 3. Baugher, et al. [Page 9]
INTERNET-DRAFT SRTP February, 2002 5. If message authentication and replay protection are provided, first check if the packet has been replayed, as described in Section 3.2.2, using the Replay List in the context and the index as determined in Step 2. If the packet is judged to be replayed, then the packet MUST be discarded, and the event SHOULD be logged. Next, perform verification of the authentication tag, using the index (rollover counter when used by the transform) from Step 2, the authentication algorithm indicated in the cryptographic context, and the session authentication key from Step 4. If the result is 'AUTHENTICATION FAILURE' (see Section 4.2), the packet MUST be discarded from further processing and the event SHOULD be logged. 6. If encryption is provided, decrypt the Encrypted Portion of the packet (see Section 4.1, for the defined ciphers), using the decryption algorithm indicated in the cryptographic context, the session encryption key and salt found in Step 4, and the index from Step 2. 7. Update the rollover counter and last sequence number, s_l, in the cryptographic context as in Section 3.2.1, using the packet index estimated in Step 2. If replay protection is provided, also update the Replay List as described in Section 3.2.2. 8. When applicable, delete the MKI and authetication tag fields from the packet. 3.2.1 Packet Index Determination, and ROC, s_l Update SRTP implementations use an 'implicit' packet index for sequencing, i.e., not all of the index is explicitly carried in the SRTP packet, as described below. For the pre-defined transforms, the index i is used in replay protection (Section 3.2.3), encryption and message authentication (Sections 4.1 and 4.2), and for the key derivation (Section 4.3). It MAY also be used to determine the correct master key as indicated above. When the session starts, the sender side MUST set the rollover counter, ROC, to zero. Each time the RTP sequence number, SEQ, wraps modulo 2^16, the sender side MUST increment ROC by one, modulo 2^32 (see security aspects below). The sender's packet index is then defined as i = 2^16 * ROC + SEQ. Receiver-side implementations use the RTP sequence number to estimate the correct index. That is, estimating the location in the sequence of all SRTP packets. Here, the index is defined as 2^16 * v + SEQ, where the RTP sequence number is SEQ, and v is an estimate for the current value of the rollover counter, ROC. This estimate is Baugher, et al. [Page 10]
INTERNET-DRAFT SRTP February, 2002 based on SEQ, a previous estimate for ROC and the value s_l. The latter two are maintained locally by the receiver as described below. A robust approach for the proper use of a rollover counter for the pre-defined transforms requires its handling and use to be well defined. In particular, out-of-order RTP packets with sequence numbers close to 2^16 or zero must be properly dealt with. Initially, the receiver MUST be given the current ROC value from the sender using out of band signaling (or ROC is zero at the beginning of the session), see Section 9. Furthermore, the receiver SHALL initialize s_l to the RTP sequence number (SEQ) of the first observed SRTP packet. On consecutive SRTP packets, the receiver MAY estimate the index as i = 2^16 * v + SEQ, where v is chosen from the set { ROC-1, ROC, ROC+1 } (modulo 2^32) such that i is closest (in modulo 2^48 sense) to the value 2^16 * ROC + s_l. After the packet has been processed using the estimated index, the receiver MUST decide if s_l and ROC should be updated. For instance, a simple (but not error robust) method is to simply set s_l to SEQ and, if the value v = ROC+1 was used, to update ROC to v. Caveat: if message authentication is not present, neither the initialization of s_l, nor the ROC update can be made completely robust on the receiving side. After a re-keying (changing to a new master key) occurs, the roll- over counter maintains its sequence of values, i.e., it MUST NOT be reset to zero, to avoid inconsistencies in key life-times. As the rollover counter is 32 bits long and the sequence number is 16 bits long, the maximum number of packets that can be secured with the same key is 2^48 using the pre-define transforms. After that number of SRTP packets have been sent with a given (master or session) key, the sender MUST not send any more packets with that key. (There exists a similar limit for SRTCP, which in practice may be more restrictive, see Section 3.3 and the summary in Section 10.2.) This limitation enforces a security benefit by providing an upper bound on the amount of traffic that can pass before cryptographic keys are changed. Re-keying (see Section 9) MUST be triggered, before this amount of traffic, and MAY be triggered earlier, e.g., for increased security and access control to media. Re-occurring key derivation, as determined by a non-zero key_derivation_rate (see Section 4.3), also gives stronger security, Baugher, et al. [Page 11]
INTERNET-DRAFT SRTP February, 2002 but does not change the above absolute maximum value, i.e. the master key shall still be used for a maximum of 2^48 SRTP packets (or 2^31 SRTCP packets, see below). The receiver's 'implicit index' approach works for the pre-defined transforms as long as the reorder and loss of the packets are not too great and bit-errors do not occur in unfortunate ways. In particular, 2^15 packets would need to be lost, or a packet would need to be 2^15 packets out of sequence in order for synchronization to be lost. Such drastic loss or reorder is likely to disrupt the RTP application itself. 3.2.2 Replay Protection Secure replay protection is only possible when integrity protection is present. It is RECOMMENDED to use replay protection, both for RTP and RTCP, as integrity protection alone cannot assure security against replay attacks. A packet is 'replayed' when it is stored by an adversary, and then re-injected into the network. SRTP provides protection against such attacks whenever message authentication is provided, through the storage of the indices of the most recently received and authenticated packets. Each SRTP receiver maintains a Replay List, which conceptually contains the indices of all of the packets which have been received and authenticated. In practice, the list can use a 'sliding window' approach, so that a fixed amount of storage suffices for replay protection. Packet indices which lag behind the packet index in the context by more than SRTP-WINDOW-SIZE can be assumed to have been received, where SRTP-WINDOW-SIZE is a receiver-side, implementation- dependent parameter and MUST be at least 64, but which MAY be set to a higher value. The receiver checks the index of an incoming packet against the replay list and the window. Only packets with index ahead of the window, or, inside the window but not already received, SHALL be accepted. After the packet has been (successfully) authenticated (if necessary the window is first moved ahead) the replay list SHALL be updated with the new index. The Replay List can be efficiently implemented by using a bitmap to represent which packets have been received, as described in the Security Architecture for IP [RFC2401]. 3.3 Secure RTCP Baugher, et al. [Page 12]
INTERNET-DRAFT SRTP February, 2002 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-->+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |V=2|P| RC | PT=SR or RR | length | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | SSRC of sender | | +>+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ | | | ... | | | | sender info | | | | ... | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | ... | | | | report block 1 | | | | ... | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | ... | | | | report block 2 | | | | ... | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | | | | | ... | | | | | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | |V=2|P| SC | PT=SDES=202 | length | | | +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ | | | SSRC/CSRC_1 | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | SDES items | | | | ... | | | +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ | | | | | | | ... | | | | | | +>+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ | | |E| SRTCP index | +-|>+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | ~ SRTP MKI (optional) ~ | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | ~ authentication tag ~ | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | +-- Encrypted Portion +---- Authenticated Portion Figure 2. An example of the format of a Secure RTCP packet, consisting of an underlying RTCP compound packet with a Report and SDES packet. Secure RTCP follows the definition of Secure RTP. SRTCP adds three new fields to the RTCP packet definition, the SRTCP index, an Baugher, et al. [Page 13]
INTERNET-DRAFT SRTP February, 2002 'encrypt-flag', and the authentication tag. Those fields MUST be appended to an RTCP packet when at least integrity protection (which is mandatory) is applied to the RTCP packet, in order to form an equivalent SRTCP packet, so that the added fields follow any other profile specific extensions. SRTCP adds an optional fourth field, the MKI, which functions according ot the MKI definition in section 3. An SRTCP packet is illustrated in Figure 2. According to [RFC1889] there is a 'recommended' packet format for compound packets. SRTCP MUST be given packets according to that recommendation in the sense that the first part MUST be a send/receive report. However, the so-called encryption prefix (Section 6.1 of [RFC1889]), a random 32-bit quantity intended to deter known plaintext attacks, MUST NOT be used (see below). The Encrypted Portion of an SRTCP packet consists of the encryption of the RTCP payload of the equivalent compound RTCP packet, from the first RTCP packet, i.e., from the ninth (9) byte to the end of the compound packet. The Authenticated Portion of an SRTCP packet consists of the entire equivalent (eventually compound) RTCP packet, the E flag, SRTCP index (after any encryption has been applied to the payload). The added fields are: E-flag: 1 bit, mandatory The E-flag indicates if the current SRTCP packet is encrypted or unencrypted. Section 9.1 of [RFC1889] allows the split of a compound RTCP packet into two lower-layer packets, one to be encrypted and one to be sent in the clear. The E bit set to '1' indicates encrypted packet, and '0' indicates non- encrypted packet. SRTCP index: 31 bits, mandatory The SRTCP index is a 31-bit counter for the SRTCP packet. The index is explicitly included in each packet, in contrast to the 'implicit' index approach used for SRTP. The SRTCP index MUST be set to zero before the first SRTCP packet is sent, and MUST be incremented by one, modulo 2^31, after each SRTCP packet is sent. In particular, after a re-key, the SRTCP index MUST NOT be reset to zero again (c.f. Section 3.2.1). Authentication Tag: variable length, mandatory The authentication tag shall be used to carry message authentication data. The optional field is the variable-length MKI (see Section 3). SRTCP uses the cryptographic context parameters and packet processing of SRTP, with the following changes: Baugher, et al. [Page 14]
INTERNET-DRAFT SRTP February, 2002 * The receiver need not to 'estimate' the index, as it is explicitly signaled in the packet. * If the MKI indicator in the cryptographic context is zero, the master keys is determined by the current SRTP index, even though SRTCP has its own index. Since the SRTCP source as with any SSRC in an SRTP session has its own sequence number space, the master key <From, To> lifetime MUST be based on the SRTP master key lifetime. The concomitant re-keying issues are discussed in sections 9 and 10. * Pre-defined SRTCP encryption is as defined in Section 4, but using the definition of the SRTCP Encrypted Portion as defined in this section, and using the SRTCP index as the index i. The encryption transform and related parameters SHALL by default be the same selected for the protection of the associated SRTP stream(s), while the NULL algorithm shall be applied to the RTCP packets not to be encrypted. Note that the master key and salt is shared between SRTP and SRTCP, but the (encryption) session key and salt will be distinct due to the key derivation definition (Section 4.3). The E-flag is assigned values by the sender depending on whether the packet was encrypted or not. * SRTCP decryption is performed as in Section 4, but only if the E flag is equal to 1. If so, the Encrypted Portion is decrypted, using the SRTCP index as the index i. In case the E-flag is 0, the payload is simply left unmodified. * SRTCP replay protection is as defined in Section 3.2.3, but using the SRTCP index as the index i, and as noted maintains a separate replay list specific to SRTCP. * The pre-defined SRTCP authentication tag is defined as in Section 4, but with the Authenticated Portion of the SRTCP packet defined in this section (which includes the index). The authentication transform and related parameters (e.g., key size) SHALL by default be the same as selected for the protection of the associated SRTP stream(s). (Exception: when SRTP is not authenticated, the default authentication transform MUST be used for SRTCP.) Note that the master key is shared between SRTP and SRTCP, but the (authentication) session key will be distinct due to the key derivation definition (Section 4.3). * In the last step of the processing, only the sender needs to update the value of the SRTCP index by incrementing it modulo 2^31 (and for security reasons the sender MUST also check the number of RTCP packets processed, see below). There MAY also exist some minor transform specific changes, see Section 4 for the defined transforms. Baugher, et al. [Page 15]
INTERNET-DRAFT SRTP February, 2002 As noted, the encryption prefix (Section 6.1 of [RFC1889]), is not to be used because this mechanism supports ciphers that are not secure against known plaintext attacks. Ciphers that are not secure against known-plaintext attacks SHOULD not be used to encrypt RTP messages. The pre-defined SRTP encryption uses a secure, additive stream cipher, and thus the prefix offers no benefit at all. The maximum number of SRTCP packets with a given session or master key is limited to 2^31. Due to for example re-keying, reaching this limit may or may not coincide with wrapping of the SRTCP index, and thus the sender MUST be able to deduce the packet count, e.g., as indicated before. Also, since the session keys for SRTP and SRTCP are by default derived from the same master key, new session and master keys for both protocols MUST be obtained before any of the two protocols reaches its maximum key-usage limit (c.f. 3.2.1). Message authentication for RTCP is REQUIRED, as it is the control protocol (e.g., it has a BYE packet). Note also that the cost in total bandwidth for RTCP authentication is not as high as the one of RTP authentication, as the recommended session bandwidth allocated to RTCP is at most 5% and the RTCP packets are less frequent. However, when adding authentication to RTCP, the overhead in bandwidth SHOULD be considered (the bandwidth will be more than 5%). Note however, that large-scale multicast application of SRTCP might require careful consideration in the configuration and use, see Section 12. The security risks that can occur wherever SRTCP is not used, MUST be taken seriously under consideration. 4. Pre-Defined Cryptographic Transforms While there are numerous encryption and message authentication algorithms that can be used in SRTP, we define below default algorithms in order to avoid the complexity of specifying the encodings for the signaling of algorithm and parameter identifiers. The defined algorithms have been chosen as they fulfill the goals listed in Section 2. Recommendations on how to extend SRTP with new transforms are given in Section 7. 4.1 Encryption The following parameters are generic and common to all pre-defined, non-NULL, encryption transforms. * BLOCK CIPHER and mode are the block cipher used and its mode of operation (the default is AES in counter mode, see below) * n_b is the bit-size of the block for the block cipher * k_e is the session encrypting key * n_e is the bit-length of k_e (the default is 128 bits) * k_s is the so called session salting key Baugher, et al. [Page 16]
INTERNET-DRAFT SRTP February, 2002 * n_s is the bit-length of k_s. n_s is at most n_b - 16 bits, and the default value is the maximum (n_b - 16). * SRTP_PREFIX_LENGTH is the octet length of the keystream prefix, an (at least) 8-bit non-negative integer, inferred from the message authentication code in use. The distinct session keys and salts for SRTP/SRTCP are by default derived as specified in Section 4.3. The encryption transforms defined in SRTP use a 'seekable' segmented keystream generator (KG), which for each secret key maps the SRTP packet index into a pseudorandom keystream segment, used to encrypt a single RTP packet. The process of encrypting a packet consists of generating the keystream segment corresponding to the packet, and then bitwise exclusive-oring that keystream segment onto the payload of the RTP packet to produce the Encrypted Portion of the SRTP packet. Decryption is done the same way, but swapping the roles of the plaintext and ciphertext. The definition of how the keystream is generated, given the index, depends on the cipher and its mode of operation. Below, two such keystream generators are defined. The NULL cipher is also defined, to be used when encryption of RTP is not required. The initial octets of each keystream segment MAY be reserved for use in a message authentication code, in which case the keystream used for encryption starts immediately after the last reserved octet. The initial reserved octets are called the keystream prefix (not to be confused with the so-called 'encryption prefix' of [RFC1889, Section 6.1]), and the remaining octets are called the keystream suffix. This process is illustrated in Figure 3. +----+ +------------------+---------------------------------+ | KG |-->| Keystream Prefix | Keystream Suffix |---+ +----+ +------------------+---------------------------------+ | | +---------------------------------+ v | Encrypted Portion of RTP Packet |->(*) +---------------------------------+ | | +---------------------------------+ | | Encrypted Portion of SRTP Packet|<--+ +---------------------------------+ Figure 3: Default SRTP Encryption Processing. Here KG denotes the keystream generator, and (*) denotes bitwise exclusive-or. The number of octets in the keystream prefix is denoted as Baugher, et al. [Page 17]
INTERNET-DRAFT SRTP February, 2002 SRTP_PREFIX_LENGTH. The keystream prefix is reserved for use with certain message authentication transforms, such as the pre-defined TMMH transform (Section 4.2.2). The Prefix is indicated by a positive, non-zero value of this latter parameter. This means that, even if confidentiality is not to be provided, the keystream generator output MAY still need to be computed for packet authentication, in which case the default keystream generator (mode) SHALL be used. The default cipher is the Advanced Encryption Standard (AES), and we define two modes of running AES, Segmented Integer Counter Mode AES and AES in f8-mode. In the sequel, let E(k,x) be AES applied to key k and input block x. AES has (default) n_e = 128-bit key size and (always) n_b = 128-bit block size. 4.1.1 AES in Counter Mode Conceptually, counter mode consists of encrypting successive integers. The actual definition is somewhat more complicated, in order to randomize the starting point of the integer sequence. Each packet is encrypted with a distinct keystream segment, which is computed as follows. 4.1.1.1 Keystream Generation A keystream segment is the concatenation of the 128-bit output blocks of the AES cipher in the encrypt direction, using key k = k_e, in which the block indices are in increasing order. Symbolically, each keystream segment looks like E(k, IV) || E(k, IV + 1 mod 2^128) || E(k, IV + 2 mod 2^128) ... where the 128-bit integer value IV SHALL be defined by the SSRC, the SRTP packet index i, and the SRTP session salting key k_s, as below. IV = (k_s * 2^16) XOR (SSRC * 2^64) XOR (i * 2^16) The inclusion of the SSRC allows the use of the same key to protect distinct SRTP streams. Exploiting such features is conditioned by requirements, see the security caveats in Section 10.1. (In the case of SRTCP, the SSRC of the first header of the compound packet MUST be used, i SHALL be the 31-bit SRTCP index and k_s SHALL be replaced by the SRTCP session salt.) Note that the initial value, IV, is fixed for each packet. The number of blocks of keystream generated for any fixed value of IV MUST NOT exceed 2^16. The AES has a block size of 128 bits, so 2^16 output blocks are sufficient to generate the 2^23 bits of keystream needed to encrypt the largest possible RTP packet (except for IPv6 Baugher, et al. [Page 18]
INTERNET-DRAFT SRTP February, 2002 'jumbograms' [RFC2675], which are not likely to be used for RTP- based multimedia traffic). This restriction on the maximum bit-size of the packet that can be encrypted ensures the security of the encryption method by limiting the effectiveness of probabilistic attacks [BDJR]. 4.1.2 AES in f8-mode To encrypt UMTS (Universal Mobile Telecommunications System, as 3G networks) data, a solution (see [ES3D]) known as the f8-algorithm has been developed. On a high level, the proposed scheme is a variant of Output Feedback Mode (OFB) [HAC], with a more elaborate initialization and feedback function. As in normal OFB, the core consists of a block cipher. We also define here the use of AES as a block cipher to be used in f8-mode for RTP encryption, with default 128-bit key and block size. Figure 4 shows the structure of block cipher, E, running in what we shall call 'f8-mode of operation'. IV | | v +------+ | | +--->| E | | | | | +------+ | | m -> (*) +-----------+-------------+-- ... ------+ | IV' | | | | | | j=1 -> (*) j=2 -> (*) ... j=L-1 ->(*) | | | | | | | +-> (*) +-> (*) ... +-> (*) | | | | | | | | | v | v | v | v | +------+ | +------+ | +------+ | +------+ | | | | | | | | | | | | k_e ---+--->| E | | | E | | | E | | | E | | | | | | | | | | | | +------+ | +------+ | +------+ | +------+ | | | | | | | +------+ +--------+ +-- ... ----+ | | | | | v v v v S(0) S(1) S(2) . . . S(L-1) Figure 4. f8-mode of operation (asterisk, (*), denotes bitwise XOR). Baugher, et al. [Page 19]
INTERNET-DRAFT SRTP February, 2002 The figure represents the KG in Figure 3, when AES-f8 is used. 4.1.2.1 f8 Keystream Generation As above, let E(k_e,x) be the 128-bit output of AES in the encrypt direction when applied to the key k_e and n_b = 128-bit plaintext block x. The Initialization Vector (IV) is determined as described in Section 4.1.2.2. Let IV', S(j), and m denote n_b-bit blocks, determined below. The keystream, S(0) || ... || S(L-1), for an N-bit message is defined by setting IV' = E(k_e XOR m, IV), and S(-1) = 00..0. For j = 0,1,..,L- 1 where L = N/n_b (rounded up to nearest integer) compute S(j) = E(k_e, IV' XOR j XOR S(j-1)) Notice that the IV is not used directly. Instead it is fed through E under another key to produce an internal, 'masked' value (denoted IV') to prevent an attacker from gaining known input/output pairs. The role of the internal counter is to prevent short keystream cycles. The value of the key mask m is defined to be m = k_s || 0x555..5, i.e. the session salting key, appended by the binary pattern 0101.. to fill out the entire desired key size, n_e. The maximum allowable packet size can be determined as follows. The AES has a block size of 128 bits, and assuming that AES behaves like a random function, it is (heuristically) secure to generate somewhat less than 2^64 output blocks, we suggest a maximum of 2^32 blocks, which is sufficient to generate 2^39 bits of keystream. For practical sizes of the RTP packets, much fewer blocks are required though, and the counter j above will often be sufficient if implemented as a 16-bit counter. 4.1.2.2 f8 SRTP IV Formation The purpose of the following IV formation is to provide a feature which we call implicit header authentication (IHA), see Section 10.5.1. The SRTP IV for 128-bit block AES-f8 is formed in the following way: IV = 0x00 || M || PT || SEQ || TS || SSRC || ROC M, PT, SEQ, TS, SSRC SHALL be taken from the RTP header; ROC is from the cryptographic context. Baugher, et al. [Page 20]
INTERNET-DRAFT SRTP February, 2002 The presence of the SSRC as part of the IV allows AES-f8 to be used when a master key is shared between multiple streams, see Section 10.1. 4.1.2.3 f8 SRTCP IV Formation The SRTCP IV for 128-bit block AES-f8 is formed in the following way: IV = 0...0 || E || SRTCP index || V || P || RC || PT || length || SSRC where V, P, RC, PT, length, SSRC SHALL be taken from the first header in the RTCP compound packet. E and SRTCP index are the 1- and 31-bit fields added to the packet. 4.1.3 NULL Cipher The NULL cipher is used when no confidentiality for RTP/RTCP is requested. The keystream can be thought of as "000..0", e.g., the encryption simply copies the plaintext input into the ciphertext output. 4.2 Message Authentication and Integrity Common parameters: * k_a is the session message authentication key * n_a is the bit-length of the authentication key (the default for the default transform is 128 bits) * n_tag is the bit-length of the output authentication tag (the default is 32 bits) * SRTP_PREFIX_LENGTH is the octet length of the keystream prefix as defined above * M is the Authenticated Portion as specified in Section 3 for RTP and in Section 3.3 for RTCP. The distinct session authentication keys for SRTP/SRTCP are by default derived as specified in Section 4.3. The values of n_a, n_tag, and SRTP_PREFIX_LENGTH MUST be fixed for any particular fixed value of the key. Below we describe the process of computing authentication tags. The sender computes the tag of the Authenticated Portion and appends it to the packet. The SRTP receiver verifies a message/authentication tag pair as follows. A new authentication tag is computed over the Authenticated Portion using the selected algorithm and key, and it is compared to the tag associated with the received message. If the two tags are equal, then the message/tag pair is valid; otherwise, Baugher, et al. [Page 21]
INTERNET-DRAFT SRTP February, 2002 it is invalid and the error audit message "AUTHENTICATION FAILURE" MUST be returned. 4.2.1. HMAC/SHA1 When HMAC/SHA1 is used, the SRTP_PREFIX_LENGTH is 0. For SRTP, the HMAC is applied to the concatenation of the Authenticated Portion of the packet (M) and the rollover counter in the cryptographic context, i.e. HMAC(k_a, M || ROC). For SRTCP, we apply HMAC to the corresponding M, only (as it already includes the SRTCP index). By default, the output shall be truncated to the n_tag left-most bits. Default value for n_a is 128 bits, and for n_tag it is 32 bits. 4.2.2 TMMH This document describes TMMH version two, which is not interoperable with the earlier version. In the following, the term TMMH refers to version two. TMMH is a simple function that maps a key and a message to a hash value. This hash value is encrypted by combining it with the keystream prefix to make the authentication tag, as described below. The key, message, and hash value are treated as sequences of unsigned sixteen bit integers in network byte order. In the following, we call such a 16-bit integer a word. The number of octets in the key and hash value MUST be a multiple of two (to be word-aligned). (Thus, n_tag and n_a MUST be multiples of 16.) Besides the above common parameters, we define the following parameters for TMMH: * TAG_WORDS, the number of words in the hash value, i.e., n_tag/16. The default is 2. The value of TAG_WORDS also defines the quantity SRTP_PREFIX_LENGTH to be TAG_WORDS * 2 (see Section 4.1). The number of words of the key, i.e. n_a/16, depends on the maximum length of any message to be authenticated as follows: MAX_MSG_LENGTH (octets) Key size (16-bit words) ---------------------------------------------------- 16 7 + 2 * TAG_WORDS 128 14 + 3 * TAG_WORDS 1024 21 + 4 * TAG_WORDS 8192 28 + 5 * TAG_WORDS 65536 35 + 6 * TAG_WORDS Baugher, et al. [Page 22]
INTERNET-DRAFT SRTP February, 2002 For instance, an RTP packet of length 80 bytes to be authenticated with a 32-bit (2 word) tag, requires a 320-bit TMMH key. The default key-size is defined below. However, applications that know on beforehand the size of the longest message they will ever encounter MAY choose a smaller key-size. * TAG is the authentication tag, which is the output of TMMH * PREFIX is the keystream prefix for the current packet as defined in Section 4.1. * K is the key, i.e., k_a as obtained by applying the key * derivation. The default key-size is 94 octets, or 752 bits (to accommodate messages up to 65536 byte with a 2-word tag). * MSG_LEN, the number of octets in the message (before padding, when all-zero padding is needed to align to a word boundary when the message contains an odd number of octets) * p is equal to the prime number 2^16 + 1. In the following, we use the symbol * to denote integer multiplication and the symbol +32 to denote integer addition modulo 2^32. TMMH uses several key-dependent internal data structures: the length multiplier array L, and an array of subkeys A. The length multiplier array L is an array of words, the ith element of which is denoted as L[i], with i ranging from zero to (TAG_WORDS - 1). A subkey is an array consisting of (TAG_WORDS + 7) words, and the ith element of the subkey S is denoted as S[i]. Five subkeys are used in TMMH. The subkeys are stored in an array denoted A.The ith subkey is denoted as A[i], with i ranging from zero to 4. +----+----+----+----+----+----+----+----+----+----+----+--- Key |K[0]|K[1]|K[2]|K[3]|K[4]|K[5]|K[6]|K[7]|K[8]|K[9]|K[a]|... +----+----+----+----+----+----+----+----+----+----+----+--- +----+----+--------------------------------------------+--- Field |L[0]|L[1]| A[0] |... +----+----+--------------------------------------------+--- Figure 5. An illustration of how the arrays L and A are assigned from the words of the TMMH key K. In this example, TAG_WORDS is equal to two. Here K[i] denotes the ith word of the TMMH key (where i is a hexadecimal number). The field A[0] is the 0:th subkey. The length multiplier array L and the subkey array A are taken from the TMMH key K as follows. Baugher, et al. [Page 23]
INTERNET-DRAFT SRTP February, 2002 1. The value L[i] is set to K[i], for i = 0 to TAG_WORDS-1. 2. The value A[i][j] (the jth element of the ith subkey) is set to K[ TAG_WORDS + j + (TAG_WORDS + 7) * i ]. This process is illustrated in Figure 5. We introduce the following notation: Let A_ij be the eight word vector A[i][j] || A[i][j + 1] || ... || A[i][j + 7]. The function V(S, M) defined below maps a subkey S and an eight-word data string M to a 32-bit unsigned integer. V(S,M) = S[0] * M[0] +32 S[1] * M[1] +32 ... +32 S[7] * M[7] Here +32 denotes integer addition modulo 2^32. The length of the subkey S may be greater than eight, but the excess words are ignored by the function V. (The definition of V such that the most significant words of the subkey may be ignored simplifies the exposition below.) If the message consists of less than 8 words, the remaining words are set to zero. The function U(S, M) is defined as U(S, M) = [ V(S, M) modulo p ] modulo 2^16. The core of TMMH is a 'compression' function C which maps a subkey value and an input string to an output string which is about eight times smaller than the input string. To compute C(S, D) for a given subkey value S and data string D of w words, do the following. 1) Divide up D into blocks of eight words each (note that the last word may contain less than eight words) i.e., D = D[0] || D[1] || ... || D[ ceil(w/8) ] where D[i] is the ith block, || denotes concatenation, and ceil(x) denotes the largest integer not less than x. 2) Apply the function U to each block, using the subkey value S each time, then concatenate the outputs as follows: C(S, D) = U(S, D[0]) || U(S, D[1]) || ... || U(S, D[ceil(w/8)]). The j:th word (j starting from zero), T[j], of the TMMH tag is computed using the following algorithm: set X to M and set i to zero while the number of words in X is greater than eight, do set X to C(A_ij, X) Baugher, et al. [Page 24]
INTERNET-DRAFT SRTP February, 2002 increment i end while return [ [ [ L[j] * MSG_LEN ] +32 V(A_ij, X) ] mod p ] mod 2^16 To use TMMH to compute the authentication tag TAG of a message, the TMMH hash value of that message is computed, then that value is combined with the keystream prefix defined in Section 4.1. The combining operation is word-wise addition modulo 2^16: TAG[j] = (T[j] + PREFIX[j]) mod 2^16, for j = 0 to TAG_WORDS-1. Note that for RTP, where HMAC is applied to M || ROC, TMMH is applied to M only. This is so, because the dependence on ROC is for TMMH inherent to the PREFIX quantity. 4.3 Key Derivation 4.3.1 Key Derivation Algorithm Regardless of the encryption or message authentication transform that is employed (it may be a defined transform or newly introduced according to Section 7), SRTP key derivation is the process of generating session keys, without extra communication between the parties and in a sender-receiver synchronized way. master salt, packet index ---+ | | v +-----------+ +--------+ session encr_key | ext | master | |----------> | key mgmt | key | key | session auth_key | (optional |-------->| deriv |----------> | rekey) | | | session salt_key | | | |----------> +-----------+ +--------+ Figure 6: SRTP key derivation. At least one initial key derivation is always performed by SRTP, i.e., the first key derivation is mandatory. Further applications of the key derivation MAY be performed, according to the 'key derivation-rate' value in the cryptographic context. The key derivation function is defined to be initially invoked before the first packet and then, if derivation rate is r > 0, to be further invoked on every r:th packet, and produce session keys according to Baugher, et al. [Page 25]
INTERNET-DRAFT SRTP February, 2002 the non-zero key derivation rate. This can be thought of as 'refreshing' the session keys. The value of 'key_derivation_rate' MUST be kept fixed for the lifetime of the associated master key. There is also a derivation of session salting keys for encryption transforms that so require, e.g., both of the pre-defined transforms. Let m and n be positive integers. A pseudo-random function family is a set of keyed functions {PRF_n(k,x)} such that for the (secret) random key k, given m-bit x, PRF_n(k,x) is an n-bit string, computationally indistinguishable from random n-bit strings, see [HAC]. For the purpose of key derivation in SRTP a secure PRF with m = 128 (or more) is needed, and a default PRF transform is defined in Section 4.3.2. Let a DIV t denote integer division of a by t, rounded down, and with the convention that a DIV 0 = 0 for all a. We also make the convention of treating a DIV t as a bit string of the same length as a, and thus 'a DIV t' will in general have leading zeros. Key derivation is defined as follows. To generate session key(s)(and session salt(s)) for the current packet, let the n-bit SRTP key (or salt) for this packet be PRF_n(k_master, ((<label> || (index DIV key_derivation_rate)) XOR master_salt)*2^16) where <label> is an 8-bit constant (see below), master_salt and key_derivation_rate is as determined in the cryptographic context, and index is the packet index (i.e., the 48-bit ROC || SEQ for SRTP). The session keys and salt are now derived using: - k_e (SRTP encryption): <label> = 0x00, n = n_e. - k_a (SRTP message authentication): <label> = 0x01, n = n_a. - k_s (SRTP salting key) <label> = 0x02, n = n_s. where n_e, n_s, and n_a are also as determined in the cryptographic context. The master key and master salting key MUST be random, but the master salt MAY be public. Default size for master key is 128-bits and for the master salt, 112 bits. Note that for a key_derivation_rate of 0, the initial application of the key derivation will take place once. The derivation operation is facilitated if the non-zero rates are chosen to be powers of 256. Baugher, et al. [Page 26]
INTERNET-DRAFT SRTP February, 2002 The upper limit in the number of packets that can be secured using the same master key (see Section 10.2) is independent of the key derivation. 4.3.2 SRTCP Key Derivation SRTCP uses the same master key as SRTP, i.e., it is shared between the two protocols. To do this securely, the following changes are done to Section 4.3.1 when applying session key derivation for SRTCP. Replace the SRTP index by the 32-bit quantity: 0 || SRTCP index (i.e. excluding the E-bit, replacing it with a fixed 0-bit), and use <label> = 0x03 for the SRTCP encryption key, <label> = 0x04 for the SRTCP authentication key, and, <label> = 0x05 for the SRTCP salting key. 4.3.3 AES-CM PRF The currently defined PRF, keyed by 128 to 256 bit (master) keys, has input block size m = 128 and can produce n-bit outputs for n up to 2^23. We define PRF_n(k,x) to be AES in counter mode as described in Section 4.1.1, applied to key k, and IV equal to x, and with the output keystream truncated to the n first (left-most) bits. (Requiring n/128, rounded up, applications of AES.) 5. Default and Mandatory Transforms The default transforms also are mandatory to implement transforms in SRTP. Of course, 'mandatory-to-implement' does not imply 'mandatory- to-use'. 5.1 Encryption: AES-CM and NULL AES running in Segmented Integer Counter Mode, as defined in Section 4.1.1, is the default and mandatory-to-implement encryption algorithm. The NULL cipher is mandatory to implement too. 5.2 Message Authentication/Integrity: HMAC/SHA1 HMAC/SHA1, as defined in Section 4.2.1, is the default and mandatory-to-implement message authentication code. 5.3 Key Derivation: AES-CM PRF The AES Counter Mode PRF defined in Sections 4.3.1 and 4.3.2, using a 128-bit key, is the default and mandatory-to-implement method for generating keys. Baugher, et al. [Page 27]
INTERNET-DRAFT SRTP February, 2002 6. SRTP/SRTCP Parameters The parameters for SRTP are listed in the following. Unless otherwise stated, SRTCP by default applies the same transforms and parameters of the correspondent SRTP, though they MAY also be independently selected. The SRTP-WINDOW-SIZE is defined to be at least 64 (Section 3.2.3). The current defined modes are Segmented Integer Counter Mode (default), f8-mode (Section 4), and the NULL Cipher. The default cipher is AES (Section 4), which has a block size of n_b = 128 bits and default encryption key size n_e = 128 bits. The currently defined message authentication functions are the HMAC/SHA1 and TMMH. Default is absence of authentication for SRTP and HMAC/SHA1 for SRTCP. For HMAC/SHA1, the default key-size is n_a = 128 bits and the output length is n_tag = 32 bits. SRTP_PREFIX_LENGTH is 0. For TMMH, default n_tag is also 32 bits, and default n_a is 752 bits. The default size of the master key shall be 128 bits, and the default size of the master- and session salting keys shall be n_s = 112 bits. The default value for the session key_derivation_rate field in the cryptographic context is "0", in practice meaning that the first application of the key derivation is performed (as it is mandatory), but not other further applications of it. 7. Adding SRTP Transforms Section 4 provides examples of the level of details needed for defining transforms. Whenever a new transform is to be added to SRTP, a companion standards-track RFC MUST be written to exactly define how the new transform can be used with SRTP (and SRTCP). Such a companion RFC should avoid to overlap with the SRTP protocol document. Note however, that it might be necessary to extend the cryptographic context's definition with new parameters, or add steps to the packet processing. The companion RFC shall explain any known issues regarding interactions between the transform and other aspects of SRTP. Encryption and message authentication transforms require some set of optional parameters or have optional modes of operation. The companion RFC shall select fixed or default values for these parameters (whenever possible), to reduce key management complexity. Baugher, et al. [Page 28]
INTERNET-DRAFT SRTP February, 2002 The mode of operation of ciphers and related parameters (e.g. IV- formation for SRTP and SRTCP) shall be defined. Each new transform document should specify its key attributes, e.g., size of keys (minimum, maximum, recommended), format of keys, recommended/required processing of input keying material, requirements/recommendations on re-keying and key derivation, etc. 8. Rationale 8.1 Key derivation Key derivation has been introduced to lighten the burden on the key exchange: the (up to) six different keys necessary to protect the RTP session (SRTP and SRTCP encryption keys and salts, SRTP and SRTCP authentication keys) are derived from a single master key in a cryptographically secure way. Note however that the key management protocol may provide SRTP with more than one master key, e.g., two distinct master keys with their respective lifetime. The security stands (and falls) with the master key as the derived session keys are cryptographically independent (under reasonable assumptions on the PRF, here AES-based). Subsequent (after the first) applications of the key derivation are optional but will give security benefits when enabled. They prevent a cryptanalysist from obtaining large amounts of ciphertext produced by a single fixed session key. They provide backwards and forward security in the sense that a compromised session key does not compromise other session keys derived from the same master (but of course, a leaked master key reveals all session keys). Considerations arise with high-rate key-refresh, especially in large multi-cast settings, see Section 12. As the TMMH keys may be quite large, the key derivation provides a simple and secure way to obtain sufficient amount of keying material. 8.2 Salting key The master salt is introduced to guarantee security against off-line key-collision attacks on the key derivation that might otherwise reduce the effective key size. The derived session salting key used in the encryption, has been introduced to protect against some attacks on additive stream ciphers, see Section 10.2. The explicit inclusion method of the salt in the IV has been selected for ease of hardware implementation. Baugher, et al. [Page 29]
INTERNET-DRAFT SRTP February, 2002 8.3 TMMH: Message Integrity from Universal Hashing The Truncated Multi-Modular Hash Function (TMMH) is a so-called universal hash function family, suitable for message authentication in the Wegman-Carter paradigm [WC81]. It is simple, quick, and especially appropriate for Digital Signal Processors and other processors with a fast multiply operation, though a straightforward implementation requires storage equal in length to the largest message to be hashed. TMMH offers secure (provably secure under randomness assumptions on the added prefix) and very efficient MACs. For a given tag size (TAG_WORDS), the forgery probability can be shown to be upper bounded by approximately 2^(-11*TAG_WORDS). However, as this approach to message integrity is new (not conceptually, but within standardization), we have chosen to make HMAC the default transform as many devices already have an HMAC implementation used for other purposes. We envision a migration to TMMH so that HMAC may eventually be phased-out from SRTP. 8.4 Data Origin Authentication Considerations Note that in unicast, integrity and data origin authentication are provided together. However, in group scenarios where the keys are shared between members, the MAC tag only proves that a member of the group sent the packet, but does not prove the actual sender. Data origin authentication (DOA) for multicast and group RTP sessions is a hard problem that needs a solution; while some promising proposals are being investigated [PCST1, PCST2], more work is needed to rigorously specify these technologies. Thus SRTP data origin authentication in groups is for further study. DOA can be done otherwise using signatures. However, this has high impact in terms of bandwidth and processing time, therefore we do not offer this form of message authentication in the pre-defined packet-integrity transforms. The presence of mixers and translators does not allow data origin authentication in case the RTP payload and/or the RTP header are manipulated. Note that these type of middle entities also disrupt end-to-end confidentiality (as the IV formation depends e.g. on the RTP header preservation). 9. Key Management Considerations For initialization, the key management needs to be given the SSRC and initial RTP sequence-number for the RTP stream, and thus has a dependency on RTP operational parameters. Baugher, et al. [Page 30]
INTERNET-DRAFT SRTP February, 2002 A particular key management system might allow different RTP sessions to share the same cryptographic master keys. The SRTP sender and receiver typically share a master key to derive session keys for encryption and decryption; SRTCP sources will typically derive keys from the same master key used by the SRTP session for which sender and receiver reports are sent. This is secure if the design of the synchronization mechanism, i.e., the IV, avoids keystream re-use (the two-time pad, Section 10.1). If this feature is used, the SSRCs MUST be unique between all the RTP streams sharing the same master key. In other words, when a master key is shared among RTP sessions, SRTP/SRTCP cryptographic transforms are vulnerable to unfortunate SSRC collisions owing to normal operation of a compliant RTP implementation. SRTCP implementations that share master keys introduce a non-standard constraint on RTP operation: SSRC values must be unique among RTP sessions that share an SRTP master key. A secure key management system can mitigate this problem by assigning SSRC values to SSRC participants at the time of master key establishment. A particular key management system might choose to provide re-key by associating a master key for a crypto context with and MKI or a pair of index (sequence number and ROC) values, <From, To>. In the latter case, such values are always specified, or the default value, 'from the first observed packet' and 'until further notice', respectively, are used. The key management specification may therefore require the SRTP implementation to check the index of an incoming SRTP packet against the interval for the master key in the context before using the key. An SSRC in an RTP Session, however, defines its own sequence number space so knowledge of how many packets have used the same master key is dispersed among multiple RTP session participants. SRTP senders can reasonably estimate the amount of SRTP and SRTCP traffic being used for a master key and invoke key management to re-key if needed. These interactions are defined by the key management interface to SRTP and are not defined by this protocol specification. Considerations arise with high-rate re-keying, especially in large multi-cast settings, see Section 12. The key management interface might use the defaults for the SRTP protocol or define values for any and all SRTP parameters such as the following: - cipher and related parameters, including mode of operation - key(s), i.e., master (and salting) key(s), and related parameters, - message authentication algorithm(s), and related parameter, - re-keying (key lifetime) and key derivation parameters, - MKI(s), - SSRC, network address, RTP port pair - Current value of ROC (or zeros prior to session Baugher, et al. [Page 31]
INTERNET-DRAFT SRTP February, 2002 commencement) and SEQ - Replay window size 10. Security Considerations 10.1 SSRC collision and two-time pad Any fixed keystream output, generated from the same key and index should only be used to encrypt once. Re-using such keystream (jokingly called a 'two-time pad' system by cryptographers), can seriously compromise security. The NSA's VENONA project [C99] provides a historical example of such a compromise. In SRTP, a 'two- time pad' is avoided by requiring the key, or some other parameter of cryptographic significance, to be unique per RTP stream and packet. The pre-defined SRTP transforms accomplish packet-uniqueness by including the packet index. Stream-uniquness require distinct keys, or, inclusion of the SSRC, which then (as noted) has to be unique to each RTP stream among the RTP sessions sharing the key. It may in some cases be desirable that multiple crypto contexts applied to multiple RTP streams contain identical master keys. For instance, there could be a desire for a group to share a single key, or, a simple bi-directional flow might want to use the same key in both directions. A multi-media sender might desire to use the same master key to protect multiple streams. Issues as above (two-time pad) MUST then be considered. As discussed in Section 9, the pre- defined transforms (AES-CM and AES-f8) allow such sharing by the use of the SSRC in the IV. Unlike multiple streams in a single RTP session, however, sharing a key among RTP sessions requires the added constraint that SSRC values be unique across RTP sessions (see Section 9). Thus, the SSRC MUST be unique between all the RTP streams and sessions sharing the same master key. It is incumbent upon SRTP implementations to ensure SSRC uniqueness across RTP sessions that share a master key, to avoid unfortunate IV combinations and end up in two-time pad. Even with distinct SSRCs, extensive use of the same key MAY improve chances of probabilistic collision and time-memory- tradeoff attacks succeeding. Also, the effect of an eventual RTP SSRC collision detection MUST be taken into account, as a collision could duplicate the SSRC leading temporarily to a two-time pad before the collision is detected. As discussed above in Section 9, this is a problem that key management can solve. Baugher, et al. [Page 32]
INTERNET-DRAFT SRTP February, 2002 10.2 Key Usage The effective key size is determined (upper bounded) by the size of the master key and, for encryption, the size of the salting key. Any additive stream cipher is vulnerable to attacks that use statistical knowledge about the plaintext source to enable key collision and time-memory tradeoff attacks [MF00,H80,Bi96]. These attacks take advantage of commonalities among plaintexts, and provide a way for a cryptanalyst to amortize the computational effort of decryption over many keys, thus reducing the effective key size of the cipher. A detailed analysis of these attacks and their applicability to the encryption of Internet traffic is provided in [MF00]. In summary, the effective key size of SRTP when used in a security system in which m distinct keys are used, is equal to the key size of the cipher less the logarithm (base two) of m. Protection against such attacks can be provided simply by increasing the size of the keys used, which here can be accomplished by the use of the salting key. Note that the salting key MUST be random, but MAY be public. A salt size of (the suggested) size 112 bits protects against attacks in scenarios where at most 2^112 keys are in use. This is sufficient for all practical purposes. Implementations SHOULD use keys that are as large as possible. Please note that in many cases increasing the key size of a cipher does not affect the throughput of that cipher. The use of the SRTP and SRTCP indexes in the pre-defined transforms fixes the maximum number of packets that can be secured with the same key. Such limit is fixed to 2^48 SRTP packets for SRTP, and 2^31 SRTCP packets, when SRTP and SRTCP are considered independently. However, since the session keys for related SRTP and SRTCP are derived from the same master key (Section 4.3), the upper bound that has to be considered is in practice the minimum of the two quantities. That is, when 2^48 SRTP packets or 2^31 SRTCP packets have been secured with the same key (whichever occurs before), the key management MUST be called to provide new master key(s) (previously stored and used keys MUST not be used again), or the session MUST be terminated. Note: in most typical applications (assuming at least one RTCP packet for every 128,000 RTP packets) it will be the SRTCP index that first reaches the upper limit (although the time until this occurs is very long). Still, note that even at 200 SRTCP packets/sec, the 2^31 index space of SRTCP is enough to secure approximately 4 months of communication. Note that the purpose of key derivation only is to limit the amount of plaintext that is encrypted with a fixed session key, and made available to an attacker for analysis. It does not extend the master key's lifetime. To see this, simply consider our requirements to avoid two-time pad: two distinct packets must either be processed Baugher, et al. [Page 33]
INTERNET-DRAFT SRTP February, 2002 with distinct IVs, or, with distinct session keys, and both the distinctness of IV and of the session keys are (for the pre-defined transforms) dependent on the distincness of the packet indicies. For the TMMH-based message integrity, the keystream prefixes MUST NOT be correlated with each other, nor with the messages they protect in the sense that given the messages, the prefixes together with the TMMH key MUST be computationally indistinguishable from random bits. This is assured by our predefined keystream generators and key-derivation. 10.3 Confidentiality of the RTP Payload By using 'seekable' stream ciphers, SRTP avoids the denial of service attacks that are possible on stream ciphers that lack this property (these attacks are described in Section 3.4 of [B96]). It is important to be aware that, as with any stream cipher, the exact length of the payload is revealed by the encryption. This means that it may be possible to deduce certain 'formatting bits' of the payload, as the length of the codec output might vary due to certain parameter settings etc. This, in turn, implies that the corresponding bit of the keystream can be deduced. However, if the stream cipher is secure (counter mode and f8 are provably secure under certain assumptions [BDJR,KSYH]), knowledge of a few bits of the keystream will not aid an attacker in predicting the following keystream bits. Thus, the payload length (and information deducible from this) will leak, but nothing else. As some RTP packet could contain highly predictable data, e.g. SID, it is important to use a cipher designed to resist known plaintext attacks (which is the current practice). 10.4 Confidentiality of the RTP Header With the described proposal, RTP headers are sent in the clear to allow for header compression. This means that data such as payload type, synchronization source identifier, and timestamp are available to an eavesdropper. Moreover, since RTP allows for future extensions of headers, we cannot foresee what kind of possibly sensitive information might also be 'leaked'. The described proposal is a low-cost method, which allows header compression to reduce bandwidth. It is up to the endpoints policies to decide about the security protocol to employ. If the header compression is omitted, other solutions might be applicable. In other words, we provide a solution that works in the most general scenario, even in the most demanding one (like conversational multimedia over low-bandwidth, unreliable media). Of course the solution will then also work in less restricted environments, but we Baugher, et al. [Page 34]
INTERNET-DRAFT SRTP February, 2002 suggest that if one really needs to protect headers, and is allowed to do so by the surrounding environment, then one should also look at alternatives, e.g., IPsec. 10.5 Integrity of the RTP packet Additive ciphers do not provide any security service other than confidentiality. In particular, they do not provide message authentication (see [RK99] or [HAC] for a discussion of this security service). However, SRTP uses a message authentication code to provide that security service. With HMAC being a well-studied authentication scheme, based on a provably secure construction, the security against MAC forgery depends on the key-size and the size of the output tags (or for some attacks, half the size of the tag due to the 'birthday-paradox'). The default size for HMAC has been fixed to 32 bits. Other size values may be chosen (via the key management protocol). The use of a truncated size is motivated by the fact that it may be desirable, e.g., in wireless environments, to save bandwidth. The choice of such a truncation MUST be evaluated to the reduction in security it implies. The default 32-bit size is a compromise, offering a reasonable level of security, taking into account the real-time aspects of the protected protocol. High security applications SHOULD however use larger tags. The fact that message authentication is optional (for SRTP) is motivated by the fact that, while the function is typically highly desired, there are certain cases (notably in cellular environments) where it has an impact in terms of cost, e.g. for bandwidth consumption. Also, independently of the tag length, a single transmission bit error in the protected part of the packet or in the tag itself forces the entire packet to be dropped. Given a fixed quality of service, it implies the necessity of higher protection of the transmitted unit, hence higher cost. In those cases, it is up to the user's security profile to request authentication. The use of error detection mechanism (e.g., Unequal Error Detection, UED and UEP) is compatible with SRTP and the pre-defined encryption transforms, since stream ciphers maintain the position of the bits. However, the use of UED/UEP may be difficult to combine with authentication because any bit errors will cause authentication to fail. 10.5.1 Integrity of the RTP header: IHA Baugher, et al. [Page 35]
INTERNET-DRAFT SRTP February, 2002 The IV formation of the f8-mode gives implicit authentication (IHA) of the RTP header, even if no cryptographic integrity protection is present. This means that modifying bits of the RTP header will cause the decryption process at the receiver to produce essentially random garbage. 11. Interaction with Forward Error Correction mechanisms Some considerations are due when Forward Error Correction mechanisms are performed, e.g., as specified in RFC 2733. In particular, the order in which SRTP processing and the error correction processing are applied, is of concern. The optimal order would be the following: - on the sender side, first encrypt the packet, then perform the FEC processing, finally authenticate - on the receiver side, first authenticate the packet, then perform the FEC processing, finally decrypt. The motivations for the above ordering are: - FEC expands the packet, so performing encryption after FEC would be more expensive - on the receiver side, authentication has to be verified before getting engaged in the FEC processing, to reduce effects of certain denial of service attacks - adding redundancy before encrypting, slightly reduces the effective key-size and resistance to attacks. However, this implies to split the security processing (FEC processing occurs between encryption/decryption and authentication). Implementations could gain in keeping the security process strictly tied, in this case the recommendation is that the security processing takes place after FEC on the sender's side, and before FEC on the receiver's side. This implies the cost of placing encryption after FEC processing, as above explained, hence a convenient choice is left to the application. For interoperability clearness, implementations are requested to place the security process after FEC on the sender's side, and before FEC on the receiver's side. This is also default behavior; another choice has to be agreed out-of-band. 12. Scenarios Baugher, et al. [Page 36]
INTERNET-DRAFT SRTP February, 2002 SRTP can be used as security protocol for the RTP/RTCP traffic in different scenarios. SRTP has a number of configuration options, and can have impact on the total performance of the application according to the way it is used. Hence, it appears that the use of SRTP is very dependent on the kind of scenario and application it is used with. In the following, we briefly illustrate some use cases for SRTP, and give some guidelines for recommended setting of its options. 12.1 Two-party Unicast 12.1.1 One bi-directional RTP stream A typical example would be a voice call, or perhaps some streaming application. It is possible for the two parties to share the same master key in the two directions. The first round of the key derivation splits the master key into any or all of the following session keys (according to the provided security functions): SRTP_encr_key, SRTP_auth_key, and SRTCP_encr_key, SRTCP_auth key. (For simplicity, we omit discussion of the salts, which are also derived.) In this scenario, it will in most cases suffice to have a single master key with unspecified lifetime (i.e. unrestricted key lifetime, not using explicit <From, To> values). This guarantees sufficiently long lifetime of the keys and a minimum set of keys in place for most practical purposes. Also, in this case RTCP protection can be applied without problems. As the key-derivation in combination with large difference in the packet rate in the respective directions may require simultaneous storage of several session keys, if storage is an issue, we recommended to use low-rate key derivation. The same considerations can be extended to the two-party unicast scenario with multiple RTP sessions sharing the master key if particular care is taken to guarantee unique SSRCs for the streams. 12.1.2 One master key per party Here, each sender provides the security for its own RTP and RTCP streams, as well as for the RTCP receiver reports sent back to him. This will turn out into two master keys, split (by key derivation) into a maximum of eight session keys (on each side of the communication link). The SSRC-uniqueness MUST be guaranteed for the streams on each side. It is recommended not to restrict the master key lifetimes using the <From, To> fields. This anyway gives in most cases a sufficient key lifetime, with the benefit of a minimum set Baugher, et al. [Page 37]
INTERNET-DRAFT SRTP February, 2002 of keys in place, and a smooth run of SRTCP. The same storage considerations as above apply for the optional key derivation. The same considerations can be extended to the two-party unicast scenario with multiple mono-directional RTP sessions. Unique SSRCs MUST be guaranteed to the streams. 12.2 Multicast Just as with (unprotected) RTP, a scalability issue arises in big groups due to the possibly very large amount of (S)RTCP receiver reports that the sender might need to process. In SRTP, the sender may have to keep state (the cryptographic context) for each receiver, or more precisely, for the SRTCP used to protect receiver reports. The problem increases proportionally to the size of the group. In particular, re-keying requires special concern, see below. We describe in the following multicast for small groups, and give guidelines for use with large group multicast. 12.2.1 Small conference with one sender The sender secures his RTP stream using one cryptographic context. The sender's RTP and RTCP is secured with the same master key. Key derivation gives the necessary session keys, i.e. SRTP_encr_key, SRTP_auth_key, and SRTCP_encr_key, SRTCP_auth key. If the streams are multiple, the SSRCs MUST as noted be unique to avoid two-time pad (see Section 9). Key derivation may (for increased security) be enabled for the senders outgoing SRTP streams. There are many possible setups with the distribution of the master keys. One possibility is that the receivers share the same master key to secure their respective SRTCP (this requires the receivers to trust each other). This shared master key could be the same used by the sender to protect its outcoming traffic. Alternatively, it could be a master key shared only among the receivers and used solely for their SRTCP. Considering SRTCP and key storage, it is recommended to use low-rate (or zero) key_derivation (except the mandatory initial one), so that the sender does not need to store too many session keys (each SRTCP stream might otherwise have a different session key at a given point in time, as the SRTCP sources send at different times). Thus, in case key derivation is wanted for SRTP, the cryptographic context Baugher, et al. [Page 38]
INTERNET-DRAFT SRTP February, 2002 for SRTP can be kept separate from the SRTCP crypto context, so that it is possible to have a key_derivation_rate of 0 for SRTCP and a non-zero value for SRTP. Re-keying gives two problems: the number of master keys stored a the sender side, and re-keying triggering. Forcing re-keying using the <From, To> fields creates the problem that the sender needs to maintain multiple keys, as the re-keying will typically happen at different times on each SRTCP stream from the receivers (because each SSRC defines a sequence number space). Also, problems may occur in retrieving the current master key for the SRTCP packets in some cases, since that is done based on SRTP index, not SRTCP index. Use of the MKI for re-keying is probably best for most applications (see Section 9). Moreover, the upper limit of 2^48 SRTP packets / 2^31 RTCP packets means that, as soon as (or rather, shortly before) one of the stream reaches such maximum number of packets, re-keying MUST be triggered on ALL the streams. A possible solution to this, may be to keep the SRTP/SRTCP contexts separated, but still sharing master key. The sender then has to estimate which stream (among the sender's SRTP/SRTCP streams and the receiver's SRTCP streams) that will first reach the 2^48 / 2^31 limit, and well in advance force a re-keying. The MKI or <From, To> may be employed for key synchronization during changeover to a new key. Use of <From, To> fields to obtain key- synchonization in such case is described in Section 12.3. 12.2.2 Large multicast with one sender The same considerations as for the small group multicast hold. The biggest issue in this scenario is the additional load placed at the sender side, due to the state (cryptographic contexts) that has to be maintained for each receiver, sending back RTCP receive reports. At minimum, a replay window might be maintained for each RTCP source. Therefore, with big groups and where the load at the sender is considered not acceptable, it might be an option to simply disable all security for RTCP. This is STRONGLY NOT RECOMMENDED from a security point of view, but may appear a reasonable compromise to have at least security guaranteed on the RTP traffic. Alternatively, an SRTCP receiver may choose not to authenticate or protect against replay for SRTCP messages or do so selectively (e.g., only messages containing sender reports are authenticated). Of course, security impacts of neglecting to authenticate certain packets MUST be carefully considered. 12.3 Re-keying and access control Baugher, et al. [Page 39]
INTERNET-DRAFT SRTP February, 2002 Re-keying may occur due to access control (e.g., when a member is removed during a multicast RTP session), or, for pure cryptographic reasons. As mentioned, the master key MUST be replaced before any of the index spaces (2^48 for SRTP, 2^31 for SRTCP) are exhausted for any of the streams protected by one and the same master key. Thus, there is always the necessity of keeping track of when the master key has to be replaced due to exhaustion of the index spaces. In addition to this, it is possible to control the master key lifetime using the <From, To> fields, which could mean that a key expires, and a new one is needed. One may choose to have key management provide at the start, arrays of master keys with associated lifetimes. Alternatively, key management is called each time the master key has to be changed. In one-sender multicast, it is responsibility of the sender to determine when a new key is needed. The sender is the only one that can keep track of when the maximum number of packets has been sent, as receivers may join and leave the session at any time, there may be packet loss and delay etc. In other scenarios other than one- sender multicast, it is recommended that the Initiator of the key management/session requires new key material well before any stream reaches the maximum key lifetime. Here, one must take into consideration that key exchange can be a costly operation, taking several seconds for a single exchange. Hence, some time before the master key is exhausted/expires, out-of-band key management is initiated, resulting in a new master key shared with the reciever(s). To maintain synchronization when switching to the new key, one could use the MKI or assign the new master key a 'valid- from' index, far enough into the future so that key management will be finished before that, but still before the current key is exhausted. For access control purposes, the <From, To> periods are set at the desired granularity, dependent on the packet rate. High rate re- keying SHOULD NOT be used in some large-group scenarios when SRTCP is enabled. This is an effect of using the SRTP index, rather than the SRTCP index, for determining the master key. In particular, for short periods during switching of master keys, it may be the case that SRTCP packets are not under the current master key of the correspondent SRTP. Therfore, using the MKI for re-keying in such scenarios is the recommended method. Note that even if the MKI is used to signal key-usage to the receiver, there might still be cases when the <From,To> fields are also in use at the same time. For instance, a From-value could be used to signal when a certain master key (MKI) is to be activated for the first time. Baugher, et al. [Page 40]
INTERNET-DRAFT SRTP February, 2002 12.4 Summary of basic scenarios The description of these scenarios highlights some recommendations on the use of SRTP, mainly related to re-keying and large scale multicast: - Do not use SRTP for fast re-keying using the <From,To> feature. It may, in particular, give problems in retrieving the correct SRTCP key, if an SRTCP packet arrives close to the re-keying time. The MKI SHOULD be used in this case. - If multiple SRTP streams share the same master key, also moderate rate re-keying MAY have the same problems, and the MKI SHOULD be used. - Carefully consider the additional load at the sender side in multicast scenarios. Optionally, but NOT RECOMMENDED, SRTCP could be disabled altogether by the SRTCP receiver. - Though offering increased security, a non-zero key_derivation_rate is NOT RECOMMENDED when trying to minimize the number of keys in use with multiple streams. 13. IANA Considerations The RTP specification establishes a registry of profile names for use by higher-level control protocols, such as the Session Description Protocol (SDP), to refer to transport methods. This profile registers the name "RTP/SAVP". 14. Acknowledgements The authors would like to thank Magnus Westerlund, Brian Weis, Robert Fairlie-Cuninghame, and Adrian Perrig for their reviews and comments. 15. Author's Addresses Questions and comments should be directed to the authors and avt@ietf.org: Mark Baugher Cisco Systems, Inc. 5510 SW Orchid Street Phone: +1 408-853-4418 Portland, OR 97219 USA Email: mbaugher@cisco.com Rolf Blom Baugher, et al. [Page 41]
INTERNET-DRAFT SRTP February, 2002 Ericsson Research SE-16480 Stockholm Phone: +46 8 58531707 Sweden EMail: rolf.blom@era.ericsson.se Elisabetta Carrara Ericsson Research SE-16480 Stockholm Phone: +46 8 50877040 Sweden EMail: elisabetta.carrara@era.ericsson.se David A. McGrew Cisco Systems, Inc. San Jose, CA 95134-1706 Phone: +1 301-349-5815 USA EMail: mcgrew@cisco.com Mats Naslund Ericsson Research SE-16480 Stockholm Phone: +46 8 58533739 Sweden EMail: mats.naslund@era.ericsson.se Karl Norrman Ericsson Research SE-16480 Stockholm Phone: +46 8 4044502 Sweden EMail: karl.norrman@era.ericsson.se David Oran Cisco Systems, Inc. San Jose, CA 95134-1706 USA EMail: oran@cisco.com 16. References Normative [AES] NIST, "Advanced Encryption Standard (AES)", FIPS PUB 197, http://www.nist.gov/aes/ [HMAC] Krawczyk, H., Bellare, M., and Canetti, R.: "HMAC: Keyed- hashing for message authentication". IETF RFC 2104, February 1997. [RFC1889] Schulzrinne, H., Casner, S., Frederick, R., Jacobson,V., "RTP: A Transport Protocol for Real-Time Applications", IETF RFC 1889. [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", IETF RFC 2119, March 1997. [RFC2401] Kent, S., and R. Atkinson, "Security Architecture for IP", IETF RFC 2401, November 1998. Baugher, et al. [Page 42]
INTERNET-DRAFT SRTP February, 2002 [RFC2675] Borman, D., Deering, S., Hinden, R., "IPv6 Jumbograms", IETF RFC 2675, August 1999. [RFC2828] Shirey, R., "Internet Security Glossary", IETF RFC 2828, May 2000. Informative [BDJR] Bellare, M., Desai, A., Jokipii, E., and Rogaway, P., "A Concrete Treatment of Symmetric Encryption: Analysis of DES Modes of Operation", Proceedings 38th IEEE FOCS, pp. 394-403, 1997. [C99] Crowell, W. P., "Introduction to the VENONA Project", http://www.nsa.gov:8080/docs/venona/index.html. [CTR] Morris Dworkin, NIST Special Publication 800-38A, "Recommendation for Block Cipher Modes of Operation: Methods and Techniques", 2001. Online at http://csrc.nist.gov/publications/nistpubs/800-38a/sp800- 38a.pdf. [ES3D] ETSI SAGE 3GPP Standard Algorithms Task Force, "Security Algorithms Group of Experts (SAGE); General Report on the Design, Specification and Evaluation of 3GPP Standard Confidentiality and Integrity Algorithms", Public report, Draft Version 1.0, Dec 1999. [ES3E] ETSI SAGE 3GPP Standard Algorithms Task Force, "Security Algorithms Group of Experts (SAGE) Report on the Evaluation of 3GPP Standard Confidentiality and Integrity Algorithms", Public report, Draft Version 1.0, Dec 1999. [HAC] Menezes, A., Van Oorschot, P., and Vanstone, S., "Handbook of Applied Cryptography", CRC Press, 1997, ISBN 0-8493-8523-7. [H80] Hellman, M. E., "A cryptanalytic time-memory trade-off", IEEE Transactions on Information Theory, July 1980, pp. 401-406. [KSYH] Kang, J-S., Shin, S-U., Hong, D., and Yi, O., Provable Security of KASUMI and 3GPP Encryption Mode f8, Proceedings Asiacrypt 2001, Springer Verlag LNCS 2248, pp. 255-271, 2001. [MF00] McGrew, D., and Fluhrer, S., "Attacks on Encryption of Redundant Plaintext and Implications on Internet Security", the Proceedings of the Seventh Annual Workshop on Selected Areas in Cryptography (SAC 2000), Springer-Verlag. Baugher, et al. [Page 43]
INTERNET-DRAFT SRTP February, 2002 [RK99] Rescorla, E., and Korver, B., "Guidelines for Writing RFC Text on Security Considerations," draft-rescorla-sec-cons- 00.txt [PCST1] Perrig, A., Canetti, R., Tygar, D., Song, D., "Efficient and Secure Source Authentication for Multicast", in Proc. of Network and Distributed System Security Symposium NDSS 2001, pp. 35-46, 2001. [PCST2] Perrig, A., Canetti, R., Tygar, D., Song, D., "Efficient Authentication and Signing of Multicast Streams over Lossy Channels", in Proc. of IEEE Security and Privacy Symposium S&P2000, pp. 56-73, 2000. [WC81] M. N. Wegman and J. L. Carter, "New Hash Functions and Their Use in Authentication and Set Equality", JCSS 22, 265-279, 1981. Appendix A: Pseudocode for Index Determination The following is an example of pseudocode for the algorithm to process an SRTP packet with sequence number SEQ and estimating its index i. In the following, signed arithmetic is assumed. if (s_l < 32,768) if (SEQ - s_l > 32,768) set v to (ROC-1) mod 2^32 else set v to ROC endif else if (s_l - 32,768 > SEQ) set v to (ROC+1) mod 2^32 else set v to ROC endif endif return SEQ + v*65,536 Appendix B: Test Vectors B.1 AES-f8 Test Vectors All values are in hexadecimal. Baugher, et al. [Page 44]
INTERNET-DRAFT SRTP February, 2002 SRTP PREFIX LENGTH : 0 RTP packet header : 806e5cba50681de55c621599 RTP packet payload : 70736575646f72616e646f6d6e657373 20697320746865206e65787420626573 74207468696e67 ROC : d462564a key : 234829008467be186c3de14aae72d62c salt key : 32f2870d key-mask (m) : 32f2870d555555555555555555555555 key XOR key-mask : 11baae0dd132eb4d3968b41ffb278379 IV : 006e5cba50681de55c621599d462564a IV' : 595b699bbd3bc0df26062093c1ad8f73 j : 0 IV' XOR j : 595b699bbd3bc0df26062093c1ad8f73 S(-1) : 00000000000000000000000000000000 S(-1) XOR IV' XOR j : 595b699bbd3bc0df26062093c1ad8f73 S(0) : 71ef82d70a172660240709c7fbb19d8e plaintext : 70736575646f72616e646f6d6e657373 ciphertext : 019ce7a26e7854014a6366aa95d4eefd j : 1 IV' XOR j : 595b699bbd3bc0df26062093c1ad8f72 S(0) : 71ef82d70a172660240709c7fbb19d8e S(0) XOR IV' XOR j : 28b4eb4cb72ce6bf020129543a1c12fc S(1) : 3abd640a60919fd43bd289a09649b5fc plaintext : 20697320746865206e65787420626573 ciphertext : 1ad4172a14f9faf455b7f1d4b62bd08f j : 2 IV' XOR j : 595b699bbd3bc0df26062093c1ad8f70 S(1) : 3abd640a60919fd43bd289a09649b5fc S(1) XOR IV' XOR j : 63e60d91ddaa5f0b1dd4a93357e43a8c S(2) : 584d14a591acfca846b3aa3a0ab50fec plaintext : 74207468696e67 ciphertext : 2c6d60cdf8c29b B.2 AES-CM Test Vectors Keystream segment length: 1044512 octets (65282 AES blocks) Key: 2B7E151628AED2A6ABF7158809CF4F3C Rollover Counter: 00000000 Sequence Number: 0000 SSRC: 00000000 Salt: F0F1F2F3F4F5F6F7F8F9FAFBFCFD0000 Baugher, et al. [Page 45]
INTERNET-DRAFT SRTP February, 2002 Offset: F0F1F2F3F4F5F6F7F8F9FAFBFCFD0000 Counter Keystream F0F1F2F3F4F5F6F7F8F9FAFBFCFD0000 E03EAD0935C95E80E166B16DD92B4EB4 F0F1F2F3F4F5F6F7F8F9FAFBFCFD0001 D23513162B02D0F72A43A2FE4A5F97AB F0F1F2F3F4F5F6F7F8F9FAFBFCFD0002 41E95B3BB0A2E8DD477901E4FCA894C0 ... ... F0F1F2F3F4F5F6F7F8F9FAFBFCFDFEFF EC8CDF7398607CB0F2D21675EA9EA1E4 F0F1F2F3F4F5F6F7F8F9FAFBFCFDFF00 362B7C3C6773516318A077D7FC5073AE F0F1F2F3F4F5F6F7F8F9FAFBFCFDFF01 6A2CC3787889374FBEB4C81B17BA6C44 Nota Bene: this test case is contrived so that the latter part of the keystream segment coincides with the test case in Section F.5.1 of [CTR]. B.3 TMMH Test Vectors This section provides test vectors which can be used to test an implementation of TMMH. The key, message, and outputs are expressed as octet sequences, with each octet in hexadecimal. TAG_WORDS: 2 key: { e627 6a01 5ea7 f27a c536 2192 11be ea35 db9d 63d6 fa8a fc45 e08b d216 ced2 7853 1a82 22f5 90fb 1c29 708e d06f 82c3 bee6 4f21 6f33 65c0 d211 c25e 9138 4fa3 7c1f 61ac 3489 2976 8c19 8252 ddbf cad3 c28f 68d6 58dd 504f 2bbf 0278 70b7 cfca } L: { e627 6a01 } A[0]: { 5ea7 f27a c536 2192 11be ea35 db9d 63d6 fa8a } A[1]: { fc45 e08b d216 ced2 7853 1a82 22f5 90fb 1c29 } A[2]: { 708e d06f 82c3 bee6 4f21 6f33 65c0 d211 c25e } A[3]: { 9138 4fa3 7c1f 61ac 3489 2976 8c19 8252 ddbf } A[4]: { cad3 c28f 68d6 58dd 504f 2bbf 0278 70b7 cfca } message: { 6015 f141 5ba1 29a0 f604 0d1c 02d9 aa8a 7931 } tag: { 8a82 4bb0 } This Internet-Draft expires in July 2002. Baugher, et al. [Page 46]