Internet Engineering Task Force                    Baugher, McGrew,
        AVT Working Group                                      Oran (Cisco)
        INTERNET-DRAFT                              Blom, Carrara, Naslund,
        EXPIRES: July 2002                               Norrman (Ericsson)
                                                              February 2002
                     The Secure Real Time Transport Protocol
     Status of this memo
        This document is an Internet-Draft and is in full conformance with
        all provisions of Section 10 of RFC2026.
        Internet-Drafts are working documents of the Internet Engineering
        Task Force (IETF), its areas, and its working groups. Note that
        other groups may also distribute working documents as Internet-
        Internet-Drafts are draft documents valid for a maximum of six
        months and may be updated, replaced, or obsoleted by other documents
        at any time. It is inappropriate to use Internet-Drafts as reference
        material or cite them other than as "work in progress".
        The list of current Internet-Drafts can be accessed at
        The list of Internet-Draft Shadow Directories can be accessed at
        This document describes the Secure Real Time Transport Protocol
        (SRTP), a profile of the Real Time Transport Protocol (RTP), which
        can provide confidentiality, message authentication, and replay
        protection to the RTP/RTCP traffic.
        SRTP can achieve high throughput and low packet expansion. SRTP
        proves to be a suitable protection for heterogeneous environments,
        i.e., environments including both wired and wireless links. To get
        such features, default transforms are described, based on an
        additive stream cipher for encryption, a keyed-hash based function
        for message authentication, and an 'implicit' index for
        sequencing/synchronization based on the RTP sequence number for SRTP
        and an index number for Secure RTCP (SRTCP).
     INTERNET-DRAFT                    SRTP                  February, 2002
     1. Notational Conventions............................................3
     2. Goals.............................................................3
     3. SRTP Framework....................................................4
      3.1 SRTP Cryptographic Contexts.....................................5
        3.1.1 Transform-independent parameters............................6
        3.1.2 Transform-dependent parameters..............................7
        3.1.3 Mapping SRTP Packets to Cryptographic Contexts..............8
      3.2 SRTP Packet Processing..........................................8
        3.2.1 Packet Index Determination, and ROC, s_l Update............10
        3.2.2 Replay Protection..........................................12
      3.3 Secure RTCP....................................................12
     4. Pre-Defined Cryptographic Transforms.............................16
      4.1 Encryption.....................................................16
        4.1.1 AES in Counter Mode........................................18
        4.1.2 AES in f8-mode.............................................19
        4.1.3 NULL Cipher................................................21
      4.2 Message Authentication and Integrity...........................21
        4.2.1. HMAC/SHA1.................................................22
        4.2.2 TMMH.......................................................22
      4.3 Key Derivation.................................................25
        4.3.1 Key Derivation Algorithm...................................25
        4.3.2 SRTCP Key Derivation.......................................27
        4.3.3 AES-CM PRF.................................................27
     5. Default and Mandatory Transforms.................................27
      5.1 Encryption: AES-CM and NULL....................................27
      5.2 Message Authentication/Integrity: HMAC/SHA1....................27
      5.3 Key Derivation: AES-CM PRF.....................................27
     6. SRTP/SRTCP Parameters............................................28
     7. Adding SRTP Transforms...........................................28
     8. Rationale........................................................29
      8.1 Key derivation.................................................29
      8.2 Salting key....................................................29
      8.3 TMMH: Message Integrity from Universal Hashing.................30
      8.4 Data Origin Authentication Considerations......................30
     9. Key Management Considerations....................................30
     10. Security Considerations.........................................32
      10.1 SSRC collision and two-time pad...............................32
      10.2 Key Usage.....................................................33
      10.3 Confidentiality of the RTP Payload............................34
      10.4 Confidentiality of the RTP Header.............................34
      10.5 Integrity of the RTP packet...................................35
        10.5.1 Integrity of the RTP header: IHA..........................35
     11. Interaction with Forward Error Correction mechanisms............36
     12. Scenarios.......................................................36
      12.1 Two-party Unicast.............................................37
        12.1.1 One bi-directional RTP stream.............................37
        12.1.2 One master key per party..................................37
      12.2 Multicast.....................................................38
     Baugher, et al.                                               [Page 2]

     INTERNET-DRAFT                    SRTP                  February, 2002
        12.2.1 Small conference with one sender..........................38
        12.2.2 Large multicast with one sender...........................39
      12.3 Re-keying and access control..................................39
      12.4 Summary of basic scenarios....................................41
     13. IANA Considerations.............................................41
     14. Acknowledgements................................................41
     15. Author's Addresses..............................................41
     16. References......................................................42
     Appendix A: Pseudocode for Index Determination......................44
     Appendix B: Test Vectors............................................44
      B.1 AES-f8 Test Vectors............................................44
      B.2 AES-CM Test Vectors............................................45
      B.3 TMMH Test Vectors..............................................46
     1. Notational Conventions
        The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
        "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
        document are to be interpreted as described in [RFC2119].
        Terminology is conform to [RFC2828].
        By convention, the left most bit (byte) is the most significant one.
        By XOR we mean bitwise addition modulo 2 of binary strings, and ||
        denotes concatenation. In other words, if C = A || B, then the most
        significant bits of C are the bits of A, and the least significant
        bits of C equal the bits of B. Hexadecimal numbers are prefixed by
     2. Goals
        The security goals for SRTP are to ensure:
        * the confidentiality of the RTP and RTCP payloads, and
        * the integrity protection of the entire RTP and RTCP packets,
          together with protection against replayed packets.
        These security services are optional and independent from each
        other, except that SRTCP integrity protection is mandatory.
        Other, functional, goals for the protocol are:
        * a framework that permits upgrading with new cryptographic
        * low bandwidth cost, i.e., a framework preserving RTP header
          compression efficiency,
        and, asserted by the pre-defined transforms:
     Baugher, et al.                                               [Page 3]

     INTERNET-DRAFT                    SRTP                  February, 2002
        * a low computational cost,
        * a small footprint (i.e. small code size and data memory for keying
          information and replay lists),
        * limited packet expansion to support the bandwidth economy goal,
        * independence from the underlying transport, network, and physical
          layers used by RTP, in particular high tolerance to packet loss
          and re-ordering, and robustness to transmission bit-errors in the
          encrypted payload.
        These properties ensure that SRTP is a suitable protection scheme
        for RTP/RTCP in both wired and wireless scenarios.
     3. SRTP Framework
        RTP is the Real Time Transport Protocol [RFC1889]. We define SRTP as
        a profile of RTP, in a way analogous to RFC1890 which defines the
        audio/video profile for RTP. Conceptually, we consider it to be a
        'bump in the stack' implementation which resides between the RTP
        application and the transport layer, which intercepts RTP packets
        and then forwards an equivalent SRTP packet on the sending side, and
        which intercepts SRTP packets and passes an equivalent RTP packet up
        the stack on the receiving side.
        The format of an SRTP packet is illustrated in Figure 1.
        The Encrypted Portion of an SRTP packet consists of the encryption
        of the RTP payload of the equivalent RTP packet. (Our use of the
        word 'encryption' includes also the possibility of a 'NULL'-
        The optional MKI and optional authentication tag are the only fields
        defined by SRTP that are not in RTP. Only 8-bit alignment is
        MKI (Master Key Identifier): variable length, optional
               The MKI is defined, signaled, and used by key management.
               The MKI identifies the master key from which the session
               key(s) were derived that authenticate and/or encrypt the
               particular packet.  Note that the MKI SHALL NOT identify the
               SRTP cryptographic context, which is identified according to
               Section 3.1.3.  The MKI MAY be used by key management for the
               purposes of re-keying and identifies a particular master key
               within the cryptographic context, viz. Section 3.1.1.
        Authentication tag: variable length, optional
     Baugher, et al.                                               [Page 4]

     INTERNET-DRAFT                    SRTP                  February, 2002
               The authentication tag shall be used to carry message
               authentication data. The Authenticated Portion of an SRTP
               packet consists of the RTP header followed by the Encrypted
               Portion of the SRTP packet. Thus, note that if both
               encryption and authentication are applied, encryption SHALL
               be applied before authentication on the sender side and
               conversely on the receiver side. The authentication tag
               provides authentication of the RTP header and payload, and it
               indirectly provides replay protection by authenticating the
               sequence number.
          0                   1                   2                   3
          0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
     |   |V=2|P|X|  CC   |M|     PT      |       sequence number         |
     |   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |   |                           timestamp                           |
     |   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |   |           synchronization source (SSRC) identifier            |
     |   +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
     |   |            contributing source (CSRC) identifiers             |
     |   |                               ....                            |
     |   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |   |                   RTP extension (optional)                    |
     | +>+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     | | |                                                               |
     | | |                           payload                             |
     | | |                             ....                              |
     | | ~                     SRTP MKI (optional)                       ~
     | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     | | ~                  authentication tag (optional)                ~
     | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     | |
     | +- Encrypted Portion
     +---- Authenticated Portion
        Figure 1.  The format of an SRTP packet.
     3.1 SRTP Cryptographic Contexts
        Each SRTP session requires the sender and receiver to maintain
        cryptographic state information. This information is called the
        cryptographic context.
        By a session key, we mean a key which is used directly in a
        cryptographic transform (e.g. encryption or message authentication),
        and by a master key, we mean a random bit string (given by the key
        management protocol) from which session keys are derived in a
        cryptographically secure way.
     Baugher, et al.                                               [Page 5]

     INTERNET-DRAFT                    SRTP                  February, 2002
     3.1.1 Transform-independent parameters
        The transform-independent parameters of the cryptographic context
        for SRTP consist of:
        * a 32-bit unsigned rollover counter (ROC), which records how many
          times the 16-bit RTP sequence number has been reset to zero after
          passing through 65,535. Unlike the sequence number (SEQ), which
          SRTP extracts from the RTP packet header, the ROC is maintained by
          SRTP as described in Section 3.2.1.
          We define the index of the SRTP packet corresponding to a given
          ROC and RTP sequence number to be the 48-bit quantity
              i = 2^16 * ROC + SEQ.
        * for the receiver only, a 16-bit sequence number s_l, which is the
          last received sequence number (possibly authenticated, if
          authentication is provided),
        * an identifier for the encryption algorithm, i.e., the cipher and
          its mode of operation, and related parameters (when encryption is
        * an identifier for the message authentication algorithm, and
          related parameters (when authentication is provided),
        * a replay list, maintained by the receiver only (when
          authentication and replay protection are provided), containing
          indexes of recently received and authenticated SRTP packets,
        * an indicator (0/1) as to whether an MKI is present in SRTP and
          SRTCP packets,
        * if the MKI indicator is set to one, the length (in bytes) of the
          MKI field, and (for the sender) the actual value of the currently
          active MKI,
        (the value of the last three MKI-related parameters above MUST be
        kept fixed for the life-time of the context)
        * the master key(s),
        * for each master key, means to maintain a count of the number of
          SRTP packets that has been processed with that master key
          (essential for security, see Sections 3.2.1 and 10), either in the
          form of an explicit counter, or, the value of the first SRTP index
          for which the key was used,
        * non-negative integers n_e, and n_a, determining the length of the
          session keys for encryption, and message authentication.
     Baugher, et al.                                               [Page 6]

     INTERNET-DRAFT                    SRTP                  February, 2002
        The master key(s) MUST be random and kept secret. In addition, for
        each master key, SRTP MAY choose to specify the following associated
        * a master salt, to be used in the key derivation of session keys.
          This value, when used, MUST be random, but MAY be public. Use of
          master salt is strongly recommended, see Section 10.2. A 'NULL'-
          salt is treated as 00...0.
        * an integer in the set {1,2,4,...,2^16}, the 'key_derivation_rate',
          where an unspecified value is treated as zero,
        * if the MKI-indicator is one, the actual MKI value for which the
          master key is valid,
        * <'From', 'To'> values, specifying the lifetime for a master key,
          expressed in terms of the two 48-bit index values inside whose
          range (including the range end-points) the master key is valid.
          These values are absolute quantities, not relative.
        SRTCP by default uses the same cryptographic context parameters,
        * no rollover counter or s_l-value needs to be maintained as the
          RTCP index is explicitly carried in each SRTCP packet,
        * a separate replay list is maintained (when replay protection is
        * SRTCP maintains a separate counter
          for its master key (even if the master key is the same as that for
          SRTP, see below) as a means to maintain a count of the number of
          SRTCP packets that have been processed with that key (c.f. above).
        Note in particular that the master key(s) MAY be shared between SRTP
        and SRTCP, if the pre-defined transforms (including the key
        derivation) are used but the session key(s) MUST NOT be so shared.
     3.1.2 Transform-dependent parameters
        All encryption, authentication/integrity, and key derivation
        parameters are defined in the Transforms section dedicated to the
        particular encryption, authentication, or key derivation transform
        (see Section 4). Typical examples of such parameters are block size
        of ciphers, session keys, data for IV formation, etc. We note again
        (it cannot be stressed enough) that SRTP and SRTCP MUST use distinct
        (pseudo-)random session keys. Future SRTP transform specifications
        MUST include a section to list the additional cryptographic
        context's parameters for that transform, if any.
     Baugher, et al.                                               [Page 7]

     INTERNET-DRAFT                    SRTP                  February, 2002
     3.1.3 Mapping SRTP Packets to Cryptographic Contexts
        Recall that an RTP session for each participant is defined [RFC1889]
        by a pair of destination transport addresses (one network address
        plus a port pair for RTP and RTCP), and that a multimedia session is
        defined as a collection of RTP sessions. For example, a particular
        multimedia session could include an audio RTP session, a video RTP
        session, and a text RTP session.
        A cryptographic context shall be uniquely identified by the triplet
        context identifier:
        context id = <SSRC, destination network address, destination
        transport port number>,
        where the destination network address and the destination transport
        port are the ones in the current RTP packet (for the sender) or SRTP
        packet (for the receiver). It is assumed that, when presented with
        this information, the key management returns a context with the
        information as described in Section 3.1.
        As noted above, SRTP and SRTCP by default shares the bulk of the
        parameters in the cryptographic context. Thus, retrieving the crypto
        context parameters for an SRTCP stream in practice may imply a
        binding to the correspondent SRTP crypto context. It is up to the
        implementation to assure such binding, since the RTCP port may not
        be directly deducible from the RTP port only. Alternatively, the key
        management MAY choose to provide separate SRTP- and SRTCP-contexts,
        duplicating the common parameters (such as master key(s)). The
        latter approach then also enables SRTP and SRTCP to use, e.g.,
        distinct transforms, if so desired.
        If no valid context can be found for a packet corresponding to a
        certain context identifier, that packet MUST be discarded from
        further processing.
     3.2 SRTP Packet Processing
        The following applies to SRTP. SRTCP is described in Section 3.3.
        Assuming initialization of the cryptographic context(s) has taken
        place via key management, and as described in Section 3.2.1, to
        construct a proper SRTP packet, given an RTP packet, the sender has
        to do the following:
        1. Determine which cryptographic context to use as described in
        Section 3.1.3.
     Baugher, et al.                                               [Page 8]

     INTERNET-DRAFT                    SRTP                  February, 2002
        2. Determine the index of the SRTP packet as described in Section
        3.2.1, using the rollover counter in the cryptographic context and
        the sequence number in the RTP packet.
        3. Determine the master key and master salt. If the MKI indicator in
        the context is set to one, this is done using the current MKI in the
        cryptographic context, otherwise, the index determined in the
        previous step is used.
        4. Determine the session keys and salt (if used by the transform) as
        described in Section 4.3, using master key, master salt,
        key_derivation_rate and session key-lengths in the cryptographic
        context and the index, determined in Steps 2 and 3.
        5. If encryption is provided, encrypt the RTP payload to produce the
        Encrypted Portion of the packet (see Section 4.1, for the defined
        ciphers), using the encryption algorithm indicated in the
        cryptographic context, the session encryption key and salt (if used)
        found in Step 4, and the index found in Step 2.
        6. If the MKI indicator is set to one, append the MKI to the packet.
        7. If message authentication is provided, compute the authentication
        tag for the Authenticated Portion of the packet, as described in
        Section 4.2, using the current rollover counter (if used by the
        transform), the authentication algorithm indicated in the
        cryptographic context, and the session authentication key found in
        Step 4. Append the authentication tag to the packet.
        8. If necessary, update the ROC as in Section 3.2.1, using the
        packet index determined in Step 2.
        To authenticate and decrypt a SRTP packet, the receiver has to do
        the following:
        1. Determine which cryptographic context to use as described in
        Section 3.1.3.
        2. Estimate the index of the SRTP packet from the rollover counter
        in the cryptographic context and the sequence number in the SRTP
        packet, as described in Section 3.2.1.
        3. Determine the master key and master salt. If the MKI indicator in
        the context is set to one, this is done using the MKI in the SRTP
        packet, otherwise, the index from the previous step is used.
        4. Determine the session keys, and session salt (if used by the
        transform) as described in Section 4.3, using master key,
        key_derivation_rate and session key-lengths in the cryptographic
        context and the index, determined in Steps 2 and 3.
     Baugher, et al.                                               [Page 9]

     INTERNET-DRAFT                    SRTP                  February, 2002
        5. If message authentication and replay protection are provided,
        first check if the packet has been replayed, as described in Section
        3.2.2, using the Replay List in the context and the index as
        determined in Step 2. If the packet is judged to be replayed, then
        the packet MUST be discarded, and the event SHOULD be logged.
        Next, perform verification of the authentication tag, using the
        index (rollover counter when used by the transform) from Step 2, the
        authentication algorithm indicated in the cryptographic context, and
        the session authentication key from Step 4. If the result is
        'AUTHENTICATION FAILURE' (see Section 4.2), the packet MUST be
        discarded from further processing and the event SHOULD be logged.
        6. If encryption is provided, decrypt the Encrypted Portion of the
        packet (see Section 4.1, for the defined ciphers), using the
        decryption algorithm indicated in the cryptographic context, the
        session encryption key and salt found in Step 4, and the index from
        Step 2.
        7. Update the rollover counter and last sequence number, s_l, in the
        cryptographic context as in Section 3.2.1, using the packet index
        estimated in Step 2. If replay protection is provided, also update
        the Replay List as described in Section 3.2.2.
        8. When applicable, delete the MKI and authetication tag fields from
        the packet.
     3.2.1 Packet Index Determination, and ROC, s_l Update
        SRTP implementations use an 'implicit' packet index for sequencing,
        i.e., not all of the index is explicitly carried in the SRTP packet,
        as described below. For the pre-defined transforms, the index i is
        used in replay protection (Section 3.2.3), encryption and message
        authentication (Sections 4.1 and 4.2), and for the key derivation
        (Section 4.3). It MAY also be used to determine the correct master
        key as indicated above.
        When the session starts, the sender side MUST set the rollover
        counter, ROC, to zero. Each time the RTP sequence number, SEQ, wraps
        modulo 2^16, the sender side MUST increment ROC by one, modulo 2^32
        (see security aspects below). The sender's packet index is then
        defined as
           i = 2^16 * ROC + SEQ.
        Receiver-side implementations use the RTP sequence number to
        estimate the correct index. That is, estimating the location in the
        sequence of all SRTP packets. Here, the index is defined as 2^16 * v
        + SEQ, where the RTP sequence number is SEQ, and v is an estimate
        for the current value of the rollover counter, ROC. This estimate is
     Baugher, et al.                                              [Page 10]

     INTERNET-DRAFT                    SRTP                  February, 2002
        based on SEQ, a previous estimate for ROC and the value s_l. The
        latter two are maintained locally by the receiver as described
        A robust approach for the proper use of a rollover counter for the
        pre-defined transforms requires its handling and use to be well
        defined. In particular, out-of-order RTP packets with sequence
        numbers close to 2^16 or zero must be properly dealt with.
        Initially, the receiver MUST be given the current ROC value from the
        sender using out of band signaling (or ROC is zero at the beginning
        of the session), see Section 9. Furthermore, the receiver SHALL
        initialize s_l to the RTP sequence number (SEQ) of the first
        observed SRTP packet.
        On consecutive SRTP packets, the receiver MAY estimate the index as
              i = 2^16 * v + SEQ,
       where v is chosen from the set { ROC-1, ROC, ROC+1 } (modulo 2^32)
       such that i is closest (in modulo 2^48 sense) to the value 2^16 * ROC
       + s_l.
        After the packet has been processed using the estimated index, the
        receiver MUST decide if s_l and ROC should be updated. For instance,
        a simple (but not error robust) method is to simply set s_l to SEQ
        and, if the value v = ROC+1 was used, to update ROC to v.
        Caveat: if message authentication is not present, neither the
        initialization of s_l, nor the ROC update can be made completely
        robust on the receiving side.
        After a re-keying (changing to a new master key) occurs, the roll-
        over counter maintains its sequence of values, i.e., it MUST NOT be
        reset to zero, to avoid inconsistencies in key life-times.
        As the rollover counter is 32 bits long and the sequence number is
        16 bits long, the maximum number of packets that can be secured with
        the same key is 2^48 using the pre-define transforms. After that
        number of SRTP packets have been sent with a given (master or
        session) key, the sender MUST not send any more packets with that
        key. (There exists a similar limit for SRTCP, which in practice may
        be more restrictive, see Section 3.3 and the summary in Section
        10.2.) This limitation enforces a security benefit by providing an
        upper bound on the amount of traffic that can pass before
        cryptographic keys are changed. Re-keying (see Section 9) MUST be
        triggered, before this amount of traffic, and MAY be triggered
        earlier, e.g., for increased security and access control to media.
        Re-occurring key derivation, as determined by a non-zero
        key_derivation_rate (see Section 4.3), also gives stronger security,
     Baugher, et al.                                              [Page 11]

     INTERNET-DRAFT                    SRTP                  February, 2002
        but does not change the above absolute maximum value, i.e. the
        master key shall still be used for a maximum of 2^48 SRTP packets
        (or 2^31 SRTCP packets, see below).
        The receiver's 'implicit index' approach works for the pre-defined
        transforms as long as the reorder and loss of the packets are not
        too great and bit-errors do not occur in unfortunate ways. In
        particular, 2^15 packets would need to be lost, or a packet would
        need to be 2^15 packets out of sequence in order for synchronization
        to be lost. Such drastic loss or reorder is likely to disrupt the
        RTP application itself.
     3.2.2 Replay Protection
        Secure replay protection is only possible when integrity protection
        is present. It is RECOMMENDED to use replay protection, both for RTP
        and RTCP, as integrity protection alone cannot assure security
        against replay attacks.
        A packet is 'replayed' when it is stored by an adversary, and then
        re-injected into the network. SRTP provides protection against such
        attacks whenever message authentication is provided, through the
        storage of the indices of the most recently received and
        authenticated packets.
        Each SRTP receiver maintains a Replay List, which conceptually
        contains the indices of all of the packets which have been received
        and authenticated. In practice, the list can use a 'sliding window'
        approach, so that a fixed amount of storage suffices for replay
        protection. Packet indices which lag behind the packet index in the
        context by more than SRTP-WINDOW-SIZE can be assumed to have been
        received, where SRTP-WINDOW-SIZE is a receiver-side, implementation-
        dependent parameter and MUST be at least 64, but which MAY be set to
        a higher value.
        The receiver checks the index of an incoming packet against the
        replay list and the window. Only packets with index ahead of the
        window, or, inside the window but not already received, SHALL be
        After the packet has been (successfully) authenticated (if necessary
        the window is first moved ahead) the replay list SHALL be updated
        with the new index.
        The Replay List can be efficiently implemented by using a bitmap to
        represent which packets have been received, as described in the
        Security Architecture for IP [RFC2401].
     3.3 Secure RTCP
     Baugher, et al.                                              [Page 12]

     INTERNET-DRAFT                    SRTP                  February, 2002
     0                   1                   2                   3
          0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
     |   |V=2|P|    RC   |   PT=SR or RR   |             length          |
     |   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |   |                         SSRC of sender                        |
     | +>+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
     | | |                              ...                              |
     | | |                          sender info                          |
     | | |                              ...                              |
     | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     | | |                              ...                              |
     | | |                         report block 1                        |
     | | |                              ...                              |
     | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     | | |                              ...                              |
     | | |                         report block 2                        |
     | | |                              ...                              |
     | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     | | |                                                               |
     | | |                              ...                              |
     | | |                                                               |
     | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     | | |V=2|P|    SC   |  PT=SDES=202  |             length            |
     | | +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
     | | |                          SSRC/CSRC_1                          |
     | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     | | |                           SDES items                          |
     | | |                              ...                              |
     | | +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
     | | |                                                               |
     | | |                              ...                              |
     | | |                                                               |
     | +>+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
     | | |E|                         SRTCP index                         |
     | | ~                     SRTP MKI (optional)                       ~
     | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     | | ~                     authentication tag                        ~
     | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     | |
     | +-- Encrypted Portion
     +---- Authenticated Portion
        Figure 2.  An example of the format of a Secure RTCP packet,
        consisting of an underlying RTCP compound packet with a Report and
        SDES packet.
        Secure RTCP follows the definition of Secure RTP. SRTCP adds three
        new fields to the RTCP packet definition, the SRTCP index, an
     Baugher, et al.                                              [Page 13]

     INTERNET-DRAFT                    SRTP                  February, 2002
        'encrypt-flag', and the authentication tag. Those fields MUST be
        appended to an RTCP packet when at least integrity protection (which
        is mandatory) is applied to the RTCP packet, in order to form an
        equivalent SRTCP packet, so that the added fields follow any other
        profile specific extensions. SRTCP adds an optional fourth field,
        the MKI, which functions according ot the MKI definition in section
        3.  An SRTCP packet is illustrated in Figure 2.
        According to [RFC1889] there is a 'recommended' packet format for
        compound packets. SRTCP MUST be given packets according to that
        recommendation in the sense that the first part MUST be a
        send/receive report. However, the so-called encryption prefix
        (Section 6.1 of [RFC1889]), a random 32-bit quantity intended to
        deter known plaintext attacks, MUST NOT be used (see below).
        The Encrypted Portion of an SRTCP packet consists of the encryption
        of the RTCP payload of the equivalent compound RTCP packet, from the
        first RTCP packet, i.e., from the ninth (9) byte to the end of the
        compound packet. The Authenticated Portion of an SRTCP packet
        consists of the entire equivalent (eventually compound) RTCP packet,
        the E flag, SRTCP index (after any encryption has been applied to
        the payload).
        The added fields are:
        E-flag: 1 bit, mandatory
               The E-flag indicates if the current SRTCP packet is encrypted
               or unencrypted. Section 9.1 of [RFC1889] allows the split of
               a compound RTCP packet into two lower-layer packets, one to
               be encrypted and one to be sent in the clear. The E bit set
               to '1' indicates encrypted packet, and '0' indicates non-
               encrypted packet.
       SRTCP index: 31 bits, mandatory
               The SRTCP index is a 31-bit counter for the SRTCP packet. The
               index is explicitly included in each packet, in contrast to
               the 'implicit' index approach used for SRTP. The SRTCP index
               MUST be set to zero before the first SRTCP packet is sent,
               and MUST be incremented by one, modulo 2^31, after each SRTCP
               packet is sent. In particular, after a re-key, the SRTCP
               index MUST NOT be reset to zero again (c.f. Section 3.2.1).
        Authentication Tag: variable length, mandatory
               The authentication tag shall be used to carry message
               authentication data.
        The optional field is the variable-length MKI (see Section 3).
        SRTCP uses the cryptographic context parameters and packet
        processing of SRTP, with the following changes:
     Baugher, et al.                                              [Page 14]

     INTERNET-DRAFT                    SRTP                  February, 2002
        * The receiver need not to 'estimate' the index, as it is explicitly
        signaled in the packet.
        * If the MKI indicator in the cryptographic context is zero, the
        master keys is determined by the current SRTP index, even though
        SRTCP has its own index. Since the SRTCP source as with any SSRC in
        an SRTP session has its own sequence number space, the master key
        <From, To> lifetime MUST be based on the SRTP master key lifetime.
        The concomitant re-keying issues are discussed in sections 9 and 10.
        * Pre-defined SRTCP encryption is as defined in Section 4, but using
        the definition of the SRTCP Encrypted Portion as defined in this
        section, and using the SRTCP index as the index i. The encryption
        transform and related parameters SHALL by default be the same
        selected for the protection of the associated SRTP stream(s), while
        the NULL algorithm shall be applied to the RTCP packets not to be
        encrypted. Note that the master key and salt is shared between SRTP
        and SRTCP, but the (encryption) session key and salt will be
        distinct due to the key derivation definition (Section 4.3).
        The E-flag is assigned values by the sender depending on whether the
        packet was encrypted or not.
        * SRTCP decryption is performed as in Section 4, but only if the E
        flag is equal to 1. If so, the Encrypted Portion is decrypted, using
        the SRTCP index as the index i. In case the E-flag is 0, the payload
        is simply left unmodified.
        * SRTCP replay protection is as defined in Section 3.2.3, but using
        the SRTCP index as the index i, and as noted maintains a separate
        replay list specific to SRTCP.
        * The pre-defined SRTCP authentication tag is defined as in Section
        4, but with the Authenticated Portion of the SRTCP packet defined in
        this section (which includes the index). The authentication
        transform and related parameters (e.g., key size) SHALL by default
        be the same as selected for the protection of the associated SRTP
        stream(s). (Exception: when SRTP is not authenticated, the default
        authentication transform MUST be used for SRTCP.) Note that the
        master key is shared between SRTP and SRTCP, but the
        (authentication) session key will be distinct due to the key
        derivation definition (Section 4.3).
     * In the last step of the processing, only the sender needs to update
     the value of the SRTCP index by incrementing it modulo 2^31 (and for
     security reasons the sender MUST also check the number of RTCP packets
     processed, see below).
        There MAY also exist some minor transform specific changes, see
        Section 4 for the defined transforms.
     Baugher, et al.                                              [Page 15]

     INTERNET-DRAFT                    SRTP                  February, 2002
        As noted, the encryption prefix (Section 6.1 of [RFC1889]), is not
        to be used because this mechanism supports ciphers that are not
        secure against known plaintext attacks.  Ciphers that are not secure
        against known-plaintext attacks SHOULD not be used to encrypt RTP
        messages. The pre-defined SRTP encryption uses a secure, additive
        stream cipher, and thus the prefix offers no benefit at all.
        The maximum number of SRTCP packets with a given session or master
        key is limited to 2^31. Due to for example re-keying, reaching this
        limit may or may not coincide with wrapping of the SRTCP index, and
        thus the sender MUST be able to deduce the packet count, e.g., as
        indicated before.  Also, since the session keys for SRTP and SRTCP
        are by default derived from the same master key, new session and
        master keys for both protocols MUST be obtained before any of the
        two protocols reaches its maximum key-usage limit (c.f. 3.2.1).
        Message authentication for RTCP is REQUIRED, as it is the control
        protocol (e.g., it has a BYE packet). Note also that the cost in
        total bandwidth for RTCP authentication is not as high as the one of
        RTP authentication, as the recommended session bandwidth allocated
        to RTCP is at most 5% and the RTCP packets are less frequent.
        However, when adding authentication to RTCP, the overhead in
        bandwidth SHOULD be considered (the bandwidth will be more than 5%).
        Note however, that large-scale multicast application of SRTCP might
        require careful consideration in the configuration and use, see
        Section 12. The security risks that can occur wherever SRTCP is not
        used, MUST be taken seriously under consideration.
     4. Pre-Defined Cryptographic Transforms
        While there are numerous encryption and message authentication
        algorithms that can be used in SRTP, we define below default
        algorithms in order to avoid the complexity of specifying the
        encodings for the signaling of algorithm and parameter identifiers.
        The defined algorithms have been chosen as they fulfill the goals
        listed in Section 2. Recommendations on how to extend SRTP with new
        transforms are given in Section 7.
     4.1 Encryption
        The following parameters are generic and common to all pre-defined,
        non-NULL, encryption transforms.
        * BLOCK CIPHER and mode are the block cipher used and its mode of
         operation (the default is AES in counter mode, see below)
        * n_b is the bit-size of the block for the block cipher
        * k_e is the session encrypting key
        * n_e is the bit-length of k_e (the default is 128 bits)
        * k_s is the so called session salting key
     Baugher, et al.                                              [Page 16]

     INTERNET-DRAFT                    SRTP                  February, 2002
        * n_s is the bit-length of k_s. n_s is at most n_b - 16 bits, and
         the default value is the maximum (n_b - 16).
        * SRTP_PREFIX_LENGTH is the octet length of the keystream prefix, an
         (at least) 8-bit non-negative integer, inferred from the message
         authentication code in use.
        The distinct session keys and salts for SRTP/SRTCP are by default
        derived as specified in Section 4.3.
        The encryption transforms defined in SRTP use a 'seekable' segmented
        keystream generator (KG), which for each secret key maps the SRTP
        packet index into a pseudorandom keystream segment, used to encrypt
        a single RTP packet. The process of encrypting a packet consists of
        generating the keystream segment corresponding to the packet, and
        then bitwise exclusive-oring that keystream segment onto the payload
        of the RTP packet to produce the Encrypted Portion of the SRTP
        packet. Decryption is done the same way, but swapping the roles of
        the plaintext and ciphertext.
        The definition of how the keystream is generated, given the index,
        depends on the cipher and its mode of operation. Below, two such
        keystream generators are defined. The NULL cipher is also defined,
        to be used when encryption of RTP is not required.
        The initial octets of each keystream segment MAY be reserved for
        use in a message authentication code, in which case the keystream
        used for encryption starts immediately after the last reserved
        octet. The initial reserved octets are called the keystream prefix
        (not to be confused with the so-called 'encryption prefix' of
        [RFC1889, Section 6.1]), and the remaining octets are called the
        keystream suffix. This process is illustrated in Figure 3.
        +----+   +------------------+---------------------------------+
        | KG |-->| Keystream Prefix |          Keystream Suffix       |---+
        +----+   +------------------+---------------------------------+   |
                                    +---------------------------------+   v
                                    | Encrypted Portion of RTP Packet |->(*)
                                    +---------------------------------+   |
                                    +---------------------------------+   |
                                    | Encrypted Portion of SRTP Packet|<--+
        Figure 3: Default SRTP Encryption Processing. Here KG denotes the
        keystream generator, and (*) denotes bitwise exclusive-or.
        The number of octets in the keystream prefix is denoted as
     Baugher, et al.                                              [Page 17]

     INTERNET-DRAFT                    SRTP                  February, 2002
        SRTP_PREFIX_LENGTH. The keystream prefix is reserved for use with
        certain message authentication transforms, such as the pre-defined
        TMMH transform (Section 4.2.2). The Prefix is indicated by a
        positive, non-zero value of this latter parameter. This means that,
        even if confidentiality is not to be provided, the keystream
        generator output MAY still need to be computed for packet
        authentication, in which case the default keystream generator (mode)
        SHALL be used.
        The default cipher is the Advanced Encryption Standard (AES), and we
        define two modes of running AES, Segmented Integer Counter Mode AES
        and AES in f8-mode. In the sequel, let E(k,x) be AES applied to key
        k and input block x. AES has (default) n_e = 128-bit key size and
        (always) n_b = 128-bit block size.
     4.1.1 AES in Counter Mode
        Conceptually, counter mode consists of encrypting successive
        integers. The actual definition is somewhat more complicated, in
        order to randomize the starting point of the integer sequence. Each
        packet is encrypted with a distinct keystream segment, which is
        computed as follows.
  Keystream Generation
        A keystream segment is the concatenation of the 128-bit output
        blocks of the AES cipher in the encrypt direction, using key k =
        k_e, in which the block indices are in increasing order.
        Symbolically, each keystream segment looks like
           E(k, IV) || E(k, IV + 1 mod 2^128) || E(k, IV + 2 mod 2^128) ...
        where the 128-bit integer value IV SHALL be defined by the SSRC, the
        SRTP packet index i, and the SRTP session salting key k_s, as below.
             IV = (k_s * 2^16) XOR (SSRC * 2^64) XOR (i * 2^16)
        The inclusion of the SSRC allows the use of the same key to protect
        distinct SRTP streams. Exploiting such features is conditioned by
        requirements, see the security caveats in Section 10.1. (In the case
        of SRTCP, the SSRC of the first header of the compound packet MUST
        be used, i SHALL be the 31-bit SRTCP index and k_s SHALL be replaced
        by the SRTCP session salt.)
        Note that the initial value, IV, is fixed for each packet. The
        number of blocks of keystream generated for any fixed value of IV
        MUST NOT exceed 2^16. The AES has a block size of 128 bits, so 2^16
        output blocks are sufficient to generate the 2^23 bits of keystream
        needed to encrypt the largest possible RTP packet (except for IPv6
     Baugher, et al.                                              [Page 18]

     INTERNET-DRAFT                    SRTP                  February, 2002
        'jumbograms' [RFC2675], which are not likely to be used for RTP-
        based multimedia traffic). This restriction on the maximum bit-size
        of the packet that can be encrypted ensures the security of the
        encryption method by limiting the effectiveness of probabilistic
        attacks [BDJR].
     4.1.2 AES in f8-mode
        To encrypt UMTS (Universal Mobile Telecommunications System, as 3G
        networks) data, a solution (see [ES3D]) known as the f8-algorithm
        has been developed. On a high level, the proposed scheme is a
        variant of Output Feedback Mode (OFB) [HAC], with a more elaborate
        initialization and feedback function. As in normal OFB, the core
        consists of a block cipher. We also define here the use of AES as a
        block cipher to be used in f8-mode for RTP encryption, with default
        128-bit key and block size.
        Figure 4 shows the structure of block cipher, E, running in what we
        shall call 'f8-mode of operation'.
                     |      |
                +--->|  E   |
                |    |      |
                |    +------+
                |        |
          m -> (*)       +-----------+-------------+--  ...     ------+
                |    IV' |           |             |                  |
                |        |   j=1 -> (*)    j=2 -> (*)   ...  j=L-1 ->(*)
                |        |           |             |                  |
                |        |      +-> (*)       +-> (*)   ...      +-> (*)
                |        |      |    |        |    |             |    |
                |        v      |    v        |    v             |    v
                |    +------+   | +------+    | +------+         | +------+
                |    |      |   | |      |    | |      |         | |      |
         k_e ---+--->|  E   |   | |  E   |    | |  E   |         | |  E   |
                     |      |   | |      |    | |      |         | |      |
                     +------+   | +------+    | +------+         | +------+
                         |      |    |        |    |             |    |
                         +------+    +--------+    +--  ...  ----+    |
                         |           |             |                  |
                         v           v             v                  v
                        S(0)        S(1)          S(2)  . . .       S(L-1)
        Figure 4. f8-mode of operation (asterisk, (*), denotes bitwise XOR).
     Baugher, et al.                                              [Page 19]

     INTERNET-DRAFT                    SRTP                  February, 2002
        The figure represents the KG in Figure 3, when AES-f8 is used.
  f8 Keystream Generation
        As above, let E(k_e,x) be the 128-bit output of AES in the encrypt
        direction when applied to the key k_e and n_b = 128-bit plaintext
        block x. The Initialization Vector (IV) is determined as described
        in Section
        Let IV', S(j), and m denote n_b-bit blocks, determined below. The
        keystream, S(0) || ... || S(L-1), for an N-bit message is defined by
        setting IV' = E(k_e XOR m, IV), and S(-1) = 00..0. For j = 0,1,..,L-
        1 where L = N/n_b (rounded up to nearest integer) compute
                 S(j) = E(k_e, IV' XOR j XOR S(j-1))
        Notice that the IV is not used directly. Instead it is fed through E
        under another key to produce an internal, 'masked' value (denoted
        IV') to prevent an attacker from gaining known input/output pairs.
        The role of the internal counter is to prevent short keystream
        cycles. The value of the key mask m is defined to be
                m = k_s || 0x555..5,
        i.e. the session salting key, appended by the binary pattern 0101..
        to fill out the entire desired key size, n_e.
        The maximum allowable packet size can be determined as follows. The
        AES has a block size of 128 bits, and assuming that AES behaves like
        a random function, it is (heuristically) secure to generate somewhat
        less than 2^64 output blocks, we suggest a maximum of 2^32 blocks,
        which is sufficient to generate 2^39 bits of keystream. For
        practical sizes of the RTP packets, much fewer blocks are required
        though, and the counter j above will often be sufficient if
        implemented as a 16-bit counter.
  f8 SRTP IV Formation
        The purpose of the following IV formation is to provide a feature
        which we call implicit header authentication (IHA), see Section
        The SRTP IV for 128-bit block AES-f8 is formed in the following way:
             IV = 0x00 || M || PT || SEQ || TS || SSRC || ROC
        M, PT, SEQ, TS, SSRC SHALL be taken from the RTP header; ROC is from
        the cryptographic context.
     Baugher, et al.                                              [Page 20]

     INTERNET-DRAFT                    SRTP                  February, 2002
        The presence of the SSRC as part of the IV allows AES-f8 to be used
        when a master key is shared between multiple streams, see Section
  f8 SRTCP IV Formation
        The SRTCP IV for 128-bit block AES-f8 is formed in the following
        IV = 0...0 || E || SRTCP index || V || P || RC || PT || length ||
        where V, P, RC, PT, length, SSRC SHALL be taken from the first
        header in the RTCP compound packet. E and SRTCP index are the 1- and
        31-bit fields added to the packet.
     4.1.3 NULL Cipher
        The NULL cipher is used when no confidentiality for RTP/RTCP is
        requested. The keystream can be thought of as "000..0", e.g., the
        encryption simply copies the plaintext input into the ciphertext
     4.2 Message Authentication and Integrity
        Common parameters:
        * k_a is the session message authentication key
        * n_a is the bit-length of the authentication key (the default for
          the default transform is 128 bits)
        * n_tag is the bit-length of the output authentication tag (the
          default is 32 bits)
        * SRTP_PREFIX_LENGTH is the octet length of the keystream prefix as
          defined above
        * M is the Authenticated Portion as specified in Section 3 for RTP
          and in Section 3.3 for RTCP.
        The distinct session authentication keys for SRTP/SRTCP are by
        default derived as specified in Section 4.3.
        The values of n_a, n_tag, and SRTP_PREFIX_LENGTH MUST be fixed for
        any particular fixed value of the key.
        Below we describe the process of computing authentication tags. The
        sender computes the tag of the Authenticated Portion and appends it
        to the packet. The SRTP receiver verifies a message/authentication
        tag pair as follows. A new authentication tag is computed over the
        Authenticated Portion using the selected algorithm and key, and it
        is compared to the tag associated with the received message. If the
        two tags are equal, then the message/tag pair is valid; otherwise,
     Baugher, et al.                                              [Page 21]

     INTERNET-DRAFT                    SRTP                  February, 2002
        it is invalid and the error audit message "AUTHENTICATION FAILURE"
        MUST be returned.
     4.2.1. HMAC/SHA1
        When HMAC/SHA1 is used, the SRTP_PREFIX_LENGTH is 0. For SRTP, the
        HMAC is applied to the concatenation of the Authenticated Portion of
        the packet (M) and the rollover counter in the cryptographic
        context, i.e. HMAC(k_a, M || ROC). For SRTCP, we apply HMAC to the
        corresponding M, only (as it already includes the SRTCP index). By
        default, the output shall be truncated to the n_tag left-most bits.
        Default value for n_a is 128 bits, and for n_tag it is 32 bits.
     4.2.2 TMMH
        This document describes TMMH version two, which is not interoperable
        with the earlier version. In the following, the term TMMH
        refers to version two.
        TMMH is a simple function that maps a key and a message to a hash
        value. This hash value is encrypted by combining it with the
        keystream prefix to make the authentication tag, as described below.
        The key, message, and hash value are treated as sequences of
        unsigned sixteen bit integers in network byte order. In the
        following, we call such a 16-bit integer a word. The number of
        octets in the key and hash value MUST be a multiple of two (to be
        word-aligned). (Thus, n_tag and n_a MUST be multiples of 16.)
        Besides the above common parameters, we define the following
        parameters for TMMH:
        * TAG_WORDS, the number of words in the hash value, i.e., n_tag/16.
          The default is 2.
        The value of TAG_WORDS also defines the quantity SRTP_PREFIX_LENGTH
        to be TAG_WORDS * 2 (see Section 4.1).
        The number of words of the key, i.e. n_a/16, depends on the maximum
        length of any message to be authenticated as follows:
               MAX_MSG_LENGTH (octets)      Key size (16-bit words)
                    16                        7 + 2 * TAG_WORDS
                   128                       14 + 3 * TAG_WORDS
                  1024                       21 + 4 * TAG_WORDS
                  8192                       28 + 5 * TAG_WORDS
                 65536                       35 + 6 * TAG_WORDS
     Baugher, et al.                                              [Page 22]

     INTERNET-DRAFT                    SRTP                  February, 2002
        For instance, an RTP packet of length 80 bytes to be authenticated
        with a 32-bit (2 word) tag, requires a 320-bit TMMH key. The default
        key-size is defined below. However, applications that know on
        beforehand the size of the longest message they will ever encounter
        MAY choose a smaller key-size.
        * TAG is the authentication tag, which is the output of TMMH
        * PREFIX is the keystream prefix for the current packet as defined
          in Section 4.1.
        * K is the key, i.e., k_a as obtained by applying the key   *
          derivation. The default key-size is 94 octets, or 752 bits (to
          accommodate messages up to 65536 byte with a 2-word tag).
        * MSG_LEN, the number of octets in the message (before padding, when
          all-zero padding is needed to align to a word boundary when the
          message contains an odd number of octets)
        * p is equal to the prime number 2^16 + 1.
        In the following, we use the symbol * to denote integer
        multiplication and the symbol +32 to denote integer addition modulo
        TMMH uses several key-dependent internal data structures: the length
        multiplier array L, and an array of subkeys A. The length multiplier
        array L is an array of words, the ith element of which is denoted as
        L[i], with i ranging from zero to (TAG_WORDS - 1). A subkey is an
        array consisting of (TAG_WORDS + 7) words, and the ith element of
        the subkey S is denoted as S[i]. Five subkeys are used in TMMH. The
        subkeys are stored in an array denoted A.The ith subkey is denoted
        as A[i], with i ranging from zero to 4.
         Key   |K[0]|K[1]|K[2]|K[3]|K[4]|K[5]|K[6]|K[7]|K[8]|K[9]|K[a]|...
         Field |L[0]|L[1]|                    A[0]                    |...
        Figure 5.  An illustration of how the arrays L and A are assigned
        from the words of the TMMH key K. In this example, TAG_WORDS is
        equal to two. Here K[i] denotes the ith word of the TMMH key (where
        i is a hexadecimal number). The field A[0] is the 0:th subkey.
        The length multiplier array L and the subkey array A are taken from
        the TMMH key K as follows.
     Baugher, et al.                                              [Page 23]

     INTERNET-DRAFT                    SRTP                  February, 2002
           1.  The value L[i] is set to K[i], for i = 0 to TAG_WORDS-1.
           2.  The value A[i][j] (the jth element of the ith subkey) is set
               to K[ TAG_WORDS + j + (TAG_WORDS + 7) * i ].
        This process is illustrated in Figure 5.
        We introduce the following notation: Let A_ij be the eight word
        vector A[i][j] ||  A[i][j + 1]  ||  ... ||  A[i][j + 7].
        The function V(S, M) defined below maps a subkey S and an eight-word
        data string M to a 32-bit unsigned integer.
          V(S,M) = S[0] * M[0] +32 S[1] * M[1] +32 ... +32 S[7] * M[7]
        Here +32 denotes integer addition modulo 2^32. The length of the
        subkey S may be greater than eight, but the excess words are ignored
        by the function V. (The definition of V such that the most
        significant words of the subkey may be ignored simplifies the
        exposition below.) If the message consists of less than 8 words, the
        remaining words are set to zero.
        The function U(S, M) is defined as
          U(S, M) = [ V(S, M) modulo p ] modulo 2^16.
        The core of TMMH is a 'compression' function C which maps a subkey
        value and an input string to an output string which is about eight
        times smaller than the input string. To compute C(S, D) for a given
        subkey value S and data string D of w words, do the following.
        1) Divide up D into blocks of eight words each (note that the last
           word may contain less than eight words) i.e.,
             D = D[0] || D[1] || ... || D[ ceil(w/8) ]
           where D[i] is the ith block, || denotes concatenation, and
           ceil(x) denotes the largest integer not less than x.
        2) Apply the function U to each block, using the subkey value S
           each time, then concatenate the outputs as follows:
            C(S, D) = U(S, D[0]) || U(S, D[1]) || ... || U(S, D[ceil(w/8)]).
        The j:th word (j starting from zero), T[j], of the TMMH tag is
        computed using the following algorithm:
           set X to M and set i to zero
           while the number of words in X is greater than eight, do
              set X to C(A_ij, X)
     Baugher, et al.                                              [Page 24]

     INTERNET-DRAFT                    SRTP                  February, 2002
              increment i
           end while
           return [ [ [ L[j] * MSG_LEN ] +32 V(A_ij, X) ] mod p ] mod 2^16
        To use TMMH to compute the authentication tag TAG of a message, the
        TMMH hash value of that message is computed, then that value is
        combined with the keystream prefix defined in Section 4.1. The
        combining operation is word-wise addition modulo 2^16:
        TAG[j] = (T[j] + PREFIX[j]) mod 2^16, for j = 0 to TAG_WORDS-1.
        Note that for RTP, where HMAC is applied to M || ROC, TMMH is
        applied to M only. This is so, because the dependence on ROC is for
        TMMH inherent to the PREFIX quantity.
     4.3 Key Derivation
     4.3.1 Key Derivation Algorithm
        Regardless of the encryption or message authentication transform
        that is employed (it may be a defined transform or newly introduced
        according to Section 7), SRTP key derivation is the process of
        generating session keys, without extra communication between the
        parties and in a sender-receiver synchronized way.
                    master salt, packet index ---+
                       +-----------+         +--------+ session encr_key
                       | ext       | master  |        |---------->
                       | key mgmt  | key     |  key   | session auth_key
                       | (optional |-------->| deriv  |---------->
                       | rekey)    |         |        | session salt_key
                       |           |         |        |---------->
                       +-----------+         +--------+
        Figure 6: SRTP key derivation.
        At least one initial key derivation is always performed by SRTP,
        i.e., the first key derivation is mandatory. Further applications of
        the key derivation MAY be performed, according to the 'key
        derivation-rate' value in the cryptographic context. The key
        derivation function is defined to be initially invoked before the
        first packet and then, if derivation rate is r > 0, to be further
        invoked on every r:th packet, and produce session keys according to
     Baugher, et al.                                              [Page 25]

     INTERNET-DRAFT                    SRTP                  February, 2002
        the non-zero key derivation rate. This can be thought of as
        'refreshing' the session keys. The value of 'key_derivation_rate'
        MUST be kept fixed for the lifetime of the associated master key.
        There is also a derivation of session salting keys for encryption
        transforms that so require, e.g., both of the pre-defined
        Let m and n be positive integers. A pseudo-random function family is
        a set of keyed functions {PRF_n(k,x)} such that for the (secret)
        random key k, given m-bit x, PRF_n(k,x) is an n-bit string,
        computationally indistinguishable from random n-bit strings, see
        [HAC]. For the purpose of key derivation in SRTP a secure PRF with m
        = 128 (or more) is needed, and a default PRF transform is defined in
        Section 4.3.2.
        Let a DIV t denote integer division of a by t, rounded down, and
        with the convention that a DIV 0 = 0 for all a. We also make the
        convention of treating a DIV t as a bit string of the same length as
        a, and thus 'a DIV t' will in general have leading zeros. Key
        derivation is defined as follows. To generate session key(s)(and
        session salt(s)) for the current packet, let the n-bit SRTP key (or
        salt) for this packet be
        PRF_n(k_master, ((<label> || (index DIV key_derivation_rate)) XOR
        where <label> is an 8-bit constant (see below), master_salt and
        key_derivation_rate is as determined in the cryptographic context,
        and index is the packet index (i.e., the 48-bit ROC || SEQ for
        The session keys and salt are now derived using:
        - k_e (SRTP encryption): <label> = 0x00, n = n_e.
        - k_a (SRTP message authentication): <label> = 0x01, n = n_a.
        - k_s (SRTP salting key) <label> = 0x02, n = n_s.
        where n_e, n_s, and n_a are also as determined in the cryptographic
        The master key and master salting key MUST be random, but the master
        salt MAY be public. Default size for master key is 128-bits and for
        the master salt, 112 bits.
        Note that for a key_derivation_rate of 0, the initial application of
        the key derivation will take place once. The derivation operation is
        facilitated if the non-zero rates are chosen to be powers of 256.
     Baugher, et al.                                              [Page 26]

     INTERNET-DRAFT                    SRTP                  February, 2002
        The upper limit in the number of packets that can be secured using
        the same master key (see Section 10.2) is independent of the key
     4.3.2 SRTCP Key Derivation
        SRTCP uses the same master key as SRTP, i.e., it is shared between
        the two protocols. To do this securely, the following changes are
        done to Section 4.3.1 when applying session key derivation for
        Replace the SRTP index by the 32-bit quantity: 0 || SRTCP index
        (i.e. excluding the E-bit, replacing it with a fixed 0-bit), and use
        <label> = 0x03 for the SRTCP encryption key, <label> = 0x04 for the
        SRTCP authentication key, and, <label> = 0x05 for the SRTCP salting
     4.3.3 AES-CM PRF
        The currently defined PRF, keyed by 128 to 256 bit (master) keys,
        has input block size m = 128 and can produce n-bit outputs for n up
        to 2^23. We define PRF_n(k,x) to be AES in counter mode as described
        in Section 4.1.1, applied to key k, and IV equal to x, and with the
        output keystream truncated to the n first (left-most) bits.
        (Requiring n/128, rounded up, applications of AES.)
     5. Default and Mandatory Transforms
        The default transforms also are mandatory to implement transforms in
        SRTP. Of course, 'mandatory-to-implement' does not imply 'mandatory-
     5.1 Encryption: AES-CM and NULL
        AES running in Segmented Integer Counter Mode, as defined in Section
        4.1.1, is the default and mandatory-to-implement encryption
        algorithm. The NULL cipher is mandatory to implement too.
     5.2 Message Authentication/Integrity: HMAC/SHA1
        HMAC/SHA1, as defined in Section 4.2.1, is the default and
        mandatory-to-implement message authentication code.
     5.3 Key Derivation: AES-CM PRF
        The AES Counter Mode PRF defined in Sections 4.3.1 and 4.3.2, using
        a 128-bit key, is the default and mandatory-to-implement method for
        generating keys.
     Baugher, et al.                                              [Page 27]

     INTERNET-DRAFT                    SRTP                  February, 2002
     6. SRTP/SRTCP Parameters
        The parameters for SRTP are listed in the following. Unless
        otherwise stated, SRTCP by default applies the same transforms and
        parameters of the correspondent SRTP, though they MAY also be
        independently selected.
        The SRTP-WINDOW-SIZE is defined to be at least 64 (Section 3.2.3).
        The current defined modes are Segmented Integer Counter Mode
        (default), f8-mode (Section 4), and the NULL Cipher. The default
        cipher is AES (Section 4), which has a block size of n_b = 128 bits
        and default encryption key size n_e = 128 bits.
        The currently defined message authentication functions are the
        HMAC/SHA1 and TMMH. Default is absence of authentication for SRTP
        and HMAC/SHA1 for SRTCP. For HMAC/SHA1, the default key-size is n_a
        = 128 bits and the output length is n_tag = 32 bits.
        SRTP_PREFIX_LENGTH is 0. For TMMH, default n_tag is also 32 bits,
        and default n_a is 752 bits.
        The default size of the master key shall be 128 bits, and the
        default size of the master- and session salting keys shall be n_s =
        112 bits.
        The default value for the session key_derivation_rate field in the
        cryptographic context is "0", in practice meaning that the first
        application of the key derivation is performed (as it is mandatory),
        but not other further applications of it.
     7. Adding SRTP Transforms
        Section 4 provides examples of the level of details needed for
        defining transforms. Whenever a new transform is to be added to
        SRTP, a companion standards-track RFC MUST be written to exactly
        define how the new transform can be used with SRTP (and SRTCP). Such
        a companion RFC should avoid to overlap with the SRTP protocol
        document. Note however, that it might be necessary to extend the
        cryptographic context's definition with new parameters, or add steps
        to the packet processing. The companion RFC shall explain any known
        issues regarding interactions between the transform and other
        aspects of SRTP.
        Encryption and message authentication transforms require some set of
        optional parameters or have optional modes of operation. The
        companion RFC shall select fixed or default values for these
        parameters (whenever possible), to reduce key management complexity.
     Baugher, et al.                                              [Page 28]

     INTERNET-DRAFT                    SRTP                  February, 2002
        The mode of operation of ciphers and related parameters (e.g. IV-
        formation for SRTP and SRTCP) shall be defined.
        Each new transform document should specify its key attributes, e.g.,
        size of keys (minimum, maximum, recommended), format of keys,
        recommended/required processing of input keying material,
        requirements/recommendations on re-keying and key derivation, etc.
     8. Rationale
     8.1 Key derivation
       Key derivation has been introduced to lighten the burden on the key
       exchange: the (up to) six different keys necessary to protect the RTP
       session (SRTP and SRTCP encryption keys and salts, SRTP and SRTCP
       authentication keys) are derived from a single master key in a
       cryptographically secure way. Note however that the key management
       protocol may provide SRTP with more than one master key, e.g., two
       distinct master keys with their respective lifetime.
        The security stands (and falls) with the master key as the derived
        session keys are cryptographically independent (under reasonable
        assumptions on the PRF, here AES-based).
        Subsequent (after the first) applications of the key derivation are
        optional but will give security benefits when enabled. They prevent
        a cryptanalysist from obtaining large amounts of ciphertext produced
        by a single fixed session key. They provide backwards and forward
        security in the sense that a compromised session key does not
        compromise other session keys derived from the same master (but of
        course, a leaked master key reveals all session keys).
        Considerations arise with high-rate key-refresh, especially in large
        multi-cast settings, see Section 12.
        As the TMMH keys may be quite large, the key derivation provides a
        simple and secure way to obtain sufficient amount of keying
     8.2 Salting key
        The master salt is introduced to guarantee security against off-line
        key-collision attacks on the key derivation that might otherwise
        reduce the effective key size.
        The derived session salting key used in the encryption, has been
        introduced to protect against some attacks on additive stream
        ciphers, see Section 10.2. The explicit inclusion method of the salt
        in the IV has been selected for ease of hardware implementation.
     Baugher, et al.                                              [Page 29]

     INTERNET-DRAFT                    SRTP                  February, 2002
     8.3 TMMH: Message Integrity from Universal Hashing
        The Truncated Multi-Modular Hash Function (TMMH) is a so-called
        universal hash function family, suitable for message authentication
        in the Wegman-Carter paradigm [WC81]. It is simple, quick, and
        especially appropriate for Digital Signal Processors and other
        processors with a fast multiply operation, though a straightforward
        implementation requires storage equal in length to the largest
        message to be hashed.
        TMMH offers secure (provably secure under randomness assumptions on
        the added prefix) and very efficient MACs. For a given tag size
        (TAG_WORDS), the forgery probability can be shown to be upper
        bounded by approximately 2^(-11*TAG_WORDS).
        However, as this approach to message integrity is new (not
        conceptually, but within standardization), we have chosen to make
        HMAC the default transform as many devices already have an HMAC
        implementation used for other purposes. We envision a migration to
        TMMH so that HMAC may eventually be phased-out from SRTP.
     8.4 Data Origin Authentication Considerations
        Note that in unicast, integrity and data origin authentication are
        provided together. However, in group scenarios where the keys are
        shared between members, the MAC tag only proves that a member of the
        group sent the packet, but does not prove the actual sender. Data
        origin authentication (DOA) for multicast and group RTP sessions is
        a hard problem that needs a solution; while some promising proposals
        are being investigated [PCST1, PCST2], more work is needed to
        rigorously specify these technologies. Thus SRTP data origin
        authentication in groups is for further study.
        DOA can be done otherwise using signatures. However, this has high
        impact in terms of bandwidth and processing time, therefore we do
        not offer this form of message authentication in the pre-defined
        packet-integrity transforms.
        The presence of mixers and translators does not allow data origin
        authentication in case the RTP payload and/or the RTP header are
        manipulated. Note that these type of middle entities also disrupt
        end-to-end confidentiality (as the IV formation depends e.g. on the
        RTP header preservation).
     9. Key Management Considerations
        For initialization, the key management needs to be given the SSRC
        and initial RTP sequence-number for the RTP stream, and thus has a
        dependency on RTP operational parameters.
     Baugher, et al.                                              [Page 30]

     INTERNET-DRAFT                    SRTP                  February, 2002
        A particular key management system might allow different RTP
        sessions to share the same cryptographic master keys. The SRTP
        sender and receiver typically share a master key to derive session
        keys for encryption and decryption; SRTCP sources will typically
        derive keys from the same master key used by the SRTP session for
        which sender and receiver reports are sent. This is secure if the
        design of the synchronization mechanism, i.e., the IV, avoids
        keystream re-use (the two-time pad, Section 10.1). If this feature
        is used, the SSRCs MUST be unique between all the RTP streams
        sharing the same master key. In other words, when a master key is
        shared among RTP sessions, SRTP/SRTCP cryptographic transforms are
        vulnerable to unfortunate SSRC collisions owing to normal operation
        of a compliant RTP implementation. SRTCP implementations that share
        master keys introduce a non-standard constraint on RTP operation:
        SSRC values must be unique among RTP sessions that share an SRTP
        master key.  A secure key management system can mitigate this
        problem by assigning SSRC values to SSRC participants at the time of
        master key establishment.
        A particular key management system might choose to provide re-key by
        associating a master key for a crypto context with and MKI or a pair
        of index (sequence number and ROC) values, <From, To>. In the latter
        case, such values are always specified, or the default value, 'from
        the first observed packet' and 'until further notice', respectively,
        are used. The key management specification may therefore require the
        SRTP implementation to check the index of an incoming SRTP packet
        against the interval for the master key in the context before using
        the key. An SSRC in an RTP Session, however, defines its own
        sequence number space so knowledge of how many packets have used the
        same master key is dispersed among multiple RTP session
        participants.  SRTP senders can reasonably estimate the amount of
        SRTP and SRTCP traffic being used for a master key and invoke key
        management to re-key if needed. These interactions are defined by
        the key management interface to SRTP and are not defined by this
        protocol specification.
        Considerations arise with high-rate re-keying, especially in large
        multi-cast settings, see Section 12.
        The key management interface might use the defaults for the SRTP
        protocol or define values for any and all SRTP parameters such as
        the following:
        - cipher and related parameters, including mode of operation
        - key(s), i.e., master (and salting) key(s), and related
        - message authentication algorithm(s), and related parameter,
        - re-keying (key lifetime) and key derivation parameters,
        - MKI(s),
        - SSRC, network address, RTP port pair
        - Current value of ROC (or zeros prior to session
     Baugher, et al.                                              [Page 31]

     INTERNET-DRAFT                    SRTP                  February, 2002
          commencement) and SEQ
        - Replay window size
     10. Security Considerations
     10.1 SSRC collision and two-time pad
        Any fixed keystream output, generated from the same key and index
        should only be used to encrypt once. Re-using such keystream
        (jokingly called a 'two-time pad' system by cryptographers), can
        seriously compromise security. The NSA's VENONA project [C99]
        provides a historical example of such a compromise. In SRTP, a 'two-
        time pad' is avoided by requiring the key, or some other parameter
        of cryptographic significance, to be unique per RTP stream and
        The pre-defined SRTP transforms accomplish packet-uniqueness by
        including the packet index. Stream-uniquness require distinct keys,
        or, inclusion of the SSRC, which then (as noted) has to be unique to
        each RTP stream among the RTP sessions sharing the key.
        It may in some cases be desirable that multiple crypto contexts
        applied to multiple RTP streams contain identical master keys. For
        instance, there could be a desire for a group to share a single key,
        or, a simple bi-directional flow might want to use the same key in
        both directions. A multi-media sender might desire to use the same
        master key to protect multiple streams. Issues as above (two-time
        pad) MUST then be considered. As discussed in Section 9, the pre-
        defined transforms (AES-CM and AES-f8) allow such sharing by the use
        of the SSRC in the IV. Unlike multiple streams in a single RTP
        session, however, sharing a key among RTP sessions requires the
        added constraint that SSRC values be unique across RTP sessions (see
        Section 9).
        Thus, the SSRC MUST be unique between all the RTP streams and
        sessions sharing the same master key. It is incumbent upon SRTP
        implementations to ensure SSRC uniqueness across RTP sessions that
        share a master key, to avoid unfortunate IV combinations and end up
        in two-time pad. Even with distinct SSRCs, extensive use of the same
        key MAY improve chances of probabilistic collision and time-memory-
        tradeoff attacks succeeding.
        Also, the effect of an eventual RTP SSRC collision detection MUST be
        taken into account, as a collision could duplicate the SSRC leading
        temporarily to a two-time pad before the collision is detected.  As
        discussed above in Section 9, this is a problem that key management
        can solve.
     Baugher, et al.                                              [Page 32]

     INTERNET-DRAFT                    SRTP                  February, 2002
     10.2 Key Usage
        The effective key size is determined (upper bounded) by the size of
        the master key and, for encryption, the size of the salting key. Any
        additive stream cipher is vulnerable to attacks that use statistical
        knowledge about the plaintext source to enable key collision and
        time-memory tradeoff attacks [MF00,H80,Bi96]. These attacks take
        advantage of commonalities among plaintexts, and provide a way for a
        cryptanalyst to amortize the computational effort of decryption over
        many keys, thus reducing the effective key size of the cipher. A
        detailed analysis of these attacks and their applicability to the
        encryption of Internet traffic is provided in [MF00]. In summary,
        the effective key size of SRTP when used in a security system in
        which m distinct keys are used, is equal to the key size of the
        cipher less the logarithm (base two) of m. Protection against such
        attacks can be provided simply by increasing the size of the keys
        used, which here can be accomplished by the use of the salting key.
        Note that the salting key MUST be random, but MAY be public. A salt
        size of (the suggested) size 112 bits protects against attacks in
        scenarios where at most 2^112 keys are in use. This is sufficient
        for all practical purposes.
        Implementations SHOULD use keys that are as large as possible.
        Please note that in many cases increasing the key size of a cipher
        does not affect the throughput of that cipher.
        The use of the SRTP and SRTCP indexes in the pre-defined transforms
        fixes the maximum number of packets that can be secured with the
        same key. Such limit is fixed to 2^48 SRTP packets for SRTP, and
        2^31 SRTCP packets, when SRTP and SRTCP are considered
        independently. However, since the session keys for related SRTP and
        SRTCP are derived from the same master key (Section 4.3), the upper
        bound that has to be considered is in practice the minimum of the
        two quantities. That is, when 2^48 SRTP packets or 2^31 SRTCP
        packets have been secured with the same key (whichever occurs
        before), the key management MUST be called to provide new master
        key(s) (previously stored and used keys MUST not be used again), or
        the session MUST be terminated. Note: in most typical applications
        (assuming at least one RTCP packet for every 128,000 RTP packets) it
        will be the SRTCP index that first reaches the upper limit (although
        the time until this occurs is very long).
        Still, note that even at 200 SRTCP packets/sec, the 2^31 index space
        of SRTCP is enough to secure approximately 4 months of
        Note that the purpose of key derivation only is to limit the amount
        of plaintext that is encrypted with a fixed session key, and made
        available to an attacker for analysis. It does not extend the master
        key's lifetime. To see this, simply consider our requirements to
        avoid two-time pad: two distinct packets must either be processed
     Baugher, et al.                                              [Page 33]

     INTERNET-DRAFT                    SRTP                  February, 2002
        with distinct IVs, or, with distinct session keys, and both the
        distinctness of IV and of the session keys are (for the pre-defined
        transforms) dependent on the distincness of the packet indicies.
        For the TMMH-based message integrity, the keystream prefixes MUST
        NOT be correlated with each other, nor with the messages they
        protect in the sense that given the messages, the prefixes together
        with the TMMH key MUST be computationally indistinguishable from
        random bits. This is assured by our predefined keystream generators
        and key-derivation.
     10.3 Confidentiality of the RTP Payload
        By using 'seekable' stream ciphers, SRTP avoids the denial of
        service attacks that are possible on stream ciphers that lack this
        property (these attacks are described in Section 3.4 of [B96]). It
        is important to be aware that, as with any stream cipher, the exact
        length of the payload is revealed by the encryption. This means that
        it may be possible to deduce certain 'formatting bits' of the
        payload, as the length of the codec output might vary due to certain
        parameter settings etc. This, in turn, implies that the
        corresponding bit of the keystream can be deduced. However, if the
        stream cipher is secure (counter mode and f8 are provably secure
        under certain assumptions [BDJR,KSYH]), knowledge of a few bits of
        the keystream will not aid an attacker in predicting the following
        keystream bits. Thus, the payload length (and information deducible
        from this) will leak, but nothing else.
        As some RTP packet could contain highly predictable data, e.g. SID,
        it is important to use a cipher designed to resist known plaintext
        attacks (which is the current practice).
     10.4 Confidentiality of the RTP Header
        With the described proposal, RTP headers are sent in the clear to
        allow for header compression. This means that data such as payload
        type, synchronization source identifier, and timestamp are available
        to an eavesdropper. Moreover, since RTP allows for future extensions
        of headers, we cannot foresee what kind of possibly sensitive
        information might also be 'leaked'.
        The described proposal is a low-cost method, which allows header
        compression to reduce bandwidth. It is up to the endpoints policies
        to decide about the security protocol to employ. If the header
        compression is omitted, other solutions might be applicable. In
        other words, we provide a solution that works in the most general
        scenario, even in the most demanding one (like conversational
        multimedia over low-bandwidth, unreliable media). Of course the
        solution will then also work in less restricted environments, but we
     Baugher, et al.                                              [Page 34]

     INTERNET-DRAFT                    SRTP                  February, 2002
        suggest that if one really needs to protect headers, and is allowed
        to do so by the surrounding environment, then one should also look
        at alternatives, e.g., IPsec.
     10.5 Integrity of the RTP packet
        Additive ciphers do not provide any security service other than
        confidentiality. In particular, they do not provide message
        authentication (see [RK99] or [HAC] for a discussion of this
        security service).
        However, SRTP uses a message authentication code to provide that
        security service.
        With HMAC being a well-studied authentication scheme, based on a
        provably secure construction, the security against MAC forgery
        depends on the key-size and the size of the output tags (or for some
        attacks, half the size of the tag due to the 'birthday-paradox').
        The default size for HMAC has been fixed to 32 bits. Other size
        values may be chosen (via the key management protocol). The use of a
        truncated size is motivated by the fact that it may be desirable,
        e.g., in wireless environments, to save bandwidth. The choice of
        such a truncation MUST be evaluated to the reduction in security it
        implies. The default 32-bit size is a compromise, offering a
        reasonable level of security, taking into account the real-time
        aspects of the protected protocol. High security applications SHOULD
        however use larger tags.
        The fact that message authentication is optional (for SRTP) is
        motivated by the fact that, while the function is typically highly
        desired, there are certain cases (notably in cellular environments)
        where it has an impact in terms of cost, e.g. for bandwidth
        consumption. Also, independently of the tag length, a single
        transmission bit error in the protected part of the packet or in the
        tag itself forces the entire packet to be dropped. Given a fixed
        quality of service, it implies the necessity of higher protection of
        the transmitted unit, hence higher cost. In those cases, it is up to
        the user's security profile to request authentication.
        The use of error detection mechanism (e.g., Unequal Error Detection,
        UED and UEP) is compatible with SRTP and the pre-defined encryption
        transforms, since stream ciphers maintain the position of the bits.
        However, the use of UED/UEP may be difficult to combine with
        authentication because any bit errors will cause authentication to
     10.5.1 Integrity of the RTP header: IHA
     Baugher, et al.                                              [Page 35]

     INTERNET-DRAFT                    SRTP                  February, 2002
        The IV formation of the f8-mode gives implicit authentication (IHA)
        of the RTP header, even if no cryptographic integrity protection is
        present. This means that modifying bits of the RTP header will cause
        the decryption process at the receiver to produce essentially random
     11. Interaction with Forward Error Correction mechanisms
        Some considerations are due when Forward Error Correction mechanisms
        are performed, e.g., as specified in RFC 2733. In particular, the
        order in which SRTP processing and the error correction processing
        are applied, is of concern.
        The optimal order would be the following:
        - on the sender side, first encrypt the packet, then perform the FEC
          processing, finally authenticate
        - on the receiver side, first authenticate the packet, then perform
          the FEC processing, finally decrypt.
        The motivations for the above ordering are:
        - FEC expands the packet, so performing encryption after FEC would
          be more expensive
        - on the receiver side, authentication has to be verified before
          getting engaged in the FEC processing, to reduce effects of
          certain denial of service attacks
        - adding redundancy before encrypting, slightly reduces the
          effective key-size and resistance to attacks.
        However, this implies to split the security processing (FEC
        processing occurs between encryption/decryption and authentication).
        Implementations could gain in keeping the security process strictly
        tied, in this case the recommendation is that the security
        processing takes place after FEC on the sender's side, and before
        FEC on the receiver's side. This implies the cost of placing
        encryption after FEC processing, as above explained, hence a
        convenient choice is left to the application. For interoperability
        clearness, implementations are requested to place the security
        process after FEC on the sender's side, and before FEC on the
        receiver's side. This is also default behavior; another choice has
        to be agreed out-of-band.
     12. Scenarios
     Baugher, et al.                                              [Page 36]

     INTERNET-DRAFT                    SRTP                  February, 2002
        SRTP can be used as security protocol for the RTP/RTCP traffic in
        different scenarios. SRTP has a number of configuration options, and
        can have impact on the total performance of the application
        according to the way it is used. Hence, it appears that the use of
        SRTP is very dependent on the kind of scenario and application it is
        used with. In the following, we briefly illustrate some use cases
        for SRTP, and give some guidelines for recommended setting of its
     12.1 Two-party Unicast
     12.1.1 One bi-directional RTP stream
        A typical example would be a voice call, or perhaps some streaming
        It is possible for the two parties to share the same master key in
        the two directions. The first round of the key derivation splits the
        master key into any or all of the following session keys (according
        to the provided security functions):
        SRTP_encr_key, SRTP_auth_key, and SRTCP_encr_key, SRTCP_auth key.
        (For simplicity, we omit discussion of the salts, which are also
        derived.) In this scenario, it will in most cases suffice to have a
        single master key with unspecified lifetime (i.e. unrestricted key
        lifetime, not using explicit <From, To> values). This guarantees
        sufficiently long lifetime of the keys and a minimum set of keys in
        place for most practical purposes. Also, in this case RTCP
        protection can be applied without problems. As the key-derivation in
        combination with large difference in the packet rate in the
        respective directions may require simultaneous storage of several
        session keys, if storage is an issue, we recommended to use low-rate
        key derivation.
        The same considerations can be extended to the two-party unicast
        scenario with multiple RTP sessions sharing the master key if
        particular care is taken to guarantee unique SSRCs for the streams.
     12.1.2 One master key per party
        Here, each sender provides the security for its own RTP and RTCP
        streams, as well as for the RTCP receiver reports sent back to him.
        This will turn out into two master keys, split (by key derivation)
        into a maximum of eight session keys (on each side of the
        communication link). The SSRC-uniqueness MUST be guaranteed for the
        streams on each side. It is recommended not to restrict the master
        key lifetimes using the <From, To> fields. This anyway gives in most
        cases a sufficient key lifetime, with the benefit of a minimum set
     Baugher, et al.                                              [Page 37]

     INTERNET-DRAFT                    SRTP                  February, 2002
        of keys in place, and a smooth run of SRTCP. The same storage
        considerations as above apply for the optional key derivation.
        The same considerations can be extended to the two-party unicast
        scenario with multiple mono-directional RTP sessions. Unique SSRCs
        MUST be guaranteed to the streams.
     12.2 Multicast
        Just as with (unprotected) RTP, a scalability issue arises in big
        groups due to the possibly very large amount of (S)RTCP receiver
        reports that the sender might need to process. In SRTP, the sender
        may have to keep state (the cryptographic context) for each
        receiver, or more precisely, for the SRTCP used to protect receiver
        reports. The problem increases proportionally to the size of the
        group. In particular, re-keying requires special concern, see below.
        We describe in the following multicast for small groups, and give
        guidelines for use with large group multicast.
     12.2.1 Small conference with one sender
        The sender secures his RTP stream using one cryptographic context.
        The sender's RTP and RTCP is secured with the same master key. Key
        derivation gives the necessary session keys, i.e.
        SRTP_encr_key, SRTP_auth_key, and SRTCP_encr_key,  SRTCP_auth key.
        If the streams are multiple, the SSRCs MUST as noted be unique to
        avoid two-time pad (see Section 9). Key derivation may (for
        increased security) be enabled for the sender’s outgoing SRTP
        There are many possible setups with the distribution of the master
        One possibility is that the receivers share the same master key to
        secure their respective SRTCP (this requires the receivers to trust
        each other). This shared master key could be the same used by the
        sender to protect its outcoming traffic. Alternatively, it could be
        a master key shared only among the receivers and used solely for
        their SRTCP.
        Considering SRTCP and key storage, it is recommended to use low-rate
        (or zero) key_derivation (except the mandatory initial one), so that
        the sender does not need to store too many session keys (each SRTCP
        stream might otherwise have a different session key at a given point
        in time, as the SRTCP sources send at different times). Thus, in
        case key derivation is wanted for SRTP, the cryptographic context
     Baugher, et al.                                              [Page 38]

     INTERNET-DRAFT                    SRTP                  February, 2002
        for SRTP can be kept separate from the SRTCP crypto context, so that
        it is possible to have a key_derivation_rate of 0 for SRTCP and a
        non-zero value for SRTP.
        Re-keying gives two problems: the number of master keys stored a the
        sender side, and re-keying triggering. Forcing re-keying using the
        <From, To> fields creates the problem that the sender needs to
        maintain multiple keys, as the re-keying will typically happen at
        different times on each SRTCP stream from the receivers (because
        each SSRC defines a sequence number space). Also, problems may occur
        in retrieving the current master key for the SRTCP packets in some
        cases, since that is done based on SRTP index, not SRTCP index.  Use
        of the MKI for re-keying is probably best for most applications (see
        Section 9).
        Moreover, the upper limit of 2^48 SRTP packets / 2^31 RTCP packets
        means that, as soon as (or rather, shortly before) one of the stream
        reaches such maximum number of packets, re-keying MUST be triggered
        on ALL the streams. A possible solution to this, may be to keep the
        SRTP/SRTCP contexts separated, but still sharing master key. The
        sender then has to estimate which stream (among the sender's
        SRTP/SRTCP streams and the receiver's SRTCP streams) that will first
        reach the 2^48 / 2^31 limit, and well in advance force a re-keying.
        The MKI or <From, To> may be employed for key synchronization during
        changeover to a new key. Use of <From, To> fields to obtain key-
        synchonization in such case is described in Section 12.3.
     12.2.2 Large multicast with one sender
        The same considerations as for the small group multicast hold. The
        biggest issue in this scenario is the additional load placed at the
        sender side, due to the state (cryptographic contexts) that has to
        be maintained for each receiver, sending back RTCP receive reports.
        At minimum, a replay window might be maintained for each RTCP
        source. Therefore, with big groups and where the load at the sender
        is considered not acceptable, it might be an option to simply
        disable all security for RTCP. This is STRONGLY NOT RECOMMENDED from
        a security point of view, but may appear a reasonable compromise to
        have at least security guaranteed on the RTP traffic.
        Alternatively, an SRTCP receiver may choose not to authenticate or
        protect against replay for SRTCP messages or do so selectively
        (e.g., only messages containing sender reports are authenticated).
        Of course, security impacts of neglecting to authenticate certain
        packets MUST be carefully considered.
     12.3 Re-keying and access control
     Baugher, et al.                                              [Page 39]

     INTERNET-DRAFT                    SRTP                  February, 2002
        Re-keying may occur due to access control (e.g., when a member is
        removed during a multicast RTP session), or, for pure cryptographic
        reasons. As mentioned, the master key MUST be replaced before any of
        the index spaces (2^48 for SRTP, 2^31 for SRTCP) are exhausted for
        any of the streams protected by one and the same master key. Thus,
        there is always the necessity of keeping track of when the master
        key has to be replaced due to exhaustion of the index spaces. In
        addition to this, it is possible to control the master key lifetime
        using the <From, To> fields, which could mean that a key expires,
        and a new one is needed.
        One may choose to have key management provide at the start, arrays
        of master keys with associated lifetimes. Alternatively, key
        management is called each time the master key has to be changed.
        In one-sender multicast, it is responsibility of the sender to
        determine when a new key is needed. The sender is the only one that
        can keep track of when the maximum number of packets has been sent,
        as receivers may join and leave the session at any time, there may
        be packet loss and delay etc. In other scenarios other than one-
        sender multicast, it is recommended that the Initiator of the key
        management/session requires new key material well before any stream
        reaches the maximum key lifetime. Here, one must take into
        consideration that key exchange can be a costly operation, taking
        several seconds for a single exchange. Hence, some time before the
        master key is exhausted/expires, out-of-band key management is
        initiated, resulting in a new master key shared with the
        reciever(s). To maintain synchronization when switching to the new
        key, one could use the MKI or assign the new master key a 'valid-
        from' index, far enough into the future so that key management will
        be finished before that, but still before the current key is
        For access control purposes, the <From, To> periods are set at the
        desired granularity, dependent on the packet rate. High rate re-
        keying SHOULD NOT be used in some large-group scenarios when SRTCP
        is enabled. This is an effect of using the SRTP index, rather than
        the SRTCP index, for determining the master key. In particular, for
        short periods during switching of master keys, it may be the case
        that SRTCP packets are not under the current master key of the
        correspondent SRTP. Therfore, using the MKI for re-keying in such
        scenarios is the recommended method.
        Note that even if the MKI is used to signal key-usage to the
        receiver, there might still be cases when the <From,To> fields are
        also in use at the same time. For instance, a From-value could be
        used to signal when a certain master key (MKI) is to be activated
        for the first time.
     Baugher, et al.                                              [Page 40]

     INTERNET-DRAFT                    SRTP                  February, 2002
     12.4 Summary of basic scenarios
        The description of these scenarios highlights some recommendations
        on the use of SRTP, mainly related to re-keying and large scale
        - Do not use SRTP for fast re-keying using the <From,To> feature. It
          may, in particular, give problems in retrieving the correct SRTCP
          key, if an SRTCP packet arrives close to the re-keying time. The
          MKI SHOULD be used in this case.
        - If multiple SRTP streams share the same master key, also moderate
          rate re-keying MAY have the same problems, and the MKI SHOULD be
        - Carefully consider the additional load at the sender side in
          multicast scenarios. Optionally, but NOT RECOMMENDED, SRTCP could
          be disabled altogether by the SRTCP receiver.
        - Though offering increased security, a non-zero key_derivation_rate
          is NOT RECOMMENDED when trying to minimize the number of keys in
          use with multiple streams.
     13. IANA Considerations
        The RTP specification establishes a registry of profile names for
        use by higher-level control protocols, such as the Session
        Description Protocol (SDP), to refer to transport methods. This
        profile registers the name "RTP/SAVP".
     14. Acknowledgements
        The authors would like to thank Magnus Westerlund, Brian Weis,
        Robert Fairlie-Cuninghame, and Adrian Perrig for their reviews and
     15. Author's Addresses
        Questions and comments should be directed to the authors and
           Mark Baugher
           Cisco Systems, Inc.
           5510 SW Orchid Street     Phone:  +1 408-853-4418
           Portland, OR 97219 USA    Email:
           Rolf Blom
     Baugher, et al.                                              [Page 41]

     INTERNET-DRAFT                    SRTP                  February, 2002
           Ericsson Research
           SE-16480 Stockholm     Phone:  +46 8 58531707
           Sweden                 EMail:
           Elisabetta Carrara
           Ericsson Research
           SE-16480 Stockholm     Phone:  +46 8 50877040
           Sweden                 EMail:
           David A. McGrew
           Cisco Systems, Inc.
           San Jose, CA 95134-1706   Phone:  +1 301-349-5815
           USA                       EMail:
           Mats Naslund
           Ericsson Research
           SE-16480 Stockholm     Phone:  +46 8 58533739
           Sweden                 EMail:
           Karl Norrman
           Ericsson Research
           SE-16480 Stockholm     Phone:  +46 8 4044502
           Sweden                 EMail:
           David Oran
           Cisco Systems, Inc.
           San Jose, CA 95134-1706
           USA                       EMail:
     16. References
        [AES] NIST, "Advanced Encryption Standard (AES)", FIPS PUB 197,
        [HMAC] Krawczyk, H., Bellare, M., and Canetti, R.: "HMAC: Keyed-
              hashing for message authentication". IETF RFC 2104, February
        [RFC1889] Schulzrinne, H., Casner, S., Frederick, R., Jacobson,V.,
                "RTP: A Transport Protocol for Real-Time Applications", IETF
                RFC 1889.
        [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
                Requirement Levels", IETF RFC 2119, March 1997.
        [RFC2401] Kent, S., and R. Atkinson, "Security Architecture for IP",
               IETF RFC 2401, November 1998.
     Baugher, et al.                                              [Page 42]

     INTERNET-DRAFT                    SRTP                  February, 2002
        [RFC2675] Borman, D., Deering, S., Hinden, R., "IPv6 Jumbograms",
               IETF RFC 2675, August 1999.
        [RFC2828] Shirey, R., "Internet Security Glossary", IETF RFC 2828,
                 May 2000.
        [BDJR] Bellare, M., Desai, A., Jokipii, E., and Rogaway, P.,
               "A Concrete Treatment of Symmetric Encryption: Analysis of
                 DES Modes of Operation", Proceedings 38th IEEE FOCS,
               pp. 394-403, 1997.
        [C99]  Crowell, W. P., "Introduction to the VENONA Project",
        [CTR] Morris Dworkin, NIST Special Publication 800-38A,
              "Recommendation for Block Cipher Modes of Operation: Methods
              and Techniques",  2001.  Online at
        [ES3D] ETSI SAGE 3GPP Standard Algorithms Task Force, "Security
               Algorithms Group of Experts (SAGE); General Report on the
               Design, Specification and Evaluation of 3GPP Standard
               Confidentiality and Integrity Algorithms", Public report,
               Draft Version 1.0, Dec 1999.
        [ES3E] ETSI SAGE 3GPP Standard Algorithms Task Force, "Security
               Algorithms Group of Experts (SAGE) Report on the Evaluation
               of 3GPP Standard Confidentiality and Integrity Algorithms",
               Public report, Draft Version 1.0, Dec 1999.
        [HAC]  Menezes, A., Van Oorschot, P., and Vanstone, S., "Handbook of
               Applied Cryptography", CRC Press, 1997, ISBN 0-8493-8523-7.
        [H80]  Hellman, M. E., "A cryptanalytic time-memory trade-off",
               IEEE Transactions on Information Theory, July 1980,
               pp. 401-406.
        [KSYH] Kang, J-S., Shin, S-U., Hong, D., and Yi, O., “Provable
               Security of KASUMI and 3GPP Encryption Mode f8”,
               Proceedings Asiacrypt 2001, Springer Verlag LNCS 2248,
               pp. 255-271, 2001.
        [MF00] McGrew, D., and Fluhrer, S., "Attacks on Encryption of
              Redundant Plaintext and Implications on Internet Security",
              the Proceedings of the Seventh Annual Workshop on Selected
              Areas in Cryptography (SAC 2000), Springer-Verlag.
     Baugher, et al.                                              [Page 43]

     INTERNET-DRAFT                    SRTP                  February, 2002
        [RK99] Rescorla, E., and Korver, B., "Guidelines for Writing RFC
              Text on Security Considerations," draft-rescorla-sec-cons-
        [PCST1] Perrig, A., Canetti, R., Tygar, D., Song, D., "Efficient and
              Secure Source Authentication for Multicast", in Proc. of
              Network and Distributed System Security Symposium NDSS 2001,
              pp. 35-46, 2001.
        [PCST2] Perrig, A., Canetti, R., Tygar, D., Song, D., "Efficient
                Authentication and Signing of Multicast Streams over Lossy
               Channels", in Proc. of IEEE Security and Privacy Symposium
               S&P2000, pp. 56-73, 2000.
        [WC81] M. N. Wegman and J. L. Carter, "New Hash Functions and Their
              Use in Authentication and Set Equality", JCSS 22, 265-279,
     Appendix A: Pseudocode for Index Determination
        The following is an example of pseudocode for the algorithm to
        process an SRTP packet with sequence number SEQ and estimating its
        index i. In the following, signed arithmetic is assumed.
              if (s_l < 32,768)
                 if (SEQ - s_l > 32,768)
                    set v to (ROC-1) mod 2^32
                    set v to ROC
                 if (s_l - 32,768 > SEQ)
                    set v to (ROC+1) mod 2^32
                    set v to ROC
              return SEQ + v*65,536
     Appendix B: Test Vectors
     B.1 AES-f8 Test Vectors
        All values are in hexadecimal.
     Baugher, et al.                                              [Page 44]

     INTERNET-DRAFT                    SRTP                  February, 2002
        SRTP PREFIX LENGTH  :   0
        RTP packet header   :   806e5cba50681de55c621599
        RTP packet payload  :   70736575646f72616e646f6d6e657373
        ROC                 :   d462564a
        key                 :   234829008467be186c3de14aae72d62c
        salt key            :   32f2870d
        key-mask (m)        :   32f2870d555555555555555555555555
        key XOR key-mask    :   11baae0dd132eb4d3968b41ffb278379
        IV                  :   006e5cba50681de55c621599d462564a
        IV'                 :   595b699bbd3bc0df26062093c1ad8f73
        j                   :   0
        IV' XOR j           :   595b699bbd3bc0df26062093c1ad8f73
        S(-1)               :   00000000000000000000000000000000
        S(-1) XOR IV' XOR j :   595b699bbd3bc0df26062093c1ad8f73
        S(0)                :   71ef82d70a172660240709c7fbb19d8e
        plaintext           :   70736575646f72616e646f6d6e657373
        ciphertext          :   019ce7a26e7854014a6366aa95d4eefd
        j                   :   1
        IV' XOR j           :   595b699bbd3bc0df26062093c1ad8f72
        S(0)                :   71ef82d70a172660240709c7fbb19d8e
        S(0) XOR IV' XOR j  :   28b4eb4cb72ce6bf020129543a1c12fc
        S(1)                :   3abd640a60919fd43bd289a09649b5fc
        plaintext           :   20697320746865206e65787420626573
        ciphertext          :   1ad4172a14f9faf455b7f1d4b62bd08f
        j                   :   2
        IV' XOR j           :   595b699bbd3bc0df26062093c1ad8f70
        S(1)                :   3abd640a60919fd43bd289a09649b5fc
        S(1) XOR IV' XOR j  :   63e60d91ddaa5f0b1dd4a93357e43a8c
        S(2)                :   584d14a591acfca846b3aa3a0ab50fec
        plaintext           :   74207468696e67
        ciphertext          :   2c6d60cdf8c29b
     B.2 AES-CM Test Vectors
        Keystream segment length: 1044512 octets (65282 AES blocks)
        Key:              2B7E151628AED2A6ABF7158809CF4F3C
        Rollover Counter: 00000000
        Sequence Number:  0000
        SSRC:             00000000
        Salt:             F0F1F2F3F4F5F6F7F8F9FAFBFCFD0000
     Baugher, et al.                                              [Page 45]

     INTERNET-DRAFT                    SRTP                  February, 2002
        Offset:           F0F1F2F3F4F5F6F7F8F9FAFBFCFD0000
        Counter                            Keystream
        F0F1F2F3F4F5F6F7F8F9FAFBFCFD0000   E03EAD0935C95E80E166B16DD92B4EB4
        F0F1F2F3F4F5F6F7F8F9FAFBFCFD0001   D23513162B02D0F72A43A2FE4A5F97AB
        F0F1F2F3F4F5F6F7F8F9FAFBFCFD0002   41E95B3BB0A2E8DD477901E4FCA894C0
        ...                                ...
        F0F1F2F3F4F5F6F7F8F9FAFBFCFDFEFF   EC8CDF7398607CB0F2D21675EA9EA1E4
        F0F1F2F3F4F5F6F7F8F9FAFBFCFDFF00   362B7C3C6773516318A077D7FC5073AE
        F0F1F2F3F4F5F6F7F8F9FAFBFCFDFF01   6A2CC3787889374FBEB4C81B17BA6C44
       Nota Bene: this test case is contrived so that the latter part of the
       keystream segment coincides with the test case in Section F.5.1 of
     B.3 TMMH Test Vectors
        This section provides test vectors which can be used to test an
        implementation of TMMH. The key, message, and outputs are expressed
        as octet sequences, with each octet in hexadecimal.
           TAG_WORDS: 2
           key:     { e627 6a01 5ea7 f27a c536 2192 11be ea35
                      db9d 63d6 fa8a fc45 e08b d216 ced2 7853
                      1a82 22f5 90fb 1c29 708e d06f 82c3 bee6
                      4f21 6f33 65c0 d211 c25e 9138 4fa3 7c1f
                      61ac 3489 2976 8c19 8252 ddbf cad3 c28f
                      68d6 58dd 504f 2bbf 0278 70b7 cfca }
           L:       { e627 6a01 }
           A[0]:    { 5ea7 f27a c536 2192 11be ea35 db9d 63d6 fa8a }
           A[1]:    { fc45 e08b d216 ced2 7853 1a82 22f5 90fb 1c29 }
           A[2]:    { 708e d06f 82c3 bee6 4f21 6f33 65c0 d211 c25e }
           A[3]:    { 9138 4fa3 7c1f 61ac 3489 2976 8c19 8252 ddbf }
           A[4]:    { cad3 c28f 68d6 58dd 504f 2bbf 0278 70b7 cfca }
           message: { 6015 f141 5ba1 29a0 f604 0d1c 02d9 aa8a 7931 }
           tag:     { 8a82 4bb0 }
        This Internet-Draft expires in July 2002.
     Baugher, et al.                                              [Page 46]