AVT WG                                                     P. Zimmermann
Internet-Draft                                             Zfone Project
Intended status: Informational                          A. Johnston, Ed.
Expires: April 25, 2007                                            Avaya
                                                               J. Callas
                                                         PGP Corporation
                                                        October 22, 2006


   ZRTP: Extensions to RTP for Diffie-Hellman Key Agreement for SRTP
                      draft-zimmermann-avt-zrtp-02

Status of this Memo

   By submitting this Internet-Draft, each author represents that any
   applicable patent or other IPR claims of which he or she is aware
   have been or will be disclosed, and any of which he or she becomes
   aware will be disclosed, in accordance with Section 6 of BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as Internet-
   Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/ietf/1id-abstracts.txt.

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html.

   This Internet-Draft will expire on April 25, 2007.

Copyright Notice

   Copyright (C) The Internet Society (2006).

Abstract

   This document defines ZRTP, RTP (Real-time Transport Protocol) header
   extensions for a Diffie-Hellman exchange to agree on a session key
   and parameters for establishing Secure RTP (SRTP) sessions.  The ZRTP
   protocol is completely self-contained in RTP and does not require
   support in the signaling protocol or assume a Public Key



Zimmermann, et al.       Expires April 25, 2007                 [Page 1]


Internet-Draft                    ZRTP                      October 2006


   Infrastructure (PKI) infrastructure.  For the media session, ZRTP
   provides confidentiality, protection against Man in the Middle (MitM)
   attacks, and, in cases where a secret is available from the signaling
   protocol, authentication.  ZRTP can utilize three Session Description
   Protocol (SDP) attributes to provide discovery and authentication
   through the signaling channel.


Table of Contents

   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  4
   2.  Terminology  . . . . . . . . . . . . . . . . . . . . . . . . .  8
   3.  ZRTP and RTP Keying Requirements . . . . . . . . . . . . . . .  8
   4.  Overview . . . . . . . . . . . . . . . . . . . . . . . . . . .  8
     4.1.  Key Agreement Modes  . . . . . . . . . . . . . . . . . . .  9
       4.1.1.  Diffie-Hellman Mode  . . . . . . . . . . . . . . . . . 10
       4.1.2.  Multistream Mode . . . . . . . . . . . . . . . . . . . 11
   5.  Protocol Description . . . . . . . . . . . . . . . . . . . . . 12
     5.1.  Key Agreement and Derivation Algorithm . . . . . . . . . . 12
       5.1.1.  Discovery  . . . . . . . . . . . . . . . . . . . . . . 12
       5.1.2.  Hash Commitment  . . . . . . . . . . . . . . . . . . . 13
       5.1.3.  Diffie-Hellman Exchange  . . . . . . . . . . . . . . . 14
       5.1.4.  Confirmation and Switch to SRTP  . . . . . . . . . . . 18
     5.2.  Multistream Mode . . . . . . . . . . . . . . . . . . . . . 20
     5.3.  Random Number Generation . . . . . . . . . . . . . . . . . 21
     5.4.  CRC Protection of Messages . . . . . . . . . . . . . . . . 22
     5.5.  ZID and Cache Operation  . . . . . . . . . . . . . . . . . 22
     5.6.  Terminating an SRTP Session or ZRTP Exchange . . . . . . . 23
   6.  RTP Header Extension . . . . . . . . . . . . . . . . . . . . . 24
     6.1.  ZRTP Message Formats . . . . . . . . . . . . . . . . . . . 24
       6.1.1.  Message Type Block . . . . . . . . . . . . . . . . . . 25
       6.1.2.  Hash Type Block  . . . . . . . . . . . . . . . . . . . 26
       6.1.3.  Cipher Type Block  . . . . . . . . . . . . . . . . . . 26
       6.1.4.  Auth Tag Length Block  . . . . . . . . . . . . . . . . 26
       6.1.5.  Key Agreement Type Block . . . . . . . . . . . . . . . 27
       6.1.6.  SAS Type Block . . . . . . . . . . . . . . . . . . . . 27
     6.2.  Hello message  . . . . . . . . . . . . . . . . . . . . . . 28
     6.3.  HelloACK message . . . . . . . . . . . . . . . . . . . . . 29
     6.4.  Commit message . . . . . . . . . . . . . . . . . . . . . . 30
     6.5.  DHPart1 message  . . . . . . . . . . . . . . . . . . . . . 31
     6.6.  DHPart2 message  . . . . . . . . . . . . . . . . . . . . . 32
     6.7.  Confirm1 message . . . . . . . . . . . . . . . . . . . . . 33
     6.8.  Confirm2 message . . . . . . . . . . . . . . . . . . . . . 35
     6.9.  Conf2ACK message . . . . . . . . . . . . . . . . . . . . . 36
     6.10. GoClear message  . . . . . . . . . . . . . . . . . . . . . 37
     6.11. ClearACK message . . . . . . . . . . . . . . . . . . . . . 37
   7.  Retransmissions  . . . . . . . . . . . . . . . . . . . . . . . 38
   8.  Short Authentication String  . . . . . . . . . . . . . . . . . 39



Zimmermann, et al.       Expires April 25, 2007                 [Page 2]


Internet-Draft                    ZRTP                      October 2006


   9.  IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 41
   10. Security Considerations  . . . . . . . . . . . . . . . . . . . 42
   11. Acknowledgments  . . . . . . . . . . . . . . . . . . . . . . . 43
   12. Appendix A - ZRTP, SIP, and SDP  . . . . . . . . . . . . . . . 43
   13. Appendix B - The ZRTP Disclosure flag  . . . . . . . . . . . . 46
   14. Appendix C - Intermediary ZRTP Devices . . . . . . . . . . . . 48
   15. References . . . . . . . . . . . . . . . . . . . . . . . . . . 48
     15.1. Normative References . . . . . . . . . . . . . . . . . . . 48
     15.2. Informative References . . . . . . . . . . . . . . . . . . 49
   Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 50
   Intellectual Property and Copyright Statements . . . . . . . . . . 52








































Zimmermann, et al.       Expires April 25, 2007                 [Page 3]


Internet-Draft                    ZRTP                      October 2006


1.  Introduction

   ZRTP is key agreement protocol which performs Diffie-Hellman key
   exchange during call setup in-band in the Real-time Transport
   Protocol (RTP) [2] media stream which has been established using a
   signaling protocol such as Session Initiation Protocol (SIP) [17].
   This generates a shared secret which is then used to generate keys
   and salt for a Secure RTP (SRTP) [3] session.  ZRTP borrows ideas
   from PGPfone [13].  A reference implementation of ZRTP is available
   as Zfone [14].

   The ZRTP protocol has some nice cryptographic features lacking in
   many other approaches to media session encryption.  Although it uses
   a public key algorithm, it does not rely on a public key
   infrastructure (PKI).  In fact, it does not use persistent public
   keys at all.  It uses ephemeral Diffie-Hellman (DH) with hash
   commitment, and allows the detection of Man in the Middle (MitM)
   attacks by displaying a short authentication string for the users to
   read and compare over the phone.  It has perfect forward secrecy,
   meaning the keys are destroyed at the end of the call, which
   precludes retroactively compromising the call by future disclosures
   of key material.  But even if the users are too lazy to bother with
   short authentication strings, we still get fairly decent
   authentication against a MitM attack, based on a form of key
   continuity.  It does this by caching some key material to use in the
   next call, to be mixed in with the next call's DH shared secret,
   giving it key continuity properties analogous to SSH.  All this is
   done without reliance on a PKI, key certification, trust models,
   certificate authorities, or key management complexity that bedevils
   the email encryption world.  It also does not rely on SIP signaling
   for the key management, and in fact does not rely on any servers at
   all.  It performs its key agreements and key management in a purely
   peer-to-peer manner over the RTP packet stream.

   Most secure phones rely on a Diffie-Hellman exchange to agree on a
   common session key.  But since DH is susceptible to a man-in-the-
   middle (MitM) attack, it is common practice to provide a way to
   authenticate the DH exchange.  In some military systems, this is done
   by depending on digital signatures backed by a centrally-managed PKI.
   A decade of industry experience has shown that deploying centrally
   managed PKIs can be a painful and often futile experience.  PKIs are
   just too messy, and require too much activation energy to get them
   started.  Setting up a PKI requires somebody to run it, which is not
   practical for an equipment provider.  A service provider like a
   carrier might venture down this path, but even then you have to deal
   with cross-carrier authentication, certificate revocation lists, and
   other complexities.  It is much simpler to avoid PKIs altogether,
   especially when developing secure commercial products.  It is



Zimmermann, et al.       Expires April 25, 2007                 [Page 4]


Internet-Draft                    ZRTP                      October 2006


   therefore more common for commercial secure phones in the PSTN world
   to augment the DH exchange with a Short Authentication String (SAS)
   combined with a hash commitment at the start of the key exchange, to
   shorten the length of SAS material that must be read aloud.  No PKI
   is required for this approach to authenticating the DH exchange.  The
   AT&T 3600, Eric Blossom's COMSEC secure phones [15], PGPfone [13],
   and CryptoPhone [16] are all examples of products that took this
   simpler lightweight approach.

   The main problem with this approach is inattentive users who may not
   execute the voice authentication procedure, or unattended secure
   phone calls to answering machines that cannot execute it.
   Additionally, some people worry about voice spoofing (the "Rich
   Little" attack), and some worry about trying to use it between people
   who don't know each other's voices.  This is not as much of a problem
   as it seems, because it isn't necessary that they recognize each
   other by their voice, it's only necessary that they detect that the
   voice used for the SAS procedure matches the voice in the rest of the
   phone call.  These concerns are not enough reason to embrace PKIs as
   an alternative, in my opinion.

   A popular and field-proven approach is used by SSH (Secure Shell)
   [18], which Peter Gutmann likes to call the "baby duck" security
   model.  SSH establishes a relationship by exchanging public keys in
   the initial session, when we assume no attacker is present, and this
   makes it possible to authenticate all subsequent sessions.  A
   successful MitM attacker has to have been present in all sessions all
   the way back to the first one, which is assumed to be difficult for
   the attacker.  All this is accomplished without resorting to a
   centrally-managed PKI.

   We use an analogous baby duck security model to authenticate the DH
   exchange in ZRTP.  We don't need to exchange persistent public keys,
   we can simply cache a shared secret and re-use it to authenticate a
   long series of DH exchanges for secure phone calls over a long period
   of time.  If we read aloud just one SAS, and then cache a shared
   secret for later calls to use for authentication, no new voice
   authentication rituals need to be executed.  We just have to remember
   we did one already.

   If we ever lose this cached shared secret, it is no longer available
   for authentication of DH exchanges, so we would have to do a new SAS
   procedure and start over with a new cached shared secret.  Then we
   could go back to omitting the voice authentication on later calls.

   A particularly compelling reason why this approach is attractive is
   that SAS is easiest to implement when a GUI or some sort of display
   is available, which raises the question of what to do when no display



Zimmermann, et al.       Expires April 25, 2007                 [Page 5]


Internet-Draft                    ZRTP                      October 2006


   is available.  We envision some products that implement secure VoIP
   via a local network proxy, which lacks a display in many cases.  If
   we take an approach that greatly reduces the need for a SAS in each
   and every call, we can operate in GUI-less products with greater
   ease.

   It's a good idea to force your opponent to have to solve multiple
   problems in order to mount a successful attack.  Some examples of
   widely differing problems we might like to present him with are:
   Stealing a shared secret from one of the parties, being present on
   the very first session and every subsequent session to carry out an
   active MitM attack, and solving the discrete log problem.  We want to
   force the opponent to solve more than one of these problems to
   succeed.

   The protocol can make use different kinds of shared secrets.  Each
   type of shared secret is determined by a different method.  All of
   the shared secrets are hashed together to form a session key to
   encrypt the call.  An attacker must defeat all of the methods in
   order to determine the session key.

   First, there is the shared secret determined entirely by a Diffie-
   Hellman key agreement.  It changes with every call, based on random
   numbers.  An attacker may attempt a classic DH MitM attack on this
   secret, but we can protect against this by displaying and reading
   aloud a SAS, combined with adding a hash commitment at the beginning
   of the DH exchange.

   Second, there is an evolving shared secret, or ongoing shared secret
   that is automatically changed and refreshed and cached with every new
   session.  We will call this the cached shared secret, or sometimes
   the retained shared secret.  Each new image of this ongoing secret is
   a non-invertable function of its previous value and the new secret
   derived by the new DH agreement.  It's possible that no cached shared
   secret is available, because there were no previous sessions to
   inherit this value from, or because one side loses its cache.

   There are other approaches for key agreement for SRTP that compute a
   shared secret using information in the signaling.  For example, [20]
   describes how to carry a MIKEY (Multimedia Internet KEYing) [21]
   payload in SDP [11].  Or [19] describes directly carrying SRTP keying
   and configuration information in SDP.  ZRTP does not rely on the
   signaling to compute a shared secret, but If a client does produce a
   shared secret via the signaling, and makes it available to the ZRTP
   protocol, ZRTP can make use of this shared secret to augment the list
   of shared secrets that will be hashed together to form a session key.
   This way, any security weaknesses that might compromise the shared
   secret contributed by the signaling will not harm the final resulting



Zimmermann, et al.       Expires April 25, 2007                 [Page 6]


Internet-Draft                    ZRTP                      October 2006


   session key.

   There may also be a static shared secret that the two parties agree
   on out-of-band in advance.  A hashed passphrase would suffice.

   The shared secret provided by the signaling (if available), the
   shared secret computed by DH, and the cached shared secret are all
   hashed together to compute the session key for a call.  If the cached
   shared secret is not available, it is omitted from the hash
   computation.  If the signaling provides no shared secret, it is also
   omitted from the hash computation.

   No DH MitM attack can succeed if the ongoing shared secret is
   available to the two parties, but not to the attacker.  This is
   because the attacker cannot compute a common session key with either
   party without knowing the cached secret component, even if he
   correctly executes a classic DH MitM attack.  Mixing in the cached
   shared secret for the session key calculation allows it to act as an
   implicit authenticator to protect the DH exchange, without requiring
   additional explicit HMACs to be computed on the DH parameters.  If
   the cached shared secret is available, a MitM attack would be
   instantly detected by the failure to achieve a shared session key,
   resulting in undecryptable packets.  The protocol can easily detect
   this.  It would be more accurate to say that the MitM attack is not
   merely detected, but thwarted.

   When adding the complexity of additional shared secrets beyond the
   familiar DH key agreement, we must make sure the lack of availability
   of the cached shared secret cannot prevent a call from going through,
   and we must also prevent false alarms that claim an attack was
   detected.

   An added benefit of using these cached shared secrets to mix in with
   the session keys is that it augments the entropy of the session key.
   Even if limits on the size of the DH exchange produces a session key
   with less than 256 bits of real work factor, the added entropy from
   the cached shared secret can bring up all the subsequent session keys
   to the full 256-bit AES key strength, assuming no attacker was
   present in the first call.

   We could have authenticated the DH exchange the same way SSH does it,
   with digital signatures, caching public keys instead of shared
   secrets.  But this approach with caching shared secrets seemed a bit
   simpler, and has the added benefit of adding more entropy to the
   session keys.

   The following sections provide an overview of the ZRTP protocol,
   describe the key agreement algorithm and RTP header extensions.



Zimmermann, et al.       Expires April 25, 2007                 [Page 7]


Internet-Draft                    ZRTP                      October 2006


2.  Terminology

   In this document, the key words "MUST", "MUST NOT", "REQUIRED",
   "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY",
   and "OPTIONAL" are to be interpreted as described in RFC 2119 and
   indicate requirement levels for compliant implementations [1].


3.  ZRTP and RTP Keying Requirements

   This section discuses how ZRTP meets the RTP keying requirements
   discussed in [12].  The section numbers referenced are those in this
   document.

   Due to the in-band key management approach, ZRTP meets the following
   requirements: 4.1 Secure Retargeting and Secure Forking and 4.2
   Clipping Media Before SDP Answer.

   Due to the built-in in-band discovery mechanisms, ZRTP meets the 5.3
   Best Effort Encryption requirement.

   The use of Diffie-Hellman ensures that ZRTP meets the 5.2 Perfect
   Forward Secrecy requirement.

   Since the supported SRTP algorithms are not exchanged in the
   signaling but in the media path, there is no computational penalty in
   allowing additional supported algorithms as described in 5.4
   Upgrading Algorithms.

   ZRTP does not require the SSRC or ROC be signaled per requirement 4.4
   SSRC and ROC

   ZRTP does not currently use certificates for authentication so it
   does not meet requirement 5.1 Public Key Infrastructure.  However,
   ZRTP could be extended to utilize a certificate to perform a digital
   signature over the Diffie-Hellman values exchanged.

   ZRTP does not support 4.3 Centralized Keying due to its point-to-
   point design.


4.  Overview

   This section provides a description of how ZRTP works.  This
   description is non-normative in nature but is included to build
   understanding of the protocol.

   ZRTP is negotiated the same way a conventional RTP session is



Zimmermann, et al.       Expires April 25, 2007                 [Page 8]


Internet-Draft                    ZRTP                      October 2006


   negotiated in an offer/answer exchange using the AVP/RTP profile.
   The ZRTP protocol begins after two endpoints have utilized a
   signaling protocol such as SIP and are ready to send or have already
   begun sending RTP packets.  This specification defines a new RTP
   extension header which is used to carry the ZRTP messages between the
   endpoints.  Since RTP endpoints ignore unknown extension headers, the
   protocol is fully backwards compatible - a ZRTP endpoint attempting
   to perform key agreement with a non-ZRTP endpoint will simply receive
   normal RTP responses and can then inform the user that a secure
   session is not possible and either continue with the insecure session
   or terminate the session depending on the user's security policy.

   The ZRTP exchange begins at the same time that the first RTP packets
   are exchanged between the endpoints.  A ZRTP message is transported
   in an RTP no-op packet.

   A ZRTP endpoint initiates the exchange by sending a ZRTP Hello
   message to the other endpoint.  The purpose of the Hello message is
   to discover if the other endpoint supports the protocol and to see
   what algorithms the two ZRTP endpoints have in common.  This
   discovery can also be achieved if a=zrtp attribute is present in an
   SDP offer or answer, as described in Appendix A.

   The Hello message contains the SRTP configuration options, and the
   ZID.  Each instance of ZRTP has a unique 96-bit random ZRTP ID or ZID
   that is generated once at installation time.  ZIDs are discovered
   during the Hello message exchange.  The received ZID is used to look
   up retained shared secrets in a local cache and are used by ZRTP to
   manage lookup cached or retained shared secrets from previous ZRTP
   sessions with the endpoint.

   A response to a ZRTP Hello message is a ZRTP HelloACK message.  The
   HelloACK message simply acknowledges receipt of the Hello message and
   indicates support for the ZRTP protocol.  Since RTP uses best effort
   UDP transport, ZRTP has retransmission timers in case of lost
   datagrams.  There are two timers, both with exponential backoff
   mechanisms.  One timer is used for retransmissions of Hello messages
   and the other is used for retransmissions of all other messages after
   receipt of a HelloACK which indicates support of ZRTP by the other
   endpoint.

4.1.  Key Agreement Modes

   After both endpoints exchange Hello and HelloACK messages, the key
   agreement exchange can begin with the ZRTP Commit message.  ZRTP
   supports a number of key agreement modes including both Diffie-
   Hellman and non-Diffie-Hellman as described in the following
   sections.



Zimmermann, et al.       Expires April 25, 2007                 [Page 9]


Internet-Draft                    ZRTP                      October 2006


4.1.1.  Diffie-Hellman Mode

   An example ZRTP call flow is shown in Figure 1 below.  Note that the
   order of the Hello/HelloACK exchanges in F1/F2 and F3/F4 may be
   reversed.  That is, either Alice or Bob might send the first Hello
   message.  Also, an endpoint that receives a Hello message and wishes
   to immediately begin the ZRTP key agreement can omit the HelloACK and
   send the Commit instead.  In Figure 1, this would result in messages
   F2, F3, and F4 being omitted.  Note that the endpoint which sends the
   Commit message is considered the initiator of the ZRTP session and
   drives the key agreement exchange.  The Diffie-Hellman public values
   are exchanged in the DHPart1 and DHPart2 messages.  SRTP keys and
   salts are then calculated along with a ZRTP Session key.






































Zimmermann, et al.       Expires April 25, 2007                [Page 10]


Internet-Draft                    ZRTP                      October 2006


   Alice                                      Bob
     |                                         |
     | Alice and Bob establish a media session.|
     |                                         |
     |                   RTP                   |
     |<=======================================>|
     |                                         |
     | Hello (version, options, Alice's ZID) F1|
     |---------------------------------------->|
     |                             HelloACK F2 |
     |<----------------------------------------|
     | Hello (version, options, Bob's ZID) F3  |
     |<----------------------------------------|
     | HelloACK F4                             |
     |---------------------------------------->|
     |                                         |
     |        Bob acts as the initiator        |
     |                                         |
     | Commit (Bob's ZID, options, hvi or nonce) F5
     |<----------------------------------------|
     | DHPart1 (pvr, shared secret hashes) F6  |
     |---------------------------------------->|
     | DHPart2 (pvi, shared secret hashes) F7  |
     |<----------------------------------------|
     |                                         |
     | Alice and Bob generate SRTP session key.|
     |                                         |
     |               SRTP begins               |
     |<=======================================>|
     |                                         |
     | Confirm1 (plaintext, D,S,V flags, hmac) F8
     |---------------------------------------->|
     | Confirm2 (plaintext, D,S,V flags, hmac) F9
     |<----------------------------------------|
     | Confirm2AK F10                          |
     |---------------------------------------->|

   Figure 1. Establishment of an SRTP session using ZRTP


4.1.2.  Multistream Mode

   Multistream mode is an alternative key agreement method when two
   endpoints have an establish SRTP media stream between them and hence
   an active ZRTP Session key.  ZRTP can derive multiple SRTP keys from
   a single DH exchange.  For example, an established secure voice call
   that adds a video stream could use Multistream mode to quickly
   initiate the video stream without a second DH exchange.



Zimmermann, et al.       Expires April 25, 2007                [Page 11]


Internet-Draft                    ZRTP                      October 2006


   When Multistream mode is indicated in the Commit message, a call flow
   similar to Figure 1 is used, but no DH calculation is performed by
   either endpoint and the DHPart1 and DHPart2 messages are omitted.  In
   this mode, multiple non-DH ZRTP exchanges can be performed in
   parallel between two endpoints.

   Alternatively, each stream can be handled independently using the
   call flow of Figure 1, resulting in a DH exchange per media stream.
   To keep the integrity of the retained shared secrets, only a single
   DH exchange can be processed at a time between two endpoints.


5.  Protocol Description

   ZRTP uses RTP [2] to transport discovery and key agreement messages.
   The messages are carried as RTP header extensions as defined in
   Section 6.  It is RECOMMENDED to use the no-op RTP/AVP payload type
   [7].  No-op packets are ideal for ZRTP transport as it is permissible
   to send no-op packets even for media streams marked 'recvonly' or
   'inactive'.  Also, no-op packets can be used with any media type.  An
   endpoint MAY use a different SSRC for ZRTP messages than for RTP
   media.

   Note: the use of separate SSRC numbers and hence separate sequence
   number space allows for very loose coupling between the ZRTP
   application and the RTP media application.

   To support best effort encryption [12], ZRTP uses normal RTP/AVP
   profile (AVP) media lines in the initial offer/answer exchange.  The
   ZRTP SDP attribute flag a=zrtp defined in Appendix A SHOULD be used
   in all offers and answers to indicate support for the ZRTP protocol.
   In subsequent offer/answer exchanges after a successful ZRTP exchange
   has resulted in an SRTP session, the Secure RTP/AVP (SAVP) profile
   MAY be used.

5.1.  Key Agreement and Derivation Algorithm

   The key agreement algorithm has four phases that are described
   normatively in the following sections.

5.1.1.  Discovery

   During the discovery phase, a ZRTP endpoint discovers if the other
   endpoint supports ZRTP and which ZRTP version, hash, cipher, auth tag
   length, key agreement type, and SAS algorithms are supported.  In
   addition, each endpoint sends and discovers ZIDs.  The received ZID
   is used to retrieve previous retained shared secrets, rs1 and rs2.
   If the endpoint has other secrets, then they are also collected.  The



Zimmermann, et al.       Expires April 25, 2007                [Page 12]


Internet-Draft                    ZRTP                      October 2006


   signaling secret (sigs), is passed from the signaling protocol used
   to establish the RTP session.  For SIP, it is the dialog identifier
   of a Secure SIP (sips) session: a string composed of Call-ID, to tag,
   and from tag.  From the definitions in RFC 3261 [17]:

   sigs = hash(call-id | tag1 | tag2)

   Note: the dialog identifier of a non-secure SIP session should not be
   considered a signaling secret as it has no confidentiality
   protection.

   For the SRTP secret (srtps), it is the SRTP master key and salt.
   This information may have been passed in the signaling using MIKEY or
   SDP Security Descriptions, for example:

   srtps = hash(SRTP master key | SRTP master salt)

   Additional shared secrets can be defined and used as other_secret.
   If no secret of a given type is available, a random value is
   generated and used for that secret to ensure a mismatch in the hash
   comparisons in the DHPart1 and DHPart2 messages.  This prevents an
   eavesdropper from knowing how many shared secrets are available
   between the endpoints.

   A Hello message can be sent at any time, but is usually sent at the
   start of an RTP session to determine if the other endpoint supports
   ZRTP, and also if the SRTP implementations are compatible.  A Hello
   message is retransmitted using timer T1 and an exponential backoff
   mechanism detailed in Section 7 until the receipt of a HelloACK
   message or a Commit message.

5.1.2.  Hash Commitment

   The hash commitment is performed by the initiator of the ZRTP
   exchange.  From the intersection of the algorithms in the sent and
   received Hello messages, the initiator chooses a hash, cipher, auth
   tag length, key agreement type, and sas algorithm to be used.

   A Diffie-Hellman mode is selected by setting the Key Agreement Type
   to DH4096 or DH3072 in the Commit.  In this mode, the key agreement
   begins with the initiator choosing a fresh random Diffie-Hellman (DH)
   secret value (svi) based on the chosen key agreement type value, and
   computing the public value.  (Note that to speed up processing, this
   computation can be done in advance.)  For guidance on generating
   random numbers, see the section on Random Number Generation.  The
   Diffie-Hellman secret value, svi, SHOULD be twice as long as the AES
   key length.  This means, if AES 128 is used, the DH secret value
   SHOULD be 256 bits long.  If AES 256 is used, the secret value SHOULD



Zimmermann, et al.       Expires April 25, 2007                [Page 13]


Internet-Draft                    ZRTP                      October 2006


   be 512 bits long.

   pvi = g^svi mod p

   where g and p are determined by the key agreement type value, and a
   hash, hvi, of the public value using the chosen hash algorithm.  The
   hvi includes the set of hash, cipher, atl, pkt, and sas types from
   the responder's Hello in the following order:

   hvi=hash(pvi | hashr1-5 | cipherr1-5 | atl1-5 | pktr1-5 | sasr1-5)

   The information from the responder's Hello message is included in the
   hash calculation to prevent a bid-down attack by modification of the
   responder's Hello message.

   Note: If both sides send Commit messages initiating a secure session
   at the same time, the Commit message with the lowest hvi value is
   discarded and the other side is the initiator.  This breaks the tie,
   allowing the protocol to proceed from this point with a clear
   definition of who is the initiator and who is the responder.

   Because the DH exchange affects the state of the retained shared
   secret cache, only one in-process ZRTP DH exchange may occur at a
   time between two ZRTP endpoints.  Otherwise, race conditions and
   cache integrity problems will result.  When multiple media streams
   are established in parallel between the same pair of ZRTP endpoints
   (determined by the ZIDs in the Hello Messages), only one can be
   processed.  Once that exchange completes with Confirm2 and Conf2ACK
   messages, another ZRTP DH exchange can begin.  In the event that
   Commit messages are sent by both ZRTP endpoints at the same time, but
   are received in different media streams, the same resolution rules
   apply - the Commit message with the lowest hvi value is discarded and
   the other side is the initiator.  The media stream in which the
   Commit was sent will proceed through the ZRTP exchange while the
   media stream with the discarded Commit must wait for the completion
   of the other ZRTP exchange.

   Note: This paragraph does not apply when Multistream mode key
   agreement is used since the cached shared secrets are not affected.

5.1.3.  Diffie-Hellman Exchange

   The purpose of the Diffie-Hellman exchange is for the two ZRTP
   endpoints to generate a new shared secret, s0.  In addition, the
   endpoints discover if they have any shared secrets in common.  If
   they do, this exchange allows them to discover how many and agree on
   an ordering for them: s1, s2, etc.




Zimmermann, et al.       Expires April 25, 2007                [Page 14]


Internet-Draft                    ZRTP                      October 2006


5.1.3.1.  Responder Behavior

   Upon receipt of the Commit message, the responder generates its own
   fresh random DH secret value, svr, and computes the public value.
   (Note that to speed up processing, this computation can be done in
   advance.)  For guidance on random number generation, see the section
   on Random Number Generation.  The Diffie-Hellman secret value, svr,
   SHOULD be twice as long as the AES key length.  This means, if AES
   128 is used, the DH secret value SHOULD be 256 bits long.  If AES 256
   is used, the secret value SHOULD be 512 bits long.

   pvr = g^svr mod p

   The final shared secret, s0, is calculated by hashing the
   concatenation of the Diffie-Hellman shared secret (DHSS) followed by
   the (possibly empty) set of shared secrets that are actually shared
   between the initiator and responder.  For computing the hash, the
   shared secrets are sorted by the order of the initiator's
   corresponding shared secret IDs.  The remainder of this section
   describes an algorithm to accomplish this.

   First, an HMAC keyed hash is calculated using the first retained
   shared secret, rs1, as the key on the string "Responder" which
   generates a retained secret ID, rs1IDr, which is truncated to 64
   bits.  HMACs are calculated in a similar way for additonal shared
   secrets:

   rs1IDr = HMAC(rs1, "Responder")

   rs2IDr = HMAC(rs2, "Responder")

   sigsIDr = HMAC(sigs, "Responder")

   srtpsIDr = HMAC(srtps, "Responder")

   other_secretIDr = HMAC(other_secret, "Responder")

   A ZRTP DHPart1 message is generated containing pvr and the set of
   keyed hashes (HMACs) derived from the possibly shared secrets.

   Upon receipt of the DHPart2 message, the responder checks that the
   initiator's public DH value is not equal to 1 or p-1.  An attacker
   might inject a false DHPart2 packet with a value of 1 or p-1 for
   g^svi mod p, which would cause a disastrously weak final DH result to
   be computed.  If pvi is 1 or p-1, the user should be alerted of the
   attack and the protocol exchange must be terminated.  Otherwise, the
   responder then computes the hash of the public DH value in the
   DHPart2 with the hash from the Commit.  If they are different, a MitM



Zimmermann, et al.       Expires April 25, 2007                [Page 15]


Internet-Draft                    ZRTP                      October 2006


   attack is taking place and the user is alerted and the protocol
   exchange terminated.

   The responder then calculates the Diffie-Hellman result:

   DHResult = pvi^svr mod p

   The responder then calculates the Diffie-Hellman shared secret:

   DHSS = hash(DHResult)

   The hmacs of the possible shared secrets received are compared
   against the hmacs of the local set of possible shared secrets.

   Note: When comparing the signaling secret sigs derived from SIP, both
   orderings of to-tag followed by from-tag, and from-tag followed by
   to-tag must be tried.

   The expected hmac values of the shared secrets are calculated (using
   the string "Initiator" instead of "Responder") and compared to the
   hmacs received in the DHPart2 message.  The secrets corresponding to
   matching hmacs are kept while the secrets corresponding to the non-
   matching ones are replaced with a null.  The set of up to five actual
   shared secrets are then s1, s2, s3, s4, and s5 - the order is that
   chosen by the initiator.  The final shared secret, s0, is calculated
   by hashing the concatenation of the DHSS and the set of non-null
   shared secrets.  As a result, the null secrets have no effect on the
   concatenation operation:

   s0 = hash(DHSS | s1 | s2 | s3 | s4 | s5)

   For example, consider two ZRTP endpoints who share secrets rs1, rs2,
   and a hash of a secret passphrase other_secret.  During the
   comparison, rs1ID, rs2ID, and other_secretID will match but sigsID
   and srtpsID will not.  As a result, s1 = rs1, s2 = rs2, s5 =
   other_secret, while s3 and s4 will be nulls. s0 for this exchange
   will be calculated as the hash of the concatenation of DHSS, rs1,
   rs2, and other_secret.

5.1.3.2.  Initiator Behavior

   Upon receipt of the DHPart1 message, the initiator checks that the
   responder's public DH value is not equal to 1 or p-1.  An attacker
   might inject a false DHPart1 packet with a value of 1 or p-1 for
   g^svr mod p, which would cause a disastrously weak final DH result to
   be computed.  If pvr is 1 or p-1, the user should be alerted of the
   attack and the protocol exchange must be terminated.




Zimmermann, et al.       Expires April 25, 2007                [Page 16]


Internet-Draft                    ZRTP                      October 2006


   If pvr is not 1 or p-1, the initiator looks up any retained shared
   secrets associated with the responder's ZID.  The final shared
   secret, s0, is calculated by hashing the concatenation of the DHSS
   followed by the (possibly empty) set of shared secrets that are
   actually shared between the initiator and responder.  For computing
   the hash, the shared secrets are sorted by the order of the
   initiator's corresponding shared secret IDs.  The remainder of this
   section describes an algorithm to accomplish this.

   First, an HMAC keyed hash is calculated using the first retained
   shared secret, rs1, as the key on the string "Initiator" which
   generates a retained secret ID, rs1IDi, which is truncated to 64
   bits.  HMACs are calculated in a similar way for additional shared
   secrets:

   rs1IDi = HMAC(rs1, "Initiator")

   rs2IDi = HMAC(rs2, "Initiator")

   sigsIDi = HMAC(sigs, "Initiator")

   srtpsIDi = HMAC(srtps, "Initiator")

   other_secretIDi = HMAC(other_secret, "Initiator")

   The initiator then sends a DHPart2 message containing the initiator's
   public DH value and the set of calculated retained secret IDs.

   The initiator calculates the same Diffie-Hellman result using:

   DHResult = pvr^svi mod p

   The initiator then calculates the DH shared secret using:

   DHSS = hash(DHResult)

   The initiator then calculates the set of secret IDs that are expected
   to be received from the responder in the DHPart1 message:

   rs1IDr = HMAC(rs1, "Responder")

   rs2IDr = HMAC(rs2, "Responder")

   sigsIDr = HMAC(sigs, "Responder")

   srtpsIDr = HMAC(srtps, "Responder")

   other_secretIDr = HMAC(other_secret, "Responder")



Zimmermann, et al.       Expires April 25, 2007                [Page 17]


Internet-Draft                    ZRTP                      October 2006


   The hmacs of the possible shared secrets received are compared
   against the hmacs of the local set of possible shared secrets.

   Note: When comparing the signaling secret sigs derived from SIP, both
   orderings of to-tag followed by from-tag, and from-tag followed by
   to-tag must be tried.

   The expected hmac values of the shared secrets are calculated (using
   the string "Responder" instead of "Initiator") and compared to the
   hmacs received in the DHPart1 message.  The secrets corresponding to
   matching hmacs are kept while the secrets corresponding to the non-
   matching ones are replaced with a null.  The set of up to five actual
   shared secrets are then s1, s2, s3, s4, and s5 - the order is that
   chosen by the initiator.  The final shared secret, s0, is calculated
   by hashing the concatenation of the DHSS and the set of non-null
   shared secrets.  As a result, the null secrets have no effect on the
   concatenation operation:

   s0 = hash(DHSS | s1 | s2 | s3 | s4 | s5)

5.1.4.  Confirmation and Switch to SRTP

   The SRTP master key and master salt are then generated using the
   shared secret.  Separate SRTP keys and salts are used in each
   direction for each media stream.  Unless otherwise specified, ZRTP
   uses SRTP with no MKI, 32 bit authentication using HMAC-SHA1, AES-CM
   128 or 256 bit key length, 112 bit session salt key length, 2^48 key
   derivation rate, and SRTP prefix length 0.

   The ZRTP initiator encrypts and the ZRTP responder decrypts packets
   by using srtpkeyi and srtpsalti, which are generated by:

   srtpkeyi = HMAC(s0,"Initiator SRTP master key")

   srtpsalti = HMAC(s0,"Initiator SRTP master salt")

   The key and salt values are truncated to the length determined by the
   chosen SRTP algorithm.  The ZRTP responder encrypts and the ZRTP
   initiator decrypts packets by using srtpkeyr and srtpsaltr, which are
   generated by:

   srtpkeyr = HMAC(s0,"Responder SRTP master key")

   srtpsaltr = HMAC(s0,"Responder SRTP master salt")

   A ZRTP Session Key is generated which then allows the ZRTP
   Multistream mode to be used to generate SRTP key and salt pairs for
   additional concurrent media streams between this pair of ZRTP



Zimmermann, et al.       Expires April 25, 2007                [Page 18]


Internet-Draft                    ZRTP                      October 2006


   endpoints.  If a ZRTP Session Key has already been generated between
   this pair of endpoints, no new ZRTP Session Key is calculated.

   ZRTPsess = HMAC(s0,"ZRTP Session Key")

   The ZRTPsess key is kept for the duration of the call signaling
   session between the two ZRTP endpoints.  That is, if there are two
   separate calls between the endpoints (in SIP terms, separate SIP
   dialogs), then a ZRTP Session Key MUST NOT be used across the two
   call signaling sessions.  At the end of the call signaling session,
   ZRTPSess is destroyed.

   The HMAC keys are generated by:

   hmackeyi = HMAC(s0,"Initiator HMAC key")

   hmackeyr = HMAC(s0,"Responder HMAC key")

   Note that these HMAC keys are used only by ZRTP and not by SRTP.  A
   new rs1 is calculated from s0:

   rs1 = HMAC (s0, "retained secret")

   The endpoints can now switch to SRTP and begin packet encryption.
   The ZRTP Initiator and Responder use their own keying material for
   the SRTP session.  No MKI is used and a 32 bit authentication tag is
   used.

   The ZRTP Confirm1 and Confirm2 messages are sent for two reasons.
   First, they confirm that all the key agreement calculations were
   successful and the encryption is working, and they enable automatic
   detection of a DH MitM attack from a reckless attacker who does not
   know the retained shared secret.  Second, they enable us to transmit
   the SAS Verified flag (V) under cover of SRTP encryption, shielding
   it from a passive observer who would like to know if the human users
   are in the habit of diligently verifying the SAS.

   The Confirm1 and Confirm2 messages contain the cache expiration
   interval for the newly generated retained shared secret.  Based on
   this, both sides now discard the rs2 value and store rs1 as rs2.  The
   Confirm1 and Confirm2 messages also contain an HMAC of some known
   plaintext and the flagoctet.  The flagoctet is an 8 bit unsigned
   integer made up of the Disclosure flag (D), Stay secure flag (S), SAS
   Verified flag (V):

   flagoctet = D * 2^2 + S * 2^1 + V * 2^0

   The HMAC is explicitly included in the payload because we may not



Zimmermann, et al.       Expires April 25, 2007                [Page 19]


Internet-Draft                    ZRTP                      October 2006


   always be able to rely on the built-in authentication tag in SRTP,
   which might be configured to different sizes, including none.

   hmac = HMAC(hmackey, "known plaintext" | flagoctet )

   This information is not carried in the extension header but inserted
   at the start of the SRTP payload.

   The Conf2ACK message completes the exchange.

5.2.  Multistream Mode

   The Multistream key agreement mode can be used to generate SRTP keys
   and salts for additional media streams established between a pair of
   endpoints.  Multistream mode cannot be used unless there is an active
   SRTP session established between the endpoints which means a ZRTP
   Session key is active.  This ZRTP Session key can be used to generate
   keys and salts without performing another DH calculation.  In this
   mode, the retained shared secret cache is not used or updated.  As a
   result, multiple ZRTP Multistream mode exchanges can be processed in
   parallel between two endpoints.

   This mode is selected by setting the Key Agreement Type to "Multistr"
   in the Commit message.  The Cipher Type and Auth Tag Length in
   Multistream mode MUST be the same as the values in the initial DH
   Mode Commit and MUST be ignored if different, making bid down
   impossible.  The SAS Type is ignored as there is no SAS
   authentication in this mode.  In in place of hvi in the Commit, a
   random number, nonce, 32 octets long is chosen.  Its value MUST be
   unique for all nonce values chosen for active ZRTP sessions between a
   pair of endpoints.  If a Commit is received with a reused nonce
   value, the ZRTP exchange MUST be immediately terminated.

   Note: Since the nonce is used to calculate different SRTP key and
   salt pairs for each media stream, a duplication will result in the
   same key and salt being generated for the two media streams.

   If a Commit is received selecting Multistream mode, but the responder
   does not have a ZRTP Session Key available, the exchange MUST be
   terminated.

   In Multistream mode, both the DHPart1 and DHPart2 messages are not
   sent.  After the Commit, SRTP begins and the responder sends the
   Confirm1 message.  The SRTP key and salt for the initiator and
   responder are calculated using the ZRTP Session Key and the nonce
   from the Commit message.  For the nth media stream:

   s0n= HMAC(ZRTPSess, nonce)



Zimmermann, et al.       Expires April 25, 2007                [Page 20]


Internet-Draft                    ZRTP                      October 2006


   The ZRTP initiator encrypts and the ZRTP responder decrypts packets
   for this nth session by using srtpkeyin and srtpsaltin, which are
   generated by:

   srtpkeyin = HMAC(s0n,"Initiator SRTP master key")

   srtpsaltin = HMAC(s0n,"Initiator SRTP master salt")

   The key and salt values are truncated to the length determined by the
   chosen SRTP algorithm.  The ZRTP responder encrypts and the ZRTP
   initiator decrypts packets for this nth stream by using srtpkeyrn and
   srtpsaltrn, which are generated by:

   srtpkeyrn = HMAC(s0n,"Responder SRTP master key")

   srtpsaltrn = HMAC(s0n,"Responder SRTP master salt")

   The HMAC keys are generated by:

   hmackeyin = HMAC(s0n,"Initiator HMAC key")

   hmackeyrn = HMAC(s0n,"Responder HMAC key")

5.3.  Random Number Generation

   The ZRTP protocol uses random numbers for cryptographic key material,
   notably for the DH secret exponents and nonces, which must be freshly
   generated with each session.  Whenever a random number is needed, all
   of the following criteria must be satisfied:

   It MUST be derived from a physical entropy source, such as RF noise,
   acoustic noise, thermal noise, high resolution timings of
   environmental events, or other unpredictable physical sources of
   entropy.  Chapter 10 of [8] gives a detailed explanation of
   cryptographic grade random numbers and provides guidance for
   collecting suitable entropy.  The raw entropy must be distilled and
   processed through a deterministic random bit generator (DRBG).
   Examples of DRBGs may be found in NIST SP 800-90 [9], and in [8].

   It MUST be freshly generated, meaning that it must not have been used
   in a previous calculation.

   It MUST be greater than or equal to two, and less than or equal to
   2^L - 1, where L is the number of random bits required.

   It MUST be chosen with equal probability from the entire available
   number space, e.g., [2, 2^L - 1].




Zimmermann, et al.       Expires April 25, 2007                [Page 21]


Internet-Draft                    ZRTP                      October 2006


5.4.  CRC Protection of Messages

   The ZRTP protocol uses a 32 bit CRC checksum in each ZRTP message as
   defined in RFC 3309 [6] to detect transmission errors.  ZRTP packets
   are carried by UDP, which carries its own built-in 16-bit checksum
   for integrity, but ZRTP does not rely on it.  This is because of the
   effect of an undetected transmission error in a ZRTP message.  For
   example, an undetected error in the DH exchange could appear to be an
   active man-in-the-middle attack.  The psychological effects of a
   false announcement of this by ZTRP clients can not be overstated.
   The probability of such a false alarm hinges on a mere 16-bit
   checksum that usually protects UDP packets, so more error detection
   is needed.  For these reasons, this belt-and-suspenders approach is
   used to minimize the chance of a transmission error affecting the
   ZRTP key agreement.

   The CRC is calculated across the ZRTP message only, including the RTP
   Header extension (0x505A) and length field, followed by the ZRTP
   message itself, but not including the CRC field.  The CRC does not
   include the normal RTP header (V, P, X, CC, M, PT, sequence number,
   timestamp, SSRC, CCRC) or payload.  In the Confirm1 and Confirm2
   messages, the CRC does not include the fields transported in the
   payload (plaintext, flags, hmac).  If a ZRTP message fails the CRC
   check, it is silently discarded.

5.5.  ZID and Cache Operation

   Each instance of ZRTP has a unique 96-bit random ZRTP ID or ZID that
   is generated once at installation time.  It is used to look up
   retained shared secrets in a local cache.  A single global ZID for a
   single installation is the simplest way to implement ZIDs.  However,
   it is specifically not precluded for an implementation to use
   multiple ZIDs, up to the limit of a separate one per callee.  This
   then turns it into a long-lived "association ID" that does not apply
   to any other associations between a different pair of parties.  It is
   a goal of this protocol to permit both options to interoperate
   freely.

   Each time a new s0 is calculated, a new retained shared secret rs1 is
   generated and stored in the cache, indexed by the ZID of the other
   endpoint.  The previous retained shared secret is then renamed rs2
   and also stored in the cache.  For the new retained shared secret,
   each endpoint chooses a cache expiration value which is an unsigned
   32 bit integer of the number of seconds that this secret should be
   retained in the cache.  The time interval is relative to when the
   Confirm1 message is sent or received.

   Note: The storage of two retained shared secrets ensures that even



Zimmermann, et al.       Expires April 25, 2007                [Page 22]


Internet-Draft                    ZRTP                      October 2006


   when a Commit is sent close to the expiration time of a retained
   shared secret, there is a high probability of the endpoints having at
   least one retained shared secret.  The exception to this is if both
   retained shared secrets have identical or near identical expiration
   times.

   The cache intervals are exchanged in the Confirm1 and Confirm2
   messages.  The actual cache interval used by both endpoints is the
   minimum of the values from the Confirm1 and Confirm2 messages.  A
   value of 0 seconds means the secret should not be cached and the
   current values of rs1 and rs2 MUST be maintained.  A value of
   0xFFFFFFFF means the secret should be cached indefinitely and is the
   recommended value.  If the ZRTP exchange results in no new shared
   secret generation (i.e.  Multistream Mode), the field in the Confirm1
   and Confirm2 is set to 0xFFFFFFFF and ignored.

   Retained shared secrets expiration times are checked at the time of
   their inclusion in a DHPart1 or DHPart2 message.  Expired values are
   not included and dropped from the cache.

5.6.  Terminating an SRTP Session or ZRTP Exchange

   The GoClear message is used to switch from SRTP to RTP or to
   terminate an in-progress ZRTP exchange.  The GoClear message contains
   a reason string for human purposes and a clear_hmac field.

   When used to switch from SRTP to RTP, ZRTP avoids relying on the
   optional SRTP authentication tag by using an HMAC of the string
   "GoClear" computed with the hmackey derived from the shared secret:

   clear_hmac = HMAC(hmackey, "GoClear")

   A GoClear message which does not receive a ClearACK response
   indicates that the GoClear has failed authentication (the clear_hmac
   does not validate) and that the session must stay in secure mode.

   When terminating an in-progress ZRTP exchange, no secret hmackey is
   available, so the clear_hmac field is set to all zeros and ignored.
   The reason string SHOULD indicate the reason for the failure (e.g.
   "No Session Key", "Nonce Reuse", "Invalid DH Value").  The
   termination of a ZRTP key agreement exchange results in no updates to
   the cached shared secrets and deletion of all crypto context.

   A ZRTP endpoint that receives a GoClear authenticates the message by
   checking the clear_hmac.  If the message authenticates, the endpoint
   stops sending SRTP packets, generates a ClearACK in response, and
   deletes the crypto context for the SRTP session.  Until confirmation
   from the user is received (e.g. clicking a button, pressing a DTMF



Zimmermann, et al.       Expires April 25, 2007                [Page 23]


Internet-Draft                    ZRTP                      October 2006


   key, etc.), the ZRTP endpoint MUST NOT resume sending RTP packets.
   The endpoint then renders the reason string and an indication that
   the media session has switched to clear mode to the user and waits
   for confirmation from the user.  To prevent pinholes from closing or
   NAT bindings from expiring, the ClearACK message MAY be resent at
   regular intervals (e.g. every 5 seconds) while waiting for
   confirmation from the user.  After confirmation of the notification
   is received from the user, the sending of RTP packets may begin.

   After sending a GoClear message, the ZRTP endpoint stops sending SRTP
   packets.  When a ClearACK is received, the ZRTP endpoint deletes the
   crypto context for the SRTP session and may then resume sending RTP
   packets.  However, the ZRTP Session key is not deleted unless the
   signaling session is terminated as well.

   A ZRTP endpoint MAY choose not to accept GoClear messages after the
   session has switched to SRTP.  This is indicated in the Confirm1 or
   Confirm2 messages by setting the Stay secure flag (S).


6.  RTP Header Extension

   This specification defines a new RTP header extension used for all
   ZRTP messages.  When used, the X bit is set in the RTP header to
   indicate the presence of the RTP header extension.

   Section 5.3.1 in RFC 3550 defines the format of an RTP Header
   extension.  The Header extension is appended to the RTP header.  The
   first 16 bits are an identifier for the header extension, and the
   following 16 bits are length of the extension header in 32 bit words.
   All word lengths referenced in this specification follow RFC 3550 and
   are 32 bits or 4 octets.  All integer fields are carried in network
   byte order, that is, most significant byte (octet) first, commonly
   known as big-endian.  Each ZRTP message is carried in a single RTP
   header extension which has the value of 0x505A.

6.1.  ZRTP Message Formats

   ZRTP messages are designed to simplify endpoint parsing requirements
   and to reduce the opportunities for buffer overflow attacks (a good
   goal of any security extension should be to not introduce new attack
   vectors...)

   ZRTP uses 8 octets (2 words) to encode many ZRTP parameters.  These
   fixed-length blocks are used for Message Type, Hash Type, Cipher
   Type, and Key Agreement Type.  For the Authentication Tag Length, 4
   octets are used.  The values in the blocks are ASCII strings which
   are extended with spaces (0x20) to make them 8 characters long.



Zimmermann, et al.       Expires April 25, 2007                [Page 24]


Internet-Draft                    ZRTP                      October 2006


   Currently defined block values are listed in Tables 1-6 below.
   Additional block values may be defined and used.

   ZRTP uses this ASCII encoding to simplify debugging and make it
   "ethereal friendly".

6.1.1.  Message Type Block

   Currently ten Message Type Blocks are defined - they represent the
   set of ZRTP message primitives.  ZRTP endpoints MUST support the
   Hello, HelloACK, Commit, DHPart1, DHPart2, Confirm1, Confirm2,
   Conf2ACK, GoClear and ClearACK block types.


    Message Type Block   |  Meaning
    ---------------------------------------------------
    "Hello   "           |  Hello Message
                         |  defined in Section 6.2
    ---------------------------------------------------
    "HelloACK"           |  HelloACK Message
                         |  defined in Section 6.3
    ---------------------------------------------------
    "Commit  "           |  Commit Message
                         |  defined in Section 6.4
    ---------------------------------------------------
    "DHPart1 "           |  DHPart1 Message
                         |  defined in Section 6.5
    ---------------------------------------------------
    "DHPart2 "           |  DHPart2 Message
                         |  defined in Section 6.6
    ---------------------------------------------------
    "Confirm1"           |  Confirm1 Message
                         |  defined in Section 6.7
    ---------------------------------------------------
    "Confirm2"           |  Confirm2 Message
                         |  defined in Section 6.8
    ---------------------------------------------------
    "Conf2ACK"           |  Conf2ACK Message
                         |  defined in Section 6.9
    ---------------------------------------------------
    "GoClear "           |  GoClear Message
                         |  defined in Section 6.10
    ---------------------------------------------------
    "ClearACK"           |  ClearACK Message
                         |  defined in Section 6.11
    ---------------------------------------------------

    Table 1. Message Block Type Values



Zimmermann, et al.       Expires April 25, 2007                [Page 25]


Internet-Draft                    ZRTP                      October 2006


6.1.2.  Hash Type Block

   Only one Hash Type is currently defined, SHA256, and all ZRTP
   endpoints MUST support this hash.  Additional Hash Types can be
   registered and used.


    Hash Type Block      |  Meaning
    ---------------------------------------------------
    "SHA256  "           |  SHA-256 Hash defined in [SHA-256]
    ---------------------------------------------------

    Table 2. Hash Block Type Values


6.1.3.  Cipher Type Block

   All ZRTP endpoints MUST support AES128 and MAY support AES256 [4]. or
   other Cipher Types.  Also, if AES 128 is used, DH3k should be used.
   If AES 256 is used, DH4k should be used.


     Cipher Type Block    |  Meaning
    ---------------------------------------------------
    "AES128  "            |  AES-CM with 128 bit keys
                          |  as defined in RFC 3711
    ---------------------------------------------------
    "AES256  "            |  AES-CM with 256 bit keys
                          |  as defined in RFC 3711
    ---------------------------------------------------

    Table 3. Cipher Block Type Values

6.1.4.  Auth Tag Length Block

   The Auth Tag Length Block is 4 octets (1 word) long.  All ZRTP
   endpoints MUST support 32 bit and 80 bit authentication tags as
   defined in RFC 3711.


    Auth Tag Length Block |  Meaning
    ---------------------------------------------------
    "32  "                |  32 bit authentication tag
                          |  as defined in RFC 3711
    ---------------------------------------------------
    "80  "                |  80 bit authentication tag
                          |  as defined in RFC 3711
    ---------------------------------------------------



Zimmermann, et al.       Expires April 25, 2007                [Page 26]


Internet-Draft                    ZRTP                      October 2006


    Table 4. Auth Tag Length Values

6.1.5.  Key Agreement Type Block

   All ZRTP endpoints MUST support DH3072 and MAY support DH4096.  ZRTP
   endpoints MUST use the DH generator function g=2.  The choice of AES
   key length is coupled to the choice of key agreement type.  If AES
   128 is chosen, DH3072 SHOULD be used.  If AES 256 is chosen, DH4096
   SHOULD be used.  ZRTP also defines a non-DH mode, Multistream, which
   MUST be supported.  In Multistream mode, the SRTP key is derived from
   a ZRTP Session key and a nonce.


     Key Agreement Type Block | Meaning
    ---------------------------------------------------
    "DH3072  "                |  DH mode with p=3072 bit prime
                              |  as defined in RFC 3526
    ---------------------------------------------------
    "DH4096  "                |  DH mode with p=4096 bit prime
                              |  as defined in RFC 3526
    ---------------------------------------------------
    "Multistr"                |  Multistream Non-DH mode
                              |  uses ZRTP Session key
    ---------------------------------------------------

    Table 5. Key Agreement Block Type Values

6.1.6.  SAS Type Block

   All ZRTP endpoints SHOULD support the base32 and base256 Short
   Authentication String scheme or other SAS schemes.  The optional ZRTP
   SAS is described in Section 7.


     SAS Type Block       |  Meaning
    ---------------------------------------------------
    "base32  "            |  Short Authentication String using
                          |  base32 encoding defined in Section 8.
    ---------------------------------------------------
    "base256 "            |  Short Authentication String using
                          |  base 256 encoding defined in Section 8.
    ---------------------------------------------------

    Table 6. SAS Block Type Values







Zimmermann, et al.       Expires April 25, 2007                [Page 27]


Internet-Draft                    ZRTP                      October 2006


6.2.  Hello message

   The Hello message has the format shown in Figure 2 below.  The header
   extension payload contains the ZRTP version number and the list of
   algorithms supported by SRTP.  The extension header field format is
   shown in Figure 2.

   The Hello ZRTP message begins with the ZRTP header extension field
   followed by the 32 bit word count of the header field.  Next is a
   word containing the version (ver) of ZRTP.  For this specification,
   the version is the string "0.03".  Next is the Client Identifier
   string (cid) which is 31 octets long and identifies the vendor and
   release of the ZRTP software.  The Passive bit (P) is a Boolean
   normally set to False.  A ZRTP endpoint which is configured to never
   initiate secure sessions is regarded as passive, and would set the P
   bit to True.  Next is a list of supported Hash Types, Cipher Types,
   Auth Tag length, Key Agreement Types, and SAS Type.  Five possible
   algorithms are listed for each using the Blocks defined in Tables 2,
   3, 4, 5, and 6.  If fewer than five algorithms are supported, spaces
   (0x20) are used to pad out the 10 words for each type.  The last
   parameter is the ZID, the 96 bit long unique identifier for the ZRTP
   endpoint.


        0                   1                   2                   3
        0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |0 1 0 1 0 0 0 0 0 1 0 1 1 0 1 0|        length=60 words        |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |            Message Type Block="Hello   " (2 words)            |
       |                                                               |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |                        version (1 word)                       |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |                                                               |
       |                 Client Identifier (31 octets)                 |
       |                              . . .                            |
       |                                               +-+-+-+-+-+-+-+-+
       |                                               |0 0 0 0 0 0 0|P|
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |                                                               |
       |                 Hash Type Blocks 1-5 (10 words)               |
       |                              . . .                            |
       |                                                               |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |                                                               |
       |                Cipher Type Blocks 1-5 (10 words)              |
       |                              . . .                            |



Zimmermann, et al.       Expires April 25, 2007                [Page 28]


Internet-Draft                    ZRTP                      October 2006


       |                                                               |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |                                                               |
       |             Auth Tag Length Blocks 1-5 (5 words)              |
       |                              . . .                            |
       |                                                               |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |                                                               |
       |             Key Agreement Type Blocks 1-5 (10 words)          |
       |                              . . .                            |
       |                                                               |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |                                                               |
       |                  SAS Type Blocks 1-5 (10 words)               |
       |                              . . .                            |
       |                                                               |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |                                                               |
       |                         ZID  (3 words)                        |
       |                                                               |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |                          CRC (1 word)                         |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

   Figure 2. Extension header format for Hello message


6.3.  HelloACK message

   The HelloACK message is used to stop retransmissions of a Hello
   message.  A HelloACK is sent regardless if the version number in the
   Hello is supported or the algorithm list supported.  The receipt of a
   HelloACK stops retransmission of the Hello message.  The format is
   shown in Figure 3 below.  Note that a Commit message can be sent in
   place of a HelloACK by an initiator.


        0                   1                   2                   3
        0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |0 1 0 1 0 0 0 0 0 1 0 1 1 0 1 0|         length=3 words        |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |              Message Type Block="HelloACK" (2 words)          |
       |                                                               |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |                          CRC (1 word)                         |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+




Zimmermann, et al.       Expires April 25, 2007                [Page 29]


Internet-Draft                    ZRTP                      October 2006


     Figure 3. Extension header format for HelloACK message


6.4.  Commit message

   The Commit message is sent to initiate the key agreement process
   after receiving a Hello message.  The Commit message contains the
   initiator's ZID and a list of selected algorithms (hash, cipher, atl,
   pkt, sas), the ZRTP mode, and hvi, a hash of the public DH value of
   the initiator and the algorithm list from the responder's Hello
   message.  If a non-DH mode is used, hvi is replaced by a random
   number, nonce.


        0                   1                   2                   3
        0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |0 1 0 1 0 0 0 0 0 1 0 1 1 0 1 0|        length=23 words        |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |              Message Type Block="Commit  " (2 words)          |
       |                                                               |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |                                                               |
       |                         ZID  (3 words)                        |
       |                                                               |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |                    Hash Type Blocks (2 words)                 |
       |                                                               |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |                   Cipher Type Block (2 words)                 |
       |                                                               |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |                 Auth Tag Length Block (1 word)                |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |                Key Agreement Type Block (2 words)             |
       |                                                               |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |                    SAS Type Block (2 words)                   |
       |                                                               |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |                                                               |
       |                        hvi or nonce (8 words)                 |
       |                               . . .                           |
       |                                                               |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |                          CRC (1 word)                         |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+




Zimmermann, et al.       Expires April 25, 2007                [Page 30]


Internet-Draft                    ZRTP                      October 2006


    Figure 4. Extension header format for Commit message

6.5.  DHPart1 message

   The DHPart1 message begins the DH exchange.  The format is shown in
   Figure 5 below.  The DHPart1 message is sent if a valid Commit
   message is received.  The length of the pvr value depends on the Key
   Agreement Type chosen.  If DH4096 is used, the pvr will be 128 words
   (512 octets).  If DH3072 is used, it is 96 words (384 octets).

   The next five parameters are HMACs of potential shared secrets used
   in generating the ZRTP secret.  The first two, rs1IDr and rs2IDr, are
   the HMACs of the responder's two retained shared secrets, truncated
   to 64 bits.  Next is sigsIDr, the HMAC of the responder's signaling
   secret, truncated to 64 bits.  Next is srtpsIDr, the HMAC of the
   responder's SRTP secret, truncated to 64 bits.  The last parameter is
   the HMAC of an additional shared secret.  For example, if multiple
   SRTP secrets are available or some other secret is used, it can be
   used as the other_secret.
































Zimmermann, et al.       Expires April 25, 2007                [Page 31]


Internet-Draft                    ZRTP                      October 2006


        0                   1                   2                   3
        0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |0 1 0 1 0 0 0 0 0 1 0 1 1 0 1 0|   length=depends on KA Type   |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |              Message Type Block="DHPart1 " (2 words)          |
       |                                                               |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |                                                               |
       |                 pvr (length depends on KA Type)               |
       |                               . . .                           |
       |                                                               |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |                        rs1IDr (2 words)                       |
       |                                                               |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |                        rs2IDr (2 words)                       |
       |                                                               |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |                        sigsIDr (2 words)                      |
       |                                                               |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |                       srtpsIDr (2 words)                      |
       |                                                               |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |                    other_secretIDr (2 words)                  |
       |                                                               |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |                          CRC (1 word)                         |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

     Figure 5. Extension header format for DHPart1 message


6.6.  DHPart2 message

   The DHPart2 message completes the DH exchange.  A DHPart2 message is
   sent if a valid DHPart1 message is received.  The length of the pvi
   value depends on the Key Agreement Type chosen.  If DH4096 is used,
   the pvr will be 128 words (512 octets).  If DH3072 is used, it is 96
   words (384 octets).

   The next five parameters are HMACs of potential shared secrets used
   in generating the ZRTP secret.  The first two, rs1IDi and rs2IDi, are
   the HMACs of the initiator's two retained shared secrets, truncated
   to 64 bits.  Next is sigsIDi, the HMAC of the initiator's signaling
   secret, truncated to 64 bits.  Next is srtpsIDi, the HMAC of the
   initiator's SRTP secret, truncated to 64 bits.  The last parameter is



Zimmermann, et al.       Expires April 25, 2007                [Page 32]


Internet-Draft                    ZRTP                      October 2006


   the HMAC of an additional shared secret.  For example, if multiple
   SRTP secrets are available or some other secret is used, it can be
   included.


        0                   1                   2                   3
        0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |0 1 0 1 0 0 0 0 0 1 0 1 1 0 1 0|   length=depends on KA Type   |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |              Message Type Block="DHPart2 " (2 words)          |
       |                                                               |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |                                                               |
       |                   pvi (length depends on KA Type)             |
       |                               . . .                           |
       |                                                               |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |                        rs1IDi (2 words)                       |
       |                                                               |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |                        rs2IDi (2 words)                       |
       |                                                               |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |                        sigsIDi (2 words)                      |
       |                                                               |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |                       srtpsIDi (2 words)                      |
       |                                                               |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |                    other_secretIDi (2 words)                  |
       |                                                               |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |                          CRC (1 word)                         |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

     Figure 6. Extension header format for DHPart2 message

6.7.  Confirm1 message

   The Confirm1 message is sent in response to a valid DHPart2 message
   after the SRTP session key and parameters have been negotiated.  As a
   result, it is always sent in an SRTP packet.  The format is shown in
   Figure 7 below.  The header extension itself has no parameters
   besides the Message Type Block and the CRC.  The first 52 octets in
   the SRTP payload are used by ZRTP to securely exchange a number of
   parameters.  The plaintext parameter contains the known plaintext
   "known plaintext".  The Disclosure Flag (D) is a Boolean bit defined



Zimmermann, et al.       Expires April 25, 2007                [Page 33]


Internet-Draft                    ZRTP                      October 2006


   in Appendix B.  The Stay secure flag (S) is a Boolean bit defined in
   Section 5.6.  The SAS Verified flag (V) is a Boolean bit defined in
   Section 8.

   The cache expiration interval is an unsigned 32 bit integer of the
   number of seconds that the newly generated cached shared secret, rs1,
   should be stored.  The hmac is a hash over the known plaintext "known
   plaintext" and the flagoctet.

   The parameters included in the SRTP payload MUST NOT be allowed to
   pass to the RTP stack or errors may occur with the media stream.


        0                   1                   2                   3
        0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |0 1 0 1 0 0 0 0 0 1 0 1 1 0 1 0|         length=3 words        |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |              Message Type Block="Confirm1" (2 words)          |
       |                                                               |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |                          CRC (1 word)                         |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

         At the start of the SRTP payload:

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |                                                               |
       |                                                               |
       |                "known plaintext" (15 octets)                  |
       |                                               +-+-+-+-+-+-+-+-+
       |                                               |0 0 0 0 0|D|S|V|
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |              cache expiration interval (1 word)               |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |                                                               |
       |                         hmac (8 words)                        |
       |                             . . .                             |
       |                                                               |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

     Figure 7. Extension header format for Confirm1 message









Zimmermann, et al.       Expires April 25, 2007                [Page 34]


Internet-Draft                    ZRTP                      October 2006


6.8.  Confirm2 message

   The Confirm2 message is sent in response to a Confirm1 message after
   the SRTP session key and parameters have been negotiated.  As a
   result, it is always sent in an SRTP packet.  The format is shown in
   Figure 8 below.  The header extension itself has no parameters
   besides the Message Type Block and the CRC.  The first 52 octets in
   the SRTP payload are used by ZRTP to securely exchange a number of
   parameters.  The plaintext parameter contains the known plaintext
   "known plaintext".  The Disclosure Flag (D) is a Boolean bit defined
   in Appendix B.  The Stay secure flag (S) is a Boolean bit defined in
   Section 5.6.  The SAS Verified flag (V) is a Boolean bit defined in
   Section 8.

   The cache expiration interval is an unsigned 32 bit integer of the
   number of seconds that the newly generated cached shared secret, rs1,
   should be stored.  The hmac is a hash over the known plaintext "known
   plaintext" and the flagoctet.

   The parameters included in the SRTP payload MUST NOT be allowed to
   pass to the RTP stack or errors may occur with the media stream.






























Zimmermann, et al.       Expires April 25, 2007                [Page 35]


Internet-Draft                    ZRTP                      October 2006


        0                   1                   2                   3
        0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |0 1 0 1 0 0 0 0 0 1 0 1 1 0 1 0|         length=3 words        |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |              Message Type Block="Confirm2" (2 words)          |
       |                                                               |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |                          CRC (1 word)                         |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

        At the start of the SRTP payload:

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |                                                               |
       |                "known plaintext" (15 octets)                  |
       |                                               +-+-+-+-+-+-+-+-+
       |                                               |0 0 0 0 0|D|S|V|
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |              cache expiration interval (1 word)               |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |                                                               |
       |                         hmac (8 words)                        |
       |                             . . .                             |
       |                                                               |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

      Figure 8. Extension header format for Confirm2 message


6.9.  Conf2ACK message

   The Conf2ACK message is sent in response to a valid Confirm2 message.
   The format is shown in Figure 9 below.  The receipt of a Conf2ACK
   stops retransmission of the Confirm2 message.


        0                   1                   2                   3
        0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |0 1 0 1 0 0 0 0 0 1 0 1 1 0 1 0|         length=3 words        |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |              Message Type Block="Conf2ACK" (2 words)          |
       |                                                               |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |                          CRC (1 word)                         |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+




Zimmermann, et al.       Expires April 25, 2007                [Page 36]


Internet-Draft                    ZRTP                      October 2006


     Figure 9. Extension header format for Conf2ACK message

6.10.  GoClear message

   The GoClear message is sent to switch from SRTP back to RTP or to
   terminate an in-process ZRTP key agreement exchange.  The format is
   shown in Figure 11 below.  The Reason String is a 16 character string
   which contains the reason for the switch to clear.  If the GoClear is
   sent due to a user interface selection, the reason is "User Request".
   If the GoClear is sent due to a protocol error, the reason phrase is
   generated to describe the reason.  The Reason String can be logged or
   rendered for human consumption.

   If the GoClear is sent to switch from SRTP back to RTP, the The
   clear_hmac is used to authenticate the GoClear message so that bogus
   GoClear messages introduced by an attacker can be detected and
   discarded.


        0                   1                   2                   3
        0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |0 1 0 1 0 0 0 0 0 1 0 1 1 0 1 0|        length=15 words        |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |              Message Type Block="GoClear " (2 words)          |
       |                                                               |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |                                                               |
       |                      Reason String  (4 words)                 |
       |                                                               |
       |                                                               |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |                                                               |
       |                       clear_hmac (8 words)                    |
       |                             . . .                             |
       |                                                               |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |                          CRC (1 word)                         |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

     Figure 11. Extension header format for GoClear message


6.11.  ClearACK message

   The ClearACK message is sent to acknowledge receipt of a GoClear.  A
   ClearACK is only sent if the clear_hmac from the GoClear message is
   authenticated.  Otherwise, no response is returned.  The format is



Zimmermann, et al.       Expires April 25, 2007                [Page 37]


Internet-Draft                    ZRTP                      October 2006


   shown in Figure 12 below.


        0                   1                   2                   3
        0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |0 1 0 1 0 0 0 0 0 1 0 1 1 0 1 0|        length=3 words         |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |              Message Type Block="ClearACK" (2 words)          |
       |                                                               |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |                          CRC (1 word)                         |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

     Figure 12. Extension header format for ClearACK message



7.  Retransmissions

   ZRTP uses two retransmission timers T1 and T2.  T1 is used for
   retransmission of Hello messages, when the support of ZRTP by the
   other endpoint may not be known.  T2 is used in retransmissions of
   all the other ZRTP messages with the exception of GoClear.

   Practical experience has shown that RTP packet loss at the start of
   an RTP session can be extremely high.  Since the entire ZRTP message
   exchange occurs during this period, the defined retransmission scheme
   is defined to be aggressive.  Since ZRTP packets with the exception
   of the DHPart1 and DHPart2 messages are small, this should have
   minimal effect on overall bandwidth utilization of the media session.

   Hello ZRTP requests are retransmitted at an interval that starts at
   T1 seconds and doubles after every retransmission, capping at 200ms.
   A Hello message is retransmitted 20 times before giving up.  T1 has a
   recommended value of 50 ms.  Retransmission of a Hello ends upon
   receipt of a HelloACK or Commit message.

   Non-Hello ZRTP requests are retransmitted only by the initiator -
   that is, only Commit, DHPart2, and Confirm2 are retransmitted if the
   corresponding message from the responder, DHPart1, Confirm1, and
   Conf2ACK, are not received.  Non-Hello ZRTP messages are
   retransmitted at an interval that starts at T2 seconds and doubles
   after every retransmission, capping at 600ms.  Only the ZRTP
   initiator performs retransmissions.  Each message is retransmitted 10
   times before giving up and resuming a normal RTP session.  T2 has a
   default value of 150ms.  Each message has a response message that
   stops retransmissions, as shown in Table 7.  The high value of T2



Zimmermann, et al.       Expires April 25, 2007                [Page 38]


Internet-Draft                    ZRTP                      October 2006


   means that retransmissions will likely only occur with packet loss.

   A GoClear message is retransmitted at 500ms intervals until a
   ClearACK message is received.


       Message      Acknowledgement Message
       -------      -----------------------
       Hello        HelloACK or Commit
       Commit       DHPart1 or Confirm1
       DHPart2      Confirm1
       Confirm1     Confirm2
       Confirm2     Conf2ACK
       GoClear      ClearACK

      Table 7. Retransmitted ZRTP Messages and Responses



8.  Short Authentication String

   This section will discuss the implementation of the Short
   Authentication String, or SAS in ZRTP.

   The Short Authentication String (SAS) value is calculated as the hash
   of both DH public values and the string "Short Authentication
   String".

   sasvalue = last 32 bits of hash(pvi | pvr | "Short Authentication
   String")

   The rendering of the SAS value depends on the SAS Type agreed upon in
   the Commit message.  For the SAS Type of base32, the last 20 bits of
   the sasvalue are rendered as a form of base32 encoding known as
   libbase32 [10].  The purpose of base32 is to represent arbitrary
   sequences of octets in a form that is as convenient as possible for
   human users to manipulate.  As a result, the choice of characters is
   slightly different from base32 as defined in RFC 3548.  The last 20
   bits of the sasvalue results in four base32 characters which are
   rendered to both ZRTP endpoints.  Other SAS Types may be defined to
   render the SAS value in other ways.

   The SAS SHOULD be rendered to the user.  In addition, the SAS SHOULD
   be sent in a subsequent offer/answer exchange (a re-INVITE in SIP)
   after the completion of ZRTP exchange using the ZRTP SAS SDP
   attributes defined in Appendix A.

   The SAS Verified flag (V) is set based on the user indicating that



Zimmermann, et al.       Expires April 25, 2007                [Page 39]


Internet-Draft                    ZRTP                      October 2006


   SAS has been successfully performed.  The SAS Verified flag is
   exchanged securely in the Confirm1 and Confirm2 messages of the next
   session.  In other words, each party sends the SAS Verified flag from
   the previous session in the Confirm message of the current session.
   It is perfectly reasonable to have a ZRTP endpoint that never sets
   the SAS Verified flag, because it would require adding complexity to
   the user interface to allow the user to set it.  The SAS Verified
   flag is not required to be set, but if it is available to the client
   software, it allows for the possibility that the client software
   could render to the user that the SAS verify procedure was carried
   out in a previous session.

   Regardless of whether there is a user interface element to allow the
   user to set the SAS Verified flag, it is worth caching a shared
   secret, because doing so reduces opportunities for an attacker in the
   next call.

   If at any time the users carry out the SAS procedure, and it actually
   fails to match, then this means there is a very resourceful man in
   the middle.  If this is the first call, the MitM was there on the
   first call, which is impressive enough.  If it happens in a later
   call, it also means the MitM must also know your cached shared
   secret, because you could not have carried out any voice traffic at
   all unless the session key was correctly computed and is also known
   to the attacker.  This implies the MitM must have been present in all
   the previous sessions, since the initial establishment of the first
   shared secret.  This is indeed a resourceful attacker.  It also means
   that if at any time he ceases his participation as a MitM on one of
   your calls, the protocol will detect that the cached shared secret is
   no longer valid -- because it was really two different shared secrets
   all along, one of them between Alice and the attacker, and the other
   between the attacker and Bob. The continuity of the cached shared
   secrets make it possible for us to detect the MitM when he inserts
   himself into the ongoing relationship, as well as when he leaves.
   Also, if the attacker tries to stay with a long lineage of calls, but
   fails to execute a DH MitM attack for even one missed call, he is
   permanently excluded.  He can no longer resynchronize with the chain
   of cached shared secrets.

   Some sort of user interface element (maybe a checkbox) is needed to
   allow the user to tell the software the SAS verify was successful,
   causing the software to set the SAS Verified flag (V), which
   (together with our cached shared secret) obviates the need to perform
   the SAS procedure in the next call.  An additional user interface
   element can be provided to let the user tell the software he detected
   an actual SAS mismatch, which indicates a MitM attack.  The software
   can then take appropriate action, clearing the SAS Verified flag, and
   erase the cached shared secret from this session.  It is up to the



Zimmermann, et al.       Expires April 25, 2007                [Page 40]


Internet-Draft                    ZRTP                      October 2006


   implementer to decide if this added user interface complexity is
   warranted.

   If the SAS matches, it means there is no MitM, which also implies it
   is now safe to trust a cached shared secret for later calls.  If
   inattentive users don't bother to check the SAS, it means we don't
   know whether there is or is not a MitM, so even if we do establish a
   new cached shared secret, there is a risk that our potential attacker
   may have a subsequent opportunity to continue inserting himself in
   the call, until we finally get around to checking the SAS.  If the
   SAS matches, it means no attacker was present for any previous
   session since we started propagating cached shared secrets, because
   this session and all the previous sessions were also authenticated
   with a continuous lineage of shared secrets.


9.  IANA Considerations

   This specification defines three new SDP [11] attributes in Appendix
   A. The IANA registrations would be as follows:

   Contact name:          Phil Zimmermann <prz@mit.edu>

   Attribute name:        "zrtp".

   Type of attribute:     Session level or Media level.

   Subject to charset:    Not.

   Purpose of attribute:  The 'zrtp' flag indicates that a UA supports the
                          ZRTP protocol.

   Allowed attribute values:  None.

   IANA would registered the ZRTP SAS SDP attribute:
















Zimmermann, et al.       Expires April 25, 2007                [Page 41]


Internet-Draft                    ZRTP                      October 2006


   Contact name:          Phil Zimmermann <prz@mit.edu>

   Attribute name:        "zrtp-sas".

   Type of attribute:     Media level.

   Subject to charset:    Yes.

   Purpose of attribute:  The 'zrtp-sas' is used to convey the ZRTP SAS
                          string that would be rendered to the users.  The
                          the SAS is carried in the same format as it
                          would be rendered.

   Allowed attribute values:  String.

   IANA would registered the ZRTP SASvalue SDP attribute:

   Contact name:          Phil Zimmermann <prz@mit.edu>

   Attribute name:        "zrtp-sasvalue".

   Type of attribute:     Media level.

   Subject to charset:    Not.

   Purpose of attribute:  The 'zrtp-sasvalue' is used to convey the SASvalue
                          used for deriving the SAS string.  The SAS value is
                          encoded as hexadecimal.

   Allowed attribute values:  Hex.


10.  Security Considerations

   This document is all about securely keying SRTP sessions.  As such,
   security is discussed in every section.  The next version of this
   draft will have a summary of those security properties discussed
   throughout the document.

   The ZRTP SDP attributes convey information through the signaling that
   is already available in clear text through the media channel.  For
   example, the ZRTP flag is equivalent to sending a ZRTP Hello message.
   The SAS is calculated from the public Diffie-Hellman values exchanged
   in the DHPart1 and DHPart2 messages and a known string.  As a result,
   none of the ZRTP SDP attributes require confidentiality from the
   signaling.

   The ZRTP SAS attributes can use the signaling channel as an out-of-



Zimmermann, et al.       Expires April 25, 2007                [Page 42]


Internet-Draft                    ZRTP                      October 2006


   band authentication mechanism.  This authentication is only useful if
   the signaling channel has end-to-end integrity protection.  Note that
   the SIP Identity header field [23] provides middle-to-end integrity
   protection across SDP message bodies which provides useful protection
   for ZRTP SAS attributes.


11.  Acknowledgments

   The authors would like to thank Bryce Wilcox-O'Hearn for his
   contributions to the design of this protocol, and to thank Jon
   Peterson, Colin Plumb, and Hal Finney for their helpful comments and
   suggestions.  Also thanks to David McGrew, Roni Even, Viktor Krikun,
   Werner Dittmann, Allen Pulsifer, Klaus Peters, and Abhishek Arya for
   their feedback and comments.


12.  Appendix A - ZRTP, SIP, and SDP

   This section discusses how ZRTP, SIP, and SDP work together.

   Note that ZRTP may be implemented without coupling with the SIP
   signaling.  For example, ZRTP can be implemented as a "bump in the
   wire" or as a "bump in the stack" in which RTP sent by the SIP UA is
   converted to ZRTP.  In these cases, the SIP UA will have no knowledge
   of ZRTP.  As a result, the signaling path discovery mechanisms
   introduced in this section should not be definitive - they are a
   hint.  Despite the absence of an indication of ZRTP support in an
   offer or answer, a ZRTP endpoint SHOULD still send Hello messages.

   ZRTP endpoints which have control over the signaling path include a
   ZRTP SDP attributes in their SDP offers and answers.  The ZRTP
   attribute, a=zrtp is a flag to indicate support for ZRTP.  There are
   a number of potential uses for this attribute.  It is useful when
   signaling elements would like to know when ZRTP may be utilized by
   endpoints.  It is also useful if endpoints support multiple methods
   of SRTP key management.  The ZRTP attribute can be used to ensure
   that these key management approaches work together instead of against
   each other.  For example, if only one endpoint supports ZRTP but both
   support another method to key SRTP, then the other method will be
   used instead.  When used in parallel, an SRTP secret carried in an
   a=keymgt [20] or a=crypto [19] attribute can be used as a shared
   secret for the srtp_secret.  The ZRTP attribute is also used to
   signal to an intermediary ZRTP device not to act as a ZRTP endpoint,
   as discussed in Appendix C.

   The a=zrtp attribute can be included at a media level or at the
   session level.  When used at the media level, it indicates that ZRTP



Zimmermann, et al.       Expires April 25, 2007                [Page 43]


Internet-Draft                    ZRTP                      October 2006


   is supported on this media stream.  When used at the session level,
   it indicates that ZRTP is supported in all media streams in the
   session described by the offer or answer.

   In some scenarios, it is desirable for a signaling intermediary to be
   able to validate the SAS on behalf of the user.  This could be due to
   an endpoint which has a user interface unable to render the SAS.  Or,
   this could be a protection by an organization against lazy users who
   never check the SAS.  Using either the ZRTP SAS or ZRTP SASvalue
   attribute, the SAS check can be performed without requiring the human
   users to speak the SAS.  Note that this check can only be relied on
   if the signaling path has end-to-end integrity protection.

   The ZRTP SAS attribute a=zrtp-sas is a Media level SDP attribute that
   can be used to carry the SAS string which would be identical to that
   rendered to the user.  The value passed depends on the negotiated SAS
   Type.  Since the SAS is not known at the start of a session, the
   a=zrtp-sas attribute will never be present in the initial offer/
   answer exchange.  After the ZRTP exchange has completed, the SAS is
   known and can be exchanged over the signaling using a second offer/
   answer exchange (a re-INVITE in SIP terms).  Note that the SAS is not
   a secret and as such does not need confidentiality protection when
   sent over the signaling path.

   The ZRTP SASvalue attribute a=zrtp-sasvalue attribute can be used to
   send the 32 bit SAS value encoded as hex.  Note that this value is
   not the same as that rendered to the user and is independent of the
   negotiated SAS type.  Since the SAS is not known at the start of a
   session, the a=zrtp-sas attribute will never be present in the
   initial offer/answer exchange.  After the ZRTP exchange has
   completed, the SAS is known and can be exchanged over the signaling
   using a second offer/answer exchange (a re-INVITE in SIP terms).

   The ABNF for the ZRTP attribute is as follows:

        zrtp-attribute    = "a=zrtp"

   The ABNF for the ZRTP SAS attribute is as follows:

        zrtp-sas-attribute    = "a=zrtp-sas:" sas-string

        sas-string            = non-ws-string

        non-ws-string         = 1*(VCHAR/%x80-FF)
                               ;string of visible characters


   The ABNF for the ZRTP SASvalue attribute is as follows:



Zimmermann, et al.       Expires April 25, 2007                [Page 44]


Internet-Draft                    ZRTP                      October 2006


        zrtp-sasvalue-attribute = "a=zrtp-sasvalue:" sas-value

        sas-value               = 1*(HEXDIG)


   Example of the ZRTP attribute in an initial SDP offer or answer used
   at the session level:

      v=0
      o=bob 2890844527 2890844527 IN IP4 client.biloxi.example.com
      s=
      c=IN IP4 client.biloxi.example.com
      a=zrtp
      t=0 0
      m=audio 3456 RTP/AVP 97 33
      a=rtpmap:97 iLBC/8000
      a=rtpmap:33 no-op/8000


   Example of the ZRTP SAS and SASvalue attribute in a subsequent SDP
   offer or answer used at the media level.  Note that the a=zrtp
   attribute doesn't provide any additional information when used with
   the SAS and SASvalue attributes but does not do any harm:

      v=0
      o=bob 2890844527 2890844528 IN IP4 client.biloxi.example.com
      s=
      c=IN IP4 client.biloxi.example.com
      a=zrtp
      t=0 0
      m=audio 3456 RTP/AVP 97 33
      a=rtpmap:97 iLBC/8000
      a=rtpmap:33 no-op/8000
      a=zrtp-sas:opz
      a=ztrp-sasvalue:45e387ff

   Another example showing a second media stream being added to the
   session.  A second DH exchange is performed (instead of using the
   Multistream mode) resulting in a second set of ZRTP SAS and SASvalue
   attributes.











Zimmermann, et al.       Expires April 25, 2007                [Page 45]


Internet-Draft                    ZRTP                      October 2006


      v=0
      o=bob 2890844527 2890844528 IN IP4 client.biloxi.example.com
      s=
      c=IN IP4 client.biloxi.example.com
      a=zrtp
      t=0 0
      m=audio 3456 RTP/AVP 97 33
      a=rtpmap:97 iLBC/8000
      a=rtpmap:33 no-op/8000
      a=zrtp-sas:opz
      a=ztrp-sasvalue:45e387ff
      m=video 51372 RTP/AVP 31 33
      a=rtpmap:31 H261/90000
      a=rtpmap:33 no-op/8000
      a=zrtp-sas:qvj
      a=ztrp-sasvalue:5e017f3a


13.  Appendix B - The ZRTP Disclosure flag

   There are no back doors defined in the ZRTP protocol specification.
   The designers of ZRTP would like to discourage back doors in ZRTP-
   enabled products.  However, despite the lack of back doors in the
   actual ZRTP protocol, it must be recognized that a ZRTP implementer
   might still deliberately create a rogue ZRTP-enabled product that
   implements a back door outside the scope of the ZRTP protocol.  For
   example, they could create a product that discloses the SRTP session
   key generated using ZRTP out-of-band to a third party.  They may even
   have a legitimate business reason to do this for some customers.

   For example, some environments have a need to monitor or record
   calls, such as stock brokerage houses who want to discourage insider
   trading, or special high security environments with special needs to
   monitor their own phone calls.  We've all experienced automated
   messages telling us that "This call may be monitored for quality
   assurance".  A ZRTP endpoint in such an environment might
   unilaterally disclose the session key to someone monitoring the call.
   ZRTP-enabled products that perform such out-of-band disclosures of
   the session key can undermine public confidence in the ZRTP protocol,
   unless we do everything we can in the protocol to alert the other
   user that this is happening.

   If one of the parties is using a product that is designed to disclose
   their session key, ZRTP requires them to confess this fact to the
   other party through a protocol message to the other party's ZRTP
   client, which can properly alert that user, perhaps by rendering it
   in a GUI.  The disclosing party does this by sending a Disclosure
   flag (D) in Confirm1 and Confirm2 messages as described in Sections



Zimmermann, et al.       Expires April 25, 2007                [Page 46]


Internet-Draft                    ZRTP                      October 2006


   6.7 and 6.8.

   Note that the intention here is to have the Disclosure flag identify
   products that are designed to disclose their session keys, not to
   identify which particular calls are compromised on a call-by-call
   basis.  This is an important legal distinction, because most
   government sanctioned wiretap regulations require a VoIP service
   provider to not reveal which particular calls are wiretapped.  But
   there is nothing illegal about revealing that a product is designed
   to be wiretap-friendly.  The ZRTP protocol mandates that such a
   product "out" itself.

   You might be using a ZRTP-enabled product with no back doors, but if
   your own GUI tells you the call is (mostly) secure, except that the
   other party is using a product that is designed in such a way that it
   may have disclosed the session key for monitoring purposes, you might
   ask him what brand of secure telephone he is using, and make a mental
   note not to purchase that brand yourself.  If we create a protocol
   environment that requires such back-doored phones to confess their
   nature, word will spread quickly, and the "unseen hand" of the free
   market will act.  The free market has effectively dealt with this in
   the past.

   Of course, a ZRTP implementer can lie about his product having a back
   door, but the ZRTP standard mandates that ZRTP-compliant products
   MUST adhere to the requirement that a back door be confessed by
   sending the Disclosure flag to the other party.

   There will be inevitable comparisons to Steve Bellovin's 2003 April
   fool's joke, when he submitted RFC 3514 [22] which defined the "Evil
   bit" in the IPV4 header, for packets with "evil intent".  But we
   submit that a similar idea can actually have some merit for securing
   VoIP.  Sure, one can always imagine that some implementer will not be
   fazed by the rules and will lie, but they would have lied anyway even
   without the Disclosure flag.  There are good reasons to believe that
   it will improve the overall percentage of implementations that at
   least tell us if they put a back door in their products, and may even
   get some of them to decide not to put in a back door at all.  From a
   civic hygiene perspective, we are better off with having the
   Disclosure flag in the protocol.

   If an endpoint stores or logs SRTP keys or information that can be
   used to reconstruct or recover SRTP keys after they are no longer in
   use (i.e. the session is active), or otherwise discloses or passes
   SRTP keys or information that can be used to reconstruct or recover
   SRTP keys to another application or device, the Disclosure flag D
   MUST be set in the Confirm1 or Confirm2 message.




Zimmermann, et al.       Expires April 25, 2007                [Page 47]


Internet-Draft                    ZRTP                      October 2006


14.  Appendix C - Intermediary ZRTP Devices

   This section discusses the operation of a ZRTP endpoint which is
   actually an intermediary.  For example, consider a device which
   proxies both signaling and media between endpoints.  There are three
   possible ways in which such a device could support ZRTP.

   An intermediary device can act transparently to the ZRTP protocol.
   To do this, a device MUST pass RTP header extensions and payloads.
   This is the RECOMMENDED behavior for intermediaries as ZRTP and SRTP
   are best when done end-to-end.

   An intermediary device could implement the ZRTP protocol and act as a
   ZRTP endpoint on behalf of non-ZRTP endpoints behind the intermediary
   device.  The intermediary could determine on a call-by-call basis
   whether the endpoint behind it supports ZRTP based on the presence or
   absence of the ZRTP SDP attribute flag (a=zrtp).  For non-ZRTP
   endpoints, the intermediary device could act as the ZRTP endpoint
   using its own ZID and cache.  This approach MUST only be used when
   there is some other security method protecting the confidentiality of
   the media between the intermediary and the inside endpoint, such as
   IPSec or physical security.

   The third mode, which is NOT RECOMMENDED, is for the intermediary
   device to attempt to back-to-back the ZRTP protocol.  In this mode,
   the intermediary would attempt to act as a ZRTP endpoint towards both
   endpoints of the media session.  This approach MUST NOT be used as it
   will always result in a detected Man-in-the-Middle attack and will
   generate alarms on both endpoints and likely result in the immediate
   termination of the session.  It cannot be stated strongly enough that
   there are no usable back-to-back uses for the ZRTP protocol.

   It is possible that an intermediary device acting as a ZRTP endpoint
   might still receive ZRTP Hello and other messages from the inside
   endpoint.  This could occur if there is another inline ZRTP device
   which does not include the ZRTP SDP attribute flag.  If this occurs,
   the intermediary MUST NOT pass these ZRTP messages if it is acting as
   the ZRTP endpoint.


15.  References

15.1.  Normative References

   [1]   Bradner, S., "Key words for use in RFCs to Indicate Requirement
         Levels", BCP 14, RFC 2119, March 1997.

   [2]   Schulzrinne, H., Casner, S., Frederick, R., and V. Jacobson,



Zimmermann, et al.       Expires April 25, 2007                [Page 48]


Internet-Draft                    ZRTP                      October 2006


         "RTP: A Transport Protocol for Real-Time Applications", STD 64,
         RFC 3550, July 2003.

   [3]   Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K.
         Norrman, "The Secure Real-time Transport Protocol (SRTP)",
         RFC 3711, March 2004.

   [4]   McGrew, D., "The use of AES-192 and AES-256 in Secure RTP",
         draft-mcgrew-srtp-big-aes-00 (work in progress), April 2006.

   [5]   Kivinen, T. and M. Kojo, "More Modular Exponential (MODP)
         Diffie-Hellman groups for Internet Key Exchange (IKE)",
         RFC 3526, May 2003.

   [6]   Stone, J., Stewart, R., and D. Otis, "Stream Control
         Transmission Protocol (SCTP) Checksum Change", RFC 3309,
         September 2002.

   [7]   Andreasen, F., "A No-Op Payload Format for RTP",
         draft-wing-avt-rtp-noop-03 (work in progress), May 2005.

   [8]   Ferguson, N. and B. Schneier, "Practical Cryptography", Wiley
         Publishing 2003.

   [9]   Barker, E. and J. Kelsey, "Recommendation for Random Number
         Generation Using Deterministic Random Bit Generators", NIST
         Special Publication 800-90 DRAFT (December 2005).

   [10]  Wilcox, B., "Human-oriented base-32 encoding", http://
         cvs.sourceforge.net/viewcvs.py/libbase32/libbase32/
         DESIGN?rev=HEAD .

   [11]  Handley, M., Jacobson, V., and C. Perkins, "SDP: Session
         Description Protocol", RFC 4566, July 2006.

15.2.  Informative References

   [12]  Audet, F. and D. Wing, "Evaluation of SRTP Keying with SIP",
         draft-wing-rtpsec-keying-eval-01 (work in progress), June 2006.

   [13]  Zimmermann, P., "PGPfone",
         http://www.pgpi.org/products/pgpfone/ .

   [14]  Zimmermann, P., "Zfone", http://www.philzimmermann.com/zfone .

   [15]  Blossom, E., "The VP1 Protocol for Voice Privacy Devices
         Version 1.2", http://www.comsec.com/vp1-protocol.pdf .




Zimmermann, et al.       Expires April 25, 2007                [Page 49]


Internet-Draft                    ZRTP                      October 2006


   [16]  "CryptoPhone", http://www.cryptophone.de/ .

   [17]  Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston, A.,
         Peterson, J., Sparks, R., Handley, M., and E. Schooler, "SIP:
         Session Initiation Protocol", RFC 3261, June 2002.

   [18]  Ylonen, T. and C. Lonvick, "The Secure Shell (SSH) Protocol
         Architecture", RFC 4251, January 2006.

   [19]  Andreasen, F., Baugher, M., and D. Wing, "Session Description
         Protocol (SDP) Security Descriptions for Media Streams",
         RFC 4568, July 2006.

   [20]  Arkko, J., Lindholm, F., Naslund, M., Norrman, K., and E.
         Carrara, "Key Management Extensions for Session Description
         Protocol (SDP) and Real Time Streaming Protocol (RTSP)",
         RFC 4567, July 2006.

   [21]  Arkko, J., Carrara, E., Lindholm, F., Naslund, M., and K.
         Norrman, "MIKEY: Multimedia Internet KEYing", RFC 3830,
         August 2004.

   [22]  Bellovin, S., "The Security Flag in the IPv4 Header", RFC 3514,
         April 1 2003.

   [23]  Peterson, J. and C. Jennings, "Enhancements for Authenticated
         Identity Management in the Session Initiation Protocol (SIP)",
         RFC 4474, August 2006.


Authors' Addresses

   Philip Zimmermann
   Zfone Project

   Email: prz@mit.edu


   Alan Johnston (editor)
   Avaya
   St. Louis, MO  63124

   Email: alan@sipstation.com








Zimmermann, et al.       Expires April 25, 2007                [Page 50]


Internet-Draft                    ZRTP                      October 2006


   Jon Callas
   PGP Corporation

   Email: jon@pgp.com















































Zimmermann, et al.       Expires April 25, 2007                [Page 51]


Internet-Draft                    ZRTP                      October 2006


Full Copyright Statement

   Copyright (C) The Internet Society (2006).

   This document is subject to the rights, licenses and restrictions
   contained in BCP 78, and except as set forth therein, the authors
   retain all their rights.

   This document and the information contained herein are provided on an
   "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
   OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET
   ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED,
   INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE
   INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
   WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.


Intellectual Property

   The IETF takes no position regarding the validity or scope of any
   Intellectual Property Rights or other rights that might be claimed to
   pertain to the implementation or use of the technology described in
   this document or the extent to which any license under such rights
   might or might not be available; nor does it represent that it has
   made any independent effort to identify any such rights.  Information
   on the procedures with respect to rights in RFC documents can be
   found in BCP 78 and BCP 79.

   Copies of IPR disclosures made to the IETF Secretariat and any
   assurances of licenses to be made available, or the result of an
   attempt made to obtain a general license or permission for the use of
   such proprietary rights by implementers or users of this
   specification can be obtained from the IETF on-line IPR repository at
   http://www.ietf.org/ipr.

   The IETF invites any interested party to bring to its attention any
   copyrights, patents or patent applications, or other proprietary
   rights that may cover technology that may be required to implement
   this standard.  Please address the information to the IETF at
   ietf-ipr@ietf.org.


Acknowledgment

   Funding for the RFC Editor function is provided by the IETF
   Administrative Support Activity (IASA).





Zimmermann, et al.       Expires April 25, 2007                [Page 52]