Skip to main content

Early Review of draft-wing-sipping-srtp-key-

Request Review of draft-wing-sipping-srtp-key
Requested revision No specific revision (document currently at 04)
Type Early Review
Team Security Area Directorate (secdir)
Deadline 2009-02-19
Requested 2008-11-11
Authors Dan Wing , Francois Audet , Steffen Fries , Hannes Tschofenig , Alan Johnston
I-D last updated 2009-02-19
Completed reviews Secdir Early review of -?? by Eric Rescorla
Assignment Reviewer Eric Rescorla
State Completed
Request Early review on draft-wing-sipping-srtp-key by Security Area Directorate Assigned
Completed 2009-02-19
$Id: draft-wing-sipping-srtp-key-04-rev.txt,v 1.2 2009/02/15 15:41:29 ekr Exp $

End-to-end VoIP security mechanisms such as DTLS-SRTP represent a
threat to mechanisms in which a network element which is not a party
to the call wishes to monitor or modify the contents of the media
traffic. This document describes a mechanism for one of the parties
to the communication to provide a copy of the keying material
to such a third party subject to some set of authorization

I'm concerned that this document doesn't have a very clear
statement of requirements. Rather, it seems to be attempting
to fulfill a number of distinct use cases which don't have
much in common except that they represent violations of the
end-to-end security model of the SIP call.

This document describes two major use cases for this type of

- Monitoring (call recording)
- Transcoding

I don't think it's particularly useful to conflate these cases, which
are really quite different. Monitoring is fundamentally a passive
process: there is no need for the monitor to be able to modify the
traffic. By contrast, transcoding is an active process: the transcoder
is expected to modify the data. In reality, a transcoded call isn't
a call between two endpoints, but rather two calls, each from one
endpoint to the transcoder. I think it's a mistake to try do to
these with the same mechanism. 

Similarly, this document fails to distinguish adequately between
real-time and non-real-time use cases. Many monitoring/call recording
applications are inherently non-real-time: you record the call
and some time in the future, the call may or may not be replayed.
This distinction has a number of implications, particularly since
capture of the keying material and media can be separated. In
particular, it may be desirable to deliver the keying material long after
the call has finished (for privacy reasons). It's not clear
to me how this is accomplished with this draft. It's possible
it could be initiated by the UA, but I don't see how it could
be initiated by the monitor. Even in a UA initiated fashion,
I don't see that the information provided by the SDP in S 11
is sufficient to unambiguously identify the flow, in part
due to network parameter reuse.

While I appreciate it's convenient to reuse the SDP parameters,
it's not clear to me that it's a good idea to hand over the SRTP
master key. If all you need to do is verify the call for 
quality assurance, you don't need the integrity check, at
least not initially. In fact, not having access to the integrity
key protects against accusations that the recording device
tampered. Similarly, it's not clear to me that it's desirable
to have the same level of protection for the connection parameters
as for the keys. Wouldn't it be useful for the monitoring application
to know what connections it *potentially* has the keys for 
but not have direct access to them until some future time?
Again, this seems like something that would be more clear with
a requirements analysis in terms of privacy requirements.

Finally, the elephant under the covers here is lawful intercept.
the authors specifically disclaim it, but it's quite clear that 
this is usable as an LI system. Indeed, many such systems
(e.g., FORTEZZA) involve cooperation from the endpoint being

Accordingly, I would recommend that rather than accepting this
mechanism as a WG document, the WG do a thorough requirements
analysis focusing on minimizing the privacy issues inherent in
mechanisms of this type. Once there is consensus on the requirements,
then it's possible to have a discussion of mechanisms.

If the requirement for recording is this strong, wouldn't it
be better not to rely on the UA doing the right thing? Rather
enforce it in a firewall or IDS.

   The signature of the SAML assertion should be produced using the
   private key of the domain certificate.  This certificate MUST have a
   SubjAltName which matches the domain of user agent's SIP proxy (that
   is, if the SIP proxy is, the SubjAltName of the
   domain certificate signing this SAML assertion MUST also be  Here, the main focus is placed on communication of
   clients with the ESC, which belongs to the client's home domain.

It's not clear to me why this is the correct authorizing certificate.

I don't really understand the need for the rcrypto thing.
Why not just pretend you have two streams with distinct
keys and use crypto= for both.

Actually, I don't really think it makes sense to use SDP
here at all: the semantics of the SDP really aren't the same,
since you're not offering to receive a media stream,
you're advertising what you're going to send. 

As noted above, I think it would be better to send the
traffic keys separately.

This whole SAML thing seems pretty underspecified.

I don't think using SIPS here is adequate, since it doesn't
provide any guarantee to the endpoint of the security treatment
of the keying material. In fact, as I noted earlier, I'm not clear
that S/MIME is good enough. I think you may want something

This Disclosure thing seems a bit confusing. Isn't what you
really need to inject the appropriate warnings in the media