Skip to main content

WebRTC Security Architecture


(Adam Roach)
(Spencer Dawkins)

No Objection

(Alvaro Retana)
(Deborah Brungard)
(Ignas Bagdonas)
(Martin Vigoureux)
(Suresh Krishnan)


Note: This ballot was opened for revision 18 and is now closed.

Adam Roach Former IESG member
Yes (for -18) Unknown

Alissa Cooper Former IESG member
Yes (2019-03-06 for -18) Sent
In Section 7 I would suggest using Alice and Bob (or some other name) rather than "I" and "you/your." is used there so the pronouns are mixed.

Same comment for 9.1 -- I would suggest replacing "your" with "the user's."
Ben Campbell Former IESG member
Yes (2019-03-04 for -18) Sent
Update to the Update: As Adam mentioned in email; while t= vs c=0 ordering is correct in 7.4.1, t= vs a= is not. (Capturing it here for the ballot record.)

Update: Ignore my comment about t= vs c=0. I had my order crossed; it is correct in the example.

I'm balloting "yes", but have a few minor comments and editorial comments:

- (nit) The first sentence will not age well; I gather RTCWEB will close before long. (also The WG acronym is RTCWEB, not WebRTC. Or are you talking about the W3C?)

- (nit) Figure 2: It seems a bit weird to have XMPP here, but never mentioned in the text. At least, please expand the abbreviation somwhere. (It also shows up in figure 4.)

(nit) §3.1, first bullet: While I don't normally object (beyond nose holding, anyway) to the use of first person in RFCs, it seems an odd choice for this sentence. I assume "we" in this sentence does not refer to the the author or the WG.

(nit) - '... button next to Bob’s name which says "Call".':

- "The calling site will also provide some user interface
element (e.g., a button) to allow Bob to answer the call, though this
is most likely not part of the trusted UI."

This is the first mention of "trusted UI". It would be helpful to elaborate on that prior to this mention.

- "In this
case, the established identity SHOULD be applied to existing DTLS
connections as well as new connections established using one of those

Applied by the recipient? (consider active voice). Also, why not MUST? Don't unexpected things happen if the recipient doesn't do this?

- "Because HTTP origins cannot be securely established against network
attackers, implementations MUST NOT allow the setting of permanent
access permissions for HTTP origins. Implementations MUST refuse all
permissions grants for HTTP origins."

(nit-ish) - The MUST NOT seems non-constraining considering the last sentence. Or am I reading that sentence wrong?

(nit) - .E.g., "Call"' : sentence fragment.
(nit) - ".. unlikely that browsers would have an X.509 certificate..." : Plural disagreement (assuming the browsers do not share 1 cert).

(maybe nit) - "Clients MAY permit the formation of data channels without any direct
user approval."
Is the switch from talking about "the browser" to "Clients" intentional?

(nit) - "Note that these requirements are NOT intended..."
"NOT" in all caps is likely to be confused with 2119/8174 language.

 (nit) - "Implementations MUST implement SRTP [RFC3711]. Implementations MUST
implement DTLS [RFC6347] and DTLS-SRTP [RFC5763][RFC5764] for SRTP
keying. Implementations MUST implement [RFC8261]."

Thank you for using the citation style that doesn't assume everyone has memorized the RFC numbers. But why not do the same for 8261?

(nit) - First paragraph: Can there be a citation for the W3C API spec?

 (My bad, the draft is correct. Comment removed.) §7.1.4, SDP example:

(nit) §11: The first sentence is a fragment.

§13.1 (normative references)

(nit) - There's a reference to RFC 5234, but it is not cited in the text.

- Is there a reason to reference 5246 rather than 8446, which obsoleted it?

- seems like the jsep reference should be normative.
Spencer Dawkins Former IESG member
Yes (for -18) Not sent

Alexey Melnikov Former IESG member
(was Discuss) No Objection
No Objection (2019-07-12 for -19) Sent for earlier
Thank you for addressing my DISCUSS.
Alvaro Retana Former IESG member
No Objection
No Objection (for -18) Not sent

Benjamin Kaduk Former IESG member
(was Discuss) No Objection
No Objection (2019-07-12 for -19) Sent
Thanks for addressing my Discuss points!
I'll leave the original Comment section below, as I note that at least one
issue remains (I spot-checked the SDP Offer/Answer reference, that still
points to RFC 6454, which is "the Web Origin Concept".  I didn't make any
attempt to trim points that did get addressed.

Section 3

(My comment about TCB and the other browser from the companion document
is probably relevant here, too.)

Section 4.1

   This message is sent to the signaling server, e.g., by XMLHttpRequest
   [XmlHttpRequest] or by WebSockets [RFC6455], preferably over TLS
   [RFC5246].  The signaling server processes the message from Alice's

This is the optimistic "best security" case, and we already say we're
talking to the signaling server over HTTPS, so it should be safe to just
say "over TLS" and drop the "preferably".  (Also, s/5246/8446.)

   call and to Alice's identity.  In this case, Alice has provided an
   identity assertion and so Bob's browser contacts Alice's identity
   provider (again, this is done in a generic way so the browser has no
   specific knowledge of the IdP) to verify the assertion.  This allows
   the browser to display a trusted element in the browser chrome
   indicating that a call is coming in from Alice.  [...]

I think I'm confused.  We're displaying trusted browser chrome based on
an assertion from some IdP that we have no relationship with and no
reason to trust?

Section 4.3

   Once the ICE checks have completed [more specifically, once some ICE
   checks have completed], [...]

nit: that's not really more specific.  Maybe "Once the requisite ICE
checks have completed"?

Section 5

I see that the 4566 <base64> includes the pad characters, though
sometimes we will mention explicitly whether they are or are not

   Note that long lines in the example are folded to meet the column
   width constraints of this document; the backslash ("\") at the end of
   a line and the carriage return that follows shall be ignored.

leading whitespace, too, right?

Section 5.1

   This section defines the SDP Offer/Answer [RFC6454] considerations
   for the SDP 'identity' attribute.

6454 is "the Web Origin Concept"; presumably this is supposed to be
4566 (or 3264?).

Section 5.1.3

I feel like we need some text here about the (non?)trustworthiness of
the IdP.

Section 5.1.4

I'm a bit confused at what's going on here.  Is "MAY send the same"
supposed to prevent changing it?  If I don't send it, does that identity
continue to apply to the existing DTLS connections but not any new ones
generated by the session modification?  Am I allowed to send a different

                  Note that [I-D.ietf-rtcweb-jsep], Section 5.2.1
   requires that each media section use the same set of fingerprints for
   every media section.

nit: is this "each media section"/"every media section" redundant?

Section 6.1

   Also note that the security architecture depends on the keying
   material not being available to move between origins.  But, it is
   assumed that the identity assertion can be passed to anyone that the
   page cares to.

There may be some (weak) privacy considerations if this is literally
anyone, since it would allow some observers (with weird
abilities/restrictions) to associate "real" identities with keys in a
way that they couldn't otherwise do.

Section 6.2

   Because HTTP origins cannot be securely established against network
   attackers, implementations MUST NOT allow the setting of permanent
   access permissions for HTTP origins.  Implementations MUST refuse all
   permissions grants for HTTP origins.

Just to check: this last sentence applies for one-time requets, too?

                                           The semantics of this request
   are that the media stream from the camera and microphone will only be
   routed through a connection which has been cryptographically verified
   (through the IdP mechanism or an X.509 certificate in the DTLS-SRTP
   handshake) as being associated with the stated identity.  [...]

Does this need to be an exhaustive list or can we leave it open-ended?
Also, it may be appropriate to mention some concept of "IdP trusted to
authenticate the stated identity".

   API Requirement:  The API MUST provide a mechanism for the requesting
      JS to relinquish the ability to see or modify the media (e.g., via
      MediaStream.record()).  [...]

Do we need to say anything about that state transition being visible to
the peer, here?

   UI Requirement:  If the UI indication of camera/microphone use are
      camera and microphone input when the indication is hidden.  [Note:
      this may not be necessary in systems that are non-windows-based
      but that have good notifications support, such as phones.]

nit: s/windows/window/?

   Clients MAY permit the formation of data channels without any direct
   user approval.  Because sites can always tunnel data through the
   server, further restrictions on the data channel do not provide any
   additional security.  (though see Section 6.3 for a related issue).

Is there anything to say about why clients might not opt to do so (and
what such approval might look like)?

(My comments about "verified user" including the IdP in some way will
apply here as well.)

Section 6.3

   While continuing consent is required, the ICE [RFC8445]; Section 10
   keepalives use STUN Binding Indications which are one-way and
   therefore not sufficient.  The current WG consensus is to use ICE

Is the "the current WG consensus" language going to age well?

   Binding Requests for continuing consent freshness.  ICE already
   requires that implementations respond to such requests, so this
   approach is maximally compatible.  A separate document will profile
   the ICE timers to be used; see [RFC7675].

Is there a WIP draft for this separate document?

Section 6.4

   API Requirement:  The API MUST provide a mechanism to allow the JS to
      suppress ICE negotiation (though perhaps to allow candidate
      gathering) until the user has decided to answer the call [note:
      determining when the call has been answered is a question for the
      JS.]  This enables a user to prevent a peer from learning their IP
      address if they elect not to answer a call and also from learning
      whether the user is online.

nit: maybe make it more clear that this only applies for incoming calls?

Section 6.5

                                                           Media traffic
   MUST NOT be sent over plain (unencrypted) RTP or RTCP; that is,
   implementations MUST NOT negotiate cipher suites with NULL encryption
   modes.  [...]

It's not clear to me that the "that is" reflects a strict equivalence;
would "in particular" be more appropriate?
(Also, "cipher suite" is a DTLS term, but do we want to disambiguate

[obligatory "Perfect Forward Secrecy" vs. "Forward Secrecy" note]

   Implementations MUST NOT implement DTLS renegotiation and MUST reject
   it with a "no_renegotiation" alert if offered.

"MUST NOT implement" isn't really something that 2119 language can
enforce; "MUST NOT use" is the best we can get.

   Endpoints MUST NOT implement TLS False Start [RFC7918].

(7918 doesn't claim to be applicable to DTLS anyway)

   API Requirement:  Unless the user specifically configures an external
      key pair, different key pairs MUST be used for each origin.  (This
      avoids creating a super-cookie.)

nit: might be appropriate to note why we care about a super-cookie (and
what it is)

      *  The "security characteristics" MUST indicate the cryptographic
         algorithms in use (For example: "AES-CBC".)  However, if Null
         ciphers are used, that MUST be presented to the user at the
         top-level UI.

I'm not sure I see anywhere that we allow the usage of null ciphers.

Section 7

   Recently, a number of Web-based identity technologies (OAuth,
   Facebook Connect etc.) have been developed.  While the details vary,
   what these technologies share is that they have a Web-based (i.e.,
   HTTP/HTTPS) identity provider which attests to your identity.  For
   instance, if I have an account at, I could use the identity provider to prove to others that I was  [...]

I agree with Alissa that the first person is not needed here.

Section 7.1

   Third-Party:   IdPs which don't have control of their section of the
      identity space.  Probably the best-known example of a third-party
      identity provider is SSL/TLS certificates, where there are a large
      number of CAs all of whom can attest to any domain name.

This probably needs some qualifier, given recent developments with CAA
and similar mechanisms.

   If an AP is authenticating via an authoritative IdP, then the RP does
   not need to explicitly configure trust in the IdP at all.  The

The RP still needs to establish somehow that the IdP in use is in fact
an authoritative IdP, though!

Section 7.2

   In order to provide security without trusting the calling site, the
   PeerConnection component of the browser must interact directly with
   the IdP.  The details of the mechanism are described in the W3C API
   specification, but the general idea is that the PeerConnection

A reference to that W3C API spec might be handy.

Section 7.3

   There are two parts to this work:

   o  The precise information from the signaling message that must be
      cryptographically bound to the user's identity and a mechanism for
      carrying assertions in JSEP messages.  This is specified in
      Section 7.4.

nit: the grammar is a bit weird here, as the "information from the
signaling message" isn't really a part of this work, but rather the
specification for what information that is.

Section 7.4

The indentation of the line with "}, {" is a bit confusing.

   This object is encoded in a JSON [RFC8259] string for passing to the
   IdP.  The identity assertion returned by the IdP, which is encoded in

I'm a little confused what this "encoded in a JSON string" is supposed
to mean.

   This structure does not need to be interpreted by the IdP or the IdP
   proxy.  It is consumed solely by the RP's browser.  The IdP merely
   treats it as an opaque value to be attested to.  Thus, new parameters
   can be added to the assertion without modifying the IdP.

The IdP probably wants to know enough about its structure to not turn
into a signing oracle for other protocols, though.

Section 7.4.1

(RFC 8259 JSON inherently is UTF-8, so maybe we don't need to mention

It's a little surprising to see sha-1 fingerprint in use (since
"examples are recommendations"), though I didn't find anything that
would actually formally deprecate such usage yet.

   Note that long lines in the example are folded to meet the column
   width constraints of this document; the backslash ("\") at the end of
   a line and the carriage return that follows shall be ignored.

leading whitespace, too, right?

Section 7.5.2

(Still need to say how it's know than authoritative assertions are in
fact authoritative for what they claim.)

Section 7.6

   The input to identity assertion is the JSON-encoded object described
   in Section 7.4 that contains the set of certificate fingerprints the
   browser intends to use.  This string is treated as opaque from the
   perspective of the IdP.

(IdP still doesn't want to become a signing oracle.)

   For use in signaling, the assertion is serialized into JSON,
   Base64-encoded [RFC4648], and used as the value of the "identity"

nit: it's unclear that "serialized into JSON" adds any value, since the
thing is defined to be a JSON object.

Section 7.7

I think that the framing of HTTP Basic (7617) here is not great.
RFC 7235 might be a better link for HTTP Authentication in general, and
of course there are mechanisms that don't include sending the password
in plaintext, like SCRAM (RFC7804).

Section 8

   The IdP proxy verifies the assertion.  Depending on the identity
   protocol, the proxy might contact the IdP server or other servers.
   For instance, an OAuth-based protocol will likely require using the
   IdP as an oracle, whereas with a signature-based scheme might be able
   to verify the assertion without contacting the IdP, provided that it
   has cached the relevant public key.

IMPORTANT: Do we need a freshness property for the assertion?  Some of
these schemes do not provide freshness.

   Figure 6 shows an example response formatted as JSON for illustrative

(Doesn't the W3C API spec need to say how the response is formatted?  Is
the JSON formatting actually "illustrative" then, or is this just an
example output?)

Section 8.1

   2.  If the domain portion of the string is not equal to the domain
       name of the IdP proxy, then the PeerConnection object MUST reject
       the assertion unless:

Reading closely, I think this is supposed to be "unless either", but
it's easy to assume it should be read as "unless both", so I think
clarification is in order.

   Any "@" or "%" characters in the "user" portion of the identity MUST
   be escaped according to the "Percent-Encoding" rules defined in

We just said in the first paragraph that "user" has "any character
except '@'", so this is a bit redundant.

Section 9.1

            Users who wish to assure themselves of security against a
   malicious identity provider can only do so by verifying peer
   credentials directly, e.g., by checking the peer's fingerprint
   against a value delivered out of band.

I suppose an "untrustworthy" IdP is basically a malicious one, though
there are perhaps some subtleties that could be distinguished here.

   In order to protect against malicious content JavaScript, that
   JavaScript MUST NOT be allowed to have direct access to---or perform
   computations with---DTLS keys.  For instance, if content JS were able
   to compute digital signatures, then it would be possible for content
   JS to get an identity assertion for a browser's generated key and
   then use that assertion plus a signature by the key to authenticate a
   call protected under an ephemeral Diffie-Hellman (DH) key controlled
   by the content JS, thus violating the security guarantees otherwise
   provided by the IdP mechanism.

I don't think I fully understand the scenario described in this last
sentence.  Is "compute digital signatures" supposed to be with some
specific secret key, and/or is "a browser's generated key" one that is
covered under the fingerprint in the IdP assertion?

Section 9.2

   Otherwise, the other side will learn linkable information.

nit: "linkable information that would allow them to correlate the
browser across multiple calls".

Section 9.3

   Consider the case of a call center which accepts calls via WebRTC.
   An attacker proxies the call center's front-end and arranges for
   multiple clients to initiate calls to the call center.  Note that
   this requires user consent in many cases but because the data channel
   does not need consent, he can use that directly.

I think I'm missing a step here.  How is the attacker using the data
channel directly when the point is to get the multiple browsers to send
the data on the data channel?

              Muxing multiple media flows over a single transport makes
   it harder to individually suppress a single flow by denying ICE
   keepalives.  Either media-level (RTCP) mechanisms must be used or the
   implementation must deny responses entirely, thus terminating the

nit: "must be used to suppress the misbehaving flow", I think.

Section 9.4.3

   The "origin" field of the signature request can be used to check that
   the user has agreed to disclose their identity to the calling site;
   because it is supplied by the PeerConnection it can be trusted to be

I don't see an "origin" field in the signature request; is this supposed
to be the "domain"?


nit: it might be friendlier to the reader to prefix this with "When
popup blocking is in use, ".

Section 13.2

It's perhaps debatable that JSEP is only an informative reference.
Deborah Brungard Former IESG member
No Objection
No Objection (for -18) Not sent

Ignas Bagdonas Former IESG member
No Objection
No Objection (for -18) Not sent

Martin Vigoureux Former IESG member
No Objection
No Objection (for -18) Not sent

Mirja Kühlewind Former IESG member
No Objection
No Objection (2019-02-28 for -18) Sent
1) This is related to my discuss on draft-ietf-rtcweb-security. I think I don't fully understand the split between those two documents, as section 4.2 seems to introduce a normative reference to draft-ietf-rtcweb-security:

  "As described in ([I-D.ietf-rtcweb-security]; Section 4.2) media
   consent verification is provided via ICE. "

However, given that section 6.3 actually normatively (re-)states the ICE requirements as well, I would maybe recommend to instead say:

  "As described in ([I-D.ietf-rtcweb-security]; Section 4.2) and stated in section 6.3 media
   consent verification is provided via ICE. "

and then move the reference to draft-ietf-rtcweb-security to informative.

2) I would have also expected some discussion in the security considerations sections about the risks to the user if the browser gets corrupted, as indicated by the trust model presented in sec 3.

3) In Sec 9.2: "Combined WebRTC/Tor
   implementations SHOULD arrange to route the media as well as the
   signaling through Tor. Currently this will produce very suboptimal
Maybe make these sentences a bit more general, e.g.
"Combined WebRTC/anonymity service
   implementations SHOULD arrange to route the media as well as the
   signaling through the anonymity network. Currently with e.g. Tor this will produce very suboptimal
Suresh Krishnan Former IESG member
No Objection
No Objection (for -18) Not sent

Eric Rescorla Former IESG member
Recuse (2019-03-06 for -18) Sent
I am an author