Security Considerations for WebRTC

Summary: Has enough positions to pass.

(Ben Campbell) Yes

Comment (2019-03-04)
I disagree that this should be informative. It does have sections that have informational content, but it also has sections that serve as security considerations for WebRTC as a whole.

(nit) §4.2.1: Please expand ICE on first mention.

Alissa Cooper Yes

Comment (2019-03-04)
PS seems like the appropriate status for this document given its role in the WebRTC document suite.

= Section 4.1.4 =

"The attacker forges the response apparently to inject JS to initiate a call to himself." --> This doesn't read correctly.

= Section 4.2.4 =

It seems like this section should reference draft-ietf-rtcweb-ip-handling.

(Spencer Dawkins) Yes

Adam Roach Yes

Ignas Bagdonas No Objection

Deborah Brungard No Objection

Comment (2019-03-05)
I support PS. As the shepherd writeup says, this document will be the reference point for other work. To me, that says it is more than "informational".

Benjamin Kaduk (was Discuss) No Objection

Comment (2019-03-06)
====== Previous DISCUSS section =======
I'd like to have a brief discussion about a few points, though it's not clear that any change
to the document will be required (details in the COMMENT section for all of these):

Mutually-verifiable "secure mode" seems to require that the peer's browser be included in
the TCB, which is a bit hard to swallow.  Are we comfortable wrapping that in alongside
"we trust the peer to not be malicious"?

It's not clear how much benefit we can get from *optional* third-party identity providers;
won't the calling service have the ability to silently downgrade to their non-usage even if
both calling peers support it?

============= COMMENT section =============

I mostly only have editorial comments, though there are a few that are
more content-ful.

Section 1

   with any Web application, the Web server can move logic between the
   server and JavaScript in the browser, but regardless of where the
   code is executing, it is ultimately under control of the server.

The user can observe the javascript running the browser, though maybe
this distinction is not necessary here.

Section 3

                                                               Huang et
   al. [huang-w2sp] summarize the core browser security guarantee as:

      Users can safely visit arbitrary web sites and execute scripts
      provided by those sites.

I note that the author of this document is listed as a coauthor on
huang-w2sp; does the self-cite really add much authority to the
summary of the guarantee?

The use of ALL-CAPS to call out new terms feels a bit dated.

   that for non-HTTPS traffic, a network attacker is also a Web
   attacker, since it can inject traffic as if it were any non-HTTPS Web
   site.  Thus, when analyzing HTTP connections, we must assume that
   traffic is going to the attacker.

nit: I know this is a web-centric document, but the privileging of https
as the only "secure" traffic reads a bit oddly to me; something like
"note that in some cases, a network attacker is also a web attacker,
since transport protocols that do not provide integrity protection allow
the network to inject traffic as if they were any communications peer.
TLS, and HTTPS in particular, prevent against these attacks, but when
analyzing HTTP connections, we must assume that traffic is going to the
attacker."  (A thought experiment might be to consider whether wss://
traffic counts as "HTTPS traffic".)

Section 3.1

It might be appropriate to provide some example references in place of
"extensive research".

Section 4.1

                                                In either case, all the
   browser is able to do is verify and check authorization for whoever
   is controlling where the media goes.  [...]

nit: the wording here is a bit odd, since in case (1) you're verifying
you're talking to A, but you still control where the media goes (in
terms of A or not-A; A can of course then forward on the media further).

   contrast, consent to send network traffic is about preventing the
   user's browser from being used to attack its local network.  [...]

nit: "local" is perhaps overly restricting, depending on interpretation

Section 4.1.1

Maybe note that the "result" of the cross-site requests that is leaked
is in the form of pixels and not structured data, but that does not
change the information content.

Section 4.1.3

   Now that we have seen another use case, we can start to reason about

nit: I'm confused by "another" here.

                                              While not suitable for all
   cases, this approach may be useful for some.  If we consider the case
   of advertising, it's not particularly convenient to require the
   advertiser to instantiate an iframe on the hosting site just to get
   permission; a more convenient approach is to cryptographically tie
   the advertiser's certificate to the communication directly.  We're

This seems to be relying on the reader to have some background knowledge
and make some leaps of reasoning that may not be reasonable to expect.

   Another case where media-level cryptographic identity makes sense is
   when a user really does not trust the calling site.  For instance, I
   might be worried that the calling service will attempt to bug my
   computer, but I also want to be able to conveniently call my friends.

This is especially challenging because if the site (and/or its
javascript) is in the path for binding a cryptographic identity to a
real-world identity, then a malicious site can still get whatever keys
it wants authorized.

Section 4.1.4

   3.  The attacker forges the response apparently http://calling- to inject JS to initiate a call to himself.

seem to be missing a word or two here.

   which contain untrusted content.  If a page from a given origin ever
   loads JavaScript from an attacker, then it is possible for that
   attacker to infect the browser's notion of that origin semi-

nit: "If any page" is more emphatic, I think.

Section 4.2

Do we want any discussion of the risks when metered bandwidth (pay per
byte) is in use?

Section 4.2.1

There's probably some room to tighten up the verbiage here; e.g., "the
site initiating ICE" is referring to a website that is using a browser
API to request ICE against some remote peer (right?).  And "ICE
keepalives are indications" is using Indication as the technical term
for a message that doesn't get an ACK response, not in its common
English usage.

Section 4.2.2

A one- or two-sentence summary of the impact of misinterpretation
attacks is probably in order, instead of making us follow the reference
(which isn't a section reference).

   Where TCP is used the risk is substantial due to the potential
   presence of transparent proxies and therefore if TCP is to be used,
   then WebSockets style masking MUST be employed.

nit: "employed" to obfuscate what, exactly?

Section 4.2.3

   refuses to send other traffic until that message has been replied to.
   The message/reply pair must be generated in such a way that an
   attacker who controls the Web application cannot forge them,
   generally by having the message contain some secret value that must
   be incorporated (e.g., echoed, hashed into, etc.).  Non-ICE

nit: "incorporated" into what?

I think I'm a little confused about which legacy actors we're talking
about.  Are we still considering the broader situation a
webserver-mediated interaction between two browsers or brower-adjacent
applications?  (E.g., a WebRTC client calling some other sort of video
chat system?)

   leaves.  The appropriate technologies here are fairly similar to
   those for initial consent, though are perhaps weaker since the
   threats is less severe.

nit: "threat is"

Section 4.2.4

   Note that as soon as the callee sends their ICE candidates, the
   caller learns the callee's IP addresses.  The callee's server
   reflexive address reveals a lot of information about the callee's
   location.  In order to avoid tracking, implementations may wish to
   suppress the start of ICE negotiation until the callee has answered.

Is "answered" supposed to be some interaction with the controlling site?

   In ordinary operation, the site learns the browser's IP address,
   though it may be hidden via mechanisms like Tor
   [] or a VPN.  However, because sites can
   cause the browser to provide IP addresses, this provides a mechanism
   for sites to learn about the user's network environment even if the
   user is behind a VPN that masks their IP address.  [...]

Some rewording for clarity is probably in order; "ordinary operation" is of
a website without WebRTC; "sites can cause the browser to provide IP
addresses" is when the site uses the browser API to request ICE
initiation; etc.

Section 4.3.1

[Obligatory note about "Forward Secrecy" vs. "Perfect Forward Secrecy"]

   to subsequent compromise.  It is this consideration that makes an
   automatic, public key-based key exchange mechanism imperative for
   WebRTC (this is a good idea for any communications security system)
   and this mechanism SHOULD provide perfect forward secrecy (PFS).  The
   signaling channel/calling service can be used to authenticate this

To be clear, the authentication that the calling service provides is a
binding between identity and the public keys that are input to the key
exchange mechanism?


                                                       Even if the user
   actually checks the other side's name (which all available evidence
   indicates is unlikely), this would require (a) the browser to trusted
   UI to provide the name and (b) the user to not be fooled by similar
   appearing names.

nit: "browser to use trusted UI"


It's not clear that third-party identity providers actually provide
downgrade-resistance -- can't the site mediating the calls just decline
to acknowledge that a third-party identity is/was available for the


                                    I.e., I must be able to verify that
   the person I am calling has engaged a secure media mode (see
   Section 4.3.3).  In order to achieve this it will be necessary to
   cryptographically bind an indication of the local media access policy
   into the cryptographic authentication procedures detailed in the
   previous sections.

This seems to require extending the TCB from just the local browser to
the remote browser as well, which is ... a stretch.
(Also, do we really need the first person?)

Section 9.2

The coordinates for [OpenID] don't seem quite right.

Suresh Krishnan No Objection

Warren Kumari No Objection

Comment (2019-03-05)
I do not have strong views on the track, but if pressed, I lean towards PS.

Mirja Kühlewind (was Discuss) No Objection

Comment (2019-03-07)
Based on feedback provided by other ADs, I'm clearing my discuss that this should be informational.

I would have also expected some discussion about the risks to the user if the browser gets corrupted, as indicated by the trust model presented in draft-ietf-rtcweb-security-arch. Alternatively, this document could go in the appendix of draft-ietf-rtcweb-security-arch instead.

Alexey Melnikov No Objection

Comment (2019-03-03)
Thank you for this document. It made me more scared of using WebRTC, but I think it is Ok :-).

The document seem to sometimes state problems without suggesting any solutions, but I don't have specific suggestions how to improve it. It does read a bit Informational at times, but it also contains some RFC 2119 language, so I think PS designation is Ok.

Alvaro Retana No Objection

Martin Vigoureux No Objection

(Eric Rescorla) Recuse

Comment (2019-03-06)
I am an author

Roman Danyliw No Record

Barry Leiba No Record

Éric Vyncke No Record

Magnus Westerlund No Record