Ballot for draft-ietf-rtcweb-security
Yes
No Objection
Recuse
Note: This ballot was opened for revision 11 and is now closed.
I do not have strong views on the track, but if pressed, I lean towards PS.
PS seems like the appropriate status for this document given its role in the WebRTC document suite. = Section 4.1.4 = "The attacker forges the response apparently http://calling-service.example.com/ to inject JS to initiate a call to himself." --> This doesn't read correctly. = Section 4.2.4 = It seems like this section should reference draft-ietf-rtcweb-ip-handling.
I disagree that this should be informative. It does have sections that have informational content, but it also has sections that serve as security considerations for WebRTC as a whole. (nit) §4.2.1: Please expand ICE on first mention.
Thank you for this document. It made me more scared of using WebRTC, but I think it is Ok :-). The document seem to sometimes state problems without suggesting any solutions, but I don't have specific suggestions how to improve it. It does read a bit Informational at times, but it also contains some RFC 2119 language, so I think PS designation is Ok.
====== Previous DISCUSS section ======= I'd like to have a brief discussion about a few points, though it's not clear that any change to the document will be required (details in the COMMENT section for all of these): Mutually-verifiable "secure mode" seems to require that the peer's browser be included in the TCB, which is a bit hard to swallow. Are we comfortable wrapping that in alongside "we trust the peer to not be malicious"? It's not clear how much benefit we can get from *optional* third-party identity providers; won't the calling service have the ability to silently downgrade to their non-usage even if both calling peers support it? ============= COMMENT section ============= I mostly only have editorial comments, though there are a few that are more content-ful. Section 1 As with any Web application, the Web server can move logic between the server and JavaScript in the browser, but regardless of where the code is executing, it is ultimately under control of the server. The user can observe the javascript running the browser, though maybe this distinction is not necessary here. Section 3 Huang et al. [huang-w2sp] summarize the core browser security guarantee as: Users can safely visit arbitrary web sites and execute scripts provided by those sites. I note that the author of this document is listed as a coauthor on huang-w2sp; does the self-cite really add much authority to the summary of the guarantee? The use of ALL-CAPS to call out new terms feels a bit dated. Note that for non-HTTPS traffic, a network attacker is also a Web attacker, since it can inject traffic as if it were any non-HTTPS Web site. Thus, when analyzing HTTP connections, we must assume that traffic is going to the attacker. nit: I know this is a web-centric document, but the privileging of https as the only "secure" traffic reads a bit oddly to me; something like "note that in some cases, a network attacker is also a web attacker, since transport protocols that do not provide integrity protection allow the network to inject traffic as if they were any communications peer. TLS, and HTTPS in particular, prevent against these attacks, but when analyzing HTTP connections, we must assume that traffic is going to the attacker." (A thought experiment might be to consider whether wss:// traffic counts as "HTTPS traffic".) Section 3.1 It might be appropriate to provide some example references in place of "extensive research". Section 4.1 In either case, all the browser is able to do is verify and check authorization for whoever is controlling where the media goes. [...] nit: the wording here is a bit odd, since in case (1) you're verifying you're talking to A, but you still control where the media goes (in terms of A or not-A; A can of course then forward on the media further). 00000000000000000000000000000000000000000000000000000000000000000000By contrast, consent to send network traffic is about preventing the user's browser from being used to attack its local network. [...] nit: "local" is perhaps overly restricting, depending on interpretation Section 4.1.1 Maybe note that the "result" of the cross-site requests that is leaked is in the form of pixels and not structured data, but that does not change the information content. Section 4.1.3 Now that we have seen another use case, we can start to reason about nit: I'm confused by "another" here. While not suitable for all cases, this approach may be useful for some. If we consider the case of advertising, it's not particularly convenient to require the advertiser to instantiate an iframe on the hosting site just to get permission; a more convenient approach is to cryptographically tie the advertiser's certificate to the communication directly. We're This seems to be relying on the reader to have some background knowledge and make some leaps of reasoning that may not be reasonable to expect. Another case where media-level cryptographic identity makes sense is when a user really does not trust the calling site. For instance, I might be worried that the calling service will attempt to bug my computer, but I also want to be able to conveniently call my friends. This is especially challenging because if the site (and/or its javascript) is in the path for binding a cryptographic identity to a real-world identity, then a malicious site can still get whatever keys it wants authorized. Section 4.1.4 3. The attacker forges the response apparently http://calling- service.example.com/ to inject JS to initiate a call to himself. seem to be missing a word or two here. which contain untrusted content. If a page from a given origin ever loads JavaScript from an attacker, then it is possible for that attacker to infect the browser's notion of that origin semi- permanently. nit: "If any page" is more emphatic, I think. Section 4.2 Do we want any discussion of the risks when metered bandwidth (pay per byte) is in use? Section 4.2.1 There's probably some room to tighten up the verbiage here; e.g., "the site initiating ICE" is referring to a website that is using a browser API to request ICE against some remote peer (right?). And "ICE keepalives are indications" is using Indication as the technical term for a message that doesn't get an ACK response, not in its common English usage. Section 4.2.2 A one- or two-sentence summary of the impact of misinterpretation attacks is probably in order, instead of making us follow the reference (which isn't a section reference). Where TCP is used the risk is substantial due to the potential presence of transparent proxies and therefore if TCP is to be used, then WebSockets style masking MUST be employed. nit: "employed" to obfuscate what, exactly? Section 4.2.3 refuses to send other traffic until that message has been replied to. The message/reply pair must be generated in such a way that an attacker who controls the Web application cannot forge them, generally by having the message contain some secret value that must be incorporated (e.g., echoed, hashed into, etc.). Non-ICE nit: "incorporated" into what? I think I'm a little confused about which legacy actors we're talking about. Are we still considering the broader situation a webserver-mediated interaction between two browsers or brower-adjacent applications? (E.g., a WebRTC client calling some other sort of video chat system?) leaves. The appropriate technologies here are fairly similar to those for initial consent, though are perhaps weaker since the threats is less severe. nit: "threat is" Section 4.2.4 Note that as soon as the callee sends their ICE candidates, the caller learns the callee's IP addresses. The callee's server reflexive address reveals a lot of information about the callee's location. In order to avoid tracking, implementations may wish to suppress the start of ICE negotiation until the callee has answered. Is "answered" supposed to be some interaction with the controlling site? In ordinary operation, the site learns the browser's IP address, though it may be hidden via mechanisms like Tor [http://www.torproject.org] or a VPN. However, because sites can cause the browser to provide IP addresses, this provides a mechanism for sites to learn about the user's network environment even if the user is behind a VPN that masks their IP address. [...] Some rewording for clarity is probably in order; "ordinary operation" is of a website without WebRTC; "sites can cause the browser to provide IP addresses" is when the site uses the browser API to request ICE initiation; etc. Section 4.3.1 [Obligatory note about "Forward Secrecy" vs. "Perfect Forward Secrecy"] to subsequent compromise. It is this consideration that makes an automatic, public key-based key exchange mechanism imperative for WebRTC (this is a good idea for any communications security system) and this mechanism SHOULD provide perfect forward secrecy (PFS). The signaling channel/calling service can be used to authenticate this mechanism. To be clear, the authentication that the calling service provides is a binding between identity and the public keys that are input to the key exchange mechanism? Section 4.3.2.1 Even if the user actually checks the other side's name (which all available evidence indicates is unlikely), this would require (a) the browser to trusted UI to provide the name and (b) the user to not be fooled by similar appearing names. nit: "browser to use trusted UI" Section 4.3.2.3 It's not clear that third-party identity providers actually provide downgrade-resistance -- can't the site mediating the calls just decline to acknowledge that a third-party identity is/was available for the peer? Section 4.3.2.4 I.e., I must be able to verify that the person I am calling has engaged a secure media mode (see Section 4.3.3). In order to achieve this it will be necessary to cryptographically bind an indication of the local media access policy into the cryptographic authentication procedures detailed in the previous sections. This seems to require extending the TCB from just the local browser to the remote browser as well, which is ... a stretch. (Also, do we really need the first person?) Section 9.2 The coordinates for [OpenID] don't seem quite right.
I support PS. As the shepherd writeup says, this document will be the reference point for other work. To me, that says it is more than "informational".
Based on feedback provided by other ADs, I'm clearing my discuss that this should be informational. I would have also expected some discussion about the risks to the user if the browser gets corrupted, as indicated by the trust model presented in draft-ietf-rtcweb-security-arch. Alternatively, this document could go in the appendix of draft-ietf-rtcweb-security-arch instead.
I am an author