-
Notifications
You must be signed in to change notification settings - Fork 78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce 'webrtc-src'. #287
Conversation
The 'webrtc-src' directive is a proposal for handling WebRTC connections. This patch isn't exactly finished, but it should at least give a concrete proposal that we can discuss in #92.
@annevk, @alvestrand, @ckerschb, @martinthomson, @murillo128, @andypaicu, WDYT? |
<pre> | ||
directive-name = "webrtc-src" | ||
directive-value = <a grammar>serialized-source-list</a> | ||
</pre> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need some more language here explaining what "servers or peers to which connections may be established" means.
Suggested text off the top of my head:
The following situations are to be checked according to this directive:
- A host URL occurs in the list of RTCIceServers of an RTCConfiguration when a PeerConnection is created. In this case, the PeerConnection creation will fail.
- A host URL occurs in the list of RTCIceServers of an RTCConfiguration when a PeerConnection's setConfiguration method is called. In this case, setting the configuration will fail.
- An address occurs in the ip, protocol and port fields of an RTCIceCandidate created from SetRemoteDescription or AddIceCandidate. In this case, the call will be rejected.
And perhaps a note of caution, something like: "Due to the problem of listing all possible communication partners for a WebRTC application, the "*" value is likely to be the most useful value to set as the value of the "webrtc-src" directive".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note - in candidates, addresses are usually expressed as address literals (1.2.3.4 or 202b:47:9f::334) - matching a hostname against an address is a complex task, and we probably shouldn't require that it happens.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note - in candidates, addresses are usually expressed as address literals (
1.2.3.4
or202b:47:9f::334
) - matching a hostname against an address is a complex task, and we probably shouldn't require that it happens.
Hrm. The matching algorithm in https://w3c.github.io/webappsec-csp/#match-hosts actually specifically excludes IP addresses. We might need to rethink that if it's the way these things are typically specified in WebRTC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the SetRemoteDescription method fail if there are any non-whitelisted peers or filter them out and allow only the valid ones? I am thinking on a situation where you allow a TURN server ip address and want all the data go via it, but on the sdp you will have all the non-turn candidates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the SetRemoteDescription method fail if there are any non-whitelisted peers or filter them out and allow only the valid ones? I am thinking on a situation where you allow a TURN server ip address and want all the data go via it, but on the sdp you will have all the non-turn candidates.
Ideally, the WebRTC spec would answer those questions. I'd like CSP to give that feature hooks that express a page's policy decisions, but I just don't know WebRTC's API surface or developer expectations to make helpful suggestions about when you'd want to reject promises, or what kinds of points you'd want to guard with these hooks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would say that it is ok to require FQDN candidates when using something different than webrtc-src:*
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer that this only define * in the first patch.
It is very common for WebRTC hosts to be specified via address literals to the point that this is the dominant method. Even for STUN and TURN. For this to fail on that make it considerably less useful.
Also, hosts are typically not authenticated by their name. That makes names a little less valuable.
If the intent is to prevent exfiltration, we can't allow application control over connections at all. CSP intentionally avoids address literals so that DNS queries don't leak information, but here we need a new strategy. It's possible that we want to govern STUN and TURN differently from candidates. Using serialized-source-list
seems fine for that purpose.
But using serialized-source-list
might prevent us from developing a sensible strategy for handling candidates in a sensible fashion.
Can we reserve values other than *
and leave this unspecified until we sort this out in more detail?
To the question of what setRemoteDescription
does, I think that there is a simple answer: nothing. The same thing we do if we get given SDP with bogus candidates now, or SDP with no candidates. Instead, we would have all connectivity checks toward that destination fail without even being attempted. Browsers might avoid putting those candidates in check lists to avoid gaps, but that would be up to them (the important thing is to not send the packet).
If WebRTC is disabled entirely (i.e., webrtc-src: none
or similar on the fallback path), then we might do something to indicate that sooner. That would be an amenity feature though, the security check should be when STUN messages go out.
How can we possibly get away with having |
@@ -1370,7 +1374,7 @@ <h4 id="should-block-navigation-response" algorithm> | |||
in |target| be blocked by Content Security Policy? | |||
</h4> | |||
|
|||
Given a <a for="/">request</a> (|navigation request|),, a string (|type|, either | |||
Given a <a for="/">request</a> (|navigation request|), a string (|type|, either |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unrelated change?
Given a page with the following Content Security Policy: | ||
|
||
<pre> | ||
Content-Security-Policy: <a>webrtc-src</a> example.com |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing colon.
<pre> | ||
directive-name = "webrtc-src" | ||
directive-value = <a grammar>serialized-source-list</a> | ||
</pre> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer that this only define * in the first patch.
It is very common for WebRTC hosts to be specified via address literals to the point that this is the dominant method. Even for STUN and TURN. For this to fail on that make it considerably less useful.
Also, hosts are typically not authenticated by their name. That makes names a little less valuable.
If the intent is to prevent exfiltration, we can't allow application control over connections at all. CSP intentionally avoids address literals so that DNS queries don't leak information, but here we need a new strategy. It's possible that we want to govern STUN and TURN differently from candidates. Using serialized-source-list
seems fine for that purpose.
But using serialized-source-list
might prevent us from developing a sensible strategy for handling candidates in a sensible fashion.
Can we reserve values other than *
and leave this unspecified until we sort this out in more detail?
To the question of what setRemoteDescription
does, I think that there is a simple answer: nothing. The same thing we do if we get given SDP with bogus candidates now, or SDP with no candidates. Instead, we would have all connectivity checks toward that destination fail without even being attempted. Browsers might avoid putting those candidates in check lists to avoid gaps, but that would be up to them (the important thing is to not send the packet).
If WebRTC is disabled entirely (i.e., webrtc-src: none
or similar on the fallback path), then we might do something to indicate that sooner. That would be an amenity feature though, the security check should be when STUN messages go out.
With regard to IP matching: In peer-to-peer WebRTC scenarios, candidates are discovered by inspecting the client's local interfaces. There is no reasonable mechanism for finding a FQDN that matches the IP address. (Reverse DNS lookup is not reasonable for this case.) For TURN servers, hostnames could be made to work. |
Candidates gathered by the browser won't work (either local, reflexive or server reflexive), but it is valid for an SDP generated by a gateway to add their FQDN instead of their ip address on the candidate. It won't cover the case in which you want to whitelist TURN servers tough, which, as you say, would require to resolve each host entry before doing the match. |
@alvestrand, doing a DNS lookup would allow the site to exfiltrate data. We can't allow the network to be triggered using site-provided inputs. @murillo128 is right that there are some cases where an IP address could work and it's not the case that we have the same constraints as other CSP directives. |
@martinthomson not sure if we are talking about the same. Let me explain myself with an example: Imagine that you have a conference-only scenario (i.e. no P2P) when all your media is going via your sfu. It could make sense to enable CSP to prevent any data going in/out from your service. You could add
What this scenario will not allow would be P2P calls with media going via the TURN server, as they remote candidates would have the turn server IP address and not the FQDN. |
@martinthomson I think we agree. The mechanism I was thinking of is too fragile and too complex.
but when A records can change between the loading of the page and the checking of a candidate, this is just Not Worth It - let's KISS. |
I need to ask: Does it actually make sense to provide anything more than an on/off switch for WebRTC? We not only have the problem that WebRTC leaks data to STUN/TURN servers. If WebRTC is turned on, it can leak data to arbitrary hosts via well-crafted ICE candidates even without any servers. So, if I use In case your answer would be Yes, I still think this is useful to the question above, then FWIW I also agree with @michaelficarra that inheriting from |
after speaking with a few teams that use webrtc within Microsoft, the consensus is this should just be a toggle for webrtc. You could trivially use a proxy to get around any kind of host restriction, not to mention the issues brought up on the most recent call that the current resolution doesn't currently cover the more commonly used (for webrtc) IP addresses. |
If we inherit from connect-src: and say that we only check for the single value '*', would that give the right level of granularity? What I want to have as properties is:
Separating the first 2 cases seems to indicate that we should be affected by connect-src; separating the last 2 cases seems to indicate that we need a webrtc-src: directive of some sort. I think just having webrtc-src: * or webrtc-allowed: yes is enough complexity for the benefit it gives, it's the interaction with connect-src / default-src that needs to be clear. |
I'm not a fan of a toggle for WebRTC in a page - namely it would be very problematic for people who are:
Basically, what would be beneficial for these stakeholders is the ability to indicate in the CSP which scripts are allowed to initiate WebRTC requests or not. Simply blacklisting WebRTC will make it unusable for them and they will benefit greatly from finer grained filtering. There is also adblockers who are a potential stakeholder of this feature and probably want/need fine grained control over what WebRTC to block. For example:
Alternatively:
Alternatively:
I also really want WebRTC blocking in the CSP to be a thing, it's something that could help large websites that utilize WebRTC and certainly those who don't who are fearful about third party scripts. I'd love to help in the spec process (I have a little experience with whatwg, but I haven't been involved in a w3c spec yet - at least not in the last few years). How can I get involved? @mikewest |
@benjamingr the main problem is that we cannot really distinguish between the different scripts. They all operate in the same global scope. Trying to assign blame to individual scripts gets complicated very quickly and there's basically no infrastructure for that (and no interest in solving it as far as I know). |
Thanks for clarifying, I realize this might be a bigger issue to solve - given WebRTC can always craft ICE candidates to bypass any non-global restriction and a global restriction pretty much ignores concerns of sites using WebRTC. If my site is not using WebRTC this might be useful for me, but if my site is using WebRTC then a blanket "block all" does not help improve the security guarantees of my site at all. As more sites will utilize P2P delivery in the future (in one way or another, the trend we're seeing) to benefit users in geographies with bad networking conditions (and users in general) this effectively means we're giving them a choice between security and performance (which is never a great thing to do). @lgrahl do you think there is a small(ish) set of changes that can be done to the RTCPeerConnection spec that might help us put a more reasonable limitation? |
@benjamingr I'm not saying it can't be done but everything my mind was able to come up with so far was overly complicated and I discarded it immediately. Token solutions for example are hard to get right and pretty much prevent any caching. |
@benjamingr I think the way you establish trust boundaries between scripts is to isolate the scripts you don't trust (the third party ones) inside an iframe. At the iframe boundary you can add arguments that stript the iframe of various capabilities, such as the ability to use webrtc. |
My conclusion from the discussion is that dealing with hostnames is too much cost and complexity for too little gain in actual benefit, and that the pure yes/no semantics is needed and needs to get published. |
In full honesty - IMO this is a bad technical solution that does not help the primary use case at all. I feel like you're choosing the easy solution over the correct one. |
What is the correct solution? |
@annevk something that would actually address WebRTC blocking use cases and isn't a kill-switch. I don't feel like the options for addressing that use case have been exhausted at all. Namely, being merged as is means everyone who wants to use WebRTC data channels for their own benefit on their own website would be unable to block untrusted third parties from opening WebRTC connections. For example:
Now to clarify this is a real use case, we happen to have a subset of our customers (>50) with a subset of our concurrent RTCPeerConnections (>100K, probably >1M) who would use this capability. I can probably easily reach out to smaller companies who work with WebRTC CDNs and get the number of interested parties to 100 (and this is websites delivering > TBs a month with WebRTC). This means we either don't have the capability at all (no WebRTC delivery) which is expensive and slow, or we have the capability and untrusted third parties can utilize it. |
We went through that scenario in #287 (comment) and down. The token-based solution @lgrahl mentions is reasonable, but also comes at quite a bit of additional complexity. It seems that such a solution could be persued on top of a simple on-off switch advocated for here. |
I feel like we haven't exhausted the options, I'm not very experienced with the spec and am unlikely to be able to contribute a more complete fix like @lgrahl or yourself. It is possible that this solution could pursued on top of a simple on-off switch, it's also possible it can't be or that the syntax will end up looking quirky with a correct solution because of the on-off switch. I really can't tell because we haven't had a concrete proposal for a "correct" solution yet - I'm only objecting to adding an API we know doesn't work from the get go. I don't have a lot of experience with web APIs but the ones I've participated in or observed had a very high standard before adding an API in. Typically use cases were discussed, consumers were talked with and edge cases were sorted out. Fetch even waited for two years for other platform APIs. I'd expect the same level of seriousness to be exercised here with the WebRTC CSP API - this might be an expectation problem on my had in which case I apologize for the mess I'm causing. I'm totally on board with any efforts you might have to gather use cases or discuss them. I can probably talk with potential users of the API and gather use cases or examples of scenarios users expect to work. I also don't mind reaching out to the community at large and gathering use cases there too if you think it would help. |
We can do token-based on top of on-off, just look at scripts and stuff. There's precedent for that in CSP so no need to worry and no need to block things on. We need on-off as a baseline because otherwise WebRTC is an exfiltration channel in an otherwise fully controlled environment. That's the thing that started this discussion. |
I guess the best kind of solution would be to have more fine granular control on a per-script level (as proposed earlier by @benjamingr) but I have no idea how feasible that is as they all operate in the same global scope. As an idealist, I'm puzzled that there isn't more interest in solving this because it seems to me as if this is one of the major issues of web environments since third party dependencies such as ad and analystic mixin scripts are a thing. |
I'd like to get this discussion moving again. Re: the use case of isolating different scripts within a single page, probably something like SES is needed for that to be tractable, as otherwise the scripts all live in the same global scope and can trample one another, so a CSP directive is useless for that level of granularity. If SES becomes widely available in the browser, then access can be mediated for code running in a compartment by providing that compartment with an attenuated API, with no need for headers to be involved. Some level of separation is possible now using iframes, and I'd like to see webrtc API objcts be transferable via So I'm of the opinion that we should add a simple on/off switch as far as CSP is concerned. |
2 years later I think a kill-switch is probably much better than having nothing if there is a way to add support for finer grained control in the future. |
What is needed to push this forward? A draft of another change just adding the on/off switch? |
https://whatwg.org/working-mode#changes has a rough list of requirements. (While this is not a WHATWG document, the requirements are high-level enough that it doesn't really matter.) |
To my mind, the critical part of pushing this forward is to ensure that there is someone who will ensure that the resources are available to implement it in browsers (and deal with the inevitable fallout of the effects of its introduction). |
This should probably be closed in favor of #457 |
The 'webrtc-src' directive is a proposal for handling WebRTC connections. This patch
isn't exactly finished, but it should at least give a concrete proposal that we can
discuss in #92.
Preview | Diff