-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fixed, per origin, device ID creates tracking risk #607
Comments
other relaxations on deviceId permanence would also be helpful, such as:
Apologies if Im misreading the doc, but I can't find where some of the issues discussed with PING previously have be addressed they way that summary states. Thanks! |
#598 is tracking this approach. Note also #549 that mitigates partially (edited after @snyderp next message) this tracking. |
Helllo @youennf : Not sure I follow the above messages. #598 seems to have concluded that no change is needed, and I don't see anything in the spec at all about double keying anything. "the spec is not forbidding [the fix] but is not enforcing it either" is not a solution to the problem introduced by the spec :) Re: "FWIW, that is what is implemented in Safari." Great! If this is the correct solution to the privacy harm introduced by the spec, it needs to be in the spec though. Its not sufficient to clearly define the privacy harming behavior in the spec, but leave the mitigations vague and unspecified. I don't understanding how #549 mitigates tracking. Would you mind explaining further? |
I am supportive of fixing this issue as well. Ideally, there would be somewhere a spec describing how to do partitioning.
It helps mitigating the issue, it does not solve it. |
I agree, it would be good to have a single "storage partitioning, double-keying" definition / spec, but until that exists, we need to deal with the state of standards as is. If future spec comes and makes specifying double-keying in this spec unnecessary, thats great and a perfect reason to revise this spec and remove it. But in the meantime, it seems we agree that double-keying is necessary, no? I take your point re #549, but unfortunately, if the sorry state of privacy on the web has shows anything, its that we can't rely on sites to protect user privacy; it needs to be baked into clients and standards. So, the standard needs to protect the user from a site that would like to track, not just allow a site to protect the user. But either way, I'm not sure how #549 addresses what I imagine to be the typical scenario: two different sites have embedded the same origin that would like to track the user AND allow the user to use the microphone / web cam / etc.(example.org) Under the current spec, all embedded example.org's would be able to track the user, even if the user only wanted to use the webcam on embedded instance. Under a double key'ed solution that wouldn't be possible. But, a bigger question: why do you need the keying at all: why not just use something other than unique device ids. What would be lost in the first suggest in the issue? Worst case, the user would need to re-grant a permission when their media set up changes. Seems reasonable? :) |
The question is not really about permission. The web engine could also remember on its own the user selection and the website could opt-in with a deviceId like 'same-as-last-visit'. If we envision scenarios with multiple cameras/microphones, this probably does not work. Websites might also want to set up multiple audio routes, say for notification and regular media playback. Anther thing to bear in mind is that device ID values are not the only privacy threat. The number of cameras, microphones or output speakers can help identifying users as well. Safari mitigates this issue by exposing device IDs after getUserMedia is granted (through a prompt). So far, this seems to be web compatible, at least for microphones and cameras. |
Totally agree. I only meant to suggest a different way of having the ability for sites to say "i want x and y that I was allowed to access before, again", w/o consistent unique identifiers (which, AFAIK, totally unique in the WebAPI).
This is a great point! :) This should also be worked into the spec as default behavor. Privacy preserving w/o breaking expected use cases. Would you accept a PR? |
@Snyder Isn't it the opposite? Future partitioning of cookies and localStorage wouldn't make this unnecessary. Rather, it would seem like a prerequisite for the suggested mitigation to matter.
They would be able to do that today without this spec using: localStorage.fingerprint = Math.random().toString(36); So where is the issue?
If the lifetime is tied to cookies, and Brave enforces fixed life time of cookies, doesn't that mitigate it? The spec "recommends to treat the per-origin persistent identifier deviceId as other persistent storage (e.g. cookies) are treated." I think our intent here was to make this no worse than cookies. OTOH there doesn't seem to be much point in wasting effort making it better than cookies, because JS would just store the ids in cookies then. Those are non-normative recommendations though, so the spec could perhaps be stronger normatively here. In Firefox, we're considering some mitigations for enumerateDevices pre-gUM-grant but those are motivated more by the actual user system bits exposed, like number of cameras and number of microphones, not the id. |
We might be saying the same thing here, but my point is that "if double keying isn't standardized somewhere else" (as it currently isn't) then it needs to be specified here, even if that text ends up being made redundant by future work.
Some browsers do, and more seem to plan to, take steps to make the above non harmful (again double keying, blocking storage in 3p frames, etc.). So that problem is (in some cases) being addressed by ongoing work. So my goal / concern in this issue is to make sure getUserMedia doesn't make that work more difficult / nullified.
Brave currently blocks 3p cookies. By the text of the standard, that would mean that getUserMedia is broken no? The broader concern is that I don't think cookie lifetime is the right the right lifetime to key off of, since vendors are getting increasingly "tricky" (for good) in managing cookie lifetimes, and not only clearing cookies when the user hits "clear cookies". Brave, for ex, blocks all 3p cookies. Brave and Safari treat the life times of different cookies in the same frame differently (e.g. cap lifetime of JS set cookies to 7 days) etc. In the latter case, would that mean that device Ids are cycled every 7 days? (and if cookie A is JS set on day one, and cookie B on day two, does that mean device Ids are reset on day 8 and 9?). All the above could be bypassed by just not using any long term unique Ids in the scheme at all right? Whats lost by the proposal in the the issue? (im sure something better could be devised, but at least as an improvement)
That seems like a terrific idea! I would be very happy to help in strengthening the normative, mandatory protections in the spec :) |
The spec should probably describe this issue. It could recommend to use partitioning and/or some other mitigations.
These mitigations might make partitioning less of an issue. |
I'm saying the opposite: if double keying of both cookies and localStorage isn't standardized—which I doubt will ever happen, as it would break lots of stuff—then double-keying deviceIds seems futile. E.g.: if (!localStorage.fingerprint) {
localStorage.fingerprint = await navigator.mediaDevices.enumerateDevices();
} My take on mediacapture's "treat ... deviceId as ... cookies ... are treated." is we're trying to avoid creating a new class of problems. I.e. whatever makes sense for cookies makes sense for deviceIds. If a browser clears or expires cookies after 7 days, then do the same for deviceIds. If third-party cookies are blocked, then block persisting third-party deviceIds.
If browsers treat deviceIds like cookies then it won't. If there's any language in the spec that contradicts this, I'd support removing it. If there's prose we can add to normatively enforce it, I'm for adding it. |
Put differently: deviceIds are useless if cookies and localStorage have been cleared. The whole point was for getUserMedia to recognize and accept ids the JS has stored from a previous visit. |
Safari is implementing partitioning for localStorage, IDB, service workers without too much breakage. Cookies is indeed another story.
It seems like you are advocating for some form of partitioning. |
This is already underway in some vendors, and some deployments. Whether or not it makes it way up into a full recommendation (🤞) , it would be good to make sure this standard doesn't make that work more difficult. E.g. double keying deviceIds will help on those platforms / configurations today, and will help everywhere if / when double keying goes "general".
Sorry but i'm not following here. The point is that (to take Safari for example), there is no single point when "cookies are cleared". For example, one currently deployed cookie management system is to reduce the the lifetime of individual cookies, explicitly so the user never has to explicitly clear cookies. Tying device id to cookie lifetime is not clear, and maybe impossible, in these cases. Consider the following:
What is the the state of the device Ids on day 7, 8 and 14?
Can you clarify if there is a functionality loss by one of the more privacy preserving alternatives? It seems like there are clear privacy wins, so it'd be useful to get a sense of the trade offs |
You mean only lifetime of C is capped here? Or does this extend B's lifetime? I'll assume the former.
Cleared on day 9 (7 days after first creation), if I got your cookie logic right.
The point of the deviceId is to let JS recognize a previously used device even if it's been disconnected and reinserted since last visit. Double-keying deviceIds without double-keying cookies would break #607 (comment). |
@jan-ivar I'm not quite following the suggestion, because other cookies don't have a capped lifetime (HTTP set cookies), and the cookie jar doesn't clear after 7 days, just the lifetime "at set" of each individual cookie is reduced, so its not clear to me how any of this lines up with the text in the standard, which says The larger point though, is that "device ids should be cleared when cookies are clear" is a necessary, but not sufficient, condition for protecting user privacy. Particularly as vendors are becoming more aggressive / thoughtful about ways of protecting user privacy that have nothing to do with clearing other persistent storage.
I'm not following here. The narrow suggestion here isn't to have idea of a device Id, its to make the device identifier not uniquely identifying. What functionality is lost by replacing UUIDs with, say, simple integers, and having the client keep track of
A privacy preserving application can just keep track of whether it has access to "3" (not privacy violating) instead of some globally unique device id (potentially privacy harming).
Sorry I let the above drop off. I think this is a fantastic idea, to the point that the standard is privacy harmful w/o it. Querying hardware capabilities w/o user permission is a hard line the standard can't allow. Is there a current issue tracking this concern, or would it be better to open a separate issue. |
The use of "3" instead of an unique ID requires that "3" be stable in the face of unplugging devices and plugging them back in. |
@alvestrand one option would be to just have the browser keep track of that per origin, double keyed. So if i plug in webcam1, it gets "1", if i unplug it and plugin webcam2, it gets "2", if i unplug webcam2 and plug back in webcam1, its still "1", etc. The browser would need to track the same amount of state as it would in the current proposal, but w/o unique ids, no? |
At least in WebKit, this proposal requires more state to store. |
Ah, i see, fair point, implementation specific, I over spoke :) So, more bytes, but same number of buckets (e.g. one bucket per origin per top level origin). But, anyway, surely the issue here isn't storage space on disk ;) |
Chromium follows a similar approach as WebKit and maintaining consistency for the 1,2,3... IDs entails more complexity than extra storage space on disk. |
@guidou I apologize, but I'm not certain I follow the suggestion (im sure I'm just being dense). If I understand, the same page, framed on two different top level domains, would be able to track a user through the same, re-occuring device-ids, once the user gives permission for the device, right? So i think that addresses one of the concerns (that sites can easily get unique, persistant ids), but doesn't address the other, related concern (the same unique Id, being reused in different contexts). |
@snyderp Once permission is given, pages can use the label field to do the tracking, regardless of device IDs. |
An extra measure that can be taken to prevent tracking when permission has not been given is to list at most one entry per device kind (in addition to showing empty IDs) |
@guidou, is it something that could be implemented in Chrome? Some additional questions:
|
@youennf That can be implemented in Chrome if we all agree that this is the way to go. We would have to confirm that no important regressions occur in applications using enumerateDevices().
|
Good to know. We will discuss this at TPAC, hopefully we can get agreement there.
The fact that Safari implements this behaviour gives some confidence that this is shippable although websites, especially Chrome specific ones, might have to adapt. |
But these will not be unique to a client though, no? Would be ideal to not have the site ever learn the label ether, just the handle to refer to it with further, but as long as the site learn the label after permission is granted, and the label is not unique to the client, then I'm far less concerned.
What happens when the same 3p frame (i.e. third-party.com) appears in two different 1p domains? (i.e. are we still in a single key'ed world, or a now a double key'ed world)?
This is a terrific idea! |
In Chromium, the labels are the same for all domains (provided it's the same devices) and (together with a cookie) are probably as good for tracking as device IDs .
In Chromium it is single keyed. The device IDs are the same for third-party.com in both cases and are different from the IDs seen by 1p. If permissions are given, the labels are the same everywhere. |
I think im getting lost in the conversation. Here is a summary, as I understand it, of where things stand:
The parts I'm still not following are whether we're getting closer to agreement on the following items:
Will look forward to having more of this conversation at TPAC in ~3 weeks, but happy to do anything i can in advance to try and come to agreement here / before hand |
Problem:
The presence of fixed device IDs creates a signfigant tracking, privacy risk. This risk is somewhat mitigated by tying their lifetime to cookies, but this in practice is insufficient, since many privacy systems protect their users w/o clearing cookie stores (e.g. Safari's ITP, Brave enforces a fixed life time on JS set cookies, etc.).
In general, fixed id's are dangerous for privacy.
Possible solution:
One possible way of addressing this issue would be to not use unique ID's but to just number them (1, 2, 3). The browser could keep track of if the current device set, and if it changes, re-prompt the user for permission if the device set has changed since the site last asked for access.
Alt possible solution:
Double key deviceIds to local, top level frame. Prior discussion with PING states that this was completed but I don't see this anywhere in the spec (which says deviceIds must be unique by origin, not double keyed).
The text was updated successfully, but these errors were encountered: