Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fixed, per origin, device ID creates tracking risk #607

Closed
pes10k opened this issue Jul 9, 2019 · 31 comments
Closed

fixed, per origin, device ID creates tracking risk #607

pes10k opened this issue Jul 9, 2019 · 31 comments
Assignees
Labels
privacy-needs-resolution Issue the Privacy Group has raised and looks for a response on.

Comments

@pes10k
Copy link

pes10k commented Jul 9, 2019

Problem:
The presence of fixed device IDs creates a signfigant tracking, privacy risk. This risk is somewhat mitigated by tying their lifetime to cookies, but this in practice is insufficient, since many privacy systems protect their users w/o clearing cookie stores (e.g. Safari's ITP, Brave enforces a fixed life time on JS set cookies, etc.).

In general, fixed id's are dangerous for privacy.

Possible solution:
One possible way of addressing this issue would be to not use unique ID's but to just number them (1, 2, 3). The browser could keep track of if the current device set, and if it changes, re-prompt the user for permission if the device set has changed since the site last asked for access.

Alt possible solution:
Double key deviceIds to local, top level frame. Prior discussion with PING states that this was completed but I don't see this anywhere in the spec (which says deviceIds must be unique by origin, not double keyed).

@pes10k
Copy link
Author

pes10k commented Jul 9, 2019

other relaxations on deviceId permanence would also be helpful, such as:

  • deviceId = MUST -> MAY in terms of identifier being consistent between visits to the same origin
  • set a max life for deviceIds / permissions

Apologies if Im misreading the doc, but I can't find where some of the issues discussed with PING previously have be addressed they way that summary states. Thanks!

@youennf
Copy link
Contributor

youennf commented Jul 9, 2019

Double key deviceIds to local, top level frame.

#598 is tracking this approach.
As per my reading, the spec is not forbidding it but is not enforcing it either.
FWIW, that is what is implemented in Safari.
Device IDs in Safari are also only exposed after getUserMedia permission.

Note also #549 that mitigates partially (edited after @snyderp next message) this tracking.

@pes10k
Copy link
Author

pes10k commented Jul 9, 2019

Helllo @youennf :

Not sure I follow the above messages.

#598 seems to have concluded that no change is needed, and I don't see anything in the spec at all about double keying anything. "the spec is not forbidding [the fix] but is not enforcing it either" is not a solution to the problem introduced by the spec :)

Re: "FWIW, that is what is implemented in Safari." Great! If this is the correct solution to the privacy harm introduced by the spec, it needs to be in the spec though. Its not sufficient to clearly define the privacy harming behavior in the spec, but leave the mitigations vague and unspecified.

I don't understanding how #549 mitigates tracking. Would you mind explaining further?

@youennf
Copy link
Contributor

youennf commented Jul 9, 2019

#598 seems to have concluded that no change is needed, and I don't see anything in the spec at all about double keying anything. "the spec is not forbidding [the fix] but is not enforcing it either" is not a solution to the problem introduced by the spec :)

I am supportive of fixing this issue as well.
This should be done for all data types (IDB, service workers...).
If device IDs are partitioned but not IDB for instance, the tracking is still possible and this might also disrupt apps.

Ideally, there would be somewhere a spec describing how to do partitioning.
Then this spec would just refer to it for deviceIds.
Some work is being done in the fetch spec with regards to HTTP cache partitioning.

I don't understanding how #549 mitigates tracking. Would you mind explaining further?

It helps mitigating the issue, it does not solve it.
With #549, a typical third party iframe will get an empty list of devices when calling enumerateDevices.
Only iframes which are allowed by the top level origin to capture through feature policy should be able to get a non empty list of devices.

@pes10k
Copy link
Author

pes10k commented Jul 9, 2019

I agree, it would be good to have a single "storage partitioning, double-keying" definition / spec, but until that exists, we need to deal with the state of standards as is. If future spec comes and makes specifying double-keying in this spec unnecessary, thats great and a perfect reason to revise this spec and remove it. But in the meantime, it seems we agree that double-keying is necessary, no?

I take your point re #549, but unfortunately, if the sorry state of privacy on the web has shows anything, its that we can't rely on sites to protect user privacy; it needs to be baked into clients and standards. So, the standard needs to protect the user from a site that would like to track, not just allow a site to protect the user.

But either way, I'm not sure how #549 addresses what I imagine to be the typical scenario: two different sites have embedded the same origin that would like to track the user AND allow the user to use the microphone / web cam / etc.(example.org)

Under the current spec, all embedded example.org's would be able to track the user, even if the user only wanted to use the webcam on embedded instance. Under a double key'ed solution that wouldn't be possible.

But, a bigger question: why do you need the keying at all: why not just use something other than unique device ids. What would be lost in the first suggest in the issue? Worst case, the user would need to re-grant a permission when their media set up changes. Seems reasonable? :)

@youennf
Copy link
Contributor

youennf commented Jul 10, 2019

But, a bigger question: why do you need the keying at all: why not just use something other than unique device ids. What would be lost in the first suggest in the issue? Worst case, the user would need to re-grant a permission when their media set up changes. Seems reasonable? :)

The question is not really about permission.
The major usecase for deviceIds is for the user to do camera/microphone selection once. The website will be able to select again the same devices using deviceIds at next user visit.

The web engine could also remember on its own the user selection and the website could opt-in with a deviceId like 'same-as-last-visit'. If we envision scenarios with multiple cameras/microphones, this probably does not work. Websites might also want to set up multiple audio routes, say for notification and regular media playback.

Anther thing to bear in mind is that device ID values are not the only privacy threat. The number of cameras, microphones or output speakers can help identifying users as well. Safari mitigates this issue by exposing device IDs after getUserMedia is granted (through a prompt). So far, this seems to be web compatible, at least for microphones and cameras.

@pes10k
Copy link
Author

pes10k commented Jul 10, 2019

The question is not really about permission.
The major usecase for deviceIds is for the user to do camera/microphone selection once. The website will be able to select again the same devices using deviceIds at next user visit.

Totally agree. I only meant to suggest a different way of having the ability for sites to say "i want x and y that I was allowed to access before, again", w/o consistent unique identifiers (which, AFAIK, totally unique in the WebAPI).

Anther thing to bear in mind is that device ID values are not the only privacy threat. The number of cameras, microphones or output speakers can help identifying users as well. Safari mitigates this issue by exposing device IDs after getUserMedia is granted (through a prompt). So far, this seems to be web compatible, at least for microphones and cameras.

This is a great point! :) This should also be worked into the spec as default behavor. Privacy preserving w/o breaking expected use cases. Would you accept a PR?

@jan-ivar
Copy link
Member

I agree, it would be good to have a single "storage partitioning, double-keying" definition / spec, but until that exists, we need to deal with the state of standards as is. If future spec comes and makes specifying double-keying in this spec unnecessary, thats great and a perfect reason to revise this spec and remove it.

@Snyder Isn't it the opposite? Future partitioning of cookies and localStorage wouldn't make this unnecessary. Rather, it would seem like a prerequisite for the suggested mitigation to matter.

all embedded example.org's would be able to track the user,

They would be able to do that today without this spec using:

localStorage.fingerprint = Math.random().toString(36);

So where is the issue?

This risk is somewhat mitigated by tying their lifetime to cookies, but this in practice is insufficient, since many privacy systems protect their users w/o clearing cookie stores (e.g. Safari's ITP, Brave enforces a fixed life time on JS set cookies, etc.).

If the lifetime is tied to cookies, and Brave enforces fixed life time of cookies, doesn't that mitigate it?

The spec "recommends to treat the per-origin persistent identifier deviceId as other persistent storage (e.g. cookies) are treated."

I think our intent here was to make this no worse than cookies. OTOH there doesn't seem to be much point in wasting effort making it better than cookies, because JS would just store the ids in cookies then.

Those are non-normative recommendations though, so the spec could perhaps be stronger normatively here.

In Firefox, we're considering some mitigations for enumerateDevices pre-gUM-grant but those are motivated more by the actual user system bits exposed, like number of cameras and number of microphones, not the id.

@pes10k
Copy link
Author

pes10k commented Jul 24, 2019

I agree, it would be good to have a single "storage partitioning, double-keying" definition / spec, but until that exists, we need to deal with the state of standards as is. If future spec comes and makes specifying double-keying in this spec unnecessary, thats great and a perfect reason to revise this spec and remove it.

@Snyder Isn't it the opposite? Future partitioning of cookies and localStorage wouldn't make this unnecessary. Rather, it would seem like a prerequisite for the suggested mitigation to matter.

We might be saying the same thing here, but my point is that "if double keying isn't standardized somewhere else" (as it currently isn't) then it needs to be specified here, even if that text ends up being made redundant by future work.

localStorage.fingerprint = Math.random().toString(36);

Some browsers do, and more seem to plan to, take steps to make the above non harmful (again double keying, blocking storage in 3p frames, etc.). So that problem is (in some cases) being addressed by ongoing work. So my goal / concern in this issue is to make sure getUserMedia doesn't make that work more difficult / nullified.

If the lifetime is tied to cookies, and Brave enforces fixed life time of cookies, doesn't that mitigate it?

Brave currently blocks 3p cookies. By the text of the standard, that would mean that getUserMedia is broken no?

The broader concern is that I don't think cookie lifetime is the right the right lifetime to key off of, since vendors are getting increasingly "tricky" (for good) in managing cookie lifetimes, and not only clearing cookies when the user hits "clear cookies".

Brave, for ex, blocks all 3p cookies. Brave and Safari treat the life times of different cookies in the same frame differently (e.g. cap lifetime of JS set cookies to 7 days) etc. In the latter case, would that mean that device Ids are cycled every 7 days? (and if cookie A is JS set on day one, and cookie B on day two, does that mean device Ids are reset on day 8 and 9?).

All the above could be bypassed by just not using any long term unique Ids in the scheme at all right? Whats lost by the proposal in the the issue? (im sure something better could be devised, but at least as an improvement)

Those are non-normative recommendations though, so the spec could perhaps be stronger normatively here.

That seems like a terrific idea! I would be very happy to help in strengthening the normative, mandatory protections in the spec :)

@youennf
Copy link
Contributor

youennf commented Jul 24, 2019

We might be saying the same thing here, but my point is that "if double keying isn't standardized somewhere else" (as it currently isn't) then it needs to be specified here, even if that text ends up being made redundant by future work.

The spec should probably describe this issue. It could recommend to use partitioning and/or some other mitigations.
It seems difficult right now to mandate partitioning if IndexedDB is not partitioned for instance.
This might not provide much benefit and might break valid websites using a WebRTC SDK iframe.

In Firefox, we're considering some mitigations for enumerateDevices pre-gUM-grant but those are motivated more by the actual user system bits exposed, like number of cameras and number of microphones, not the id.

These mitigations might make partitioning less of an issue.
Since there seems to be interest in these mitigations, it makes sense to describe these mitigations.
The spec could describe, recommend or even mandate them.

@jan-ivar
Copy link
Member

We might be saying the same thing here, but my point is that "if double keying isn't standardized somewhere else" (as it currently isn't) then it needs to be specified here, even if that text ends up being made redundant by future work.

I'm saying the opposite: if double keying of both cookies and localStorage isn't standardized—which I doubt will ever happen, as it would break lots of stuff—then double-keying deviceIds seems futile. E.g.:

if (!localStorage.fingerprint) {
  localStorage.fingerprint = await navigator.mediaDevices.enumerateDevices();
}

My take on mediacapture's "treat ... deviceId as ... cookies ... are treated." is we're trying to avoid creating a new class of problems. I.e. whatever makes sense for cookies makes sense for deviceIds. If a browser clears or expires cookies after 7 days, then do the same for deviceIds. If third-party cookies are blocked, then block persisting third-party deviceIds.

So my goal / concern in this issue is to make sure getUserMedia doesn't make that work more difficult / nullified.

If browsers treat deviceIds like cookies then it won't.

If there's any language in the spec that contradicts this, I'd support removing it. If there's prose we can add to normatively enforce it, I'm for adding it.

@jan-ivar
Copy link
Member

Put differently: deviceIds are useless if cookies and localStorage have been cleared. The whole point was for getUserMedia to recognize and accept ids the JS has stored from a previous visit.

@youennf
Copy link
Contributor

youennf commented Jul 24, 2019

I'm saying the opposite: if double keying of both cookies and localStorage isn't standardized—which I doubt will ever happen.

Safari is implementing partitioning for localStorage, IDB, service workers without too much breakage. Cookies is indeed another story.

If third-party cookies are blocked, then block persisting third-party deviceIds.

It seems like you are advocating for some form of partitioning.
You might end up having different deviceIds for the same resource, be it loaded as a main frame (persistent ids) or loaded as a third-party iframe (non persistent ids).

@pes10k
Copy link
Author

pes10k commented Jul 24, 2019

I'm saying the opposite: if double keying of both cookies and localStorage isn't standardized—which I doubt will ever happen, as it would break lots of stuff—then double-keying deviceIds seems futile.

This is already underway in some vendors, and some deployments. Whether or not it makes it way up into a full recommendation (🤞) , it would be good to make sure this standard doesn't make that work more difficult. E.g. double keying deviceIds will help on those platforms / configurations today, and will help everywhere if / when double keying goes "general".

My take on mediacapture's "treat ... deviceId as ... cookies ... are treated." …

Sorry but i'm not following here. The point is that (to take Safari for example), there is no single point when "cookies are cleared". For example, one currently deployed cookie management system is to reduce the the lifetime of individual cookies, explicitly so the user never has to explicitly clear cookies. Tying device id to cookie lifetime is not clear, and maybe impossible, in these cases. Consider the following:

  1. 3p frame A sets cookie B is set on day 1. Lifetime is capped by client set to 7 days
  2. 3p frame reads devices Ids on day 2
  3. 3p frame A sets cookie C on day 3. Lifetime is capped by client to 7 days
  4. 3p frame reads device ids on day 4

What is the the state of the device Ids on day 7, 8 and 14?

@snyderp All the above could be bypassed by just not using any long term unique Ids in the scheme at all right? Whats lost by the proposal in the the issue? (im sure something better could be devised, but at least as an improvement)

Can you clarify if there is a functionality loss by one of the more privacy preserving alternatives? It seems like there are clear privacy wins, so it'd be useful to get a sense of the trade offs

@jan-ivar
Copy link
Member

frame A sets cookie C on day 3. Lifetime is capped by client to 7 days

You mean only lifetime of C is capped here? Or does this extend B's lifetime? I'll assume the former.

What is the the state of the device Ids on day 7, 8 and 14?

Cleared on day 9 (7 days after first creation), if I got your cookie logic right.

Can you clarify if there is a functionality loss by one of the more privacy preserving alternatives?

The point of the deviceId is to let JS recognize a previously used device even if it's been disconnected and reinserted since last visit.

Double-keying deviceIds without double-keying cookies would break #607 (comment).

@pes10k
Copy link
Author

pes10k commented Jul 26, 2019

@jan-ivar I'm not quite following the suggestion, because other cookies don't have a capped lifetime (HTTP set cookies), and the cookie jar doesn't clear after 7 days, just the lifetime "at set" of each individual cookie is reduced, so its not clear to me how any of this lines up with the text in the standard, which says reset per-origin device identifiers when other persistent storage are cleared.

The larger point though, is that "device ids should be cleared when cookies are clear" is a necessary, but not sufficient, condition for protecting user privacy. Particularly as vendors are becoming more aggressive / thoughtful about ways of protecting user privacy that have nothing to do with clearing other persistent storage.

Double-keying deviceIds without double-keying cookies would break #607 (comment).

I'm not following here. The narrow suggestion here isn't to have idea of a device Id, its to make the device identifier not uniquely identifying. What functionality is lost by replacing UUIDs with, say, simple integers, and having the client keep track of

  1. all observed devices (w/ a simple integer identifier for each one)
  2. permission of whether its ever given permission of device X (likely single digit int) (instead of UUID) to site A

A privacy preserving application can just keep track of whether it has access to "3" (not privacy violating) instead of some globally unique device id (potentially privacy harming).

In Firefox, we're considering some mitigations for enumerateDevices pre-gUM-grant but those are motivated more by the actual user system bits exposed, like number of cameras and number of microphones, not the id.

Sorry I let the above drop off. I think this is a fantastic idea, to the point that the standard is privacy harmful w/o it. Querying hardware capabilities w/o user permission is a hard line the standard can't allow. Is there a current issue tracking this concern, or would it be better to open a separate issue.

@alvestrand
Copy link
Contributor

The use of "3" instead of an unique ID requires that "3" be stable in the face of unplugging devices and plugging them back in.

@pes10k
Copy link
Author

pes10k commented Aug 15, 2019

@alvestrand one option would be to just have the browser keep track of that per origin, double keyed. So if i plug in webcam1, it gets "1", if i unplug it and plugin webcam2, it gets "2", if i unplug webcam2 and plug back in webcam1, its still "1", etc. The browser would need to track the same amount of state as it would in the current proposal, but w/o unique ids, no?

@youennf
Copy link
Contributor

youennf commented Aug 15, 2019

The browser would need to track the same amount of state as it would in the current proposal, but w/o unique ids, no?

At least in WebKit, this proposal requires more state to store.
WebKit is currently storing a single salt per partitioned origin.
This salt is used to generate an origin-unique deviceId based on the OS-provided deviceId.

@pes10k
Copy link
Author

pes10k commented Aug 15, 2019

Ah, i see, fair point, implementation specific, I over spoke :)

So, more bytes, but same number of buckets (e.g. one bucket per origin per top level origin).

But, anyway, surely the issue here isn't storage space on disk ;)

@guidou
Copy link
Contributor

guidou commented Aug 21, 2019

Chromium follows a similar approach as WebKit and maintaining consistency for the 1,2,3... IDs entails more complexity than extra storage space on disk.
What about using empty IDs if a permission has not been given? It would still support the use case of detecting the addition/removal of a device to enable/disable features in the application and would mitigate the tracking concerns for users that have not given permission.

@pes10k
Copy link
Author

pes10k commented Aug 22, 2019

@guidou I apologize, but I'm not certain I follow the suggestion (im sure I'm just being dense). If I understand, the same page, framed on two different top level domains, would be able to track a user through the same, re-occuring device-ids, once the user gives permission for the device, right?

So i think that addresses one of the concerns (that sites can easily get unique, persistant ids), but doesn't address the other, related concern (the same unique Id, being reused in different contexts).

@aboba aboba added privacy-tracker Group bringing to attention of Privacy, or tracked by the Privacy Group but not needing response. and removed TPAC 2019 labels Aug 22, 2019
@guidou
Copy link
Contributor

guidou commented Aug 26, 2019

@snyderp Once permission is given, pages can use the label field to do the tracking, regardless of device IDs.

@guidou
Copy link
Contributor

guidou commented Aug 26, 2019

An extra measure that can be taken to prevent tracking when permission has not been given is to list at most one entry per device kind (in addition to showing empty IDs)
This way, the number of devices becomes largely useless for tracking (only 8 possible outputs) but the use case of enabling and disabling features prior to authorization based on the presence of input devices can still be supported.

@youennf
Copy link
Contributor

youennf commented Aug 26, 2019

@guidou, is it something that could be implemented in Chrome?
I believe that what you are proposing is inline with what Safari does and what Mozilla has in mind.
This could be the foundation to fix #612 and would greatly reduce the severity of this issue.

Some additional questions:

  1. Is there a use to expose audio output devices when device-info permission is not granted?
  2. If example.com main page is granted device-info permission, does it mean that any embedded example.com iframe is also granted device-info permission?

@guidou
Copy link
Contributor

guidou commented Aug 26, 2019

@youennf That can be implemented in Chrome if we all agree that this is the way to go. We would have to confirm that no important regressions occur in applications using enumerateDevices().

  1. In Chrome, if no device permission has been granted, the output devices appear in the enumeration with an empty label field.
  2. If the embedded iframe is in the same domain, yes. If it's a cross-domain iframe, no.

@youennf
Copy link
Contributor

youennf commented Aug 26, 2019

That can be implemented in Chrome if we all agree that this is the way to go.

Good to know. We will discuss this at TPAC, hopefully we can get agreement there.

We would have to confirm that no important regressions occur in applications using enumerateDevices().

The fact that Safari implements this behaviour gives some confidence that this is shippable although websites, especially Chrome specific ones, might have to adapt.
I guess some statistics could be gathered like pages doing an enumerateDevices call before doing a getUserMedia call.

@pes10k
Copy link
Author

pes10k commented Aug 26, 2019

@guidou

@snyderp Once permission is given, pages can use the label field to do the tracking, regardless of device IDs.

But these will not be unique to a client though, no? Would be ideal to not have the site ever learn the label ether, just the handle to refer to it with further, but as long as the site learn the label after permission is granted, and the label is not unique to the client, then I'm far less concerned.

  1. If the embedded iframe is in the same domain, yes. If it's a cross-domain iframe, no.

What happens when the same 3p frame (i.e. third-party.com) appears in two different 1p domains? (i.e. are we still in a single key'ed world, or a now a double key'ed world)?

An extra measure that can be taken to prevent tracking when permission has not been given is to list at most one entry per device kind (in addition to showing empty IDs)

This is a terrific idea!

@guidou
Copy link
Contributor

guidou commented Aug 26, 2019

@guidou

@snyderp Once permission is given, pages can use the label field to do the tracking, regardless of device IDs.

But these will not be unique to a client though, no? Would be ideal to not have the site ever learn the label ether, just the handle to refer to it with further, but as long as the site learn the label after permission is granted, and the label is not unique to the client, then I'm far less concerned.

In Chromium, the labels are the same for all domains (provided it's the same devices) and (together with a cookie) are probably as good for tracking as device IDs .

  1. If the embedded iframe is in the same domain, yes. If it's a cross-domain iframe, no.

What happens when the same 3p frame (i.e. third-party.com) appears in two different 1p domains? (i.e. are we still in a single key'ed world, or a now a double key'ed world)?

In Chromium it is single keyed. The device IDs are the same for third-party.com in both cases and are different from the IDs seen by 1p. If permissions are given, the labels are the same everywhere.

@pes10k
Copy link
Author

pes10k commented Aug 30, 2019

In Chromium, the labels are the same for all domains (provided it's the same devices) and (together with a cookie) are probably as good for tracking as device IDs .

I think im getting lost in the conversation. Here is a summary, as I understand it, of where things stand:

  1. enumerateDevices will instead return on entry per device type (audio, video, etc), and that entry will have neither labels not device IDs
  2. Once a user gives a site access, the site can learn the true label for the device

The parts I'm still not following are whether we're getting closer to agreement on the following items:

  1. Modifying the standard to make the deviceID double keyed (note my understanding is that the WG already agreed to make this change during the previous HR)
  2. Coming up with some handle that a site can use to re-access a device the user has already given access to that isn't a persistent, unique global identifier. I've offered some suggestions (maybe naive) on how this could be done above. Having the browser maintain and present globally unique long term identifiers to sites would be a uniquely new privacy harm in the browser, so working out some other option here seems unavoidable.

Will look forward to having more of this conversation at TPAC in ~3 weeks, but happy to do anything i can in advance to try and come to agreement here / before hand

@w3cbot w3cbot added the privacy-needs-resolution Issue the Privacy Group has raised and looks for a response on. label May 7, 2020
@w3cbot w3cbot removed the privacy-tracker Group bringing to attention of Privacy, or tracked by the Privacy Group but not needing response. label May 15, 2020
@youennf
Copy link
Contributor

youennf commented Jun 18, 2020

Closing as this largely overlaps with #682. #682 was fixed by #687

@youennf youennf closed this as completed Jun 18, 2020
@jan-ivar jan-ivar added the privacy-tracker Group bringing to attention of Privacy, or tracked by the Privacy Group but not needing response. label Oct 9, 2020
@w3cbot w3cbot removed the privacy-tracker Group bringing to attention of Privacy, or tracked by the Privacy Group but not needing response. label Oct 13, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
privacy-needs-resolution Issue the Privacy Group has raised and looks for a response on.
Projects
None yet
Development

No branches or pull requests

7 participants