-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
editorial: Add reading quantization and threshold check algorithms. #77
Conversation
@anssiko @reillyeon @sandandsnow this is a companion to w3c/sensors#429 As mentioned there, AFAICS only specifying the granularity of the illuminance data is not enough, as the Chrome implementation also checks if the new reading differs from the latest one significantly enough, and IIRC @reillyeon mentioned only doing the rounding was not enough to avoid fingerprinting. Opens I can see immediately:
|
My initial impression is this looks good!
AFAICT, the "at least 50lx" threshold was informed by research conducted in this group with results collected using a setup described at #13 (comment). Optimally we'd link to this data in the spec so that privacy researchers can review the test setup and data easily and we can adjust this mitigation if new information is brought to our attention. Revise as needed. Rather than linking to a Google sheet, I'd prefer to see this data exported into an appendix in the spec, or alternatively convert the sheet into a markdown file stored in this repo. Again, I'll lean on @sandandsnow and other PING participants for privacy experts' perspective. |
@sandandsnow, how could the DAS WG help PING review this proposed privacy mitigation? This mitigation has already been implemented in Chromium. As proposed in this PR, the DAS WG would like to now normatively specify this mitigation so that other implementers could benefit from this and we are seeking PING review to capture your perspective. |
Thank you for bringing this to my attention, and thank you for addressing this vulnerability. I do not have the specialist expertise to determine if at least 50lx threshold is a sufficient mitigation, but I will confer with others for their views and revert shortly. |
@sandandsnow, thanks for your swift response. I put a reminder to check back the status of this RFC in a week. Please let us know if PING has a meeting cadence we should align with. We want to engage with PING as early as possible when there's a privacy-impacting concrete spec change proposal in review. Optimally, such proposals are not landed in the spec before PING has reviewed the proposed changes to minimise spec churn and to increase implementers' confidence. |
Hello, Thanks for not limiting to the frequency reduction which was not the central culprit of some past risks. I'm happy this gets formalised and I agree that this minimises the risks of such known attacks. Minimises, as it isn't clear if we're aware of the full risk potential. That said, this change helps, and likely fixes the most "reasonable" scenarios imaginable. I agree that "50 lx" is quite a strong limitation, unless for really specific circumstances (can't be ruled out but probably atypical anyway). Another approach could involve further reduction and possibly going from quantitative lux readout to qualitative description such as "bright", "dark", "very dark", etc. |
@lknik, thanks for your review. Also thanks for the earlier contributions as a WG participant that also helped improve the privacy properties of this API. You’re in acknowledgements. @sandandsnow, should we consider this to be PING’s official review or are we expecting more feedback? |
Always happy to help, @anssiko! Feel free to name the threshold check algorithm "Janc's algorithm" (of @arturjanc) :-) (j/k) |
@lknik thanks for weighing in |
@anssiko, we discussed this at our PING meeting on Thursday. As a consequence, there are a couple of follow-up questions. I was hoping to share them with you last week, but I'm waiting on colleagues to clarify those. |
Thank you. We discussed the proposed mitigations in the PING call today. As a result of that conversation we have a couple of follow-up questions:
And, a more general privacy question (not related to reducing granularity), how does the specification prevent or protect against cross-device tracking (e.g. the light equivalent of ultrasonic beacons)? More specifically, we have received these observations and comments:
|
@sandandsnow May I please ask why do you consider the 50lx thing through the lens of fingerprinting risks? The point was to minimise data leak risks (here, which also sums up your observations, too -- in other words, we know about this :)). The "more general question" about cross-device... I'd say it greatly lowers the risk, but it nonetheless remains (out of bands). |
1d1ad4d
to
f28ded8
Compare
(apologies in advance for the wall of text ahead) Hi, @sandandsnow and @lknik. Thank you very much for all the time spent reviewing this PR (and special thanks to @lknik for being around and watching this API for years now). My apologies for the time it took me to get back to this change. At least I did spend some time working and documenting the Generic Sensor implementation in Chromium and have a better understanding of the mitigations I am trying to "upstream" here. I've updated this PR as well as w3c/sensors#429 to address some of the feedback received here as well as to make the prose and algorithms better match what we have in Chromium. I strongly suggest looking at w3c/sensors#429 first and then reading this PR's diff. I'll go over the current solution and then try to address the concerns @sandandsnow has brought from PING. Current changeCompared to the previous version from the end of 2021:
Things I'd like to discuss
PING's concerns
Done in the spec and also above, hopefully.
Please correct me if I'm wrong, but I'm under the impression that some of those concerns came up by looking at this spec in isolation without looking at the main Generic Sensor spec. https://w3c.github.io/sensors/#concepts-can-expose-sensor-readings and https://w3c.github.io/sensors/#abstract-operations mandate, for example, that:
It is up to each UA to implement "request permission to use", and it might involve prompting users, for example. At the moment, Chromium does not prompt users for access to motion sensors (e.g. accelerometer and gyroscope) but lets them allow or block access by default. We are also working on making this better by moving to prompting by default (and removing the "allow by default" option) as part of the working on implementing Device Orientation's With the above in mind, let me try to get to the specific questions:
I believe the fingerprinting risk remains. Even though we reduce the granularity of the data exposed to API users, an attacker could still know that a user is e.g. at an office environment between certain hours (320-500lx per https://en.wikipedia.org/wiki/Lux#Illuminance), and walks under full daylight at a certain time of the day (1000 to 10000lx). The mitigations listed above help prevent that websites (including third-parties) have undetected and unprompted access to the data.
The idea with the set of mitigations proposed here and in the Generic Sensor spec is to make the readings coarse enough to help prevent cross-device tracking while at the same time only making readings available to pages that fulfill the requirements above and which the user has authorized to gather data.
Do the mitigations above help make it more acceptable? I'm asking because this is also the case even for specs such as https://w3c.github.io/deviceorientation that are implemented by multiple engines: a
Are you referring to https://github.com/asankah/ephemeral-fingerprinting or is there another resource I could look at? That page lists several possible mitigations and we implement many of them, so I'm wondering if the Generic Sensor + ALS mitigations do address the concern at least partially?
Answered above: the permission side of things is handled in the main Generic Sensor spec, UAs are free to handle the permission request implementation, we want to add a prompt to the Chromium implementation. Additionally, when it comes to the covert channel attack, the bucketing idea also helps make it more difficult -- the idea looks similar to https://arturjanc.com/ls/ after all, which the bucketing idea is supposed to help address. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Unblocking my review, @sandandsnow @lknik & other PING folks' review is on the critical path.
still useful for API users. The value of 50 lux as a minimum for the | ||
[=illuminance rounding multiple=] was determined in <a | ||
href="https://github.com/w3c/ambient-light/issues/13#issuecomment-302393458">GitHub | ||
issue #13</a> after different ambient light level measurements under different |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'd be indeed good to snapshot this table mentioned in #13 (comment) into this repo and link to it from the spec. Maybe https://www.npmjs.com/package/csv2md could help with the conversion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was considering doing that in a separate PR. Is there any other spec I could look at for inspiration? Should it be added to the spec itself or as a separate file? Wasn't there a different W3C process for handling this sort of data?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My recommendation would be to put this type of informational data in a separate file in the repo and link to it. Markdown happens to render nicely so that's an OK format. One example: https://github.com/webmachinelearning/webnn/blob/main/op_compatibility/first_wave_models.md
This data could also go to an appendix in the spec but that'd mean an HTML table that is less fun to maintain if needed.
The key idea here is to have a reference that won't rot in case that Google Sheet goes down. The exact format is not so important as long as it can be read without special software. See https://www.w3.org/Consortium/Persistence.html -- note this pledge predates GitHub, also w3c GH org is archived nowadays.
To be more fancy, we could have a local biblio entry for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved with a fix for a reference to an unknown definition.
31d469e
to
b4ebffe
Compare
@reillyeon while I have you here, could you take a look at the "threshold check algorithm" idea part of #77 (comment)? I'd like to double-check those items with you since you were around when this was discussed when reviewing the initial version of these mitigations in Chromium. |
It may be possible to defer the threshold checking to the hardware or operating system as long as the threshold is implemented as a delta from the previously reported value. It is important that this works correctly as it is critical to making rounding an effective mitigation when rounding to a value significantly higher than the noise in the system, as is the case with the ambient light sensor. |
Related to w3c#63, which says the granularity of the data exposed by Ambient Light Sensors should be specified normatively. This commit goes a bit further and specifies the two anti-fingerprinting measures currently implemented by Chrome -- namely, not only are illuminance values rounded but there's also a threshold value check to avoid storing values that are too close to the latest reading. w3c/sensors#429 defines the concepts of "reading quantization algorithm" and "threshold check algorithm" that concrete sensors can specify. We specify both here, along with some values used by them (based on the current Chromium values): - An "illuminance rounding multiple" of at least 50lx. - An "illuminance threshold value" of at least 25lx (half the illuminance roundig multiple, to be more precise). These values are then used in the following algorithms: - The "threshold check algorithm" checks that the difference between new and current illuminance values is above the illuminance threshold value. - The "reading quantization algorithm" rounds up readings to the closest multiple of the illuminance rounding multiple.
This follows https://w3c.github.io/fingerprinting-guidance/#mark-fingerprinting and makes it clear that this API can increase the fingerprinting surface despite the proposed mitigations.
b4ebffe
to
44e8b41
Compare
I've pushed a new version of this PR with a few changes:
|
@sandandsnow @lknik friendly ping, just wondering if any of you had time to take a look at the changes pushed to this PR as well as w3c/sensors#429 |
In my opinion, the threshold method helps mitigating the risk. Of course, some potential remains but it would be much more difficult to abuse in practice. The reason Fig 3/5 in the referenced PDFs vary so much may be due to the tested environment. In my tests, 50lx differences were also recorded routinely. However, in my view it is less likely (if mitigations are deployed) to abuse it in practice to e.g. exfiltrate data, as then the environmental changes would contribute less to a reliable abuse. So let's move forward. There's still a risk that some academic team will want to validate the boundary issues, but such is life :) |
Thanks, @lknik, I really appreciate the review and that you've stuck around for so many years. For the record, you might also be interested in #79 where we discuss the permissions/permission prompt situation (and requiring the camera permission for this API), which is also part of the analysis you've written about ALS. |
Thanks @lknik for your privacy-focused suggestions and review throughout the years (plural). @sandandsnow we'd still be happy to hear PING's feedback for this proposed editorial improvement before we merge this PR. |
I'm happy to be guided by @lknik, but I have drawn this to the attention of my PING co-chairs in case they have anything further they wish to raise. |
@sandandsnow, it seems no concerns from the other PING co-chairs have been raised, could we merge this? If so, we’d appreciate if you could your submit approval with the usual GH facility (Files changes > Review changes). |
@anssiko We're happy for you to close the issue. |
Thanks @sandandsnow! |
These two operations have been referenced by the Ambient Light Sensor spec since w3c/ambient-light#77 but the `<dfn>`s in this spec were not properly exported.
Related to #63, which says the granularity of the data exposed by Ambient
Light Sensors should be specified normatively.
This commit goes a bit further and specifies the two anti-fingerprinting
measures currently implemented by Chrome -- namely, not only are illuminance
values rounded but there's also a threshold value check to avoid storing
values that are too close to the latest reading.
w3c/sensors#429 defines the concepts of "reading quantization algorithm" and
"threshold check algorithm" that concrete sensors can specify. We specify
both here, along with some values used by them (based on the current
Chromium values):
roundig multiple, to be more precise).
These values are then used in the following algorithms:
current illuminance values is above the illuminance threshold value.
multiple of the illuminance rounding multiple.
Preview | Diff