Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

editorial: Add reading quantization and threshold check algorithms. #77

Merged
merged 2 commits into from
Aug 22, 2022

Conversation

rakuco
Copy link
Member

@rakuco rakuco commented Dec 15, 2021

Related to #63, which says the granularity of the data exposed by Ambient
Light Sensors should be specified normatively.

This commit goes a bit further and specifies the two anti-fingerprinting
measures currently implemented by Chrome -- namely, not only are illuminance
values rounded but there's also a threshold value check to avoid storing
values that are too close to the latest reading.

w3c/sensors#429 defines the concepts of "reading quantization algorithm" and
"threshold check algorithm" that concrete sensors can specify. We specify
both here, along with some values used by them (based on the current
Chromium values):

  • An "illuminance rounding multiple" of at least 50lx.
  • An "illuminance threshold value" of at least 25lx (half the illuminance
    roundig multiple, to be more precise).

These values are then used in the following algorithms:

  • The "threshold check algorithm" checks that the difference between new and
    current illuminance values is above the illuminance threshold value.
  • The "reading quantization algorithm" rounds up readings to the closest
    multiple of the illuminance rounding multiple.

Preview | Diff

@rakuco
Copy link
Member Author

rakuco commented Dec 15, 2021

@anssiko @reillyeon @sandandsnow this is a companion to w3c/sensors#429

As mentioned there, AFAICS only specifying the granularity of the illuminance data is not enough, as the Chrome implementation also checks if the new reading differs from the latest one significantly enough, and IIRC @reillyeon mentioned only doing the rounding was not enough to avoid fingerprinting.

Opens I can see immediately:

  • It is not clear if "at least 50lx" makes sense or if we should mandate 50lx specifically in the values above.
  • It is also not clear if implementations should have leeway to specify different values for the "illuminance threshold value" and the "illuminance rounding multiple" (or if the values should depend on each other).
  • The changes to the illuminance getter are probably too hand-wavy. I did not know whether the actual algorithm (this is what Chrome does at the moment) must be specified or not.
  • Chrome does the rounding before notifying Sensor instances (e.g. the getter just calls "get value from latest readings") by storing raw readings but sharing rounded readings with instances, but I guess this is irrelevant from a spec perspective?
  • The names for the new concepts and values are not great :-)

@rakuco rakuco changed the title Mandate that illuminance readings be rounded; require threshold value check. RFC: Mandate that illuminance readings be rounded; require threshold value check. Dec 15, 2021
@anssiko
Copy link
Member

anssiko commented Dec 16, 2021

My initial impression is this looks good!

It is not clear if "at least 50lx" makes sense or if we should mandate 50lx specifically in the values above.

AFAICT, the "at least 50lx" threshold was informed by research conducted in this group with results collected using a setup described at #13 (comment).

Optimally we'd link to this data in the spec so that privacy researchers can review the test setup and data easily and we can adjust this mitigation if new information is brought to our attention. Revise as needed.

Rather than linking to a Google sheet, I'd prefer to see this data exported into an appendix in the spec, or alternatively convert the sheet into a markdown file stored in this repo.

Again, I'll lean on @sandandsnow and other PING participants for privacy experts' perspective.

@anssiko
Copy link
Member

anssiko commented Feb 3, 2022

@sandandsnow, how could the DAS WG help PING review this proposed privacy mitigation?

This mitigation has already been implemented in Chromium.

As proposed in this PR, the DAS WG would like to now normatively specify this mitigation so that other implementers could benefit from this and we are seeking PING review to capture your perspective.

@sandandsnow
Copy link

Thank you for bringing this to my attention, and thank you for addressing this vulnerability. I do not have the specialist expertise to determine if at least 50lx threshold is a sufficient mitigation, but I will confer with others for their views and revert shortly.

@anssiko
Copy link
Member

anssiko commented Feb 7, 2022

@sandandsnow, thanks for your swift response. I put a reminder to check back the status of this RFC in a week. Please let us know if PING has a meeting cadence we should align with.

We want to engage with PING as early as possible when there's a privacy-impacting concrete spec change proposal in review. Optimally, such proposals are not landed in the spec before PING has reviewed the proposed changes to minimise spec churn and to increase implementers' confidence.

@lknik
Copy link

lknik commented Feb 11, 2022

Hello,

Thanks for not limiting to the frequency reduction which was not the central culprit of some past risks. I'm happy this gets formalised and I agree that this minimises the risks of such known attacks. Minimises, as it isn't clear if we're aware of the full risk potential. That said, this change helps, and likely fixes the most "reasonable" scenarios imaginable.

I agree that "50 lx" is quite a strong limitation, unless for really specific circumstances (can't be ruled out but probably atypical anyway). Another approach could involve further reduction and possibly going from quantitative lux readout to qualitative description such as "bright", "dark", "very dark", etc.

@anssiko
Copy link
Member

anssiko commented Feb 11, 2022

@lknik, thanks for your review. Also thanks for the earlier contributions as a WG participant that also helped improve the privacy properties of this API. You’re in acknowledgements.

@sandandsnow, should we consider this to be PING’s official review or are we expecting more feedback?

@lknik
Copy link

lknik commented Feb 11, 2022

Always happy to help, @anssiko!

Feel free to name the threshold check algorithm "Janc's algorithm" (of @arturjanc) :-) (j/k)

@sandandsnow
Copy link

@lknik thanks for weighing in
@anssiko I am happy to defer to @lknik's expertise in this area. I don't imagine there will be further feedback, but I'll put this on the agenda for the next PING call on 17 February 2022 in case anyone wants to raise any further (and final) input. Thanks for your patience.

@sandandsnow
Copy link

@anssiko, we discussed this at our PING meeting on Thursday. As a consequence, there are a couple of follow-up questions. I was hoping to share them with you last week, but I'm waiting on colleagues to clarify those.

@sandandsnow
Copy link

Thank you. We discussed the proposed mitigations in the PING call today. As a result of that conversation we have a couple of follow-up questions:

  • Could you clarify for us why the WG chose a 50lx threshold?
  • Also, notwithstanding the mitigations, is there still a fingerprinting risk (albeit a reduced risk)? More specifically, to what extent does reducing to a 50lx threshold (or any threshold) prevent fingerprinting on the basis of opening up the capacity to track a user through typical behavior patterns?

And, a more general privacy question (not related to reducing granularity), how does the specification prevent or protect against cross-device tracking (e.g. the light equivalent of ultrasonic beacons)?

More specifically, we have received these observations and comments:

  • Even bucketing by 50lux still seems to expose a lot of fingerprinting surface (>=4bits given the range here), which doesn’t seem acceptable
  • Bucketing doesn’t seem to address the “ephemeral fingerprinting” concern
  • This API seems like an extremely infrequently needed feature (as evidenced by most browsers not being interested in implementing); so, why not put it behind a permission prompt?
  • This seems to be easily exploitable as a covert channel (write to the channel by changing the brightness of the content on the page, read from the channel through the brightness sensor). The spec needs to address this (e.g. through permission prompt)

@lknik
Copy link

lknik commented Feb 23, 2022

@sandandsnow May I please ask why do you consider the 50lx thing through the lens of fingerprinting risks? The point was to minimise data leak risks (here, which also sums up your observations, too -- in other words, we know about this :)). The "more general question" about cross-device... I'd say it greatly lowers the risk, but it nonetheless remains (out of bands).

@rakuco rakuco force-pushed the add-privacy-terms-and-enhancements branch from 1d1ad4d to f28ded8 Compare June 3, 2022 12:24
@rakuco rakuco changed the title RFC: Mandate that illuminance readings be rounded; require threshold value check. RFC: editorial: Add reading quantization and threshold check algorithms. Jun 3, 2022
@rakuco
Copy link
Member Author

rakuco commented Jun 3, 2022

(apologies in advance for the wall of text ahead)

Hi, @sandandsnow and @lknik.

Thank you very much for all the time spent reviewing this PR (and special thanks to @lknik for being around and watching this API for years now). My apologies for the time it took me to get back to this change. At least I did spend some time working and documenting the Generic Sensor implementation in Chromium and have a better understanding of the mitigations I am trying to "upstream" here.

I've updated this PR as well as w3c/sensors#429 to address some of the feedback received here as well as to make the prose and algorithms better match what we have in Chromium. I strongly suggest looking at w3c/sensors#429 first and then reading this PR's diff. I'll go over the current solution and then try to address the concerns @sandandsnow has brought from PING.

Current change

Compared to the previous version from the end of 2021:

  • The illuminance getter here no longer does anything special. Instead, I've moved the (optional) rounding of values to the newly-added "reading quantization algorithm" called by "get value from latest reading" in the main Generic Sensor spec. Specifications such as this one then define a "reading quantization algorithm" to perform the actual rounding, but the mechanism is generic enough that other specs (such as Accelerometer) could do something similar without having to do everything from scratch.
  • The value of the "illuminance threshold value" has changed from 50lx to "half of the illuminance rounding multiple" (25lx by default then). This matches the current Chromium implementation, although I am not sure this is the best option (more on that below).
  • I've added informative references to works analyzing potential attacks using Ambient Light Sensors, including one of @lknik's blog posts, and how the proposed changed help mitigate some of the attacks. I've also mentioned issue Security and Privacy considerations for ALS #13 where the current proposed rounding value of 50lx was first derived, and I've linked to the spreadsheet with the measurements made back in 2017 that we've been using as a basis for the decisions (in the future we could follow @anssiko's idea and add the table to this repository).
  • I've also followed https://w3c.github.io/fingerprinting-guidance/#mark-fingerprinting and added a tracking-vector mark to the security and privacy section (not sure if it should be added to a different location instead though).

Things I'd like to discuss

  • I am not entirely sure about https://arxiv.org/abs/1405.3760 because the variation in lux in figures 3 and 5 differs quite a lot. The variation in figure 5 matches my informal tests using my own phone more closely, and the mitigations suggested here would address them more effectively than the variation in figure 3.
  • The "threshold check algorithm" idea
    • I am still not 100% sure the whole idea behind the it counts as a fingerprinting/data leak mitigation or if it's just a sort of high-pass filter for changes in illuminance. Some operating systems and hardware platforms even take care of this kind of check themselves, where new readings are made available only if they differ from the latest one by some percentage or absolute value. In Chromium, the idea was to avoid reporting too many changes when illuminance values hovered around the "edges" between rounding multiples, so that moving from 24lx to 26lx and back to 24lx wouldn't generate 3 "reading" events and update the readings, for example, and let an attacker know that the actual value is probably in that region. I can see how this helps avoid leaking data, but is it effective enough to count as a mitigation?
    • The actual "illuminance threshold value" suggested here (25lx) came up during code review years ago, but I don't think there's a strong motivation for it to be 25lx and not the same as the "illuminance rounding multiple" (50lx). Using 25lx means the API might end up reporting a new reading even if the rounded value has not changed. For example, going from 26lx to 53lx in raw readings would generate a "reading" event even though the illuminance attribute would return 50 in both cases. Cases like this could allow an attacker to know that the actual illuminance value is in the [25,75) half-open interval and that the two readings differ by at least 25lx and at most 49lx. If we the "illuminance rounding multiple", this case would not exist.

PING's concerns

Thank you. We discussed the proposed mitigations in the PING call today. As a result of that conversation we have a couple of follow-up questions:

  • Could you clarify for us why the WG chose a 50lx threshold?

Done in the spec and also above, hopefully.

[other questions]

Please correct me if I'm wrong, but I'm under the impression that some of those concerns came up by looking at this spec in isolation without looking at the main Generic Sensor spec. https://w3c.github.io/sensors/#concepts-can-expose-sensor-readings and https://w3c.github.io/sensors/#abstract-operations mandate, for example, that:

  • Sensor readings are only exposed under the following conditions:
    • The environment is a secure context.
    • Sensor usage is allowed via the Permissions Policy API.
    • The document's visibility is "visible".
    • The currently focused area belongs to a document whose origin is same origin-domain with document’s origin.
    • The call(s) to request permission to use in Sensor.start() have passed.

It is up to each UA to implement "request permission to use", and it might involve prompting users, for example. At the moment, Chromium does not prompt users for access to motion sensors (e.g. accelerometer and gyroscope) but lets them allow or block access by default. We are also working on making this better by moving to prompting by default (and removing the "allow by default" option) as part of the working on implementing Device Orientation's requestPermission() method. Additionally, in https://www.w3.org/2021/10/29-dap-minutes.html#t07 we also decided to also add a camera permission requirement to the spec to make the permission requirements stricter (I still have to address that one).

With the above in mind, let me try to get to the specific questions:

  • Also, notwithstanding the mitigations, is there still a fingerprinting risk (albeit a reduced risk)? More specifically, to what extent does reducing to a 50lx threshold (or any threshold) prevent fingerprinting on the basis of opening up the capacity to track a user through typical behavior patterns?

I believe the fingerprinting risk remains. Even though we reduce the granularity of the data exposed to API users, an attacker could still know that a user is e.g. at an office environment between certain hours (320-500lx per https://en.wikipedia.org/wiki/Lux#Illuminance), and walks under full daylight at a certain time of the day (1000 to 10000lx). The mitigations listed above help prevent that websites (including third-parties) have undetected and unprompted access to the data.

And, a more general privacy question (not related to reducing granularity), how does the specification prevent or protect against cross-device tracking (e.g. the light equivalent of ultrasonic beacons)?

The idea with the set of mitigations proposed here and in the Generic Sensor spec is to make the readings coarse enough to help prevent cross-device tracking while at the same time only making readings available to pages that fulfill the requirements above and which the user has authorized to gather data.

More specifically, we have received these observations and comments:

  • Even bucketing by 50lux still seems to expose a lot of fingerprinting surface (>=4bits given the range here), which doesn’t seem acceptable

Do the mitigations above help make it more acceptable? I'm asking because this is also the case even for specs such as https://w3c.github.io/deviceorientation that are implemented by multiple engines: a DeviceMotionEvent can include the output of two 3-axis accelerometers, a gyroscope and a double (interval), and isn't that a lot more bits? I'm not asking this rhetorically, as I'm basing my calculations on https://www.eff.org/deeplinks/2010/01/primer-information-theory-and-privacy and am not sure if this is right.

  • Bucketing doesn’t seem to address the “ephemeral fingerprinting” concern

Are you referring to https://github.com/asankah/ephemeral-fingerprinting or is there another resource I could look at? That page lists several possible mitigations and we implement many of them, so I'm wondering if the Generic Sensor + ALS mitigations do address the concern at least partially?

  • This API seems like an extremely infrequently needed feature (as evidenced by most browsers not being interested in implementing); so, why not put it behind a permission prompt?
  • This seems to be easily exploitable as a covert channel (write to the channel by changing the brightness of the content on the page, read from the channel through the brightness sensor). The spec needs to address this (e.g. through permission prompt)

Answered above: the permission side of things is handled in the main Generic Sensor spec, UAs are free to handle the permission request implementation, we want to add a prompt to the Chromium implementation. Additionally, when it comes to the covert channel attack, the bucketing idea also helps make it more difficult -- the idea looks similar to https://arturjanc.com/ls/ after all, which the bucketing idea is supposed to help address.

Copy link
Member

@anssiko anssiko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Unblocking my review, @sandandsnow @lknik & other PING folks' review is on the critical path.

still useful for API users. The value of 50 lux as a minimum for the
[=illuminance rounding multiple=] was determined in <a
href="https://github.com/w3c/ambient-light/issues/13#issuecomment-302393458">GitHub
issue #13</a> after different ambient light level measurements under different
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be indeed good to snapshot this table mentioned in #13 (comment) into this repo and link to it from the spec. Maybe https://www.npmjs.com/package/csv2md could help with the conversion.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was considering doing that in a separate PR. Is there any other spec I could look at for inspiration? Should it be added to the spec itself or as a separate file? Wasn't there a different W3C process for handling this sort of data?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My recommendation would be to put this type of informational data in a separate file in the repo and link to it. Markdown happens to render nicely so that's an OK format. One example: https://github.com/webmachinelearning/webnn/blob/main/op_compatibility/first_wave_models.md

This data could also go to an appendix in the spec but that'd mean an HTML table that is less fun to maintain if needed.

The key idea here is to have a reference that won't rot in case that Google Sheet goes down. The exact format is not so important as long as it can be read without special software. See https://www.w3.org/Consortium/Persistence.html -- note this pledge predates GitHub, also w3c GH org is archived nowadays.

To be more fancy, we could have a local biblio entry for it.

@rakuco rakuco marked this pull request as ready for review June 3, 2022 14:04
Copy link
Member

@reillyeon reillyeon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved with a fix for a reference to an unknown definition.

index.bs Outdated Show resolved Hide resolved
@rakuco rakuco force-pushed the add-privacy-terms-and-enhancements branch from 31d469e to b4ebffe Compare June 10, 2022 14:15
@rakuco
Copy link
Member Author

rakuco commented Jun 10, 2022

Approved with a fix for a reference to an unknown definition.

@reillyeon while I have you here, could you take a look at the "threshold check algorithm" idea part of #77 (comment)? I'd like to double-check those items with you since you were around when this was discussed when reviewing the initial version of these mitigations in Chromium.

@reillyeon
Copy link
Member

@reillyeon while I have you here, could you take a look at the "threshold check algorithm" idea part of #77 (comment)? I'd like to double-check those items with you since you were around when this was discussed when reviewing the initial version of these mitigations in Chromium.

It may be possible to defer the threshold checking to the hardware or operating system as long as the threshold is implemented as a delta from the previously reported value. It is important that this works correctly as it is critical to making rounding an effective mitigation when rounding to a value significantly higher than the noise in the system, as is the case with the ambient light sensor.

Related to w3c#63, which says the granularity of the data exposed by Ambient
Light Sensors should be specified normatively.

This commit goes a bit further and specifies the two anti-fingerprinting
measures currently implemented by Chrome -- namely, not only are illuminance
values rounded but there's also a threshold value check to avoid storing
values that are too close to the latest reading.

w3c/sensors#429 defines the concepts of "reading quantization algorithm" and
"threshold check algorithm" that concrete sensors can specify. We specify
both here, along with some values used by them (based on the current
Chromium values):

- An "illuminance rounding multiple" of at least 50lx.
- An "illuminance threshold value" of at least 25lx (half the illuminance
  roundig multiple, to be more precise).

These values are then used in the following algorithms:
- The "threshold check algorithm" checks that the difference between new and
  current illuminance values is above the illuminance threshold value.
- The "reading quantization algorithm" rounds up readings to the closest
  multiple of the illuminance rounding multiple.
This follows
https://w3c.github.io/fingerprinting-guidance/#mark-fingerprinting and makes
it clear that this API can increase the fingerprinting surface despite the
proposed mitigations.
@rakuco rakuco force-pushed the add-privacy-terms-and-enhancements branch from b4ebffe to 44e8b41 Compare June 15, 2022 12:50
@rakuco
Copy link
Member Author

rakuco commented Jun 15, 2022

I've pushed a new version of this PR with a few changes:

@rakuco
Copy link
Member Author

rakuco commented Jul 5, 2022

@sandandsnow @lknik friendly ping, just wondering if any of you had time to take a look at the changes pushed to this PR as well as w3c/sensors#429

@lknik
Copy link

lknik commented Jul 5, 2022

In my opinion, the threshold method helps mitigating the risk. Of course, some potential remains but it would be much more difficult to abuse in practice.

The reason Fig 3/5 in the referenced PDFs vary so much may be due to the tested environment. In my tests, 50lx differences were also recorded routinely. However, in my view it is less likely (if mitigations are deployed) to abuse it in practice to e.g. exfiltrate data, as then the environmental changes would contribute less to a reliable abuse.

So let's move forward. There's still a risk that some academic team will want to validate the boundary issues, but such is life :)

@rakuco
Copy link
Member Author

rakuco commented Jul 5, 2022

Thanks, @lknik, I really appreciate the review and that you've stuck around for so many years.

For the record, you might also be interested in #79 where we discuss the permissions/permission prompt situation (and requiring the camera permission for this API), which is also part of the analysis you've written about ALS.

@rakuco rakuco changed the title RFC: editorial: Add reading quantization and threshold check algorithms. editorial: Add reading quantization and threshold check algorithms. Aug 8, 2022
@anssiko
Copy link
Member

anssiko commented Aug 8, 2022

Thanks @lknik for your privacy-focused suggestions and review throughout the years (plural).

@sandandsnow we'd still be happy to hear PING's feedback for this proposed editorial improvement before we merge this PR.

@sandandsnow
Copy link

I'm happy to be guided by @lknik, but I have drawn this to the attention of my PING co-chairs in case they have anything further they wish to raise.

@anssiko
Copy link
Member

anssiko commented Aug 22, 2022

@sandandsnow, it seems no concerns from the other PING co-chairs have been raised, could we merge this? If so, we’d appreciate if you could your submit approval with the usual GH facility (Files changes > Review changes).

@sandandsnow
Copy link

@anssiko We're happy for you to close the issue.

@anssiko
Copy link
Member

anssiko commented Aug 22, 2022

Thanks @sandandsnow!

@anssiko anssiko merged commit 2fbaa74 into w3c:main Aug 22, 2022
github-actions bot added a commit that referenced this pull request Aug 22, 2022
SHA: 2fbaa74
Reason: push, by @anssiko

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
@rakuco rakuco deleted the add-privacy-terms-and-enhancements branch August 23, 2022 09:57
@plehegar plehegar added privacy-tracker Group bringing to attention of Privacy, or tracked by the Privacy Group but not needing response. and removed privacy-tracker Group bringing to attention of Privacy, or tracked by the Privacy Group but not needing response. labels Aug 29, 2022
rakuco added a commit to rakuco/sensors that referenced this pull request Jan 17, 2023
These two operations have been referenced by the Ambient Light Sensor spec
since w3c/ambient-light#77 but the `<dfn>`s in this spec were not properly
exported.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants