Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flexible event backwards incompatibility: modulus operator on trigger data #765

Closed
csharrison opened this issue Apr 17, 2023 · 17 comments · Fixed by #1044
Closed

Flexible event backwards incompatibility: modulus operator on trigger data #765

csharrison opened this issue Apr 17, 2023 · 17 comments · Fixed by #1044
Labels
compat issue that may affect web compatibility flexible-event Regarding the flexible event configuration proposal

Comments

@csharrison
Copy link
Collaborator

https://wicg.github.io/attribution-reporting-api/#obtain-an-event-level-report specifies currently how we deal with the trigger_data associated with an event-level trigger configuration:

trigger data
The remainder when dividing config’s trigger data by triggerDataCardinality.

This is not really compatible with the flexible configuration because that positions trigger_data as another filtering criteria. I don't see a good way to support something like this unless we force sources to specify data in the range [0, max], but this breaks the use case illustrated in this example.

cc @avvarad @apasel422

@csharrison csharrison added the flexible-event Regarding the flexible event configuration proposal label Apr 17, 2023
@csharrison csharrison added the compat issue that may affect web compatibility label Jun 27, 2023
@johnivdel
Copy link
Collaborator

It would be if an implementation could measure how often this modulus operation is exercised, and use that to evaluate the amount of breakage if we forced trigger data to be within [0, triggerDataCardinality].

I don't think it's unreasonable to expect this is low, as the intention of this was to support registration between UAs with different cardinalities which is not the case today.

In either case, we could consider:

  1. Removing the graceful degradation which would not result in user visible breakage, but reports not being delivered

  2. Special casing the "default" configuration s.t. they continue to use the modulus operation, but any developer defined trigger specs would not (I don't think this would cause problems, as the use-case in the example requires a custom trigger spec). This introduces sufficient complexity we should try to avoid it in favor of (1)

@apasel422
Copy link
Collaborator

It would be if an implementation could measure how often this modulus operation is exercised, and use that to evaluate the amount of breakage if we forced trigger data to be within [0, triggerDataCardinality].

I can look into this for Chromium.

A related consideration is that in today's trigger registrations, the trigger_data field is a string, but the flexible configuration proposal uses JSON integers. Probably too subtle, but we could treat string-encoded ones as they are today by applying modulus, but treat integers as modulus-free for flexible-configuration purposes.

@apasel422
Copy link
Collaborator

@yanzhangnyc Could you comment on how this interacts with your flexible-event implementation work?

@yanzhangnyc
Copy link
Collaborator

In non-flex version, there is no pre-define eligible list of trigger_data, so I think it makes sense to use the modulus to enforce the cardinality. In flexible event API, the eligible list of trigger_data is pre-defined in the source and we can enforce the cardinality at source. All trigger_data at trigger that not on the eligible list will not be matched.

@apasel422
Copy link
Collaborator

Proposal:

Add a top-level field trigger_data_transform to source registrations. Its value is either "none" or "modulus", defaulting to "modulus". Other values may be added in the future.

When a trigger whose effective event-level configuration containing trigger_data t is attributed to a source, the source's trigger_data_transform field is consulted:

  • If its value is "none" and the source does not contain a trigger_spec whose trigger_data contains t, the trigger is dropped.
  • If its value is "modulus":
    1. Let list be a list containing, in ascending order, the union of the source's trigger_specs' trigger_data.
    2. Set t to the result of list[t % list.length].

This avoids breakage of existing reliance on the modulus operation, allows it to be used in Full Flex, and provides an opt out for reporting origins that wish to guarantee that the trigger data exactly matches their source configurations.

@apasel422
Copy link
Collaborator

Re #765 (comment) the field and value naming could also be reversed, so it's something like

{
  "trigger_data_matching": "exact" | "modulus"
}

@yanzhangnyc
Copy link
Collaborator

If we add a new field trigger_data_matching, we should also define if trigger_data_matching == modulus, the trigger_data should not exact match [0, trigger_data_cardinality and trigger_data_cardinality should be defined explicitly

@csharrison
Copy link
Collaborator Author

@yanzhangnyc can you elaborate, maybe with an example? I don't understand your suggestion.

@yanzhangnyc
Copy link
Collaborator

@csharrison, 1) we need explicitly define trigger_data_cardinality; 2) all of the trigger_data should between [0, trigger_data_cardinality). Here is an counter example. When the trigger contains trigger_data: 6, no clear expectation on the behavior.

{
  "trigger_data_matching": "modulus"
  "trigger_spec": [
    {
      "trigger_data": [1,2],
      // report windows end [2, 7]
    },
    {
      "trigger_data": [5],
      // report windows end [30]
    }
  ]
}

@apasel422
Copy link
Collaborator

In that example, the implicit cardinality is 3, since there are 3 trigger data values: [1, 2, 5]. 6 % 3 = 0, and therefore according to #765 (comment) we would arrange [1, 2, 5] in ascending order and select 1, which is at index 0.

@yanzhangnyc
Copy link
Collaborator

@apasel422 how about this example

{
  "trigger_data_matching": "modulus"
  "trigger_spec": [
    {
      "trigger_data": [1],
      // report windows end [1]
    },
    {
      "trigger_data": [2],
      // report windows end [2]
    },
    {
      "trigger_data": [4],
      // report windows end [4]
    }
  ]
}

How about the reporting window for trigger data 0, 3, 4, 5, 6, 7

@apasel422
Copy link
Collaborator

Per my original comment, since modulus is in effect, we take the union of all trigger_data sets for the source and create a list list sorted in ascending order from those, giving us [1, 2, 4].

If a trigger has trigger_data...

  • 0: list[0 % list.length = 0] = effective trigger_data 1
  • 3: list[3 % list.length = 0] = effective trigger_data 1
  • 4: list[4 % list.length = 1] = effective trigger_data 2
  • 5: list[5 % list.length = 2] = effective trigger_data 4
  • 6: list[6 % list.length = 0] = effective trigger_data 1
  • 7: list[7 % list.length = 1] = effective trigger_data 2

@apasel422
Copy link
Collaborator

I could also imagine there being clamp_min, clamp_max, and clamp.

apasel422 added a commit to apasel422/attribution-reporting-api that referenced this issue Sep 25, 2023
@yanzhangnyc
Copy link
Collaborator

@apasel422 I think this way of definition is a little bit trick. The trigger_data serves both index and value. I suggestion if modulus is chosen, the user should define a completely and explicitly way to avoid confusion. It is similar logic to flexible event API, that is, uses may not define it at all so it can fall back. But it the users provided trigger_specs, it must be a valid one or the source registration is void.

@apasel422
Copy link
Collaborator

The trigger_data serves both index and value.

Technically the definition above only uses the trigger data as index when modulus is in effect.

That said, I would be fine allowing modulus only when the trigger specs define a contiguous range starting at 0. For example, we would allow "trigger_data_matching": "modulus" with trigger specs whose trigger_data union equals [0, 1, 2, 3] but not with [1, 2, 3] or [0, 1, 4].

@apasel422
Copy link
Collaborator

#1044

@yanzhangnyc
Copy link
Collaborator

@apasel422 Yes. I understand. My concern is that it is confusing for 'trigger data as index'. For example, if trigger data 1, 2, 3, 5 is defined, and trigger data in trigger will mapping as 0->1, 1->2, 2->3, 3->5, 4->1, 5->2.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compat issue that may affect web compatibility flexible-event Regarding the flexible event configuration proposal
Projects
None yet
4 participants