Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposing updates based on community feedback #91

Merged
merged 3 commits into from
Aug 22, 2022
Merged

Conversation

krgovind
Copy link
Collaborator

No description provided.

krgovind added 3 commits July 27, 2022 11:51
Proposing changes to First-Party Sets based on community feedback.
Copy link

@dmarti dmarti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Positive changes here, but I'm concerned about a possibly subjective and time-consuming review process for participants in public reviews.

</tr>
<tr>
<td>service</td>
<td>Reserved for utility or sandbox domains.<br>
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is likely to get tested by people setting up a domain to look like a service domain, and then switching it to do something different. Are there any other limitations on a "utility or sandbox" domain that would keep it functional for those purposes, but would create less temptation to switch it to something else (for example a service domain could not have third-party resources from a domain that is not a member of its set?)

Enforcement is going to be more of a challenge for this one than for the first two, since the first two are much easier to enforce automatically.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is a utility or sandbox domain always a domain that is not intended to appear in search results? Another possible rule here would be to require that service domains block general-interest web search engines in their robots.txt.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think that this is a great approach to service/utility/sandbox domains, where we can make a few strong assumptions along the lines of "this is not a self-sufficient site that users would discover and interact with, except when integrating with other sites in the set", then build hard technical constraints that match these, like requiring of blocking crawling.

This is something we should work with developers to figure out, to understand potential use cases of "service" domains and which constraints would break valid use cases. I think this is partially tracked in #95 and we might have time to discuss it on the WICG call.

<tr>
<td>associated</td>
<td>Reserved for domains whose affiliation with the set primary is clearly presented to users (e.g., an About page, header or footer, shared branding or logo, or similar forms).</td>
<td>Limit of 3* domains. If greater than 3, auto-reject access.<br>
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An additional rule might make this practical: a pull request that adds an "associated" set must include user research to substantiate that the set is clearly presented to users.

  • research should cover a representative sample of the audience for the sites that are members of the proposed set
  • research should document the understanding on the part of the users of the relationship among site members
  • results should be statistically significant
  • researchers should be available to answer questions on the PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion, Don! A few personal thoughts from my side:

I agree with your concern about long discussions/pile-ons with subjective opinions of what constitutes "clearly presented" and we should try to avoid that.

However, I feel like asking for dedicated user research is adding significant cost to submission. It's a barrier to smaller sites and sites from non-English speaking countries, and even mid-sized organizations may struggle coordinating this kind of project.

Conversely, if an attacker falsely claims two domains have a clear affiliation, faking or manipulating user research in order to pass doesn't seem unthinkable. So we're setting up the enforcement entity/community to the challenge of disproving the presented results, which they would always strive to do if the association is not subjectively clear to them. As such, "clear presentation" seems like it would end up to be the main criterion either way?

In the end, I'm not sure that user research would actually sway most people's subjective opinion about whether two domains are clearly affiliated.

Given that, it seems a bit excessive to ask of developers by default, though I think it's interesting to consider as an optional step.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A problem with having the public reviewers directly review the set is that they're not necessarily going to get the same experience as the real users who are using the set member sites. A set could do some pages that are "clearly presented" (home page, privacy policy) for public review, and then promote other pages that are not clearly presented or easily findable. (For example, a clinic site and an online pharmacy could make their home pages clearly common set members, then use social media ads to promote "independent doctor recommendations" and "product info" pages that are designed to appear independent).

FPS is an exchange of value for value -- the set members get additional information from users, and the users get additional checks on the practices and reputations of the set members. Set members are asking for something from users, so should offer something in exchange -- research that reflects people's real experience with the set members. And the research does not have to be a separate project -- it can be an expansion of the user studies that set members are already doing.

There is a risk that a user researcher could present a completely bogus study, but each researcher would only be able to get away with that once. New and unknown researchers would receive additional attention.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think #101 is somewhat picking up on this topic, so maybe we can continue to discuss over there.


Additionally, there are other enforcement strategies we could consider to further mitigate abuse. If there is a report regarding a domain specified under the "service" subset, potential reactive enforcement measures could be taken to validate that the domain in question is indeed a "service" subset.

For some subsets, like the "associated" subset, objective enforcement may be much more difficult and complex. In these situations, the browser's handling policy, such as a limit of three domains, should limit the scope of potential abuse. Additionally, we think that site authors will be beholden to the subset definition and avoid intentional miscategorization as their submissions would be entirely public and constitute an assertion
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One likely risk here is that the repository gets filled with long discussions of people's subjective evaluation of whether a common context or branding is "clearly presented." Limiting the number of domains would reduce the payoff for winning one argument about whether a set is clear to the user, but could increase the number of such arguments.

Graphic design conventions, and of course text about a site's set membership, can also be different across languages and cultures. A rule based on posting user research to the repository (see my comment on line 252) could help to limit the number and length of arguments about "clear" sets.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned, if the primary idea for user research is to discourage discussion, maybe it makes more sense specifically for organizations that expect such discussion to occur to prepare such research, and not for smaller sites or sites that are confident that their use case is clear enough that it won't come under public scrutiny.


### Abuse mitigation measures

We consider using a public submission process (like a GitHub repository) to be a valuable approach because it facilitates our goal to keep all set submissions public and submitters accountable to users, civil society, and the broader web ecosystem. For example, a mechanism to report potentially invalid sets may be provisioned. We expect public accountability to be a significant deterrent for intentionally miscategorized subsets.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Public accountability will be valuable as long as the number of sets requiring review is manageable and the review work is seen as meaningful. It would be useful to limit the number of proposed sets requiring human attention, especially for the "associated" type, by requiring associated research and through setting sensible waiting periods for re-submitting the same domain in a new set after a rejection of a previous one.

It is always important to consider the incentives for participating in public accountability. People will be less likely to participate if their efforts are seen as duplicating a task that could have been automated, or as joining an open-ended argument about whether a set is valid. There is likely to be more and higher-quality participation if it is presented as evaluating the existing user research about a set than if it is just an opportunity to express opinions.

There will also need to be appropriately designed and enforced anti-abuse and anti-harassment measures for public review participants. (nobody wants to get SWATted because their comment kept some scammer's bogus set from getting in)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is an important point to make, besides your point about user research evaluation, we should think about how to ensure that all humans involved in the public process are appropriately protected and feel like their time is spent well. I can file an issue to that effect!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filed #105

<td>Reserved for domains whose affiliation with the set primary is clearly presented to users (e.g., an About page, header or footer, shared branding or logo, or similar forms).</td>
<td>Limit of 3* domains. If greater than 3, auto-reject access.<br>
<br>
<em>*[^1]exact number TBD</em></a></td>
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly more effective than a hard limit would be setting the delay for re-submitting a domain from a rejected set into a new set based on the number of members of the rejected set. (Forming a larger set would mean taking more risks of having the member domains kept out of future sets for longer, so create more incentives to do research that is more likely to hold up)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, catching up to this a bit late. I think this suggestion adds unnecessarily high stakes to the review process. Just because of the set structure (which is a technical detail), a rejection/removal could have catastrophic consequences for "legitimate" set owners. It seems very arbitrary and not like a great submission experience.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review, Don! The reasoning for the hard limit is based on previous discussions that suggests it is harder to ensure user understanding of associated sites for large sets; and it is difficult to run technical checks or other enforcement at scale to ensure user understanding. So the numeric limit is aimed to reducing the scope of abuse. Enforcing a delay for re-submitting a domain from a rejected set into a new set doesn't address the user understanding issue.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm planning to merge in this PR; but we can continue the discussion on #93

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you--I also opened #101 to cover an example of a possible method of abuse that could be done with a small set.

Copy link
Member

@johannhof johannhof left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that we're now mostly discussing the how of the new proposal and related details, I think it's reasonable to merge this PR to reflect the most recent state of FPS in the README.

We can update the README later as we refine things further.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants