Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Capability to skip deals for indexing: need config to mark content as not retrievable in market #689

Open
honghaoq opened this issue Aug 5, 2022 · 12 comments
Labels

Comments

@honghaoq
Copy link

honghaoq commented Aug 5, 2022

For contents that need to remain private and not retrievable, need a config to flag it so we won't index it. This requirement is brought by Patrick from Factor8 as they have deals with some clients that need to keep data private.

@LaurenSpiegel
Copy link
Collaborator

After further discussion, to maintain flexibility we should separate the concept of "announced" from publicly retrievable.

**So, we should implement a deal option of -- AnnounceIndexes: true|false - default true.

For prior deals, we should have a way for SP's to set false.**

By implementing this we should actually increase the amount of data indexed since some SP's are completely disallowing indexing from full miners just because they do not want to announce some of the deals.

Potential future uses for announcing indexes of "non-public" data:

  1. client who is not very concerned about privacy so would opt for the ease of the indexer helping with discoverability (so data is encrypted but indexed).
  2. if we build in authorization tokens in the request, could use public index and have some data with access via the auth tokens.

@jacobheun @dirkmc @brendalee @willscott

@willscott
Copy link
Collaborator

Note that the indexer expects data that is announced to be publicly retrievable.

if boost providers announce data but expect to not make it retrievable, they risk being de-listed from the indexer, because as we get reports of data they announce not being retrievable that will hurt their reputation and cause the indexer to stop providing records from that provider because we cannot have confidence a downstream client can find them useful.

unless you provide an additional signal to the indexers that the data actually is retrievable, expect that this means those providers end up risking loosing retrieval rewards or ability to participate in systems like saturn.

@LaurenSpiegel
Copy link
Collaborator

  1. Where is this reputation dinging documented or decided?
  2. We could implement the IsPublic flag at the same time to clarify intent.

@LaurenSpiegel
Copy link
Collaborator

Further discussion with @willscott :

  • We want the indexer to be as fast as possible. To do this, we need to be hitting nodes for data that are actually serving publicly. If we do allow non-public data to be indexed we would need additional metadata as to when those nodes should be used. So, IsPublic:false + AnnounceIndex:true would need further development considerations.
  • Unless there is a compelling use case NOW for AnnounceIndex:true + IsPublic:false (or AnnounceIndex:false + IsPublic:true), we can start with just an IsPublic flag which will serve as AnnounceIndex:true as well. When designing the IsPublic flag we should keep in mind the future potential flexibility of adding the separate explicit AnnounceIndex flag.

Any further discussion should be added to this thread for resolution.

@honghaoq
Copy link
Author

honghaoq commented Oct 21, 2022

Assuming we go with the 2nd point above, when the IsPublic flag is set to false (in the case of private data), is the plan to integrate with ACL/Auth tooling like cid gravity with boost to provide that filtering protection on retrieval? Before that is fully implemented, the caveat is it would give client a false sense of safety since isPublic = False hints privacy, but it is still publicly retrievable. I think it is fine to go with it, we would just need to communicate with SPs clearly about the expectation (and roadmap) over there.

@LaurenSpiegel
Copy link
Collaborator

Good point. Would not equate IsPublic:false with "private" and will need to make this very clear.

@willscott
Copy link
Collaborator

A storage deal today has two boolean fields associated with it:

FastRetrieval bool
VerifiedDeal  bool

In understanding / moving towards a more nuanced permissioned retrieval story, I'd propose that the evolution of semantics would be:

  • existing deals with both of these flags set, are, by default, considered isPublic. Other deals are not.
  • SPs are able to update and view the isPublic state of deals
  • We introduce the concept of the isPublic state for a deal in the next version, allowing clients to specify it in new deals, and in the interface.

Other notes

  • evolution of the API for a more complex ACL story e.g with cid gravity is a separate work item that will happen after this.
  • We can initially support isPublic in the deal proposal, which is not on chain. We will need to subsequently decide if we want to propose a FIP to include this state on chain as well for accountability to fil+ notaries.

@LaurenSpiegel
Copy link
Collaborator

@willscott, interesting idea. If a deal was supposed to have FastRetrieval and was a VerifiedDeal it makes sense to at least default to announcing to the indexer.

FastRetrieval is stored in the DAGstore of each SP and not on chain so no easy way to see roughly what percentage of deals have these set to True, correct? (I don't see FastRetrieval or similar in Lillium's data model-- https://lilium.sh/data/models/).

@willscott
Copy link
Collaborator

correct.
However, you can look at percentage of fil+ deals, or look at the amount that aggregators with known deal-making configurations have sourced to get approximations of what that rate would be.

@honghaoq honghaoq changed the title Capacity to skip deals for indexing: need config to mark content as not retrievable in market Capability to skip deals for indexing: need config to mark content as not retrievable in market Nov 12, 2022
@jacobheun
Copy link
Contributor

Related discussion filecoin-project/notary-governance#666

@TorfinnOlsen
Copy link

🎣 After reviewing with the team we've come to the conclusion that we would like to proceed with a simple revision to allow SP's to let clients set the announce status to indexers.

tl;dr Request to add a flag to boost to allow storage clients to elect to not announce deal data to IPNI. Default behavior will be to announce just as it is today.

🕐 When is a deal announced to the indexer?

  • When a deal is made, if AnnounceToIPNI=True or not set at all, CID’s announced to indexer.
  • If AnnounceToIPNI=False, no announcement.

🔧 How will this be implemented?

  • Currently, we are on Storage Deal Protocol v1.2 (/fil/storage/mk/1.2.0) where every deal is announced.
  • This new proposal is for Storage Deal Protocol v1.2.1 (/fil/storage/mk/1.2.1)
  • Boost will be updated to recognize and store as metadata this new flag.

🥡 How does this impact whether the data may be retrieved?

  • It doesn’t. No impact. ACL’s must be set separately. This just relates to IPNI.

🛃 How can a client change the announce status?

  • No change currently permitted. The decision is made at the time the deal is made.

👴 What about legacy deals?

  • They stay announced as they are today.

ℹ️ More details including discussions leading up to this proposed course of action can be found here:

🌵

  • A note from @masih - "Zero-value for boolean in Go is false. To have the default to be true it needs to be set explicitly every time. That's why in go config is worded such that its zero-value in golang is the intended default."

Related Github Discussion

Relegated github issues/discussions being closed as a result of this update

@brendalee
Copy link
Collaborator

brendalee commented Jan 18, 2023

I believe @LexLuthr completed this work and added the relevant flags to the storage deal proposal / boost client code - #1051. @TorfinnOlsen and I have drafted an FRC as well, once that lands we can close this issue.

(cc @dirkmc to keep me honest).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: No status
Development

No branches or pull requests

6 participants