Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rationalize parents cache key derivation. #1192

Open
porcuquine opened this issue Jun 29, 2020 · 9 comments
Open

Rationalize parents cache key derivation. #1192

porcuquine opened this issue Jun 29, 2020 · 9 comments
Labels
P2 Low priority

Comments

@porcuquine
Copy link
Collaborator

Description

The parents cache key is currently derived from an explicit reference to the Feistel keys, as well as to other information which identifies the DRG (sector size, hash, base and expansion degrees, etc.).

However, the DRG seed is not explicitly included in the derivation — and it should be: if the DRG seed changes, the cache will be invalid.

In practice, this is still safe because both Feistel Keys and DRG seed are derived from porep_id. Therefore, if DRG seed changes, Feistel keys must also. The current mechanism is somewhat elaborate and involves reuse of the DRG's identifier() method — which is also used to construct the circuit identifier. As noted in a comment, the DRG seed should not be included there, because by design the DRG seed could be modified without a change to circuits.

One 'correct' solution would be to separate the identifier into two methods, a circuit_identifier and a cache_identifier (or some other semantically consistent names). The latter could depend on the former but enhance it with the DRG seed. A similar split would need to also happen for the composite expander+drg graph, which already depends on the extant identifier. To get this normalized right, might be somewhat tricky (since the new circuit_identifier must not depend on the DRG's cache_identifier, but the new cache_identifier must do so while also depending on its own corresponding circuit_identifier.

Alternately, the cache could just be made to depend directly on the porep_id. This is reasonable, since that value is used to uniquely determine a graph. Graphs must change if porep_id does and not otherwise.

Acceptance criteria

  • Parents cache key includes either both DRG Seed and Feistel keys (less good) or neither but does contain porep_id (better).
  • Circuit identifiers must not change. The expected names of existing groth parameters should not be affected.
  • Actual graph structure must not change.

Risks + pitfalls

It might be easy to get this requirement wrong when refactoring the identifiers:

  • Circuit identifiers must not change. The expected names of existing groth parameters should not be affected.

NOTE: since the expected outcome is a change to parents cache, all currently-deployed caches will be invalidated by the change.

Where to begin

@porcuquine porcuquine added the P2 Low priority label Jun 30, 2020
@dignifiedquire
Copy link
Contributor

@porcuquine is this still needed?

@porcuquine
Copy link
Collaborator Author

Technically, yes. If not, should at least put a reference-to/explanation-of the issue in the parents cache key derivation (cache_path) and a reminder that derive_porep_domain_seed must not change without potentially silently breaking caches.

@hzsong123
Copy link

hello, my dear friend. I have no idea about the concept of DRG an DRG graphs. Which urls I can find them?
thank you very much.

@vmx
Copy link
Contributor

vmx commented Nov 23, 2020

@hzsong123 you can find more information about it in the Filecoin specification.

@cryptonemo
Copy link
Collaborator

This appears to have been resolved here: 332d0a9

And used here: https://github.com/filecoin-project/rust-fil-proofs/blob/master/storage-proofs-porep/src/stacked/vanilla/cache.rs#L414

@porcuquine If you agree, please close this issue. If not, the acceptable criteria is not clear.

@porcuquine
Copy link
Collaborator Author

Looking at your second link, I see that the parent cache key is derived from the Feistel keys only.

The relevant acceptance criterion was:

Parents cache key includes either both DRG Seed and Feistel keys (less good) or neither but does contain porep_id (better).

For this to be true, the cache_path function should be hashing the porep_id, not the Feistel keys (or — less good — additionally hashing the DRG seed).

Does that clarify the meaning?

@cryptonemo
Copy link
Collaborator

Looking at your second link, I see that the parent cache key is derived from the Feistel keys only.

The relevant acceptance criterion was:

Parents cache key includes either both DRG Seed and Feistel keys (less good) or neither but does contain porep_id (better).

For this to be true, the cache_path function should be hashing the porep_id, not the Feistel keys (or — less good — additionally hashing the DRG seed).

Does that clarify the meaning?

Not really. The feistel keys are derived from the porep_id, as is the drg_seed. It's not clear to me what's missing since the parent's cache_id is then derived from the feistel keys (we had an upgrade that proved this, no?)

The first link contains the following, but here are more direct links:

https://github.com/filecoin-project/rust-fil-proofs/blob/master/storage-proofs-porep/src/stacked/vanilla/graph.rs#L80
https://github.com/filecoin-project/rust-fil-proofs/blob/master/storage-proofs-core/src/crypto/mod.rs#L12
https://github.com/filecoin-project/rust-fil-proofs/blob/master/storage-proofs-core/src/drgraph.rs#L243

@cryptonemo
Copy link
Collaborator

Perhaps the confusion is that it appears correct, but without a comment on the derive_porep function, problems could potentially be introduced in the future if someone changed how that's hashed. If that's the case, I agree that a comment should suffice.

@porcuquine
Copy link
Collaborator Author

porcuquine commented Jan 21, 2021

The feistel keys are derived from the porep_id, as is the drg_seed. It's not clear to me what's missing since the parent's cache_id is then derived from the feistel keys (we had an upgrade that proved this, no?)

Your observation is the same thing I meant by this line in the original issue:

In practice, this is still safe because both Feistel Keys and DRG seed are derived from porep_id.

In other words, this is incidentally correct because Feistel keys happen to depend on porep_id. However, this issue is about a fix which would capture the actual requirement directly.

I am not saying fixing this is high priority. I am just clarifying what the issue is about. An example of how this could matter is that with this implementation, there is an implicit requirement that the DRG seed never change independent of the Feistel keys. While I don't think it is likely that we will accidentally or intentionally violate this requirement in the future, it would be more correct to not need to track the implicit requirement and just change the cache key to rely directly on the porep_id — since the assigned semantics of that identifier make it precisely the one on which the cache key should depend. The graph for any given set of parameters changes if and only if the porep_id changes.

Changing the implementation would fix the root problem so it won't ever manifest. You can also try to add defensive comments or just ignore the issue as more trouble than its worth. I would most like to eventually see a real fix only because these kinds of correctness issues are almost impossible to keep track of. The best way to encode correctness/security requirements is directly in code. Otherwise, we rely on contextual knowledge of the whole system — whether retained in a few people's heads or scattered in comments, or even in the spec. Even though this particular issue seems unlikely to ever come up, the only defensive mechanism I personally have actual confidence in is implementing the code in a way that is 'precisely correct by construction'.

Not having done this in the first place wasn't a huge deal, but it is technical debt. Resolve as you see fit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P2 Low priority
Projects
None yet
Development

No branches or pull requests

5 participants