-
Notifications
You must be signed in to change notification settings - Fork 315
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rationalize parents cache key derivation. #1192
Comments
@porcuquine is this still needed? |
Technically, yes. If not, should at least put a reference-to/explanation-of the issue in the parents cache key derivation ( |
hello, my dear friend. I have no idea about the concept of DRG an DRG graphs. Which urls I can find them? |
@hzsong123 you can find more information about it in the Filecoin specification. |
This appears to have been resolved here: 332d0a9 And used here: https://github.com/filecoin-project/rust-fil-proofs/blob/master/storage-proofs-porep/src/stacked/vanilla/cache.rs#L414 @porcuquine If you agree, please close this issue. If not, the acceptable criteria is not clear. |
Looking at your second link, I see that the parent cache key is derived from the Feistel keys only. The relevant acceptance criterion was:
For this to be true, the Does that clarify the meaning? |
Not really. The feistel keys are derived from the porep_id, as is the drg_seed. It's not clear to me what's missing since the parent's cache_id is then derived from the feistel keys (we had an upgrade that proved this, no?) The first link contains the following, but here are more direct links: https://github.com/filecoin-project/rust-fil-proofs/blob/master/storage-proofs-porep/src/stacked/vanilla/graph.rs#L80 |
Perhaps the confusion is that it appears correct, but without a comment on the derive_porep function, problems could potentially be introduced in the future if someone changed how that's hashed. If that's the case, I agree that a comment should suffice. |
Your observation is the same thing I meant by this line in the original issue:
In other words, this is incidentally correct because Feistel keys happen to depend on I am not saying fixing this is high priority. I am just clarifying what the issue is about. An example of how this could matter is that with this implementation, there is an implicit requirement that the DRG seed never change independent of the Feistel keys. While I don't think it is likely that we will accidentally or intentionally violate this requirement in the future, it would be more correct to not need to track the implicit requirement and just change the cache key to rely directly on the Changing the implementation would fix the root problem so it won't ever manifest. You can also try to add defensive comments or just ignore the issue as more trouble than its worth. I would most like to eventually see a real fix only because these kinds of correctness issues are almost impossible to keep track of. The best way to encode correctness/security requirements is directly in code. Otherwise, we rely on contextual knowledge of the whole system — whether retained in a few people's heads or scattered in comments, or even in the spec. Even though this particular issue seems unlikely to ever come up, the only defensive mechanism I personally have actual confidence in is implementing the code in a way that is 'precisely correct by construction'. Not having done this in the first place wasn't a huge deal, but it is technical debt. Resolve as you see fit. |
Description
The parents cache key is currently derived from an explicit reference to the Feistel keys, as well as to other information which identifies the DRG (sector size, hash, base and expansion degrees, etc.).
However, the DRG seed is not explicitly included in the derivation — and it should be: if the DRG seed changes, the cache will be invalid.
In practice, this is still safe because both Feistel Keys and DRG seed are derived from porep_id. Therefore, if DRG seed changes, Feistel keys must also. The current mechanism is somewhat elaborate and involves reuse of the DRG's
identifier()
method — which is also used to construct the circuit identifier. As noted in a comment, the DRG seed should not be included there, because by design the DRG seed could be modified without a change to circuits.One 'correct' solution would be to separate the
identifier
into two methods, acircuit_identifier
and acache_identifier
(or some other semantically consistent names). The latter could depend on the former but enhance it with the DRG seed. A similar split would need to also happen for the composite expander+drg graph, which already depends on the extantidentifier
. To get this normalized right, might be somewhat tricky (since the newcircuit_identifier
must not depend on the DRG'scache_identifier
, but the newcache_identifier
must do so while also depending on its own correspondingcircuit_identifier
.Alternately, the cache could just be made to depend directly on the
porep_id
. This is reasonable, since that value is used to uniquely determine a graph. Graphs must change ifporep_id
does and not otherwise.Acceptance criteria
porep_id
(better).Risks + pitfalls
It might be easy to get this requirement wrong when refactoring the identifiers:
NOTE: since the expected outcome is a change to parents cache, all currently-deployed caches will be invalidated by the change.
Where to begin
The text was updated successfully, but these errors were encountered: