-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature: move partition #1326
Feature: move partition #1326
Conversation
23d838a
to
181dc2e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this. I haven't looked at the tests much until we align on the implementation.
The big outstanding question is whether this should more literally just move the partition, intact, to a new deadline, or remove a partition and add the sectors individually to the new deadline. You've done the latter, but I think the former will be better, definitely cheaper. But it does require adding a new method on Deadline state to accept a new partition.
Also there seems to be some confusion over which partitions to move. Only prove and move those the caller specified.
deadline_distance(policy, current_deadline, to_deadline) | ||
< deadline_distance(policy, current_deadline, from_deadline) | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These look correct, but please add tests to demonstrate all the cases.
actors/miner/src/lib.rs
Outdated
state.delete_sectors(store, &dead).map_err(|e| { | ||
e.downcast_default(ExitCode::USR_ILLEGAL_STATE, "failed to delete dead sectors") | ||
})?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't want to do this if keeping the partition intact (including its reference to terminated sectors) while moving. I can see below that's not what you're doing, but please leave this comment until we resolve that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm looking into the "moving partitions intact" approach, but Deadline.expirations_epochs
seems quite complicated to maintain while moving. I guess it's the main reason current compact_partitions
was implemented by removing then adding sectors.
@anorth Not sure whether it's worth to load sector infos just to calculate an accurate epoch for Deadline.expirations_epochs
. Does it matter if the epoch is a period later? If that's the case, we can simply use the source epoch to calculate the target epoch with a different quant
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I see. I think we should avoid loading sector info if at all possible.
If we say that a partition with faulty sectors can't be moved, then I think we only need to worry about on-time expirations. In that case I suggest rounding up, i.e. find the next epoch quantized to the new deadline that is not sooner than epoch in the queue. Then the SP might need to maintain a sector up to 1 proving period longer than committed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't worry about removing anything from the old deadline's queue. It will have redundant entries, but they're harmless.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about other sectors: faulty, unproven, terminated? I was considering to allow all sectors for this "moving partitions intact" approach.
Also, I found there is a test case which requires the epoch to be exactly quant.quantize_up(sector.expiration)
, should we modify this check?
I've temporarily pushed the latest commit here in case you want to try the test cases like ok_to_move_partitions_from_safe_epoch
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to consider allowing any partition to be moved too, but at the moment I don't think it's worth the work/risk of changing how remove_partitions works at the moment. Let's require non-faulty, fully proven sectors for now.
Yes I think we should probably remove that test case. The quantisation is mentioned elsewhere too but I think we can probably remove it, and update associated code to handle arbitrary quantisation. This can be an independent change ahead of what you're doing here – for now just comment out that state assertion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please do push that commit to this branch - I think it's the right approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I found there is a test case which requires the epoch to be exactly quant.quantize_up(sector.expiration), should we modify this check?
I looked into this and I think we can just make a simple change to make it epoch >= target
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed it a bit more restrictive: epoch >= target && (epoch-target)%quant.unit ==0
could you please point me to where are we checking the amount of partitions after of |
can we add some tests for dispute scenarios as well, please? |
That's a good point, I don't think we're doing that. |
2. add check for `max_partitions_per_deadline`
Good point, just added this check! |
OK, this may take a bit longer though. |
Can be added in a follow up pr too! |
I'll add it in a follow up pr then, and we'll also manually test this case! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with more tests to come in follow-up
actors/miner/src/lib.rs
Outdated
let mut orig_deadline = | ||
deadlines.load_deadline(store, params.orig_deadline).context_code( | ||
ExitCode::USR_ILLEGAL_STATE, | ||
format!("failed to load deadline {}", params.orig_deadline), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: use with_context_code
when doing a format!
operation for the message
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, I guess this is to avoid alloc when no error happens?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes.
|| { | ||
let current_deadline = h.current_deadline(&rt); | ||
|
||
let from_deadline = new_deadline_info( | ||
rt.policy(), | ||
if current_deadline.index < orig_deadline_id { | ||
current_deadline.period_start - rt.policy().wpost_proving_period | ||
} else { | ||
current_deadline.period_start | ||
}, | ||
orig_deadline_id, | ||
*rt.epoch.borrow(), | ||
); | ||
|
||
let from_ddl = h.get_deadline(&rt, orig_deadline_id); | ||
|
||
let entropy = RawBytes::serialize(h.receiver).unwrap(); | ||
rt.expect_get_randomness_from_beacon( | ||
DomainSeparationTag::WindowedPoStChallengeSeed, | ||
from_deadline.challenge, | ||
entropy.to_vec(), | ||
TEST_RANDOMNESS_ARRAY_FROM_ONE, | ||
); | ||
|
||
let post = h.get_submitted_proof(&rt, &from_ddl, 0); | ||
|
||
let all_ignored = BitField::new(); | ||
let vi = h.make_window_post_verify_info( | ||
§ors_info, | ||
&all_ignored, | ||
sectors_info[1].clone(), | ||
Randomness(TEST_RANDOMNESS_ARRAY_FROM_ONE.into()), | ||
post.proofs, | ||
); | ||
rt.expect_verify_post(vi, ExitCode::OK); | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This window post expectation-setting is long and duplicated in other tests. Can you follow up to factor it out into either a flag on move_partitions, or another method on the harness that returns this closure?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just saw the comments here, I'll do it tomorrow.
use vm_api::util::{apply_ok, get_state, DynBlockstore}; | ||
use vm_api::VM; | ||
|
||
#[test] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In order to export tests to be run in other frameworks, test entry points no longer live here. Instead, they go in the test_vm package with a simple wrapper that invokes a method here with a single VM parameter. Please follow up to use the same pattern as other integration tests.
This reverts commit 5ec2a6b.
This reverts commit 5ec2a6b.
This PR implements this FIP.