-
Notifications
You must be signed in to change notification settings - Fork 179
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Atree storage migration #4633
Atree storage migration #4633
Conversation
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## master #4633 +/- ##
==========================================
- Coverage 56.02% 55.62% -0.41%
==========================================
Files 965 1003 +38
Lines 89671 96785 +7114
==========================================
+ Hits 50241 53836 +3595
- Misses 35678 38881 +3203
- Partials 3752 4068 +316
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Thanks for doing this! 👍 I mostly looked at payload migration code.
Just an idea, maybe we can use 2 storages for migration:
- Original payloads can be loaded into the first storage and migrated payloads can be stored in the second storage.
- Two storages can more completely test migration results by comparing all Cadence values before and after migration.
- Not sure, maybe there is also a performance benefit because we don't need to recursively remove migrated payloads from first storage (
StorageMap.SetValue
). - Since it will use more memory, we can enable 2 storages migration with a flag.
Thoughts?
I was considering that, but I opted for a single storage for simplicity. When we move values to a new storage, we also have to move non atree registers (account status register, account key registers, contract names register). and also fix the If you think there is a possibility that it will make it faster, I can try. |
I think if 2 storages will be used ( and considering reliability > speed ) , one more stage to check orphan slabs can be good. After comparing cadence values ( storage2==storage1 ) , we can delete stuff from storage 1 and in the end we can assert nothing left in storage 1. |
Yes. All non-atree registers can be moved as is, except for storage key registers. How is storageUsed field computed? Does it require data migration to complete first?
Yes, I think using 2 storages might speed up the process by not recursively removing all atree registers after cloning. The main benefit is being able to fully compare the full migration result during testing and also during spork (if speed is acceptable). So even if it is same speed or slightly worse, it would be worthwhile. |
Good point! This check for orphan slabs would be great to have during data migration dry runs, etc. 👍 Maybe this check can be enabled by a flag. There is already a function in Atree to check for orphan slabs so I think we can use it during data migration as an option. |
Those are good points. I'll switch to having two storages. the storage used field updates automatically if you go through the environment.StatefulAccounts object, which the migration does. But if there is two storages the new storage used will be recorded incorectly... storageUsed is X:
but this is an issue I can fix |
6767f31
to
b8f10e9
Compare
b8f10e9
to
e795e53
Compare
349f70e
to
8a4e274
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took a look (not a full PR review) for skipped registers problem. Nothing obvious jumped out related to skipped registers but I left a few comments.
Maybe it can be useful to have atree register dump for a small account with known skipped registers:
- before migration
- after migration
Thoughts?
27f763a
to
c1d7866
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@janezpodhostnik Thanks for sharing latest findings today!
You also mentioned needing more memory on the test machine, so I took a look for quick memory savings.
@janezpodhostnik sharding |
@bluesign yeah, something like that might be necessary. I was also thinking of just dumping intermediate results to disk. |
@janezpodhostnik Maybe we can eliminate map of account groups. Instead of creating map, we can sort input payload slice by address in place. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just took a look at this from a Cadence perspective, but had a couple thoughts
cmd/util/ledger/util/util.go
Outdated
} | ||
|
||
// NopMemoryGauge is a no-op implementation of the MemoryGauge interface | ||
type NopMemoryGauge struct{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not 100% sure, but I don't think it's necessary to define a new Nop interface here for a memory gauge that doesn't do anything; the Cadence code should be set up such that nil
can be passed to anywhere that expects a MemoryGauge
if you don't care about tracking memory usage.
// Attachments are enabled everywhere except for Mainnet | ||
AttachmentsEnabled: true, | ||
// Capability Controllers are enabled everywhere except for Mainnet | ||
CapabilityControllersEnabled: true, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume these are enabled so this migration can be run on testnet too?
Thanks for sharing results yesterday of dry-run for migration without atree linining. Largest account took hours longer than we'd like, so we can try to parallelize migration within large accounts by:
flow-go/cmd/util/ledger/migrations/atree_register_migration.go Lines 217 to 223 in ab9077a
flow-go/cmd/util/ledger/migrations/atree_register_migration.go Lines 225 to 231 in ab9077a
Thoughts? |
This is a temporary (and inefficient) workaround to check number of fields in Cadence CompositeValue. In the future, CompositeValue.FieldCount() will be added to Cadence, so that function can be used to improve this workaround.
…e-value-validation Replace hash-based validation of migrated Cadence values to use `Equal()`
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took another look and left some minor comments. Also ran some dry-runs this week with/without non-hash validation (PR 5204) and will take a closer look at the dry-run logs.
The dry runs used this PR updated with followup PRs 5122, 5128, 5143, and 5204. Total dry-run duration took 5 hours with new validation enabled. Without validation enabled, it took about 4 hours using mainnet24 root. Peak memory use (max RSS) was 1.95 TB - 2.05 TB with/without validation.
Nice improvements on the migration duration!
Migration dry-run without/without domain
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! LGTM!
Remove cricket moments references from atree migration
} | ||
|
||
// Iterate through all domains and compare cadence values. | ||
for _, domain := range domains { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we validate different storage domain in parallel?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can but most of the data is in one domain, so it is likely to remain a bottleneck (not much to gain by parallelism).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we validate the storage map keys in parallel though?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we validate the storage map keys in parallel though?
Good question. That would allow parallelizing within same domain but we would need to determine if updating atree read cache needs to support parallelism for this.
return true | ||
}) | ||
|
||
if len(vFieldNames) != len(otherFieldNames) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to compare if they contain the same items?
If so, what about building a map, and remove each field from the map when iterating through otherComposite
, and verify if there is any field left.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to compare if they contain the same items?
Not needed here because:
- we compare number of fields for both composite values
- we compare fields in 2 composite values by iterating over fields of 1st composite value and comparing field values (of matching field name) in 2nd composite value
- if composite values have different fields, we would fail to get field value by the same field name in the previous step.
nWorker int, // number of concurrent worker to migation payloads | ||
runMigrations bool, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zhangchiqing as discussed, re-adding this
var s string | ||
err := capturePanic( | ||
func() { | ||
s = value.RecursiveString(interpreter.SeenReferences{}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not 100% if the string description of all values includes all data of the value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That file (cadence_data_validation.go) isn't used any more.
So maybe the file can be removed since we're using cadence_value_validation.go instead.
} | ||
|
||
// Iterate through all domains and compare cadence values. | ||
for _, domain := range domains { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we validate the storage map keys in parallel though?
} | ||
|
||
var domainsLookupMap = map[string]struct{}{ | ||
common.PathDomainStorage.Identifier(): {}, | ||
common.PathDomainPrivate.Identifier(): {}, | ||
common.PathDomainPublic.Identifier(): {}, | ||
runtime.StorageDomainContract: {}, | ||
stdlib.InboxStorageDomain: {}, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that Testnet already had Cap Cons enabled, those also have to be migrated, i.e. stdlib.CapabilityControllerStorageDomain
needs to be added
} | |
var domainsLookupMap = map[string]struct{}{ | |
common.PathDomainStorage.Identifier(): {}, | |
common.PathDomainPrivate.Identifier(): {}, | |
common.PathDomainPublic.Identifier(): {}, | |
runtime.StorageDomainContract: {}, | |
stdlib.InboxStorageDomain: {}, | |
} | |
stdlib.CapabilityControllerStorageDomain, | |
} | |
var domainsLookupMap = map[string]struct{}{ | |
common.PathDomainStorage.Identifier(): {}, | |
common.PathDomainPrivate.Identifier(): {}, | |
common.PathDomainPublic.Identifier(): {}, | |
runtime.StorageDomainContract: {}, | |
stdlib.InboxStorageDomain: {}, | |
stdlib.CapabilityControllerStorageDomain: {}, | |
} |
This is the migration that was planned for atree register inlining.
Changes and Migrations
The migration has 4 steps:
Other changes:
Current state of the code
top_longest_migrations=�["30cf5dcf6ea8d379: 12m27.007314972s","1e3c78c6d580273b: 12m37.598222755s","e9a85b21a91040b1: 14m34.614031384s","e2e1689b53e92a82: 14m14.566103677s","b6f2481eba4df97b: 17m11.768844197s","321d8fcde05f6e8c: 28m7.706118076s","87ca73a41bb50ad5: 26m22.046354959s","481914259cb9174e: 15m39.537646583s","8528d8dd8a586f0d: 16m17.15586826s","20187093790b9aef: 20m31.733136401s","ecfad18ba9582d4f: 2h33m22.364125919s","bd56134d152d0ae2: 1h1m3.288347443s","fa57101aa0d55954: 47m38.798327044s","5eb12ad3d5a99945: 1h32m59.467484076s","81e95660ab5308e1: 41m8.044854272s","e4cf4bdc1751c65d: 59m55.150660767s","cecd8c81c050eef1: 19m44.674823294s","5cd62ecc83793917: 21m14.581682407s","e1f2a091f7bb5245: 54m24.543389847s","4eded0de73020ca5: 4h2m13.039145307s"]