-
Notifications
You must be signed in to change notification settings - Fork 137
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Run full Cadence 1.0 migration on TN state #3096
Comments
For investigating the concurrency issues, I tried building the CGO_ENABLED=1 \
GOOS=linux \
GOARCH=amd64 \
CC="zig cc -target x86_64-linux-musl" \
CXX="zig c++ -target x86_64-linux-musl" \
go build -v --tags "netgo,osusergo" -race -ldflags "-extldflags=-static" |
For the empty intersection type loading issue, I opened #3138, which so far has a reproducer for one particular case where the type loading occurs ( I'm not really sure how to proceed here. Disabling type checks, like The best idea I came up so far is rewriting the static type of a container value (e.g. dictionary, array), before migrating the nested value inside of it, i.e. before calling |
I wonder if we can integrate the intersection type rewriting and entitlements migration code directly into I tried just performing the intersection type rewriting there, and none of the tests break, except for the entitlements migration, which looses the necessary information it needs (legacy intersection type). Maybe also adding the entitlements migration there will fix that last issue? |
About Errors when worker count >1 not 100% sure they are all related to the same thing (its likely), but one of the problems is that the migration code currently doesn't support creating slab indexes from multiple goroutines. This can be fixed by either adding a locking mechanism or some channel magic. I will take a look. |
Unless I'm mistaken, Slab indexes are per account. I think we might only have a problem to generate slab index if multiple goroutines migrate the same account. |
I had a look at |
Re: Burner deployment is too expensive: I think @fxamacker you mentioned that creating a snapshot for all payloads is potentially intractable. I opened onflow/flow-go#5470, to show that for the deployment migration, we can filter the payloads to the payloads of just the accounts that are needed for the deployment. However, how can we merge the resulting write set into the original payloads without having to construct a full payload snapshot? cc @onflow/flow-cadence-execution |
Cross posting from Discord, Re: Does the migration pipeline properly migrate nested containers? The example I thought of was: cadence/migrations/entitlements/migration.go Lines 256 to 285 in 2dd12d3
From what I can see (but I might be missing something), the "outer" storage migration (StorageMigration) only mutates the existing value to be migrated ( cadence/migrations/migration.go Lines 245 to 336 in 2dd12d3
I'd assume to be that the outer storage migration code either performs a post-order "traversal"/migration, i.e. migrates the child values of the container first, and then updates the outer replacement (if any); or performs a pre-order "traversal"/migration, i.e. migrates the value (in this case container) first, then the children I wasn't quite sure if we properly handle it and it would be good to have a test case for it (maybe we have one already) |
Added some tests and improved the comments here: #3142 |
@janezpodhostnik @fxamacker I reran the util with race detector enabled on the machine and this time it reported data races while grouping accounts. Opened onflow/flow-go#5485. Could you please have a look? |
Adding atree storage health checks to the tests of the migrations in #3144 shows errors. Also opened onflow/flow-go#5486 in flow-go and it reports problems as well. My hunch is that the container re-creation, as done by the static type and entitlement migration, is incorrect: We're just creating a new dictionary, and insert the elements of the container directly into it – we probably need to properly remove them from the old container, which transfers them to the "stack", and then insert them into the new container |
@turbolent I opened PR to fix the data race and added a test to reproduce it: The PR targets main branch since it is a bugfix. |
Running the migration with the latest fixes also reports a "Go panic" for an out-of-bound index:
@SupunS Could we maybe improve the storage migration to include a stack trace when we recover from panics, so we can pinpoint the problem? The migration also runs in to "slab not found" errors, but those are probably due to the bugs we have already reported by the atree storage health checks:
Finally, the progress logs for storing the final trie seemed odd, cc @onflow/flow-cadence-execution:
|
@fxamacker In the migrations, we need to often update the static types of values, including containers (array, dictionaries, composites), i.e. the type info of atree arrays and ordered maps. Do you think we could add support for changing the type info of atree arrays and ordered maps? I saw that |
@turbolent Yes, I can add Are these functions needed for both atree-inlining branch and master branch in atree repo? |
@fxamacker Awesome! That would be extremely useful 🙏
Yes, those functions will be needed for both the current atree version used on master (higher priority), and in the register-inlining version of atree used in the atree register-inlining branch (we can port the functions there later) |
@turbolent I opened PRs at onflow/atree to add
Since Cadence master branch is using atree
The same functions probably can't be ported to atree-inlining version because inlined container needs to notify parent container so parent container slab can be persisted with new inlined static type information. I will open new PRs to add |
@fxamacker Awesome! Thank you for adding these 👏 👌 |
The TN state service account is massive (5GB!), because of the random beacon history, see https://discord.com/channels/613813861610684416/1215015019260158022. I upgraded my machine at home to 64GB of memory and that still doesn't seem to be enough. I'll try to add a migration to the beginning of the migration pipeline which prunes the random beacon history. |
All items so far have been addressed, and this issue is getting long. Closing and opened #3162 for follow-up work that was discovered |
Nice! 👏 |
Tasks
The text was updated successfully, but these errors were encountered: