Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

State migration fails in some cases #3192

Closed
turbolent opened this issue Mar 25, 2024 · 23 comments
Closed

State migration fails in some cases #3192

turbolent opened this issue Mar 25, 2024 · 23 comments
Assignees
Labels
Bug Something isn't working Feedback

Comments

@turbolent
Copy link
Member

turbolent commented Mar 25, 2024

Current Behavior

The state migration still runs into some errors. In particular, the log for running the migration on the full TN state (https://www.notion.so/flowfoundation/State-Migration-for-Crescendo-Release-56fe231eb56a4a5383f919ffe444d109?pvs=4#679a28687bbf49e9b3b52697820bc7ae) revealed the following problems:


  • Problem: During the static type migration, values are encountered which have nil types, even though they should not.

    For example:

    6:40AM ERR failed to run StaticTypeMigration in account b87e00279f5bc9c6, domain   storage, key ChildAccountTag: internal error: unexpected static type: <nil>
    goroutine 1685 [running]:
    runtime/debug.Stack()
    	/opt/homebrew/Cellar/go@1.20/1.20.14/libexec/src/runtime/debug/stack.go:24 +0x65
    github.com/onflow/cadence/runtime/errors.NewUnexpectedError({0x6880db?, 0x1f0c61b?},   {0xf3fc43aaf8?, 0xab6ce0?, 0x2d94e60?})
    	/Users/bastian/go/pkg/mod/github.com/onflow/cadence@v1.0.0-preview.14/runtime/errors/  errors.go:156 +0x4b
    github.com/onflow/cadence/migrations/statictypes.(*StaticTypeMigration).  maybeConvertStaticType(0xf3fc43abc0?, {0x0?, 0x0?}, {0xaa4008?, 0xff1bbabe00?})
    	/Users/bastian/go/pkg/mod/github.com/onflow/cadence@v1.0.0-preview.14/migrations/  statictypes/statictype_migration.go:450 +0x889
    github.com/onflow/cadence/migrations/statictypes.(*StaticTypeMigration).  maybeConvertStaticType(0xff1bbab840, {0xaa4008?, 0xff1bbabe00?}, {0xaa4110?,   0xff99da3180?})
    	/Users/bastian/go/pkg/mod/github.com/onflow/cadence@v1.0.0-preview.14/migrations/  statictypes/statictype_migration.go:185 +0x265
    github.com/onflow/cadence/migrations/statictypes.(*StaticTypeMigration).  maybeConvertStaticType(0xff1bbab840, {0xaa4110?, 0xff99da3180?}, {0x0?, 0x0?})
    	/Users/bastian/go/pkg/mod/github.com/onflow/cadence@v1.0.0-preview.14/migrations/  statictypes/statictype_migration.go:173 +0x179
    github.com/onflow/cadence/migrations/statictypes.(*StaticTypeMigration).Migrate  (0xff99de6cf0?, {{0x668884, 0x7}, {0xb8, 0x7e, 0x0, 0x27, 0x9f, 0x5b, 0xc9, ...}}, ...)
    	/Users/bastian/go/pkg/mod/github.com/onflow/cadence@v1.0.0-preview.14/migrations/  statictypes/statictype_migration.go:144 +0x5b0
    github.com/onflow/cadence/migrations.(*StorageMigration).migrate(0x16?, {0xa98a60?,   0xff1bbab840?}, {{0x668884, 0x7}, {0xb8, 0x7e, 0x0, 0x27, 0x9f, ...}}, ...)
    	/Users/bastian/go/pkg/mod/github.com/onflow/cadence@v1.0.0-preview.14/migrations/  migration.go:533 +0x13b
    github.com/onflow/cadence/migrations.(*StorageMigration).MigrateNestedValue  (0xff1bbab830, {{0x668884, 0x7}, {0xb8, 0x7e, 0x0, 0x27, 0x9f, 0x5b, 0xc9, ...}}, ...)
    	/Users/bastian/go/pkg/mod/github.com/onflow/cadence@v1.0.0-preview.14/migrations/  migration.go:422 +0x65f
    github.com/onflow/cadence/migrations.(*StorageMigration).MigrateNestedValue  (0xff1bbab830, {{0x668884, 0x7}, {0xb8, 0x7e, 0x0, 0x27, 0x9f, 0x5b, 0xc9, ...}}, ...)
    	/Users/bastian/go/pkg/mod/github.com/onflow/cadence@v1.0.0-preview.14/migrations/  migration.go:281 +0x115e
    github.com/onflow/cadence/migrations.(*StorageMigration).NewValueMigrationsPathMigrator.  func1({{0x668884, 0x7}, {0xb8, 0x7e, 0x0, 0x27, 0x9f, 0x5b, 0xc9, 0xc6}}, ...)
    	/Users/bastian/go/pkg/mod/github.com/onflow/cadence@v1.0.0-preview.14/migrations/  migration.go:139 +0x85
    github.com/onflow/cadence/migrations.ValueConverterPathMigrator.Migrate({0x0?,   0xff6bf31a00?}, 0x2b7a29d?, {{0x668884, 0x7}, {0xb8, 0x7e, 0x0, 0x27, 0x9f, ...}}, ...)
    	/Users/bastian/go/pkg/mod/github.com/onflow/cadence@v1.0.0-preview.14/migrations/  account_storage.go:83 +0x90
    github.com/onflow/cadence/migrations.(*AccountStorage).MigrateStorageMap(0xf3fc43bb78,   0x18198cb?, {0x668884, 0x7}, {0xa90b10, 0xff1bbab850}, 0x8934f0)
    	/Users/bastian/go/pkg/mod/github.com/onflow/cadence@v1.0.0-preview.14/migrations/  account_storage.go:162 +0x34d
    github.com/onflow/cadence/migrations.(*AccountStorage).MigrateStringKeys(0x4dabe0?,   0xff6bf37860?, {0x668884?, 0x30?}, {0xa90b10?, 0xff1bbab850?})
    	/Users/bastian/go/pkg/mod/github.com/onflow/cadence@v1.0.0-preview.14/migrations/  account_storage.go:104 +0x2f
    github.com/onflow/cadence/migrations.(*StorageMigration).MigrateAccount(0xff1bbab830,   {0xb8, 0x7e, 0x0, 0x27, 0x9f, 0x5b, 0xc9, 0xc6}, {0xa90b10, ...})
    	/Users/bastian/go/pkg/mod/github.com/onflow/cadence@v1.0.0-preview.14/migrations/  migration.go:77 +0xa6
    github.com/onflow/flow-go/cmd/util/ledger/migrations.(*CadenceBaseMigrator).  MigrateAccount(0xc0003e8600, {0xc6c95b9f27007eb8?, 0x0?}, {0xb8, 0x7e, 0x0, 0x27, 0x9f,   0x5b, 0xc9, ...}, ...)
    	/Users/bastian/Documents/work/flow-go/cmd/util/ledger/migrations/  cadence_values_migration.go:119 +0x38c
    github.com/onflow/flow-go/cmd/util/ledger/migrations.MigrateGroupConcurrently.func1()
    	/Users/bastian/Documents/work/flow-go/cmd/util/ledger/migrations/  account_based_migration.go:215 +0x4df
    created by github.com/onflow/flow-go/cmd/util/ledger/migrations.MigrateGroupConcurrently
    	/Users/bastian/Documents/work/flow-go/cmd/util/ledger/migrations/  account_based_migration.go:163 +0x217
    

    In particular, the following cases are encountered (in v1.0.0-preview.14):

    • statictypes/statictype_migration.go:144: Dictionary
    • statictypes/statictype_migration.go:173
    • statictypes/statictype_migration.go:185: Capability
    • statictypes/statictype_migration.go:450

    Analysis: Capability's type parameter is optional, but the static type migration assumes it is always set / never nil.

    Fix: Handle unparameterized Capability static types #3196


  • Problem: A particular key of a dictionary fails to get migrated, because the migrated key could not get removed:

    10:58AM ERR failed to run StorageMigration in account 75318eaf00edbc9d, domain storage, key ChildCapabilityProxy0x4308dccb4c835b75: internal error: failed to remove old value for migrated key: Type<Capability<auth(A.631e88ae7f1d7c20.NonFungibleToken.Withdraw, A.631e88ae7f1d7c20.NonFungibleToken.Owner) &{A.631e88ae7f1d7c20.NonFungibleToken.Provider}>>()
    

    Fix: Only encode reference static type in legacy format if it was decoded as such #3199


  • Problem: The atree storage health check fails, as storage refers to zero address slabs, which are not stored in accounts. For example:

    6:39AM ERR storage health check failed error="slab (0x0.48) not found: slab not found during slab iteration" migration=cadence-value-migration migration_index=0
    6:57AM ERR storage health check failed error="slab (0x0.35) not found: slab not found during slab iteration" migration=cadence-value-migration migration_index=0
    7:06AM ERR storage health check failed error="slab (0x0.45) not found: slab not found during slab iteration" migration=cadence-value-migration migration_index=0
    

    Analysis:


  • Problem: The atree storage health check fails, because storage contains slabs which are never referenced. For example:

    7:24AM ERR storage health check failed error="internal error: slabs not referenced from account Storage: [0x289f9fe703f5a8ec.8]\ngoroutine 1685 [running]:\nruntime/debug.Stack()\n\t/opt/homebrew/Cellar/go@1.20/1.20.14/libexec/src/runtime/debug/stack.go:24 +0x65\ngh.neting.cc/onflow/cadence/runtime/errors.NewUnexpectedError({0x6b72e6?, 0x1?}, {0xef131f79f0?, 0x9022971ae68fe337?, 0xef131f7840?})\n\t/Users/bastian/go/pkg/mod/github.com/onflow/cadence@v1.0.0-preview.14/runtime/errors/errors.go:156 +0x4b\ngh.neting.cc/onflow/cadence/runtime.(*Storage).CheckHealth(0xfa9bc79bc0)\n\t/Users/bastian/go/pkg/mod/github.com/onflow/cadence@v1.0.0-preview.14/runtime/storage.go:350 +0x725\ngh.neting.cc/onflow/flow-go/cmd/util/ledger/migrations.(*CadenceBaseMigrator).MigrateAccount(0xc0003e8600, {0xeca8f503e79f9f28?, 0x0?}, {0x28, 0x9f, 0x9f, 0xe7, 0x3, 0xf5, 0xa8, ...}, ...)\n\t/Users/bastian/Documents/work/flow-go/cmd/util/ledger/migrations/cadence_values_migration.go:132 +0x414\ngh.neting.cc/onflow/flow-go/cmd/util/ledger/migrations.MigrateGroupConcurrently.func1()\n\t/Users/bastian/Documents/work/flow-go/cmd/util/ledger/migrations/account_based_migration.go:215 +0x4df\ncreated by github.com/onflow/flow-go/cmd/util/ledger/migrations.MigrateGroupConcurrently\n\t/Users/bastian/Documents/work/flow-go/cmd/util/ledger/migrations/account_based_migration.go:163 +0x217\n" migration=cadence-value-migration migration_index=0
    7:37AM ERR storage health check failed error="internal error: slabs not referenced from account Storage: [0x3115a3260214571.14]\ngoroutine 1814 [running]:\nruntime/debug.Stack()\n\t/opt/homebrew/Cellar/go@1.20/1.20.14/libexec/src/runtime/debug/stack.go:24 +0x65\ngh.neting.cc/onflow/cadence/runtime/errors.NewUnexpectedError({0x6b72e6?, 0x1?}, {0xe3555c99f0?, 0x2a5d21396466d4fd?, 0xe3555c9840?})\n\t/Users/bastian/go/pkg/mod/github.com/onflow/cadence@v1.0.0-preview.14/runtime/errors/errors.go:156 +0x4b\ngh.neting.cc/onflow/cadence/runtime.(*Storage).CheckHealth(0x114f213e380)\n\t/Users/bastian/go/pkg/mod/github.com/onflow/cadence@v1.0.0-preview.14/runtime/storage.go:350 +0x725\ngh.neting.cc/onflow/flow-go/cmd/util/ledger/migrations.(*CadenceBaseMigrator).MigrateAccount(0xc0003e8600, {0x71452160325a1103?, 0x0?}, {0x3, 0x11, 0x5a, 0x32, 0x60, 0x21, 0x45, ...}, ...)\n\t/Users/bastian/Documents/work/flow-go/cmd/util/ledger/migrations/cadence_values_migration.go:132 +0x414\ngh.neting.cc/onflow/flow-go/cmd/util/ledger/migrations.MigrateGroupConcurrently.func1()\n\t/Users/bastian/Documents/work/flow-go/cmd/util/ledger/migrations/account_based_migration.go:215 +0x4df\ncreated by github.com/onflow/flow-go/cmd/util/ledger/migrations.MigrateGroupConcurrently\n\t/Users/bastian/Documents/work/flow-go/cmd/util/ledger/migrations/account_based_migration.go:163 +0x217\n" migration=cadence-value-migration migration_index=0
    

    Analysis:


  • Problem: The entitlements migration encounters AuthAccount and it fails to load the type (as expected, it does not exist anymore)

    For example:

    5:36PM ERR failed to run EntitlementsMigration in account 200d11a868731c48, domain inbox, key AuthAccountCapability: failed to load type: AuthAccount
    goroutine 1650 [running]:
    runtime/debug.Stack()
    	/opt/homebrew/Cellar/go@1.20/1.20.14/libexec/src/runtime/debug/stack.go:24 +0x65
    github.com/onflow/cadence/migrations.(*StorageMigration).migrate.func1()
    	/Users/bastian/go/pkg/mod/github.com/onflow/cadence@v1.0.0-preview.18/migrations/migration.go:531 +0x14b
    panic({0x495ae0, 0x11a24f9dd80})
    	/opt/homebrew/Cellar/go@1.20/1.20.14/libexec/src/runtime/panic.go:884 +0x213
    github.com/onflow/cadence/runtime/interpreter.(*Interpreter).MustConvertStaticToSemaType(...)
    	/Users/bastian/go/pkg/mod/github.com/onflow/cadence@v1.0.0-preview.18/runtime/interpreter/interpreter.go:4563
    github.com/onflow/cadence/migrations/entitlements.ConvertToEntitledType(0x0?, {0xaa3ed0, 0x12706a93a70?})
    	/Users/bastian/go/pkg/mod/github.com/onflow/cadence@v1.0.0-preview.18/migrations/entitlements/migration.go:104 +0x905
    github.com/onflow/cadence/migrations/entitlements.ConvertValueToEntitlements(0x18196ad?, {0xaac110?, 0x12706a93aa0?})
    	/Users/bastian/go/pkg/mod/github.com/onflow/cadence@v1.0.0-preview.18/migrations/entitlements/migration.go:288 +0x20c
    github.com/onflow/cadence/migrations/entitlements.EntitlementsMigration.Migrate({0xf1a437afb8?}, {{0x66530c, 0x5}, {0x20, 0xd, 0x11, 0xa8, 0x68, 0x73, 0x1c, ...}}, ...)
    	/Users/bastian/go/pkg/mod/github.com/onflow/cadence@v1.0.0-preview.18/migrations/entitlements/migration.go:374 +0x29
    github.com/onflow/cadence/migrations.(*StorageMigration).migrate(0xf1a437b0c0?, {0xa996b8?, 0x119f596e240?}, {{0x66530c, 0x5}, {0x20, 0xd, 0x11, 0xa8, 0x68, ...}}, ...)
    	/Users/bastian/go/pkg/mod/github.com/onflow/cadence@v1.0.0-preview.18/migrations/migration.go:536 +0x13b
    github.com/onflow/cadence/migrations.(*StorageMigration).MigrateNestedValue(0x119f596a920, {{0x66530c, 0x5}, {0x20, 0xd, 0x11, 0xa8, 0x68, 0x73, 0x1c, ...}}, ...)
    	/Users/bastian/go/pkg/mod/github.com/onflow/cadence@v1.0.0-preview.18/migrations/migration.go:425 +0x66a
    github.com/onflow/cadence/migrations.(*StorageMigration).MigrateNestedValue(0x119f596a920, {{0x66530c, 0x5}, {0x20, 0xd, 0x11, 0xa8, 0x68, 0x73, 0x1c, ...}}, ...)
    	/Users/bastian/go/pkg/mod/github.com/onflow/cadence@v1.0.0-preview.18/migrations/migration.go:400 +0x4eb
    github.com/onflow/cadence/migrations.(*StorageMigration).NewValueMigrationsPathMigrator.func1({{0x66530c, 0x5}, {0x20, 0xd, 0x11, 0xa8, 0x68, 0x73, 0x1c, 0x48}}, ...)
    
    5:09PM ERR failed to run EntitlementsMigration in account 75c72302038d2736, domain contract, key LonelyCorn: failed to load type: AuthAccount
    goroutine 1658 [running]:
    runtime/debug.Stack()
    	/opt/homebrew/Cellar/go@1.20/1.20.14/libexec/src/runtime/debug/stack.go:24 +0x65
    github.com/onflow/cadence/migrations.(*StorageMigration).migrate.func1()
    	/Users/bastian/go/pkg/mod/github.com/onflow/cadence@v1.0.0-preview.18/migrations/migration.go:531 +0x14b
    panic({0x495ae0, 0x1210a69daf0})
    	/opt/homebrew/Cellar/go@1.20/1.20.14/libexec/src/runtime/panic.go:884 +0x213
    github.com/onflow/cadence/runtime/interpreter.(*Interpreter).MustConvertStaticToSemaType(...)
    	/Users/bastian/go/pkg/mod/github.com/onflow/cadence@v1.0.0-preview.18/runtime/interpreter/interpreter.go:4563
    github.com/onflow/cadence/migrations/entitlements.ConvertToEntitledType(0x8?, {0xaa3ed0, 0xf643ee9620?})
    	/Users/bastian/go/pkg/mod/github.com/onflow/cadence@v1.0.0-preview.18/migrations/entitlements/migration.go:104 +0x905
    github.com/onflow/cadence/migrations/entitlements.ConvertToEntitledType(0xeff723d780?, {0xaa3c68, 0xeff7225a50?})
    	/Users/bastian/go/pkg/mod/github.com/onflow/cadence@v1.0.0-preview.18/migrations/entitlements/migration.go:150 +0x572
    github.com/onflow/cadence/migrations/entitlements.ConvertToEntitledType(0x0?, {0xaa3d70, 0xeff723d780?})
    	/Users/bastian/go/pkg/mod/github.com/onflow/cadence@v1.0.0-preview.18/migrations/entitlements/migration.go:193 +0x752
    

    Analysis: Built-in types may not always be represented as PrimitiveStaticTypes, but also as CompositeStaticTypes.

    Fix: Fix migration of built-in types #3205


Expected Behavior

The migration should succeed and not report errors.

Steps To Reproduce

Run latest state migration on recent, full TN state

Environment

- Cadence version: 1.0-preview.14
- Network: TN
@turbolent turbolent added Bug Something isn't working Feedback labels Mar 25, 2024
@turbolent turbolent self-assigned this Mar 25, 2024
@fxamacker
Copy link
Member

Is storage already healthy before? Could it be that the data to be migrated is already having these problems?

@turbolent Maybe we should add a debug flag in state migration program to report health status of input data (no migration). It will confirm if errors related to "slab not found" and "slab not referenced" are already in existing state.

@turbolent
Copy link
Member Author

I think what happens for the dictionary key problem is that we always perform the key removal with the "legacy form" of the key. For example, for reference types, we encode the key with the old boolean authorization flag (LegacyReferenceType.Encode).

However, if this happens after the entitlements migration, e.g. in the capability migration, then the key is already in the new "new form", with an authorization that is an entitlements set.

I'm currently trying to write a reproduction test case for this hypothesis.

@SupunS Do you think this could be the cause?

@turbolent
Copy link
Member Author

turbolent commented Mar 28, 2024

While analyzing the logs of the latest run of the migration against a more recent TN state snapshot, I realized I missed an error: the entitlements migration encounters AuthAccount and fails to load the type. I updated the description with details

@turbolent
Copy link
Member Author

In the latest run of the migration against a more recent TN state snapshot, the first two errors are not encountered anymore (capability static type without borrow type is handled properly, and key removal succeeds)

@turbolent
Copy link
Member Author

turbolent commented Apr 1, 2024

Unfortunately, the atree related errors are not due to prior state issues, but seem to be caused by the migration itself.

For example (note the "after migration", which is only reported if the health check succeeded before the migration):

5:13AM ERR storage health check after migration failed error="slab (0x0.48) not found: slab not found during slab iteration" account=ba53f16ede01972d migration=cadence-value-migration migration_index=0
6:02AM ERR storage health check after migration failed error="internal error: slabs not referenced from account Storage: [0x289f9fe703f5a8ec.8]\ngoroutine 1618 [running]:\nruntime/debug.Stack()\n\t/opt/homebrew/Cellar/go@1.20/1.20.14/libexec/src/runtime/debug/stack.go:24 +0x65\ngh.neting.cc/onflow/cadence/runtime/errors.NewUnexpectedError({0x6b75d0?, 0x1?}, {0xf3869ff950?, 0x9022971ae68fe337?, 0xf3869ff7a0?})\n\t/Users/bastian/go/pkg/mod/github.com/onflow/cadence@v1.0.0-preview.18/runtime/errors/errors.go:156 +0x4b\ngh.neting.cc/onflow/cadence/runtime.(*Storage).CheckHealth(0x13bfbf77ac0)\n\t/Users/bastian/go/pkg/mod/github.com/onflow/cadence@v1.0.0-preview.18/runtime/storage.go:350 +0x725\ngh.neting.cc/onflow/flow-go/cmd/util/ledger/migrations.(*CadenceBaseMigrator).MigrateAccount(0xc0003de180, {0xeca8f503e79f9f28?, 0x0?}, {0x28, 0x9f, 0x9f, 0xe7, 0x3, 0xf5, 0xa8, ...}, ...)\n\t/Users/bastian/Documents/work/flow-go/cmd/util/ledger/migrations/cadence_values_migration.go:175 +0x4a5\ngh.neting.cc/onflow/flow-go/cmd/util/ledger/migrations.MigrateGroupConcurrently.func1()\n\t/Users/bastian/Documents/work/flow-go/cmd/util/ledger/migrations/account_based_migration.go:215 +0x4df\ncreated by github.com/onflow/flow-go/cmd/util/ledger/migrations.MigrateGroupConcurrently\n\t/Users/bastian/Documents/work/flow-go/cmd/util/ledger/migrations/account_based_migration.go:163 +0x217\n" account=289f9fe703f5a8ec migration=cadence-value-migration migration_index=0

I'll continue to investigate why / how these occur.

@SupunS
Copy link
Member

SupunS commented Apr 1, 2024

@turbolent where you able to identify why the AuthAccount is still present at the entitlement migration?

@turbolent
Copy link
Member Author

turbolent commented Apr 1, 2024

@SupunS No, I wasn't able to identify yet why this happens. I tried to reproduce the issue by adding a test case in https://github.com/onflow/cadence/pull/3205/files, but it succeeds as expected.

I'll create a reduced state snapshot with just the core accounts + the problematic accounts, which cause the remaining errors (AuthAccount fails to load, atree health check errors), and share it, so we can locally run the migration and debug it.

@bluesign
Copy link
Contributor

bluesign commented Apr 1, 2024

@turbolent one of the accounts is key collision from interface ordering, I guess.

https://f.dnz.dev/0x289f9fe703f5a8ec/raw/storage/CapabilityFactory0x96b15ff6dfde11fe

@turbolent
Copy link
Member Author

@bluesign Oh wow, good catch, thank you for looking into this! Great lead 👍

The static type migration is normalizing the ordering of the interfaces in intersection types and it's not accounting for the possibility that the new key already stores something. The old value is simply overwritten, leaving the slabs of the old data unreferenced.

I'll add a check and error for when the new key already exists. I don't think we can resolve this situation automatically and probably need to ask the owner of the account to resolve this manually before the migration.

@bluesign
Copy link
Contributor

bluesign commented Apr 1, 2024

yeah not much possible to do there automatically.

@turbolent
Copy link
Member Author

.@SupunS mentioned that the value migrations do not fail on first error, so it's possible that the entitlements migration encounters a deprecated type that should have been migrated by the static type migration before, if the static type migration itself failed.

I'll check if there are any prior errors in the logs

@turbolent
Copy link
Member Author

turbolent commented Apr 1, 2024

@SupunS in regards to the AuthAccount type loading error: Looking at the logs for just one particular account for which the error occurs, there are no prior static type migration errors, e.g.

5:53AM ERR failed to run EntitlementsMigration in account 41029c7f2c9a39fd, domain storage, key NFTStorefrontV2: error getting program migration=cadence-value-migration migration_index=0
5:53AM ERR failed to run EntitlementsMigration in account 41029c7f2c9a39fd, domain storage, key NFTStorefrontV2: error getting program migration=cadence-value-migration migration_index=0
5:53AM ERR failed to run EntitlementsMigration in account 41029c7f2c9a39fd, domain storage, key NFTStorefrontV2: error getting program migration=cadence-value-migration migration_index=0
5:53AM ERR failed to run EntitlementsMigration in account 41029c7f2c9a39fd, domain storage, key NFTStorefrontV2: error getting program migration=cadence-value-migration migration_index=0
5:53AM ERR failed to run EntitlementsMigration in account 41029c7f2c9a39fd, domain storage, key NFTStorefrontV2: error getting program migration=cadence-value-migration migration_index=0
5:53AM ERR failed to run EntitlementsMigration in account 41029c7f2c9a39fd, domain private, key kittyItemsCollectionProviderV14: error getting program migration=cadence-value-migration migration_index=0
5:53AM ERR failed to run EntitlementsMigration in account 41029c7f2c9a39fd, domain private, key TestHandler: error getting program migration=cadence-value-migration migration_index=0
5:53AM ERR failed to run EntitlementsMigration in account 41029c7f2c9a39fd, domain public, key kittyItemsCollectionV14: error getting program migration=cadence-value-migration migration_index=0
5:53AM ERR failed to run EntitlementsMigration in account 41029c7f2c9a39fd, domain public, key NFTStorefrontV2: error getting program migration=cadence-value-migration migration_index=0
5:53AM ERR failed to run EntitlementsMigration in account 41029c7f2c9a39fd, domain public, key TestHandler: error getting program migration=cadence-value-migration migration_index=0
5:53AM ERR failed to run EntitlementsMigration in account 41029c7f2c9a39fd, domain inbox, key AuthAccountCapability: failed to load type: AuthAccount
5:53AM ERR failed to run EntitlementsMigration in account 41029c7f2c9a39fd, domain inbox, key kitty0: failed to load type: AuthAccount
9:14AM ERR failed to run LinkValueMigration in account 41029c7f2c9a39fd, domain private, key kittyItemsCollectionProviderV14: error getting program migration=cadence-link-value-migration migration_index=0
9:14AM ERR failed to run LinkValueMigration in account 41029c7f2c9a39fd, domain private, key TestHandler: error getting program migration=cadence-link-value-migration migration_index=0
9:14AM ERR failed to run LinkValueMigration in account 41029c7f2c9a39fd, domain public, key kittyItemsCollectionV14: error getting program migration=cadence-link-value-migration migration_index=0
9:14AM ERR failed to run LinkValueMigration in account 41029c7f2c9a39fd, domain public, key NFTStorefrontV2: error getting program migration=cadence-link-value-migration migration_index=0
9:14AM ERR failed to run LinkValueMigration in account 41029c7f2c9a39fd, domain public, key TestHandler: error getting program migration=cadence-link-value-migration migration_index=0

@turbolent
Copy link
Member Author

@SupunS Created cadence-migration-test.us-west1-b.flow-benchmark:/var/flow/tn-issues.payloads, which has accounts 8c5303eaa26202d6,95e019a17d0e23d7,7aad92e5a0715d21,912d5440f7e3769e,7e60df042a9c0868,9a0766d93b6608b7,631e88ae7f1d7c20,9eca2b38b18b5dfe,289f9fe703f5a8ec,03115a3260214571,ba53f16ede01972d,7d8c7e050c694eaa,5e3448b3cffb97f2,5d63c34d7f05e5a4,48d3be92e6e4a973,48602d8056ff9d93,454c9991c2b8d947,434a1f199a7ae3ba,c843c1f5a4805c3a,48d3be92e6e4a973,7d8c7e050c694eaa,75c72302038d2736,41029c7f2c9a39fd,41029c7f2c9a39fd,200d11a868731c48,18d1bf2f68f9cb9e,a0b2fec2bb2e5510,fbf878df362a03c4

@turbolent
Copy link
Member Author

Managed to debug the AuthAccount type issue locally by extracting only the payloads for 200d11a868731c48, which only has 44 payloads.

The problem with AuthAccount (and likely other built-in types) is that it seems to be stored as a CompositeStaticType (location = nil, qualifiedIdentifier = "AuthAccount", typeID = "AuthAccount`).

We need to handle these forms and also re-write them if they are deprecated.

In general we should maybe use this chance and clean up the state and rewrite all such occurrences (deprecated or not).

I wonder how the type ended up being stored like this. Maybe during import of the type?

@SupunS
Copy link
Member

SupunS commented Apr 2, 2024

I wonder how the type ended up being stored like this. Maybe during import of the type?

Probably from the type constructors i.e: var v = CompositeType("AuthAccount")

e.g:

assert.Equal(t,
interpreter.TypeValue{
Type: interpreter.NewCompositeStaticTypeComputeTypeID(nil, nil, "PublicKey"),
},
inter.Globals.Get("g").GetValue(),
)

@SupunS
Copy link
Member

SupunS commented Apr 2, 2024

nice find btw!

@SupunS
Copy link
Member

SupunS commented Apr 2, 2024

We should maybe also fix the type constructor, to return the pre-defined type if they are builtin. But not sure what the implications are. e.g: will it still be considered a "CompositeType"? Does that really matter? etc..

@turbolent
Copy link
Member Author

@SupunS Good find! I'll create a separate issue to clean this up

Also confirmed that #3205 fixes the migration issue, there are no longer any occurrences of AuthAccount type loading errors in the state subset with problematic accounts.

@turbolent
Copy link
Member Author

Account 5d63c34d7f05e5a4 nicely reproduces the slab not found with address 0x0 (stack) issue, it only has 27 payloads

@turbolent
Copy link
Member Author

turbolent commented Apr 2, 2024

Account 289f9fe703f5a8ec nicely reproduces the slab not referenced issue: with #3211 it reports:

2:19PM ERR failed to run cadence-value-migration in account 289f9fe703f5a8ec, domain storage, key CapabilityFactory0x96b15ff6dfde11fe: internal error: dictionary contains new key after removal of old key (conflict): Type<&{A.9a0766d93b6608b7.FungibleToken.Provider, A.9a0766d93b6608b7.FungibleToken.Balance, A.9a0766d93b6608b7.FungibleToken.Receiver}>()

after which the health check fails (as expected):

2:19PM ERR storage health check after migration failed error="internal error: slabs not referenced from account Storage: [0x289f9fe703f5a8ec.10]\ngoroutine 88 [running]:\nruntime/debug.Stack()\n\t/usr/lib/go-1.20/src/runtime/debug/stack.go:24 +0x65\ngh.neting.cc/onflow/cadence/runtime/errors.NewUnexpectedError({0x1d65d1a?, 0x1?}, {0xc0003d5950?, 0x4b9d5f7bc17f43d8?, 0xc0003d57a0?})\n\t/home/bastian/Documents/cadence/runtime/errors/errors.go:156 +0x4b\ngh.neting.cc/onflow/cadence/runtime.(*Storage).CheckHealth(0xc0005e9900)\n\t/home/bastian/Documents/cadence/runtime/storage.go:350 +0x725\ngh.neting.cc/onflow/flow-go/cmd/util/ledger/migrations.(*CadenceBaseMigrator).MigrateAccount(0xc000000000, {0xeca8f503e79f9f28?, 0x0?}, {0x28, 0x9f, 0x9f, 0xe7, 0x3, 0xf5, 0xa8, ...}, ...)\n\t/home/bastian/Documents/flow-go/cmd/util/ledger/migrations/cadence_values_migration.go:175 +0x4a5\ngh.neting.cc/onflow/flow-go/cmd/util/ledger/migrations.MigrateGroupConcurrently.func1()\n\t/home/bastian/Documents/flow-go/cmd/util/ledger/migrations/account_based_migration.go:215 +0x4df\ncreated by github.com/onflow/flow-go/cmd/util/ledger/migrations.MigrateGroupConcurrently\n\t/home/bastian/Documents/flow-go/cmd/util/ledger/migrations/account_based_migration.go:163 +0x217\n" account=289f9fe703f5a8ec migration=cadence-value-migration migration_index=0

This confirms the hint from @bluesign and the fix in #3211.

@turbolent
Copy link
Member Author

Further investigating the last remaining issue, slabs not findable, it turned out that I made a mistake when adding the health check in onflow/flow-go#5591: I mixed up the parameter order, and the health check before the migration was not run.

When running with the fix, onflow/flow-go#5618, the unhealthy storage is correctly reported:

4:45PM WRN storage health check before migration failed error="slab (0x0.49) not found: slab not found during slab iteration" account=5d63c34d7f05e5a4 migration=cadence-value-migration migration_index=0

This resolves the last outstanding issue for the migration :-)

However, we might still want to do something about this case. The data is gone, but for example, when this occurs for a dictionary or array, we could remove the broken slab references / elements, to at least "repair" the value and make it accessible again.

The unhealthy storage / broken value seems to be related to https://www.notion.so/dapperlabs/Mainnet-contracts-affected-by-data-loss-bug-fb22d83d0c9c4704a74f2cd859b0f05c

@turbolent
Copy link
Member Author

All known issues for healthy data are now complete, going to open a separate issue for the remaining issues with unhealthy data.

@turbolent
Copy link
Member Author

Opened #3220 and onflow/flow-go#5634

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Something isn't working Feedback
Projects
None yet
Development

No branches or pull requests

4 participants