Run full Cadence 1.0 migration on TN state #3096

turbolent · 2024-02-12T22:45:04Z

Tasks

Give feedback

Cadence 1.0 migration: Errors when worker count >1
Transfer, called by remove, insert, etc., loads types, which might be unmigrated/unsupported: empty intersection types are invalid: Fix migrating values with empty intersection type #3138
Static types of containers need to get migrated too: Migrate static types of arrays and dictionaries #3141
Ensure nested containers are properly (recursively) migrated: Migrate static types of arrays and dictionaries #3141 and Add test for nested container value migration #3142
Payload grouping has data races flow-go#5485

Bug
Burner contract deployment is too expensive, creating full snapshot of all payloads is intractable #3153

1 of 2
Add Atree storage health checks to migrations #3149

2 of 2
Clean up TN service account: Prune random beacon history
Improve static type and entitlements migrations: Update array/dictionary type directly #3158

Improvement
Optimize storage migration: Allow skipping of values #3157

Improvement
Options

turbolent · 2024-02-26T23:15:07Z

For investigating the concurrency issues, I tried building the util binary with the race detector on, but it reported no issues:

CGO_ENABLED=1 \
GOOS=linux \
GOARCH=amd64 \
CC="zig cc -target x86_64-linux-musl" \
CXX="zig c++ -target x86_64-linux-musl" \
go build -v --tags "netgo,osusergo" -race -ldflags "-extldflags=-static"

turbolent · 2024-02-26T23:45:53Z

For the empty intersection type loading issue, I opened #3138, which so far has a reproducer for one particular case where the type loading occurs (DictionaryValue.Insert in StorageMigration.MigrateNestedValue).

I'm not really sure how to proceed here. Disabling type checks, like checkContainerMutation isn't sufficient, because e.g. CompositeValue.Transfer calls CompositeValue.IsResourceKinded, which isn't just a run-time check, but required to distinguish between the struct vs resource code path of transferring (copy vs move).

The best idea I came up so far is rewriting the static type of a container value (e.g. dictionary, array), before migrating the nested value inside of it, i.e. before calling Remove/Insert, and before the related Transfer calls occur, which convert the static type.

turbolent · 2024-02-27T01:08:26Z

I wonder if we can integrate the intersection type rewriting and entitlements migration code directly into interpreter.ConvertStaticToSemaType.

I tried just performing the intersection type rewriting there, and none of the tests break, except for the entitlements migration, which looses the necessary information it needs (legacy intersection type). Maybe also adding the entitlements migration there will fix that last issue?

janezpodhostnik · 2024-02-27T15:59:08Z

About Errors when worker count >1

not 100% sure they are all related to the same thing (its likely), but one of the problems is that the migration code currently doesn't support creating slab indexes from multiple goroutines. This can be fixed by either adding a locking mechanism or some channel magic.

I will take a look.

fxamacker · 2024-02-27T16:38:42Z

About Errors when worker count >1

not 100% sure they are all related to the same thing (its likely), but one of the problems is that the migration code currently doesn't support creating slab indexes from multiple goroutines. This can be fixed by either adding a locking mechanism or some channel magic.

Unless I'm mistaken, worker count > 1 means n workers processing n accounts concurrently.

Slab indexes are per account. I think we might only have a problem to generate slab index if multiple goroutines migrate the same account.

turbolent · 2024-02-27T23:46:12Z

I had a look at AtreeRegisterMigrator, and the function that generates the new payloads from the snapshot's, validateChangesAndCreateNewRegisters, looks different from MergeRegisterChanges in the Cadence 1.0 migration. Are there maybe some cases we're not properly handling yet?

turbolent · 2024-02-28T01:52:15Z

Re: Burner deployment is too expensive: I think @fxamacker you mentioned that creating a snapshot for all payloads is potentially intractable.

I opened onflow/flow-go#5470, to show that for the deployment migration, we can filter the payloads to the payloads of just the accounts that are needed for the deployment.

However, how can we merge the resulting write set into the original payloads without having to construct a full payload snapshot?

cc @onflow/flow-cadence-execution

turbolent · 2024-02-28T05:21:17Z

Cross posting from Discord, Re: Does the migration pipeline properly migrate nested containers?

The example I thought of was:
Given an outer container (e.g. dictionary) and and inner one, both their static types must be updated (by the static type migration, and again by the entitlements migration).
Both the static type migration (https://github.com/onflow/cadence/pull/3141/files#diff-1b50282851963b6af7e8c0baf33a448c9b6fdb4d405de6182626484310c8b5ffR151-R168) and the entitlements migration (

cadence/migrations/entitlements/migration.go

Lines 256 to 285 in 2dd12d3

    
           case *interpreter.DictionaryValue: 
        
           	elementType := v.Type 
        
           	entitledElementType, err := ConvertToEntitledType(inter, elementType) 
        
           	if err != nil { 
        
           		return nil, err 
        
           	} 
        
           	if entitledElementType != nil { 
        
           		var keysAndValues []interpreter.Value 
        
           		iterator := v.Iterator() 
        
           		for { 
        
           			keyValue, value := iterator.Next(inter) 
        
           			if keyValue == nil { 
        
           				break 
        
           			} 
        
           			keysAndValues = append(keysAndValues, keyValue) 
        
           			keysAndValues = append(keysAndValues, value) 
        
           		} 
        
           		return interpreter.NewDictionaryValueWithAddress( 
        
           			inter, 
        
           			interpreter.EmptyLocationRange, 
        
           			entitledElementType.(*interpreter.DictionaryStaticType), 
        
           			v.GetOwner(), 
        
           			keysAndValues..., 
        
           		), nil 
        
           	}

) create/return a new/replacement value, and do not recursively call the migration.

From what I can see (but I might be missing something), the "outer" storage migration (StorageMigration) only mutates the existing value to be migrated (

cadence/migrations/migration.go

Lines 245 to 336 in 2dd12d3

    
           case *interpreter.DictionaryValue: 
        
           	dictionary := value 
        
           	type keyValuePair struct { 
        
           		key, value interpreter.Value 
        
           	} 
        
           	// Read the keys first, so the iteration wouldn't be affected 
        
           	// by the modification of the nested values. 
        
           	var existingKeysAndValues []keyValuePair 
        
           	iterator := dictionary.Iterator() 
        
           	for { 
        
           		key, value := iterator.Next(nil) 
        
           		if key == nil { 
        
           			break 
        
           		} 
        
           		existingKeysAndValues = append( 
        
           			existingKeysAndValues, 
        
           			keyValuePair{ 
        
           				key:   key, 
        
           				value: value, 
        
           			}, 
        
           		) 
        
           	} 
        
           	for _, existingKeyAndValue := range existingKeysAndValues { 
        
           		existingKey := existingKeyAndValue.key 
        
           		existingValue := existingKeyAndValue.value 
        
           		newKey := m.MigrateNestedValue( 
        
           			storageKey, 
        
           			storageMapKey, 
        
           			existingKey, 
        
           			valueMigrations, 
        
           			reporter, 
        
           		) 
        
           		newValue := m.MigrateNestedValue( 
        
           			storageKey, 
        
           			storageMapKey, 
        
           			existingValue, 
        
           			valueMigrations, 
        
           			reporter, 
        
           		) 
        
           		if newKey == nil && newValue == nil { 
        
           			continue 
        
           		} 
        
           		// We only reach here at least one of key or value has been migrated. 
        
           		var keyToSet, valueToSet interpreter.Value 
        
           		if newKey == nil { 
        
           			keyToSet = existingKey 
        
           		} else { 
        
           			// Key was migrated. 
        
           			// Remove the old value at the old key. 
        
           			// This old value will be inserted again with the new key, unless the value is also migrated. 
        
           			existingKey = legacyKey(existingKey) 
        
           			oldValue := dictionary.Remove( 
        
           				m.interpreter, 
        
           				emptyLocationRange, 
        
           				existingKey, 
        
           			) 
        
           			if _, ok := oldValue.(*interpreter.SomeValue); !ok { 
        
           				panic(errors.NewUnexpectedError( 
        
           					"failed to remove old value for migrated key: %s", 
        
           					existingKey, 
        
           				)) 
        
           			} 
        
           			keyToSet = newKey 
        
           		} 
        
           		if newValue == nil { 
        
           			valueToSet = existingValue 
        
           		} else { 
        
           			// Value was migrated 
        
           			valueToSet = newValue 
        
           		} 
        
           		dictionary.Insert( 
        
           			m.interpreter, 
        
           			emptyLocationRange, 
        
           			keyToSet, 
        
           			valueToSet, 
        
           		) 
        
           	}

).
I'd assume to be that the outer storage migration code either
performs a post-order "traversal"/migration, i.e. migrates the child values of the container first, and then updates the outer replacement (if any); or
performs a pre-order "traversal"/migration, i.e. migrates the value (in this case container) first, then the children

I wasn't quite sure if we properly handle it and it would be good to have a test case for it (maybe we have one already)

SupunS · 2024-02-28T12:04:34Z

Added some tests and improved the comments here: #3142

turbolent · 2024-02-29T22:52:04Z

@janezpodhostnik @fxamacker I reran the util with race detector enabled on the machine and this time it reported data races while grouping accounts. Opened onflow/flow-go#5485. Could you please have a look?

turbolent · 2024-02-29T23:04:42Z

Adding atree storage health checks to the tests of the migrations in #3144 shows errors. Also opened onflow/flow-go#5486 in flow-go and it reports problems as well.

My hunch is that the container re-creation, as done by the static type and entitlement migration, is incorrect: We're just creating a new dictionary, and insert the elements of the container directly into it – we probably need to properly remove them from the old container, which transfers them to the "stack", and then insert them into the new container

fxamacker · 2024-03-01T01:30:24Z

@janezpodhostnik @fxamacker I reran the util with race detector enabled on the machine and this time it reported data races while grouping accounts. Opened onflow/flow-go#5485. Could you please have a look?

@turbolent I opened PR to fix the data race and added a test to reproduce it:

Fix data race in account grouping for account based migration flow-go#5488

The PR targets main branch since it is a bugfix.

turbolent · 2024-03-04T19:57:33Z

Running the migration with the latest fixes also reports a "Go panic" for an out-of-bound index:

ERR failed to run StorageMigration in account 9eca2b38b18b5dfe, domain contract, key FlowDKG: runtime error: index out of range [0] with length 0 migration=cadence-value-migration migration_index=0

@SupunS Could we maybe improve the storage migration to include a stack trace when we recover from panics, so we can pinpoint the problem?

The migration also runs in to "slab not found" errors, but those are probably due to the bugs we have already reported by the atree storage health checks:

ERR failed to run StorageMigration in account 9eca2b38b18b5dfe, domain contract, key FlowIDTableStaking: slab (0x9eca2b38b18b5dfe.49614) not found: slab not found for stored value migration=cadence-capability-value-migration migration_index=0

Finally, the progress logs for storing the final trie seemed odd, cc @onflow/flow-cadence-execution:

INF storing 9-th sub trie roots progress 6906506/5755421 (120.0%) elapsed: 4m17s, eta 0s checkpoint_file=/var/flow/c1/root.checkpoint trie_count=1 version=6

turbolent · 2024-03-04T22:29:53Z

@fxamacker In the migrations, we need to often update the static types of values, including containers (array, dictionaries, composites), i.e. the type info of atree arrays and ordered maps.

Do you think we could add support for changing the type info of atree arrays and ordered maps?

I saw that OrderedMap has Type() to get the TypeInfo, which in turn uses MapSlab.ExtraData(). There is also a MapSlab.SetExtraData(), but it's not (yet) exposed via a OrderedMap.SetType().

fxamacker · 2024-03-04T23:36:16Z

Do you think we could add support for changing the type info of atree arrays and ordered maps?

@turbolent Yes, I can add Array.SetType() and OrderedMap.SetType() in atree.

Are these functions needed for both atree-inlining branch and master branch in atree repo?

turbolent · 2024-03-05T01:04:19Z

@fxamacker Awesome! That would be extremely useful 🙏

Are these functions needed for both atree-inlining branch and master branch in atree repo?

Yes, those functions will be needed for both the current atree version used on master (higher priority), and in the register-inlining version of atree used in the atree register-inlining branch (we can port the functions there later)

fxamacker · 2024-03-05T20:28:15Z

Do you think we could add support for changing the type info of atree arrays and ordered maps?

@turbolent I opened PRs at onflow/atree to add SetType for atree containers (without inlining):

Yes, those functions will be needed for both the current atree version used on master (higher priority)

Since Cadence master branch is using atree feature/stable-cadence branch, I made the new PRs target feature/stable-cadence branch (instead of master).

github.com/onflow/atree v0.6.1-0.20230711151834-86040b30171f

in the register-inlining version of atree used in the atree register-inlining branch (we can port the functions there later)

The same functions probably can't be ported to atree-inlining version because inlined container needs to notify parent container so parent container slab can be persisted with new inlined static type information.

I will open new PRs to add SetType implemented differently for atree-inlining version.

turbolent · 2024-03-05T23:16:06Z

@fxamacker Awesome! Thank you for adding these 👏 👌

turbolent · 2024-03-07T16:56:32Z

The TN state service account is massive (5GB!), because of the random beacon history, see https://discord.com/channels/613813861610684416/1215015019260158022. I upgraded my machine at home to 64GB of memory and that still doesn't seem to be enough.

I'll try to add a migration to the beginning of the migration pipeline which prunes the random beacon history.

turbolent · 2024-03-10T00:26:55Z

All items so far have been addressed, and this issue is getting long.

Closing and opened #3162 for follow-up work that was discovered

fxamacker · 2024-03-10T00:53:58Z

Nice! 👏

turbolent mentioned this issue Feb 12, 2024

Cadence 1.0 (formerly known as "Stable Cadence") #2642

Open

j1010001 modified the milestones: OKR_C2-KR2-C1.0-Preview_launch, OKR_C2_KR2-C1.0_Preview_launch, OKR_C2_KR1-C1.0_Upgrade_testing Feb 13, 2024

j1010001 assigned turbolent Feb 15, 2024

This was referenced Feb 23, 2024

Improve Cadence 1.0 migration onflow/flow-go#5388

Merged

Fix migrating values with empty intersection type #3138

Merged

turbolent mentioned this issue Feb 27, 2024

[Cadence 1.0] Make staged contract code update migration aware of updated system contracts onflow/flow-go#5461

Closed

turbolent mentioned this issue Feb 28, 2024

Optimize deployment migration onflow/flow-go#5470

Merged

turbolent mentioned this issue Feb 28, 2024

Migrate static types of arrays and dictionaries #3141

Merged

6 tasks

SupunS mentioned this issue Feb 28, 2024

Add test for nested container value migration #3142

Merged

6 tasks

turbolent mentioned this issue Feb 29, 2024

Improve migrations #3144

Merged

6 tasks

This was referenced Feb 29, 2024

[Cadence 1.0] Perform atree storage health check after migrating account onflow/flow-go#5486

Closed

Add util command to generate payloads for bootsrapped execution state onflow/flow-go#5487

Merged

fxamacker mentioned this issue Mar 1, 2024

Fix data race in account grouping for account based migration onflow/flow-go#5488

Merged

This was referenced Mar 5, 2024

Allow updating static types of Array onflow/atree#372

Closed

Allow updating static types of OrderedMap onflow/atree#373

Closed

turbolent mentioned this issue Mar 6, 2024

Improve errors in state migration #3154

Merged

6 tasks

This was referenced Mar 8, 2024

Optimize storage migration: Allow skipping of values #3157

Merged

Improve static type and entitlements migrations: Update array/dictionary type directly #3158

Merged

turbolent closed this as completed Mar 10, 2024

fxamacker mentioned this issue Mar 11, 2024

Reduce RAM and persistent storage by deduplicating inlined dict type info onflow/atree#369

Merged

6 tasks

This was referenced Mar 12, 2024

Improve migration #3169

Merged

Optimize string migration: only return new value if needed #3173

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Run full Cadence 1.0 migration on TN state #3096

Run full Cadence 1.0 migration on TN state #3096

turbolent commented Feb 12, 2024 •

edited

Loading

Tasks

turbolent commented Feb 26, 2024

turbolent commented Feb 26, 2024

turbolent commented Feb 27, 2024

janezpodhostnik commented Feb 27, 2024

fxamacker commented Feb 27, 2024

turbolent commented Feb 27, 2024

turbolent commented Feb 28, 2024 •

edited

Loading

turbolent commented Feb 28, 2024

SupunS commented Feb 28, 2024

turbolent commented Feb 29, 2024

turbolent commented Feb 29, 2024 •

edited

Loading

fxamacker commented Mar 1, 2024

turbolent commented Mar 4, 2024

turbolent commented Mar 4, 2024 •

edited

Loading

fxamacker commented Mar 4, 2024

turbolent commented Mar 5, 2024 •

edited

Loading

fxamacker commented Mar 5, 2024

turbolent commented Mar 5, 2024

turbolent commented Mar 7, 2024

turbolent commented Mar 10, 2024 •

edited

Loading

fxamacker commented Mar 10, 2024

Run full Cadence 1.0 migration on TN state #3096

Run full Cadence 1.0 migration on TN state #3096

Comments

turbolent commented Feb 12, 2024 • edited Loading

Tasks

turbolent commented Feb 26, 2024

turbolent commented Feb 26, 2024

turbolent commented Feb 27, 2024

janezpodhostnik commented Feb 27, 2024

fxamacker commented Feb 27, 2024

turbolent commented Feb 27, 2024

turbolent commented Feb 28, 2024 • edited Loading

turbolent commented Feb 28, 2024

SupunS commented Feb 28, 2024

turbolent commented Feb 29, 2024

turbolent commented Feb 29, 2024 • edited Loading

fxamacker commented Mar 1, 2024

turbolent commented Mar 4, 2024

turbolent commented Mar 4, 2024 • edited Loading

fxamacker commented Mar 4, 2024

turbolent commented Mar 5, 2024 • edited Loading

fxamacker commented Mar 5, 2024

turbolent commented Mar 5, 2024

turbolent commented Mar 7, 2024

turbolent commented Mar 10, 2024 • edited Loading

fxamacker commented Mar 10, 2024

turbolent commented Feb 12, 2024 •

edited

Loading

turbolent commented Feb 28, 2024 •

edited

Loading

turbolent commented Feb 29, 2024 •

edited

Loading

turbolent commented Mar 4, 2024 •

edited

Loading

turbolent commented Mar 5, 2024 •

edited

Loading

turbolent commented Mar 10, 2024 •

edited

Loading