Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge feature/array-map-inlining (atree inlining feature branch) to main #429

Merged
merged 151 commits into from
Jul 29, 2024

Conversation

fxamacker
Copy link
Member

@fxamacker fxamacker commented Jul 26, 2024

Updates #292

Merge feature/array-map-inlining branch to main.

This is a fast-forward merge.

See also:

Estimated Atree Inlining Results with Cadence 0.42

Results are data dependent for memory use reduction on execution nodes. We may see very roughly:

  • 50% reduction for testnet peak memory use
  • 40% reduction for mainnet peak memory use
    • peak RAM use reduced by several hundred GB on each execution node

In addition to operational RAM reduction:

  • faster block execution speed
  • SSD storage use will also be reduced (e.g. checkpoint file sizes, etc.)
  • the growth rate of memory use will also be reduced to help prevent OOM crashes
  • databases storing payloads benefit from fewer payloads (smaller db, index, and cache)

Impact on memory and storage size

image

image

Impact on payload count and node count

image

image


  • Targeted PR against main branch
  • Linked to Github issue with discussion and accepted design OR link to spec that describes this work
  • Code follows the standards mentioned here
  • Updated relevant documentation
  • Re-reviewed Files changed in the Github PR explorer
  • Added appropriate labels

fxamacker and others added 30 commits September 14, 2023 11:47
Currently, every array or map are stored in its own slab and
parent slab refers to child array or map by SlabID.

The current approach can lead to many small slabs, especially for
Cadence data structures with multiple nested levels.

This commit inlines child array/map in parent slab when:
- child array/map fits in one slab (root slab is data slab)
- encoded size of inlined child array/map is less than the
  max inline size limit enforced by parent

This commit optimizes encoding size by:
- reusing inlined array types
- reusing seed, digests, and field names of inlined composite types

Also update debugging code to handle inlined array/map element.
Currently, in parentUpdater callback, parent array/map resets same
child value.

Child value ID should match overwritten SlabIDStorable
or Slab.SlabID().

This commit adds check to make sure same child value is being reset.
This commit adds callback notification in the set or inserted child
element in array.  So if child element is modified after Array.Set()
or Array.Insert(), changes to child element is properly reflected
in the parent array.

This commit doesn't appear to be needed by current version of Cadence
but it helps reduce Atree's dependence on an implementation detail
of Cadence.
This commit adds callback notification in the set child element in map.
So if child element is modified after OrderedMap.Set() , changes to
child element is properly reflected in the parent map.

This commit doesn't appear to be needed by current version of Cadence
but it helps reduce Atree's dependence on an implementation detail
of Cadence.
Currently, deduplication feature in PR 342 (not merged yet)
does not support Cadence attachments.

Add support for Cadence attachments and other future
use cases by using type ID and sorted field names
instead of field count.

While at it, also refactor encoding composite type to
reduce traversal and type assertion.
Decoded ExtraData (including TypeInfo) can be shared by all
inlined slabs.  This commit creates a new ExtraData with copied
TypeInfo for inlined slabs to prevent accidental mutation.
Decoded ExtraData (including keys) can be shared by all
inlined composite referring to the same type.

This commit copies key for inlined composite to prevent
accidental mutation.
Some of the comments were taken from Atree's README
which was originally authored by Ramtin.

Co-authored-by: Ramtin M. Seraj <ramtinms@users.noreply.github.com>
While at it, also refactored validMapSlab().
While at it, also refactored validArraySlab().
Currently, ValidArray() doesn't verify that inlined array slabs
are in not storage.

It is called by tests in Atree and Cadence, so update it
to check if inlined array slabs are in storage.
fxamacker added 19 commits May 13, 2024 14:18
This change reduces number of lines in the function but is
not expected to yield significant speed improvements.
…st-commit

Add NonderterministicFastCommit to speed up migrations when ordering isn't required
Add BatchPreload to decode slabs in parallel and cache
This commit returns ReadOnlyIteratorElementMutationError when
elements of readonly map iterator are mutated. Also, a
callback can be provided by the caller to log or debug
such mutations with more context.

As always, mutation of elements from readonly iterators are
not guaranteed to persist.

Instead of relying on other projects not to mutate readonly elements,
this commit returns ReadOnlyIteratorElementMutationError when
elements from readonly iterators are mutated.

This commit also adds readonly iterator functions that receive
mutation callbacks.  Callbacks are useful for logging, etc. with
more context when mutation occurs.  Mutation handling is the same
with or without callbacks.  If needed, other projects using atree
can choose to panic in the callback when mutation is detected.

If elements from readonly iterators are mutated:
- those changes are not guaranteed to persist.
- mutation functions of child containers return
  ReadOnlyIteratorElementMutationError.
- ReadOnlyMapIteratorMutationCallback are called if provided
This commit returns ReadOnlyIteratorElementMutationError when
elements of readonly array iterator are mutated. Also, a
callback can be provided by the caller to log or debug
such mutations with more context.

As always, mutation of elements from readonly iterators are
not guaranteed to persist.

Instead of relying on other projects not to mutate readonly elements,
this commit returns ReadOnlyIteratorElementMutationError when
elements from readonly iterators are mutated.

This commit also adds readonly iterator functions that receive
mutation callbacks.  Callbacks are useful for logging, etc. with
more context when mutation occurs.  Mutation handling is the same
with or without callbacks.  If needed, other projects using atree
can choose to panic in the callback when mutation is detected.

If elements from readonly iterators are mutated:
- those changes are not guaranteed to persist.
- mutation functions of child containers return
  ReadOnlyIteratorElementMutationError.
- ReadOnlyMapIteratorMutationCallback are called if provided
…o-readonly-map-iterator

Check mutation of elements from readonly map iterator
…o-readonly-array-iterator

Check mutation of elements from readonly array iterator
Make it clearer that encoded slabs are still deterministic, so
array and map iterations will remain deterministic.

Only the sequence of changed slabs getting committed is
nondeterministic.

This is useful for migration programs that don't require
commit sequence of slabs to be deterministic while
still preserving deterministic encoding of slabs
(e.g. iteration of arrays and maps remain deterministic).
…eterministicfastcommit

Update comment for NondeterministicFastCommit
Update copyright notice to Flow Foundation
Update feature/array-map-inlining (atree inlining feature branch)
@fxamacker fxamacker self-assigned this Jul 26, 2024
@fxamacker fxamacker changed the title Merge feature/array-map-inlining (atree inlining feature branch) to master Merge feature/array-map-inlining (atree inlining feature branch) to main Jul 26, 2024
Copy link
Member

@turbolent turbolent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉 🎉 🎉

Awesome to see this get merged, fantastic work Faye! 👏 👏 👏

@fxamacker
Copy link
Member Author

I started long-running smoke tests yesterday and no problems were detected.

I'll continue checking the running smoke tests (on n1-standard-4) from time to time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants