Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MTG-12] merging the asset data into a single column with change to flatbuffer for serialization #298

Open
wants to merge 41 commits into
base: main
Choose a base branch
from

Conversation

StanChe
Copy link
Contributor

@StanChe StanChe commented Oct 30, 2024

This PR restructures the asset data to store it within a single column, transitioning from the previous format to Flatbuffers for serialization. Key enhancements include:

•	Performance Gains: Major optimizations in batch dumping, asset retrieval (both batch and individual), and asset index fetching, cutting times by up to 50% in certain cases.
•	New Merging Functionality: Introduces Flatbuffer-based merging, with slight trade-offs in merge speed but added flexibility.

This rework aims for more efficient data handling and improved asset synchronization across large datasets.

This PR introduces performance improvements to asset retrieval and database operations, as well as new merging capabilities. Here is a breakdown of key benchmarks comparing the old code (pre-optimization) and the updated code (post-optimization):

1.	Dumping Group (2k batch size) with a database of 1M assets:
•	Before: 10.282–10.430 seconds
•	After: 5.323–5.431 seconds
•	Outcome: Approximately a 50% reduction in time, indicating significant optimization in handling large batch operations.
2.	Batch Retrieval of Assets (1000 assets):
•	Before: 9.0335–9.7307 ms
•	After: 5.1584–5.4180 ms
•	Outcome: Improved performance with a reduction of around 40-45% in retrieval time, showing enhanced efficiency in batch asset retrieval.
3.	Individual Asset Retrieval (1000 individual gets):
•	Before: 82.344–90.889 ms
•	After: 29.979–36.333 ms
•	Outcome: Over a 50% decrease in retrieval time for individual assets, streamlining single asset requests substantially.
4.	Asset Index Retrieval:
•	Before: 5.6599–5.7292 ms
•	After: 4.4278–4.5293 ms
•	Outcome: Around 20% improvement, but still shows a slight gain in retrieving asset indexes.

New Merging Functionality and Performance

The PR also adds a merging function:

•	Merging with a Bincode Object:1.6429-1.6471 ms, similar to previous performance as no merge existed for identical objects.
•	Merging with a Flatbuffer Object (using a simpler merger): 4.1065–4.1171 ms.

Note on Downgrade: There is a slight performance downgrade in merge performance when using the Flatbuffer object due to the simpler merger’s handling complexity. However, this adds functionality not previously available.

In summary, this PR yields substantial performance gains in batch and individual asset retrieval while introducing new merging options, particularly useful for scenarios requiring optimized data processing and merging flexibility within large datasets.

Some benchmarks to include:
old code before performing:
Dumping Group/2k batch size (db with 1M assets)
time: [10.282 s 10.352 s 10.430 s]
get_assets (1000 assets batch get)
time: [9.0335 ms 9.4210 ms 9.7307 ms]
get_assets_individually (1000 gets of asset)
time: [82.344 ms 85.425 ms 90.889 ms]
get_asset_indexes
time: [5.6599 ms 5.7022 ms 5.7292 ms]
after the refactoring:
Dumping Group/2k batch size
time: [5.2559 s 5.3231 s 5.3912 s]
get_assets
time: [5.1584 ms 5.2776 ms 5.4180 ms]
get_assets_individually
time: [29.979 ms 33.406 ms 36.333 ms]
get_asset_indexes
time: [4.4278 ms 4.4707 ms 4.5293 ms]

There is a downgrade in merge performance - 1000 merges for the same object:
merge with a bincode object - similar to previous as the old code didn't have a merge for the same objects
time: [1.6429 ms 1.6450 ms 1.6471 ms]
merge with a flatbuffer object via a simpler merger - the new code
time: [4.1065 ms 4.1116 ms 4.1171 ms]
image
image
Note: percent values on screenshots should be ignored, only the time measurements are relevant

StanChe and others added 26 commits October 10, 2024 13:32
* feat: add more metrics to the synchronizer

* feat: more metrics
* feat: move fung dump to different thread

* feat: add more metrics
* feat: save fungible updates in regular sync

* feat: change migration file and add indexes drop/create during dump load

* fix: fungible tokens

* chore: code style
Now all the assets-related data is persisted in a single column
…ment over the reference, or over 50% over the initial merge
Current benchmarking results for 1M assets:
Dumping Group/2k batch size
                        time:   [4.8857 s 4.9288 s 4.9785 s]
Dumping Group/get_assets
                        time:   [5.0407 ms 5.0694 ms 5.0968 ms]
Dumping Group/get_assets_individually (taking 1000 assets in a loop one after another)
                        time:   [31.137 ms 33.730 ms 38.050 ms]
…MTG-12-flatbuffer-merge

# Conflicts:
#	nft_ingester/src/api/dapi/change_logs.rs
#	rocks-db/src/batch_client.rs
#	rocks-db/src/dump_client.rs
#	rocks-db/src/storage_traits.rs
@@ -41,6 +41,7 @@ usecase = { path = "../usecase" }
tempfile = { workspace = true }
bubblegum-batch-sdk = { git = "https://github.com/metaplex-foundation/bubblegum-batch-sdk.git", rev = "0d529f5" }
num-traits = { workspace = true }
flatbuffers = { version="24.3.25", features = ["serialize"]}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This version is used as the generated code seems to be compatible with it, but not with 23... used in the project. It's either the code may be regenerated with 23, or existing updated to 24, which I'm not aware will be easy/possible.

@@ -377,6 +377,18 @@ where
.collect::<Vec<_>>()
}

pub fn pairs_iterator<'a>(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a handy method that's unused by this PR, it was used for iterations with a previous refactor

@@ -74,6 +94,48 @@ impl Storage {
.await?;
Ok(())
}

pub async fn apply_migration_merge(&self) -> Result<()> {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This migration is not fully integrated and is not called by default.

Added slot to data generation in tests
added a valid onchain data in test generation
fixed a no longer valid test
@StanChe StanChe marked this pull request as ready for review November 4, 2024 15:17
…MTG-12-flatbuffer-merge

# Conflicts:
#	rocks-db/Cargo.toml
…MTG-12-flatbuffer-merge

# Conflicts:
#	nft_ingester/tests/process_accounts.rs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants