[MTG-12] merging the asset data into a single column with change to flatbuffer for serialization #298

StanChe · 2024-10-30T18:34:33Z

This PR restructures the asset data to store it within a single column, transitioning from the previous format to Flatbuffers for serialization. Key enhancements include:

•	Performance Gains: Major optimizations in batch dumping, asset retrieval (both batch and individual), and asset index fetching, cutting times by up to 50% in certain cases.
•	New Merging Functionality: Introduces Flatbuffer-based merging, with slight trade-offs in merge speed but added flexibility.

This rework aims for more efficient data handling and improved asset synchronization across large datasets.

This PR introduces performance improvements to asset retrieval and database operations, as well as new merging capabilities. Here is a breakdown of key benchmarks comparing the old code (pre-optimization) and the updated code (post-optimization):

1.	Dumping Group (2k batch size) with a database of 1M assets:
•	Before: 10.282–10.430 seconds
•	After: 5.323–5.431 seconds
•	Outcome: Approximately a 50% reduction in time, indicating significant optimization in handling large batch operations.
2.	Batch Retrieval of Assets (1000 assets):
•	Before: 9.0335–9.7307 ms
•	After: 5.1584–5.4180 ms
•	Outcome: Improved performance with a reduction of around 40-45% in retrieval time, showing enhanced efficiency in batch asset retrieval.
3.	Individual Asset Retrieval (1000 individual gets):
•	Before: 82.344–90.889 ms
•	After: 29.979–36.333 ms
•	Outcome: Over a 50% decrease in retrieval time for individual assets, streamlining single asset requests substantially.
4.	Asset Index Retrieval:
•	Before: 5.6599–5.7292 ms
•	After: 4.4278–4.5293 ms
•	Outcome: Around 20% improvement, but still shows a slight gain in retrieving asset indexes.

New Merging Functionality and Performance

The PR also adds a merging function:

•	Merging with a Bincode Object:1.6429-1.6471 ms, similar to previous performance as no merge existed for identical objects.
•	Merging with a Flatbuffer Object (using a simpler merger): 4.1065–4.1171 ms.

Note on Downgrade: There is a slight performance downgrade in merge performance when using the Flatbuffer object due to the simpler merger’s handling complexity. However, this adds functionality not previously available.

In summary, this PR yields substantial performance gains in batch and individual asset retrieval while introducing new merging options, particularly useful for scenarios requiring optimized data processing and merging flexibility within large datasets.

Some benchmarks to include:
old code before performing:
Dumping Group/2k batch size (db with 1M assets)
time: [10.282 s 10.352 s 10.430 s]
get_assets (1000 assets batch get)
time: [9.0335 ms 9.4210 ms 9.7307 ms]
get_assets_individually (1000 gets of asset)
time: [82.344 ms 85.425 ms 90.889 ms]
get_asset_indexes
time: [5.6599 ms 5.7022 ms 5.7292 ms]
after the refactoring:
Dumping Group/2k batch size
time: [5.2559 s 5.3231 s 5.3912 s]
get_assets
time: [5.1584 ms 5.2776 ms 5.4180 ms]
get_assets_individually
time: [29.979 ms 33.406 ms 36.333 ms]
get_asset_indexes
time: [4.4278 ms 4.4707 ms 4.5293 ms]

There is a downgrade in merge performance - 1000 merges for the same object:
merge with a bincode object - similar to previous as the old code didn't have a merge for the same objects
time: [1.6429 ms 1.6450 ms 1.6471 ms]
merge with a flatbuffer object via a simpler merger - the new code
time: [4.1065 ms 4.1116 ms 4.1171 ms]

Note: percent values on screenshots should be ignored, only the time measurements are relevant

* feat: add more metrics to the synchronizer * feat: more metrics

* feat: move fung dump to different thread * feat: add more metrics

* feat: save fungible updates in regular sync * feat: change migration file and add indexes drop/create during dump load * fix: fungible tokens * chore: code style

Now all the assets-related data is persisted in a single column

…ment over the reference, or over 50% over the initial merge

Current benchmarking results for 1M assets: Dumping Group/2k batch size time: [4.8857 s 4.9288 s 4.9785 s] Dumping Group/get_assets time: [5.0407 ms 5.0694 ms 5.0968 ms] Dumping Group/get_assets_individually (taking 1000 assets in a loop one after another) time: [31.137 ms 33.730 ms 38.050 ms]

…MTG-12-flatbuffer-merge # Conflicts: # nft_ingester/src/api/dapi/change_logs.rs # rocks-db/src/batch_client.rs # rocks-db/src/dump_client.rs # rocks-db/src/storage_traits.rs

StanChe · 2024-10-31T09:32:07Z

rocks-db/Cargo.toml

@@ -41,6 +41,7 @@ usecase = { path = "../usecase" }
 tempfile = { workspace = true }
 bubblegum-batch-sdk = { git = "https://github.com/metaplex-foundation/bubblegum-batch-sdk.git", rev = "0d529f5" }
 num-traits = { workspace = true }
+flatbuffers = { version="24.3.25", features = ["serialize"]}


This version is used as the generated code seems to be compatible with it, but not with 23... used in the project. It's either the code may be regenerated with 23, or existing updated to 24, which I'm not aware will be easy/possible.

rocks-db/src/asset_streaming_client.rs

StanChe · 2024-10-31T10:07:36Z

rocks-db/src/column.rs

@@ -377,6 +377,18 @@ where
            .collect::<Vec<_>>()
    }

+    pub fn pairs_iterator<'a>(


a handy method that's unused by this PR, it was used for iterations with a previous refactor

StanChe · 2024-10-31T10:11:06Z

rocks-db/src/migrator.rs

@@ -74,6 +94,48 @@ impl Storage {
            .await?;
        Ok(())
    }
+
+    pub async fn apply_migration_merge(&self) -> Result<()> {


This migration is not fully integrated and is not called by default.

Added slot to data generation in tests added a valid onchain data in test generation fixed a no longer valid test

…etadata. fixed another test setup

…ests

…MTG-12-flatbuffer-merge + updated the data read to get dynamic details # Conflicts: # rocks-db/src/bin/leaf_checker/main.rs

…MTG-12-flatbuffer-merge # Conflicts: # rocks-db/Cargo.toml

…MTG-12-flatbuffer-merge # Conflicts: # nft_ingester/tests/process_accounts.rs

StanChe and others added 26 commits October 10, 2024 13:32

drop the mutex as soon as needed

e79ccd4

spawn_blocking instead of the regular spawn for blocking code

9e63ab2

removed parallel processing from synchronizer reading

91bfdee

More metrics for synchronizer (#281)

b629102

* feat: add more metrics to the synchronizer * feat: more metrics

MTG-742 Move fungible assets dump to different thread (#282)

52e511a

* feat: move fung dump to different thread * feat: add more metrics

MTG-747 Save updates for fungible assets during regular sync (#283)

f61d79d

* feat: save fungible updates in regular sync * feat: change migration file and add indexes drop/create during dump load * fix: fungible tokens * chore: code style

feat: pass optional hashMap to get_asset_indexes

06d85a8

fix: optimize memory usage

fbc242d

a complete rework of the rocks storage

78b291f

Now all the assets-related data is persisted in a single column

renamed the merge function

d1963f0

flatbuffer types + merged types + optimized dump client = 30% improve…

824fc1f

…ment over the reference, or over 50% over the initial merge

batch client using flatbuffer

a05855d

insert_gaped_data updated in batch client

d7d9b6b

minor refactor

31aca0f

more benchmarks

edcff2f

get_asset_indexes optimized

936ebda

full refactor to use flatbuffer

ff7e94e

code cleanup

89a6571

crazy merge function

73531ff

moved the code around

4221d94

merge function tests

522cb1a

more randomized assets generation in tests

885fbba

another attempt to the merge function

3ea1df2

using the recent iteration of merge function

de2e42f

Merge commit '88e5d12b86483126a5aa571072dca93b5d4df52f' into feature/…

98eb391

…MTG-12-flatbuffer-merge # Conflicts: # nft_ingester/src/api/dapi/change_logs.rs # rocks-db/src/batch_client.rs # rocks-db/src/dump_client.rs # rocks-db/src/storage_traits.rs

StanChe commented Oct 31, 2024

View reviewed changes

rocks-db/src/asset_streaming_client.rs Outdated Show resolved Hide resolved

updated streaming client to use flatbuffers

44df86e

StanChe commented Oct 31, 2024

View reviewed changes

StanChe added 4 commits October 31, 2024 11:18

fixed tests

53c4a68

fixed a test and dropped a debug line from another test

ce73e95

Added slot to data generation in tests added a valid onchain data in test generation fixed a no longer valid test

cleanup benches

1e25381

fixed the imports

5247299

StanChe mentioned this pull request Oct 31, 2024

Asset data in RocksDB merged into a single column #290

Closed

StanChe added 7 commits November 1, 2024 12:49

fixed migrator, some tests

08062da

fixed an issue with mpl core collections getting into response with m…

ab77ec4

…etadata. fixed another test setup

dump client fixed for fungibles

7342535

fix for tests setup accounting for authority being optional in some t…

4123d05

…ests

Merge commit '962eaf9560de83ae1f917ec168b461666c3d7000' into feature/…

fd0ca32

…MTG-12-flatbuffer-merge + updated the data read to get dynamic details # Conflicts: # rocks-db/src/bin/leaf_checker/main.rs

making fmt and clippy happier

cae65de

less randomness in tests

620d842

StanChe marked this pull request as ready for review November 4, 2024 15:17

StanChe added 3 commits November 4, 2024 15:33

dropped some commented out code

3474b40

Merge commit 'b410996aaf3584ed572b2dea699d2a66f94359b3' into feature/…

d8855c4

…MTG-12-flatbuffer-merge # Conflicts: # rocks-db/Cargo.toml

Merge commit 'd2f3e0af6b2cb438a77f45e640e2e5731c1f0977' into feature/…

d4baf7d

…MTG-12-flatbuffer-merge # Conflicts: # nft_ingester/tests/process_accounts.rs

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MTG-12] merging the asset data into a single column with change to flatbuffer for serialization #298

[MTG-12] merging the asset data into a single column with change to flatbuffer for serialization #298

StanChe commented Oct 30, 2024 •

edited

Loading

StanChe Oct 31, 2024

StanChe Oct 31, 2024

StanChe Oct 31, 2024

[MTG-12] merging the asset data into a single column with change to flatbuffer for serialization #298

Are you sure you want to change the base?

[MTG-12] merging the asset data into a single column with change to flatbuffer for serialization #298

Conversation

StanChe commented Oct 30, 2024 • edited Loading

StanChe Oct 31, 2024

Choose a reason for hiding this comment

StanChe Oct 31, 2024

Choose a reason for hiding this comment

StanChe Oct 31, 2024

Choose a reason for hiding this comment

StanChe commented Oct 30, 2024 •

edited

Loading