Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flat storage MVP #7327

Closed
26 tasks done
exalate-issue-sync bot opened this issue Aug 3, 2022 · 7 comments
Closed
26 tasks done

Flat storage MVP #7327

exalate-issue-sync bot opened this issue Aug 3, 2022 · 7 comments
Assignees
Labels
A-storage Area: storage and databases C-tracking-issue Category: a tracking issue

Comments

@exalate-issue-sync
Copy link

exalate-issue-sync bot commented Aug 3, 2022

Owner: Alex L

Requires completion of READ only flat storage. Tracking issue for flat storage #7327

LoE for Q1 includes:

Picture: https://miro.com/app/board/uXjVP3ziycI=/

Implementation roadmap for Flat Storage (FS)

This section is kept up-to-date to provide a high-level progress overview for anyone not directly involved in the implementation.
We used it when we didn't have github tracking issues for subtask. Detailed subtasks are also tracked on Jira.

  1. Basic flat storage (DONE)

    • implement state lookup that reads from flat state
    • keep flat state in sync when we write to state
    • implement deltas to deal with forks
  2. Data migration (DONE)

  3. Fully operational read-only FS (DONE)

  4. Add docs and update existing documentation to include flat storage (DONE)

There is also a blocker for shipping: #8032. Without it, undercharging coming from non-charged TTNs can't be controlled, but TTN parameter = 10 would help.

Links

Existing documentation:

https://nearinc.atlassian.net/wiki/spaces/EAP/pages/58556568/Flat+Storage

https://docs.google.com/document/d/1Tla2JtHHc6rJwZPUOF0skoaLiExrcV1PW2F65llfHG8/edit#heading=h.2r15ioo581ig

Related works and discussions:

Main flat storage discussion channel

https://nearinc.atlassian.net/browse/CRT-8 and subsequently linked tasks

https://nearinc.atlassian.net/browse/CRT-86

Picture of flat storage structure

https://near.zulipchat.com/#narrow/stream/329092-pagoda.2Fstorage.2Fprivate/topic/Action.20receipt.20undercharging

@exalate-issue-sync exalate-issue-sync bot added T-core Team: issues relevant to the core team T-storage labels Aug 3, 2022
@exalate-issue-sync exalate-issue-sync bot reopened this Aug 3, 2022
@Longarithm Longarithm removed the T-core Team: issues relevant to the core team label Aug 3, 2022
This was referenced Aug 4, 2022
near-bulldozer bot pushed a commit that referenced this issue Aug 9, 2022
First step towards flat storage: #7327
Continue work here because #7295 was broken.

Here we start building it under protocol feature without adding it to nightly. The goal is to support it for localnet with one node which never sends skip approvals, so later we could extend it to more nodes and eventually implement a migration to mainnet.

Now, everyone should be able to build neard with this feature, set `max_block_production_delay` to something big and run localnet.

I currently couldn't make all tests work with enabled feature, e.g. `test_care_about_shard`, but I open it for review because I don't think it affects the idea.

## Testing

* Check that encoding and decoding value reference gives expected results. `ValueRef` can be reused in other places in code
* Check that flat state is used in regular trie handler and not used in view trie handler
* Check that after block processing, getting value from regular trie and view trie gives the same result
nikurt pushed a commit that referenced this issue Aug 11, 2022
First step towards flat storage: #7327
Continue work here because #7295 was broken.

Here we start building it under protocol feature without adding it to nightly. The goal is to support it for localnet with one node which never sends skip approvals, so later we could extend it to more nodes and eventually implement a migration to mainnet.

Now, everyone should be able to build neard with this feature, set `max_block_production_delay` to something big and run localnet.

I currently couldn't make all tests work with enabled feature, e.g. `test_care_about_shard`, but I open it for review because I don't think it affects the idea.

## Testing

* Check that encoding and decoding value reference gives expected results. `ValueRef` can be reused in other places in code
* Check that flat state is used in regular trie handler and not used in view trie handler
* Check that after block processing, getting value from regular trie and view trie gives the same result
@jakmeier jakmeier added the C-tracking-issue Category: a tracking issue label Nov 9, 2022
mina86 added a commit to mina86/nearcore that referenced this issue Dec 9, 2022
As per the TODO, take `&[u8; 36]` in ValueRef::decode and stop using
Cursor.  This makes the method considerably simpler without making the
call site more complex.  While dealing with ValueRef, stop using
Cursor in account_id_to_shard_id as well.

Issue: near#7327
near-bulldozer bot pushed a commit that referenced this issue Dec 14, 2022
As per the TODO, take `&[u8; 36]` in ValueRef::decode and stop using
Cursor.  This makes the method considerably simpler without making the
call site more complex.  While dealing with ValueRef, stop using
Cursor in account_id_to_shard_id as well.

Issue: #7327
@bowenwang1996
Copy link
Collaborator

@Longarithm is the progress in this tracking issue up to date?

@Longarithm
Copy link
Member

Yes. The PR for catchups needs some polishing to be merged #8193, tests for migration and proper flat state removal are TODO, other items weren't started yet.

nikurt pushed a commit to nikurt/nearcore that referenced this issue Dec 19, 2022
As per the TODO, take `&[u8; 36]` in ValueRef::decode and stop using
Cursor.  This makes the method considerably simpler without making the
call site more complex.  While dealing with ValueRef, stop using
Cursor in account_id_to_shard_id as well.

Issue: near#7327
@exalate-issue-sync exalate-issue-sync bot changed the title Build and ship flat storage Flat storage MVP Dec 20, 2022
@walnut-the-cat walnut-the-cat assigned pugachAG and unassigned pugachAG Feb 13, 2023
@pugachAG
Copy link
Contributor

pugachAG commented Feb 22, 2023

Progress update (created together with @Longarithm):

Total progress for Q1: 45%

cc @walnut-the-cat

@walnut-the-cat
Copy link
Contributor

#8531 is now merged. Does it complete #8398?

@Longarithm
Copy link
Member

Longarithm commented Mar 6, 2023

Yes, closing

@jakmeier
Copy link
Contributor

A new issue related to FS: #8791

I don't think we should block FS on it, since it functionally does not depend on it. We just have to make sure #8791 also gets done before the next release cut, which should be more than enough time.

@Longarithm
Copy link
Member

We completed the work for flat storage reads. There are some remaining tasks to release it in 1.34, which I listed in #8827. We plan to spend some time on it in Q2, as 1.34 is expected to happen in Q2.

near-bulldozer bot pushed a commit that referenced this issue Apr 21, 2023
The code is literally removing `protocol_feature_flat_state` and moving feature to stable protocol. We also disable `test_state_sync` as this is part of refactor we can do in Q2.

## Feature to stabilize

Here we stabilize Flat Storage for reads, which means that all state reads in the client during block processing will query flat storage instead of Trie. Flat Storage is another index for blockchain state, reducing number of DB accesses for state read from `2 * key.len()` in the worst case to 2.

This will trigger background creation of flat storage, using 8 threads and finishing in 15h for RPC node and 2d for archival node. After that all non-contract reads will go through flat storage, for which special "chunk views" will be created. When protocol upgrade happens, contracts reads will go through flat storage as well. Also compute costs will change as Option 3 suggests [here](#8006 (comment)). It is to be merged separately, but we need to ensure that both costs change and flat storage go into next release together.

## Context

Find more details in:
- Overview: https://near.github.io/nearcore/architecture/storage/flat_storage.html
- Approved NEP: https://github.com/near/NEPs/blob/master/neps/nep-0339.md
- Tracking issue: #7327

## Testing and QA

* Flat storage structs are covered by unit tests;
* Integration tests check that chain behaviour is preserved and costs are changed as expected;
* Flat storage spent ~2 months in betanet with assertion that flat and trie `ValueRef`s are the same;
* We were running testnet/mainnet nodes for ~2 months with the same assertion. We checked that performance is not degraded, see e.g. https://nearinc.grafana.net/d/Vg9SREA4k/flat-storage-test?orgId=1&var-chain_id=mainnet&var-node_id=logunov-mainnet-fs-1&from=1677804289279&to=1678088806154 checking that even with finality lag of 50 blocks performance is not impacted. Small exception is that we updated data layout several times during development, but we checked that results are unchanged.

## Checklist
- [x] Include compute costs after they are merged - #8924
- [x] https://nayduck.near.org/#/run/2916
- [x] Update CHANGELOG.md to include this protocol feature in the `Unreleased` section.
nikurt pushed a commit that referenced this issue Apr 25, 2023
The code is literally removing `protocol_feature_flat_state` and moving feature to stable protocol. We also disable `test_state_sync` as this is part of refactor we can do in Q2.

## Feature to stabilize

Here we stabilize Flat Storage for reads, which means that all state reads in the client during block processing will query flat storage instead of Trie. Flat Storage is another index for blockchain state, reducing number of DB accesses for state read from `2 * key.len()` in the worst case to 2.

This will trigger background creation of flat storage, using 8 threads and finishing in 15h for RPC node and 2d for archival node. After that all non-contract reads will go through flat storage, for which special "chunk views" will be created. When protocol upgrade happens, contracts reads will go through flat storage as well. Also compute costs will change as Option 3 suggests [here](#8006 (comment)). It is to be merged separately, but we need to ensure that both costs change and flat storage go into next release together.

## Context

Find more details in:
- Overview: https://near.github.io/nearcore/architecture/storage/flat_storage.html
- Approved NEP: https://github.com/near/NEPs/blob/master/neps/nep-0339.md
- Tracking issue: #7327

## Testing and QA

* Flat storage structs are covered by unit tests;
* Integration tests check that chain behaviour is preserved and costs are changed as expected;
* Flat storage spent ~2 months in betanet with assertion that flat and trie `ValueRef`s are the same;
* We were running testnet/mainnet nodes for ~2 months with the same assertion. We checked that performance is not degraded, see e.g. https://nearinc.grafana.net/d/Vg9SREA4k/flat-storage-test?orgId=1&var-chain_id=mainnet&var-node_id=logunov-mainnet-fs-1&from=1677804289279&to=1678088806154 checking that even with finality lag of 50 blocks performance is not impacted. Small exception is that we updated data layout several times during development, but we checked that results are unchanged.

## Checklist
- [x] Include compute costs after they are merged - #8924
- [x] https://nayduck.near.org/#/run/2916
- [x] Update CHANGELOG.md to include this protocol feature in the `Unreleased` section.
nikurt pushed a commit that referenced this issue Apr 25, 2023
The code is literally removing `protocol_feature_flat_state` and moving feature to stable protocol. We also disable `test_state_sync` as this is part of refactor we can do in Q2.

## Feature to stabilize

Here we stabilize Flat Storage for reads, which means that all state reads in the client during block processing will query flat storage instead of Trie. Flat Storage is another index for blockchain state, reducing number of DB accesses for state read from `2 * key.len()` in the worst case to 2.

This will trigger background creation of flat storage, using 8 threads and finishing in 15h for RPC node and 2d for archival node. After that all non-contract reads will go through flat storage, for which special "chunk views" will be created. When protocol upgrade happens, contracts reads will go through flat storage as well. Also compute costs will change as Option 3 suggests [here](#8006 (comment)). It is to be merged separately, but we need to ensure that both costs change and flat storage go into next release together.

## Context

Find more details in:
- Overview: https://near.github.io/nearcore/architecture/storage/flat_storage.html
- Approved NEP: https://github.com/near/NEPs/blob/master/neps/nep-0339.md
- Tracking issue: #7327

## Testing and QA

* Flat storage structs are covered by unit tests;
* Integration tests check that chain behaviour is preserved and costs are changed as expected;
* Flat storage spent ~2 months in betanet with assertion that flat and trie `ValueRef`s are the same;
* We were running testnet/mainnet nodes for ~2 months with the same assertion. We checked that performance is not degraded, see e.g. https://nearinc.grafana.net/d/Vg9SREA4k/flat-storage-test?orgId=1&var-chain_id=mainnet&var-node_id=logunov-mainnet-fs-1&from=1677804289279&to=1678088806154 checking that even with finality lag of 50 blocks performance is not impacted. Small exception is that we updated data layout several times during development, but we checked that results are unchanged.

## Checklist
- [x] Include compute costs after they are merged - #8924
- [x] https://nayduck.near.org/#/run/2916
- [x] Update CHANGELOG.md to include this protocol feature in the `Unreleased` section.
nikurt pushed a commit that referenced this issue Apr 28, 2023
The code is literally removing `protocol_feature_flat_state` and moving feature to stable protocol. We also disable `test_state_sync` as this is part of refactor we can do in Q2.

## Feature to stabilize

Here we stabilize Flat Storage for reads, which means that all state reads in the client during block processing will query flat storage instead of Trie. Flat Storage is another index for blockchain state, reducing number of DB accesses for state read from `2 * key.len()` in the worst case to 2.

This will trigger background creation of flat storage, using 8 threads and finishing in 15h for RPC node and 2d for archival node. After that all non-contract reads will go through flat storage, for which special "chunk views" will be created. When protocol upgrade happens, contracts reads will go through flat storage as well. Also compute costs will change as Option 3 suggests [here](#8006 (comment)). It is to be merged separately, but we need to ensure that both costs change and flat storage go into next release together.

## Context

Find more details in:
- Overview: https://near.github.io/nearcore/architecture/storage/flat_storage.html
- Approved NEP: https://github.com/near/NEPs/blob/master/neps/nep-0339.md
- Tracking issue: #7327

## Testing and QA

* Flat storage structs are covered by unit tests;
* Integration tests check that chain behaviour is preserved and costs are changed as expected;
* Flat storage spent ~2 months in betanet with assertion that flat and trie `ValueRef`s are the same;
* We were running testnet/mainnet nodes for ~2 months with the same assertion. We checked that performance is not degraded, see e.g. https://nearinc.grafana.net/d/Vg9SREA4k/flat-storage-test?orgId=1&var-chain_id=mainnet&var-node_id=logunov-mainnet-fs-1&from=1677804289279&to=1678088806154 checking that even with finality lag of 50 blocks performance is not impacted. Small exception is that we updated data layout several times during development, but we checked that results are unchanged.

## Checklist
- [x] Include compute costs after they are merged - #8924
- [x] https://nayduck.near.org/#/run/2916
- [x] Update CHANGELOG.md to include this protocol feature in the `Unreleased` section.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-storage Area: storage and databases C-tracking-issue Category: a tracking issue
Projects
None yet
Development

No branches or pull requests

5 participants