Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[indexer] index protocol configs and feature flags #18450

Merged
merged 7 commits into from
Aug 21, 2024

Conversation

emmazzz
Copy link
Contributor

@emmazzz emmazzz commented Jun 28, 2024

Description

This PR adds two tables protocol_configs and feature_flags that get populated at indexer startup time and every epoch change time if protocol version has changed.

Test plan

Tested locally against devnet.


Release notes

Check each box that your changes affect. If none of the boxes relate to your changes, release notes aren't required.

For each box you select, include information after the relevant heading that describes the impact of your changes that a user might notice and any actions they must take to implement updates.

  • Protocol:
  • Nodes (Validators and Full nodes):
  • Indexer: adds two new indexer tables that stores protocol configs and features flags of different versions.
  • JSON-RPC:
  • GraphQL: uses the stored data to query for protocol configs instead of native configs stored in the binary.
  • CLI:
  • Rust SDK:

Copy link

vercel bot commented Jun 28, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
sui-docs ✅ Ready (Inspect) Visit Preview 💬 Add feedback Aug 21, 2024 6:04pm
3 Skipped Deployments
Name Status Preview Comments Updated (UTC)
multisig-toolkit ⬜️ Ignored (Inspect) Visit Preview Aug 21, 2024 6:04pm
sui-kiosk ⬜️ Ignored (Inspect) Visit Preview Aug 21, 2024 6:04pm
sui-typescript-docs ⬜️ Ignored (Inspect) Visit Preview Aug 21, 2024 6:04pm

Copy link
Member

@amnn amnn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! My main pending question is about the relationship between the task that's responsible for filling the protocol config table and the checkpoint watermark -- it seemed to me like they were decoupled (which we don't want, if that's the case), but I am by no means the expert there.

Comment on lines 1807 to 1810
while {
chain_id_opt = self.get_chain_identifier()?;
chain_id_opt.is_none()
} {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The busy wait here is a little heavyweight, can we add some small sleep duration to avoid the service hammering the DB?


// We only store protocol configs until the version for the latest epoch, because config
// values for future epochs, even when they already exist, are subject to change.
let end_version = self.get_latest_epoch_protocol_version()?.unwrap_or(1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: The two reads from the DB could be coalesced into one, that returns both latest protocol versions (that may also address the potential confusion between the two that you mention in the doc comments).

@@ -108,6 +108,10 @@ impl Indexer {
)
.await?;

// Index protocol configs for protocol versions not yet in the db
let store_clone = store.clone();
spawn_monitored_task!(store_clone.persist_protocol_configs_and_feature_flags());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean that protocol configs are filled by a parallel process (i.e. not adhering/contributing to the same watermark as the "main" indexing task? That may pose an issue for consistency (where someone issues a query for an epoch and its protocol config and gets one but not the other).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand your concern. This issue primarily arises during the startup of an indexer instance. Specifically, the persist_protocol_configs_and_feature_flags call at epoch change time is not a parallel process. However, if a user makes a query right after the indexer restarts after being down for more than an epoch, they could encounter this problem.

I agree that the process should be blocking and completed before we start indexing everything else. I have attempted to implement this, but ran into an issue. In my setup, we read the chain identifier from the database in persist_protocol_configs_and_feature_flags. This identifier is the digest of the first checkpoint, which requires the indexing of the first checkpoint to be completed in the database. That's why we wait in that while loop for the other process to finish indexing checkpoint 0.

Ideally, the process should follow this sequential order:

  • Index the first checkpoint, if it has not been indexed yet.
  • Call persist_protocol_configs_and_feature_flags.
  • Index the rest of the checkpoints.

One potential solution to alleviate this issue is to perform the process in parallel only if the watermark is 0. This way, we only encounter the consistency issue at the beginning with an empty database, which no user will ever encounter. What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to pass the chain identifier to the catch-up process in memory rather than having the job fetch it from the DB? That's one way I can think of to keep the reads consistent always while maintaining this read order.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An alternative is to add a configuration parameter to the indexer that sets which chain it is supposed to be indexing for, which is then used to initialise the protocol config table. That is simpler to implement but then we'll need to update all the deployment configs, and it adds a layer of complexity for starting up the indexer as well, I suppose.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if a user makes a query right after the indexer restarts after being down for more than an epoch, they could encounter this problem

is the problem about protocol config being outdated on indexer, isn't that expected as all other tables are outdated too?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue is not so much about it being out-of-date, it's about it being out-of-sync: They could query for an epoch and its protocol config and the protocol config not exist in the DB yet because it's being filled in parallel.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've changed the code to pass in the chain identifier, and to index protocol configs at the same time we index checkpoint 0, so no busy waiting or separate process anymore.

@@ -108,6 +108,10 @@ impl Indexer {
)
.await?;

// Index protocol configs for protocol versions not yet in the db
let store_clone = store.clone();
spawn_monitored_task!(store_clone.persist_protocol_configs_and_feature_flags());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if a user makes a query right after the indexer restarts after being down for more than an epoch, they could encounter this problem

is the problem about protocol config being outdated on indexer, isn't that expected as all other tables are outdated too?

read_only_blocking!(&self.blocking_cp, |conn| {
checkpoints::dsl::checkpoints
.select(checkpoints::checkpoint_digest)
.filter(checkpoints::sequence_number.eq(0))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this always reads the sequence_number.eq(0), what if we have pruned this or restore only the latest 30D data?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes this is indeed an issue. We read from checkpoint 0 for chain identifier in other queries too. I think the right thing to do here is to have a separate chain_identifier singleton table that stores just the chain identifier and never gets pruned away. Wdyt? @gegaowp

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in #18825


// We only store protocol configs until the version for the latest epoch, because config
// values for future epochs, even when they already exist, are subject to change.
let end_version = self.get_latest_epoch_protocol_version()?.unwrap_or(1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the end_version can be derived from epoch_data and passed to this function in committer, so that we can avoid counting on the DB read here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition to epoch change time, we also call persist_protocol_configs_and_feature_flags at indexer startup time, where epoch_data is not available.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we can avoid this call at indexer startup time then it looks like we can pass the value in?

);

// Gather all protocol configs and feature flags for all versions between start and end.
for version in start_version..=end_version {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this for for backfill, in other words, if indexer starts along with genesis, here each time we will always run only once?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if indexer starts along with genesis, here each time we will always run only once?

Yes. If indexer is in sync, this line will be run once per epoch so each time the loop will iterate at most once.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok i see, i suppose if we don't do this at indexer startup, then the first time we hit this loop we may be doing this for a while

@emmazzz emmazzz force-pushed the idx-breaking-change-park branch from ef8a189 to aa76e71 Compare August 3, 2024 00:54
@emmazzz emmazzz force-pushed the emma/add-protocol-configs branch from 61ddce9 to 53a0188 Compare August 7, 2024 13:13
@emmazzz emmazzz requested review from a team, ronny-mysten and mystenmark as code owners August 7, 2024 13:13
@emmazzz emmazzz changed the base branch from idx-breaking-change-park to main August 7, 2024 13:13
@emmazzz emmazzz removed request for a team, mystenmark and ronny-mysten August 7, 2024 13:13
Comment on lines +125 to +132
// If we already have chain identifier indexed (i.e. the first checkpoint has been indexed),
// then we persist protocol configs for protocol versions not yet in the db.
// Otherwise, we would do the persisting in `commit_checkpoint` while the first cp is
// being indexed.
if let Some(chain_id) = store.get_chain_identifier().await? {
store.persist_protocol_configs_and_feature_flags(chain_id)?;
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this really necessary since it'll get picked up on the committer side?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The target use case for this one is for when an up-to-date db (like our backfill dbs) wants to add protocol configs tables for the first time. If we don't do it here, we'll need to wait until epoch change.


// We only store protocol configs until the version for the latest epoch, because config
// values for future epochs, even when they already exist, are subject to change.
let end_version = self.get_latest_epoch_protocol_version()?.unwrap_or(1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we can avoid this call at indexer startup time then it looks like we can pass the value in?


// We only store protocol configs until the version for the latest epoch, because config
// values for future epochs, even when they already exist, are subject to change.
let end_version = self.get_latest_epoch_protocol_version()?.unwrap_or(1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe a comment explaining the diff between this and start_version?

);

// Gather all protocol configs and feature flags for all versions between start and end.
for version in start_version..=end_version {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok i see, i suppose if we don't do this at indexer startup, then the first time we hit this loop we may be doing this for a while

Comment on lines +182 to +198
if is_epoch_end {
// The epoch has advanced so we update the configs for the new protocol version, if it has changed.
let chain_id = state
.get_chain_identifier()
.await
.expect("Failed to get chain identifier")
.expect("Chain identifier should have been indexed at this point");
let _ = state.persist_protocol_configs_and_feature_flags(chain_id);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we do this before we persist_checkpoints, maybe right after state.advance_epoch under if let Some(epoch_data) = epoch?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We actually have to do this after persist_checkpoints, in the event where the first epoch is committed before the first checkpoints. In that scenario, chain identifier doesn't exist yet in the db when we advance epoch.

Copy link
Member

@amnn amnn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some nits, but looks good!

@@ -30,7 +30,9 @@ pub(crate) struct ProtocolConfigFeatureFlag {

#[derive(Clone, Debug)]
pub(crate) struct ProtocolConfigs {
native: NativeProtocolConfig,
version: u64,
configs: BTreeMap<String, Option<String>>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need the Option here? (Maybe the rest of the PR will answer this question for me).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Attempting to answer this for myself -- our GraphQL schema differentiates between a config that doesn't exist at any version, and a config that exists but hasn't been set, so that's why we need this Option).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Close. But it's not due to GraphQL differentiating that - protocol config values themselves are Options, likely due to the reason you mentioned.

))
})?;

let version = protocol_version.unwrap_or(latest_version as u64);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code always issues the read to get the latest protocol version, even if you don't use it, you can avoid that with:

let version = if let Some(version) = protocol_version {
    version
} else {
    let latest: i64 = db
        .execute(move |conn| { /* ... */ })
        .await
        .map_err(|e| { /* ... */ })?;
    latest as u64
}


let version = protocol_version.unwrap_or(latest_version as u64);

let configs: BTreeMap<String, Option<String>> = db
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a neat way to get all this data in a single DB query? We should at least re-use the connection.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could implement a DataLoader for this, but that's not a blocker for this PR.

@emmazzz emmazzz merged commit b1976e6 into main Aug 21, 2024
48 checks passed
@emmazzz emmazzz deleted the emma/add-protocol-configs branch August 21, 2024 20:34
tx-tomcat pushed a commit to tx-tomcat/sui-network that referenced this pull request Aug 27, 2024
## Description 

This PR adds two tables `protocol_configs` and `feature_flags` that get
populated at indexer startup time and every epoch change time if
protocol version has changed.

## Test plan 

Tested locally against devnet.

---

## Release notes

Check each box that your changes affect. If none of the boxes relate to
your changes, release notes aren't required.

For each box you select, include information after the relevant heading
that describes the impact of your changes that a user might notice and
any actions they must take to implement updates.

- [ ] Protocol: 
- [ ] Nodes (Validators and Full nodes): 
- [x] Indexer: adds two new indexer tables that stores protocol configs
and features flags of different versions.
- [ ] JSON-RPC: 
- [x] GraphQL: uses the stored data to query for protocol configs
instead of native configs stored in the binary.
- [ ] CLI: 
- [ ] Rust SDK:
suiwombat pushed a commit that referenced this pull request Sep 16, 2024
## Description 

This PR adds two tables `protocol_configs` and `feature_flags` that get
populated at indexer startup time and every epoch change time if
protocol version has changed.

## Test plan 

Tested locally against devnet.

---

## Release notes

Check each box that your changes affect. If none of the boxes relate to
your changes, release notes aren't required.

For each box you select, include information after the relevant heading
that describes the impact of your changes that a user might notice and
any actions they must take to implement updates.

- [ ] Protocol: 
- [ ] Nodes (Validators and Full nodes): 
- [x] Indexer: adds two new indexer tables that stores protocol configs
and features flags of different versions.
- [ ] JSON-RPC: 
- [x] GraphQL: uses the stored data to query for protocol configs
instead of native configs stored in the binary.
- [ ] CLI: 
- [ ] Rust SDK:
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants