-
Notifications
You must be signed in to change notification settings - Fork 11.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[indexer] index protocol configs and feature flags #18450
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎
3 Skipped Deployments
|
45218c2
to
61ddce9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! My main pending question is about the relationship between the task that's responsible for filling the protocol config table and the checkpoint watermark -- it seemed to me like they were decoupled (which we don't want, if that's the case), but I am by no means the expert there.
while { | ||
chain_id_opt = self.get_chain_identifier()?; | ||
chain_id_opt.is_none() | ||
} {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The busy wait here is a little heavyweight, can we add some small sleep duration to avoid the service hammering the DB?
|
||
// We only store protocol configs until the version for the latest epoch, because config | ||
// values for future epochs, even when they already exist, are subject to change. | ||
let end_version = self.get_latest_epoch_protocol_version()?.unwrap_or(1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: The two reads from the DB could be coalesced into one, that returns both latest protocol versions (that may also address the potential confusion between the two that you mention in the doc comments).
crates/sui-indexer/src/indexer.rs
Outdated
@@ -108,6 +108,10 @@ impl Indexer { | |||
) | |||
.await?; | |||
|
|||
// Index protocol configs for protocol versions not yet in the db | |||
let store_clone = store.clone(); | |||
spawn_monitored_task!(store_clone.persist_protocol_configs_and_feature_flags()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this mean that protocol configs are filled by a parallel process (i.e. not adhering/contributing to the same watermark as the "main" indexing task? That may pose an issue for consistency (where someone issues a query for an epoch and its protocol config and gets one but not the other).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand your concern. This issue primarily arises during the startup of an indexer instance. Specifically, the persist_protocol_configs_and_feature_flags
call at epoch change time is not a parallel process. However, if a user makes a query right after the indexer restarts after being down for more than an epoch, they could encounter this problem.
I agree that the process should be blocking and completed before we start indexing everything else. I have attempted to implement this, but ran into an issue. In my setup, we read the chain identifier from the database in persist_protocol_configs_and_feature_flags
. This identifier is the digest of the first checkpoint, which requires the indexing of the first checkpoint to be completed in the database. That's why we wait in that while loop for the other process to finish indexing checkpoint 0.
Ideally, the process should follow this sequential order:
- Index the first checkpoint, if it has not been indexed yet.
- Call
persist_protocol_configs_and_feature_flags
. - Index the rest of the checkpoints.
One potential solution to alleviate this issue is to perform the process in parallel only if the watermark is 0. This way, we only encounter the consistency issue at the beginning with an empty database, which no user will ever encounter. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to pass the chain identifier to the catch-up process in memory rather than having the job fetch it from the DB? That's one way I can think of to keep the reads consistent always while maintaining this read order.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An alternative is to add a configuration parameter to the indexer that sets which chain it is supposed to be indexing for, which is then used to initialise the protocol config table. That is simpler to implement but then we'll need to update all the deployment configs, and it adds a layer of complexity for starting up the indexer as well, I suppose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if a user makes a query right after the indexer restarts after being down for more than an epoch, they could encounter this problem
is the problem about protocol config being outdated on indexer, isn't that expected as all other tables are outdated too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue is not so much about it being out-of-date, it's about it being out-of-sync: They could query for an epoch and its protocol config and the protocol config not exist in the DB yet because it's being filled in parallel.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've changed the code to pass in the chain identifier, and to index protocol configs at the same time we index checkpoint 0, so no busy waiting or separate process anymore.
crates/sui-indexer/src/indexer.rs
Outdated
@@ -108,6 +108,10 @@ impl Indexer { | |||
) | |||
.await?; | |||
|
|||
// Index protocol configs for protocol versions not yet in the db | |||
let store_clone = store.clone(); | |||
spawn_monitored_task!(store_clone.persist_protocol_configs_and_feature_flags()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if a user makes a query right after the indexer restarts after being down for more than an epoch, they could encounter this problem
is the problem about protocol config being outdated on indexer, isn't that expected as all other tables are outdated too?
read_only_blocking!(&self.blocking_cp, |conn| { | ||
checkpoints::dsl::checkpoints | ||
.select(checkpoints::checkpoint_digest) | ||
.filter(checkpoints::sequence_number.eq(0)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this always reads the sequence_number.eq(0)
, what if we have pruned this or restore only the latest 30D data?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes this is indeed an issue. We read from checkpoint 0 for chain identifier in other queries too. I think the right thing to do here is to have a separate chain_identifier
singleton table that stores just the chain identifier and never gets pruned away. Wdyt? @gegaowp
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed in #18825
|
||
// We only store protocol configs until the version for the latest epoch, because config | ||
// values for future epochs, even when they already exist, are subject to change. | ||
let end_version = self.get_latest_epoch_protocol_version()?.unwrap_or(1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the end_version
can be derived from epoch_data
and passed to this function in committer
, so that we can avoid counting on the DB read here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In addition to epoch change time, we also call persist_protocol_configs_and_feature_flags
at indexer startup time, where epoch_data
is not available.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we can avoid this call at indexer startup time then it looks like we can pass the value in?
); | ||
|
||
// Gather all protocol configs and feature flags for all versions between start and end. | ||
for version in start_version..=end_version { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this for
for backfill, in other words, if indexer starts along with genesis, here each time we will always run only once?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if indexer starts along with genesis, here each time we will always run only once?
Yes. If indexer is in sync, this line will be run once per epoch so each time the loop will iterate at most once.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok i see, i suppose if we don't do this at indexer startup, then the first time we hit this loop we may be doing this for a while
ef8a189
to
aa76e71
Compare
61ddce9
to
53a0188
Compare
// If we already have chain identifier indexed (i.e. the first checkpoint has been indexed), | ||
// then we persist protocol configs for protocol versions not yet in the db. | ||
// Otherwise, we would do the persisting in `commit_checkpoint` while the first cp is | ||
// being indexed. | ||
if let Some(chain_id) = store.get_chain_identifier().await? { | ||
store.persist_protocol_configs_and_feature_flags(chain_id)?; | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this really necessary since it'll get picked up on the committer side?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The target use case for this one is for when an up-to-date db (like our backfill dbs) wants to add protocol configs tables for the first time. If we don't do it here, we'll need to wait until epoch change.
|
||
// We only store protocol configs until the version for the latest epoch, because config | ||
// values for future epochs, even when they already exist, are subject to change. | ||
let end_version = self.get_latest_epoch_protocol_version()?.unwrap_or(1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we can avoid this call at indexer startup time then it looks like we can pass the value in?
|
||
// We only store protocol configs until the version for the latest epoch, because config | ||
// values for future epochs, even when they already exist, are subject to change. | ||
let end_version = self.get_latest_epoch_protocol_version()?.unwrap_or(1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe a comment explaining the diff between this and start_version
?
); | ||
|
||
// Gather all protocol configs and feature flags for all versions between start and end. | ||
for version in start_version..=end_version { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok i see, i suppose if we don't do this at indexer startup, then the first time we hit this loop we may be doing this for a while
if is_epoch_end { | ||
// The epoch has advanced so we update the configs for the new protocol version, if it has changed. | ||
let chain_id = state | ||
.get_chain_identifier() | ||
.await | ||
.expect("Failed to get chain identifier") | ||
.expect("Chain identifier should have been indexed at this point"); | ||
let _ = state.persist_protocol_configs_and_feature_flags(chain_id); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we do this before we persist_checkpoints
, maybe right after state.advance_epoch
under if let Some(epoch_data) = epoch
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We actually have to do this after persist_checkpoints
, in the event where the first epoch is committed before the first checkpoints. In that scenario, chain identifier doesn't exist yet in the db when we advance epoch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some nits, but looks good!
@@ -30,7 +30,9 @@ pub(crate) struct ProtocolConfigFeatureFlag { | |||
|
|||
#[derive(Clone, Debug)] | |||
pub(crate) struct ProtocolConfigs { | |||
native: NativeProtocolConfig, | |||
version: u64, | |||
configs: BTreeMap<String, Option<String>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need the Option
here? (Maybe the rest of the PR will answer this question for me).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Attempting to answer this for myself -- our GraphQL schema differentiates between a config that doesn't exist at any version, and a config that exists but hasn't been set, so that's why we need this Option
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Close. But it's not due to GraphQL differentiating that - protocol config values themselves are Option
s, likely due to the reason you mentioned.
)) | ||
})?; | ||
|
||
let version = protocol_version.unwrap_or(latest_version as u64); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code always issues the read to get the latest protocol version, even if you don't use it, you can avoid that with:
let version = if let Some(version) = protocol_version {
version
} else {
let latest: i64 = db
.execute(move |conn| { /* ... */ })
.await
.map_err(|e| { /* ... */ })?;
latest as u64
}
|
||
let version = protocol_version.unwrap_or(latest_version as u64); | ||
|
||
let configs: BTreeMap<String, Option<String>> = db |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a neat way to get all this data in a single DB query? We should at least re-use the connection.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could implement a DataLoader for this, but that's not a blocker for this PR.
97ee85e
to
838381a
Compare
0e02187
to
b39c45c
Compare
b39c45c
to
d4b6fee
Compare
## Description This PR adds two tables `protocol_configs` and `feature_flags` that get populated at indexer startup time and every epoch change time if protocol version has changed. ## Test plan Tested locally against devnet. --- ## Release notes Check each box that your changes affect. If none of the boxes relate to your changes, release notes aren't required. For each box you select, include information after the relevant heading that describes the impact of your changes that a user might notice and any actions they must take to implement updates. - [ ] Protocol: - [ ] Nodes (Validators and Full nodes): - [x] Indexer: adds two new indexer tables that stores protocol configs and features flags of different versions. - [ ] JSON-RPC: - [x] GraphQL: uses the stored data to query for protocol configs instead of native configs stored in the binary. - [ ] CLI: - [ ] Rust SDK:
## Description This PR adds two tables `protocol_configs` and `feature_flags` that get populated at indexer startup time and every epoch change time if protocol version has changed. ## Test plan Tested locally against devnet. --- ## Release notes Check each box that your changes affect. If none of the boxes relate to your changes, release notes aren't required. For each box you select, include information after the relevant heading that describes the impact of your changes that a user might notice and any actions they must take to implement updates. - [ ] Protocol: - [ ] Nodes (Validators and Full nodes): - [x] Indexer: adds two new indexer tables that stores protocol configs and features flags of different versions. - [ ] JSON-RPC: - [x] GraphQL: uses the stored data to query for protocol configs instead of native configs stored in the binary. - [ ] CLI: - [ ] Rust SDK:
Description
This PR adds two tables
protocol_configs
andfeature_flags
that get populated at indexer startup time and every epoch change time if protocol version has changed.Test plan
Tested locally against devnet.
Release notes
Check each box that your changes affect. If none of the boxes relate to your changes, release notes aren't required.
For each box you select, include information after the relevant heading that describes the impact of your changes that a user might notice and any actions they must take to implement updates.