-
Notifications
You must be signed in to change notification settings - Fork 503
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
services/horizon/ingest/filtering: Backfill low level design #4267
Comments
Is this really necessary? Can't we piggyback on the retention count cmdline flag? |
Here's an idea of how we can implement backfilling:
(1) could be replaced by a txmeta archive + index once we have it. Note that (1) needs to happen even when filtering isn't enabled, since, otherwise, backfilling won't work if filtering is enabled later on. |
@bartekn thoughts? |
yes, definitely, updated description to be clear about that usage. |
@2opremio , @bartekn , wondering if could remove the the |
Yes it would but I am not sure we can afford that (performance-wise). I also think that a stateless design is cleaner. BTW, we may also need to think about garbage collection (what if a filter is updated to not include certain accounts anymore, should we care about the data left behind?) In this case you I think do need state, otherwise you won't know what to gargage-collect if Horizon is restarted. |
Actually, it doesn't need to be enabled when filtering is disabled, because then everything is reingested and thus, there is nothing to backfill there. |
@jcx120 , I think we discussed this 'stale' or 'garbage' state as part of filter lifecycle earlier, but can find reference, can you confirm product expectation for when a filter rule scope is changed which reduces it's scope and therefore any already ingested history data that no longer fits within the filter rule scope becomes 'stale' but still left on history db, is that acceptable or does the 'stale' data need to be purged? |
@sreuland Purging of 'garbage' data (that are left behind due to a change in filter rules) can be, for this current phase, treated as a low priority / nice to have (and something we can come back to after all top priority features are implemented). The assumption is that for majority of users who apply filters, the dataset will be sufficiently small such that doing a clean re-ingest from scratch would be fast and cheap. |
This will be a very similar structure to indexes we've been working with @paulbellamy: they determine in which checkpoints a given account was active (also wrt payment/all operations and sucessfull/all txs). So this is less granular to what you propose (ledger vs checkpoint) but from Captive-Core perspective if you want to ingest a specific ledger you need to catchup to the last checkpoint before it and then apply ledgers so it should take the same amount of time to get to the ledger. So when (or if ever) these indexes will be ready in production (they are ready but not benchmarked/tested/checked for correctness) you won't need extra tables and the process will be simpler. |
This is good, in the sense that we can share the solution. However, I also think this may be a showstopper for the approach I proposed, since it may make backfilling too slow (due to the slowness of Captive core to jump around) So, we may need to wait for the implementation of a txmeta archive before backfilling can be sufficiently performant. |
What problem does your feature solve?
Design proposal for backfill behavior of filters.
What would you like to see?
A design that defines user experience with filter rule changes and backfill behaviors in horizon.
Low level design writeup for filtered data range behavior.
Acceptance Criteria:
--history-retention-count
Pre-requisites:
--history-retention-count
for backfill range.The text was updated successfully, but these errors were encountered: