Sync'ing a large chain #26

abourget · 2020-05-14T15:11:52Z

See https://docs.dfuse.io/eosio/admin-guide/large-chains-preparation/

shaqk · 2020-05-29T09:57:59Z

If there is a hole in the merged-blocks, it's not reported as an error/warning and it looks like nothing is working anymore as the indexing apps are waiting for a 100 blocks file that does not exists (and will never will). Doing a tail dfuse-data/dfuse.log.json shows up that the bstream filesource is waiting for a bundle that does not exist. We should probably think about some way to improve that, after a while the file is not found, could we perform a list files to look if there is any "future" files and as such, detect the hole here?

A good improvement would be to auto detect holes (based on "future" files) in the merged-blocks directory either after encountering a filesource error in phase2 (after few retries) and/or at start of phase2

Once the missing merged-block ranges are identified, dfuseeos can stop and re-launch phase1 with the appropriate flags to create the missing merged-blocks (Might need sequential/parallel runs if more than one missing range)

Phase 2 should kick off again after missing merged-block ranges are created.

abourget · 2020-05-29T13:58:21Z

The phase producing merged-blocks could simply try to be more resilient, and have retries (but that could be hairy, starting say 100,000 blocks before because of a failed merge, in a loop, if there's an issue).

I think consumers (FileSources) shouldn't bother with warning, because they are powerless in fixing the issue.. and those warnings would be all over the place.

Currently, dfuseeos is not aware of phases, they are operator concepts.

The thing is, in theory, there should be no holes in merged blocks once a pass went over. Otherwise, there's either: a bug that needs to be fixed (in mindreader with --...-store-directly, or the merger), or an operational error, like forgetting to run a certain block range.

sduchesneau · 2020-05-29T14:05:20Z

@abourget @shaqk when the phase 1 is running with "mindreader-merge-and-store-directly=true", we usually are doing this in parallel with a bunch of vms, and we discard those afterwards.

But for a user who does this on a smaller scale, stoping mindreader after that phase will create a hole between the last "directly-merged-file" (ex: 99,900->99,999) and the next few blocks that passed during phase 1 (block 100,000 and following are never written anywhere with 'merge-and-store-directly'.

I'm working on a "cleanup()" call on the Archiver interface, where a reproc-archiver could dump its buffer to the DefaultArchiver, creating one-block-files for those few blocks close to head. This way, restarting mindreader in a "normal mode" afterwards would not create a hole, as the merger could take over at that point.

abourget · 2020-07-21T20:33:41Z

See https://docs.dfuse.io/eosio/admin-guide/large-chains-preparation/ now

sduchesneau mentioned this issue May 29, 2020

prevent gap after restart when using mindreader direct-merge #47

Merged

This was referenced Jul 3, 2020

found a hole in a oneblock files #80

Closed

Improve mindreader/merger behavior when syncing a large chain from block 1 #81

Closed

abourget closed this as completed Jul 21, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sync'ing a large chain #26

Sync'ing a large chain #26

abourget commented May 14, 2020 •

edited

Loading

shaqk commented May 29, 2020 •

edited

Loading

abourget commented May 29, 2020

sduchesneau commented May 29, 2020

abourget commented Jul 21, 2020

Sync'ing a large chain #26

Sync'ing a large chain #26

Comments

abourget commented May 14, 2020 • edited Loading

shaqk commented May 29, 2020 • edited Loading

abourget commented May 29, 2020

sduchesneau commented May 29, 2020

abourget commented Jul 21, 2020

abourget commented May 14, 2020 •

edited

Loading

shaqk commented May 29, 2020 •

edited

Loading