Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sync'ing a large chain #26

Closed
abourget opened this issue May 14, 2020 · 4 comments
Closed

Sync'ing a large chain #26

abourget opened this issue May 14, 2020 · 4 comments

Comments

@abourget
Copy link
Contributor

abourget commented May 14, 2020

See https://docs.dfuse.io/eosio/admin-guide/large-chains-preparation/

@shaqk
Copy link
Contributor

shaqk commented May 29, 2020

If there is a hole in the merged-blocks, it's not reported as an error/warning and it looks like nothing is working anymore as the indexing apps are waiting for a 100 blocks file that does not exists (and will never will). Doing a tail dfuse-data/dfuse.log.json shows up that the bstream filesource is waiting for a bundle that does not exist. We should probably think about some way to improve that, after a while the file is not found, could we perform a list files to look if there is any "future" files and as such, detect the hole here?

A good improvement would be to auto detect holes (based on "future" files) in the merged-blocks directory either after encountering a filesource error in phase2 (after few retries) and/or at start of phase2

Once the missing merged-block ranges are identified, dfuseeos can stop and re-launch phase1 with the appropriate flags to create the missing merged-blocks (Might need sequential/parallel runs if more than one missing range)

Phase 2 should kick off again after missing merged-block ranges are created.

@abourget
Copy link
Contributor Author

The phase producing merged-blocks could simply try to be more resilient, and have retries (but that could be hairy, starting say 100,000 blocks before because of a failed merge, in a loop, if there's an issue).

I think consumers (FileSources) shouldn't bother with warning, because they are powerless in fixing the issue.. and those warnings would be all over the place.

Currently, dfuseeos is not aware of phases, they are operator concepts.

The thing is, in theory, there should be no holes in merged blocks once a pass went over. Otherwise, there's either: a bug that needs to be fixed (in mindreader with --...-store-directly, or the merger), or an operational error, like forgetting to run a certain block range.

@sduchesneau
Copy link
Contributor

@abourget @shaqk when the phase 1 is running with "mindreader-merge-and-store-directly=true", we usually are doing this in parallel with a bunch of vms, and we discard those afterwards.

But for a user who does this on a smaller scale, stoping mindreader after that phase will create a hole between the last "directly-merged-file" (ex: 99,900->99,999) and the next few blocks that passed during phase 1 (block 100,000 and following are never written anywhere with 'merge-and-store-directly'.

I'm working on a "cleanup()" call on the Archiver interface, where a reproc-archiver could dump its buffer to the DefaultArchiver, creating one-block-files for those few blocks close to head. This way, restarting mindreader in a "normal mode" afterwards would not create a hole, as the merger could take over at that point.

@abourget
Copy link
Contributor Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants