-
Notifications
You must be signed in to change notification settings - Fork 96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable DynamicHoneyBadgers to rejoin after connection loss #366
Conversation
I need to think about the problem more. The part that I understood could be done at the networking layer as I wrote in #365. |
Did you try out whether this change would actually work, even after some nodes joined and left? My concern is this: |
Regarding the tests, our plans in the long term are to replace |
Oh, I didn't notice that, I always thought of them as two different things (era for key sets, epoch for rounds in an era). Actually my use case stays at era 0 forever, so I never tried to get the current era 😄 I just wanted to use the queuing HB. Since that's probably not the standard use case there should be two "getters" imo: one for the era and one for the epoch. If that's too much of a change for you I'm ok with just forking and maintaining that diff myself and maybe try to build my own queuing HB on top of HB instead of dynamic HB later. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PR is good to be merged. Let's however make it self-contained by not committing to a specific design in a comment.
I can't see at the moment how manually setting the epoch on construction could result in an invalid state (you may end up with a confused honey badger, but not an invalid one). So I guess this is okay? When rejoining after a connection loss, would it not be prudent to keep the old algorithm instance around? After all, it's built to be asynchronous; as @vkomenda said, it sounds like a networking problem (the honey badger would ideally sort itself out). Crashing the program is a different matter however, so there's definitely merit in recovering from that by storing state and picking up where one left off. I'll leave it up @afck and @vkomenda to judge if these changes make sense, purely from an implementation standpoint they look good to me =). @sgeisler: We recently (just now!) changed the API regarding the passing of random number generators, those lines are in close proximity to your changes, so you will have to rebase. Sorry. Also, the |
Implementing an epoch setter for the `DynamicHoneyBadgerBuilder` enables the creation of a `DynamicHoneyBadger` that will join the consensus at a given epoch.
@mbr: Yes, in case of a short-term connection loss, the Honey Badger should just keep going! The application logic must make sure that all messages are delivered after reconnecting. I'm fine with merging this, if @vkomenda is. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good to be merged.
Implementing an epoch setter for the
DynamicHoneyBadgerBuilder
enables the creation of aDynamicHoneyBadger
that will join the consensus at a given epoch (fixes #365 ).I'm not yet familiar enough with your code base to know what testing would be expected for that change and where to put the tests. You seem to mainly rely on integration tests, so I guess I could patch
TestNetwork
to support late joiners and then add a test with one?TODO: