-
-
Notifications
You must be signed in to change notification settings - Fork 102
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Max retries support for existing restart strategies #157
Conversation
Thanks for your pull request! It looks great overall, And I'm really pleased to see the new example and the docs update. |
This PR is still in progress and restricted by the following issue: match self.strategy {
SupervisionStrategy::OneForOne => {
// expected call
self.restart(id..id+1).await;
}
SupervisionStrategy::OneForAll => {
self.restart(0..).await;
self.stopped.shrink_to_fit();
self.killed.shrink_to_fit();
}
SupervisionStrategy::RestForOne => {
let (start, _, _) = self.launched.get(&id).ok_or(())?;
let start = *start;
self.restart(start..).await;
}
} |
That’s a pretty tricky question! Off the top of my head the function could take an into iterator over bastionId so we could invoke it with std::iter::once(id) or with a range, but I’m not quite sure it’s worth the hassle. (Sorry I’m not at the computer right now so I can’t develop much more on that -pun intended-) Edit https://doc.rust-lang.org/std/iter/index.html#implementing-iterator would allow us to use a bastionId or a range seamlessly. Not sure if worth exploring though :) |
One more idea would be:
Will take a look @ home again. |
I'm also thinking about the following solution: async fn recover(&mut self, id: BastionId) -> Result<(), ()> {
debug!(
"Supervisor({}): Recovering using strategy: {:?}",
self.id(),
self.strategy
);
match self.strategy {
SupervisionStrategy::OneForOne => {
let (index, _, _) = self.launched.remove(&id).ok_or(())?;
let value = self.order.remove(index);
self.order.push(value);
let start = self.order.len();
self.restart(start..).await;
}
SupervisionStrategy::OneForAll => {
self.restart(0..).await;
// TODO: should be empty
self.stopped.shrink_to_fit();
self.killed.shrink_to_fit();
}
SupervisionStrategy::RestForOne => {
let (start, _, _) = self.launched.get(&id).ok_or(())?;
let start = *start;
self.restart(start..).await;
}
}
Ok(())
} It will work, however, I'm not sure that will be correct because we're doing pop from the some place in the vector tracked by the supervisor and then pushing in the end. Is it ok for the Another approach which is more or less reliable is to replace the |
I have decided to change the signature of the restart/kill/stop methods, so that it will use the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall this looks good, just I have suggested couple of small nits may be useful for the user point of view.
bastion/src/supervisor.rs
Outdated
} | ||
|
||
/// Sets the limit of attempts for restoring failed actors. | ||
pub fn with_max_restarts(mut self, max_restarts: Option<usize>) -> Self { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have surface API improvement:
- Let's make this isize
- isize = !0 means that unlimited restarts.
- isize > 0 means that use given restart count.
- isize = 0 means that don't restart.
What do you think @o0Ignition0o and @Relrin ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps it worth to reorganize it into enum (for better checks in match expressions), like this:
pub enum MaxRestartsPolicy {
Unlimited,
Limit(usize)
DontRestart,
}
It looks better and easier to understand intentions of other devs from code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An enum sounds great
Maybe this naming would convey the meaning more accurately:
pub enum RestartPolicy {
Always,
Never,
Tries(usize)
}
pub fn with_restarts(mut self, restarts: RestartPolicy) -> Self {}
What do you think ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good to me. I will refactor code with this enum
use std::time::Duration; | ||
|
||
use bastion::prelude::*; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We mostly write a prelude intro to our examples to explain where an example can be used for and what's the intent. I would like to have that. You might want to take a gaze to this one:
e.g. https://github.com/bastion-rs/bastion/blob/master/bastion/examples/parallel_computation.rs#L4
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
/// # timeout: Duration::from_secs(5) | ||
/// # }; | ||
/// # let restart_strategy = RestartStrategy::default() | ||
/// # .with_actor_restart_strategy(actor_restart_strategy); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This just tests the initiation, that's nice but we need an actual test for this case in tests/
I think:
https://github.com/bastion-rs/bastion/tree/master/bastion/tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you and welcome to our max retries implementation! 🎉
The following pull request contains the changes for supporting max retries for the failed actors. It covers the #105 and the #146 issues for the bastion project.
By default the restart strategy doesn't have any limits in restating the failed actor, but can be overridden via the special call (which is the
.with_max_retries
).Checklist
cargo test
.