Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Max retries support for existing restart strategies #157

Merged
merged 10 commits into from
Jan 22, 2020
Merged

Max retries support for existing restart strategies #157

merged 10 commits into from
Jan 22, 2020

Conversation

Relrin
Copy link
Member

@Relrin Relrin commented Jan 21, 2020

The following pull request contains the changes for supporting max retries for the failed actors. It covers the #105 and the #146 issues for the bastion project.

By default the restart strategy doesn't have any limits in restating the failed actor, but can be overridden via the special call (which is the .with_max_retries).

Checklist
  • tests are passing with cargo test.
  • tests and/or benchmarks are included
  • documentation is changed or added
  • commit message is clear
  • added an example of the usage

@o0Ignition0o
Copy link
Contributor

Thanks for your pull request!

It looks great overall, And I'm really pleased to see the new example and the docs update.
I won't be able to look at it before tomorrow, but maybe @vertexclique will :)

@Relrin
Copy link
Member Author

Relrin commented Jan 21, 2020

This PR is still in progress and restricted by the following issue:
The following piece of code contains the separate logic for restoring the single actor and the range. I'd like to use the single call for restoring a failed actor. Any ideas how to reorganize signature of the async fn restart(&mut self, range: RangeFrom<usize>) method, so that I could write as in the following snippet in async fn recover(&mut self, id: BastionId) method:

match self.strategy {
    SupervisionStrategy::OneForOne => {
        // expected call 
        self.restart(id..id+1).await; 
    }
    SupervisionStrategy::OneForAll => {
        self.restart(0..).await;


        self.stopped.shrink_to_fit();
        self.killed.shrink_to_fit();
    }
    SupervisionStrategy::RestForOne => {
        let (start, _, _) = self.launched.get(&id).ok_or(())?;
        let start = *start;

        self.restart(start..).await;
    }
}

@o0Ignition0o
Copy link
Contributor

o0Ignition0o commented Jan 21, 2020

That’s a pretty tricky question! Off the top of my head the function could take an into iterator over bastionId so we could invoke it with std::iter::once(id) or with a range, but I’m not quite sure it’s worth the hassle.

(Sorry I’m not at the computer right now so I can’t develop much more on that -pun intended-)

Edit https://doc.rust-lang.org/std/iter/index.html#implementing-iterator would allow us to use a bastionId or a range seamlessly. Not sure if worth exploring though :)

@vertexclique
Copy link
Member

One more idea would be:

  • Seek in the range
  • Use splice to extract the chunk that contains our one and the only.
  • Yield to your method with dereffed chunk and use 0 on that (or directly use first).

Will take a look @ home again.

@Relrin
Copy link
Member Author

Relrin commented Jan 21, 2020

One more idea would be:

  • Seek in the range
  • Use splice to extract the chunk that contains our one and the only.
  • Yield to your method with dereffed chunk and use 0 on that (or directly use first).

I'm also thinking about the following solution:

async fn recover(&mut self, id: BastionId) -> Result<(), ()> {
        debug!(
            "Supervisor({}): Recovering using strategy: {:?}",
            self.id(),
            self.strategy
        );
        match self.strategy {
            SupervisionStrategy::OneForOne => {
                let (index, _, _) = self.launched.remove(&id).ok_or(())?;
                let value = self.order.remove(index);
                self.order.push(value);
                let start = self.order.len();
                self.restart(start..).await;
            }
            SupervisionStrategy::OneForAll => {
                self.restart(0..).await;

                // TODO: should be empty
                self.stopped.shrink_to_fit();
                self.killed.shrink_to_fit();
            }
            SupervisionStrategy::RestForOne => {
                let (start, _, _) = self.launched.get(&id).ok_or(())?;
                let start = *start;

                self.restart(start..).await;
            }
        }

        Ok(())
    }

It will work, however, I'm not sure that will be correct because we're doing pop from the some place in the vector tracked by the supervisor and then pushing in the end. Is it ok for the SupervisionStrategy::RestForOne strategy? I guess for covering this case, we will need to update all the rest of the indexes (by the minus one) for each value added after this particular index.

Another approach which is more or less reliable is to replace the RangeFrom<T> type onto the std::ops::Range<T> type. And for any of restart policies we actually know what to update (e.g. (start..self.order.len()) for the OneForAll or the RestForOne strategies). The only one disadvantage of this approach is having potential breaking changes in existing private API.

@Relrin
Copy link
Member Author

Relrin commented Jan 22, 2020

I have decided to change the signature of the restart/kill/stop methods, so that it will use the Range<T> instead of the RangeFrom<T> type. After these changes, everything works as expected. :)

Copy link
Member

@vertexclique vertexclique left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this looks good, just I have suggested couple of small nits may be useful for the user point of view.

}

/// Sets the limit of attempts for restoring failed actors.
pub fn with_max_restarts(mut self, max_restarts: Option<usize>) -> Self {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have surface API improvement:

  • Let's make this isize
  • isize = !0 means that unlimited restarts.
  • isize > 0 means that use given restart count.
  • isize = 0 means that don't restart.

What do you think @o0Ignition0o and @Relrin ?

Copy link
Member Author

@Relrin Relrin Jan 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps it worth to reorganize it into enum (for better checks in match expressions), like this:

pub enum MaxRestartsPolicy {
    Unlimited,
    Limit(usize)
    DontRestart,
}

It looks better and easier to understand intentions of other devs from code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An enum sounds great
Maybe this naming would convey the meaning more accurately:

pub enum RestartPolicy {
    Always,
    Never,
    Tries(usize)
}

pub fn with_restarts(mut self, restarts: RestartPolicy) -> Self {}

What do you think ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me. I will refactor code with this enum

use std::time::Duration;

use bastion::prelude::*;

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We mostly write a prelude intro to our examples to explain where an example can be used for and what's the intent. I would like to have that. You might want to take a gaze to this one:
e.g. https://github.com/bastion-rs/bastion/blob/master/bastion/examples/parallel_computation.rs#L4

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

/// # timeout: Duration::from_secs(5)
/// # };
/// # let restart_strategy = RestartStrategy::default()
/// # .with_actor_restart_strategy(actor_restart_strategy);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This just tests the initiation, that's nice but we need an actual test for this case in tests/ I think:
https://github.com/bastion-rs/bastion/tree/master/bastion/tests

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Member

@vertexclique vertexclique left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you and welcome to our max retries implementation! 🎉

@vertexclique vertexclique merged commit 14c51d6 into bastion-rs:master Jan 22, 2020
@Relrin Relrin deleted the feature-max-retries branch January 22, 2020 21:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants