Max retries support for existing restart strategies #157

Relrin · 2020-01-21T16:47:30Z

The following pull request contains the changes for supporting max retries for the failed actors. It covers the #105 and the #146 issues for the bastion project.

By default the restart strategy doesn't have any limits in restating the failed actor, but can be overridden via the special call (which is the .with_max_retries).

Checklist

tests are passing with cargo test.
tests and/or benchmarks are included
documentation is changed or added
commit message is clear
added an example of the usage

o0Ignition0o · 2020-01-21T16:51:30Z

Thanks for your pull request!

It looks great overall, And I'm really pleased to see the new example and the docs update.
I won't be able to look at it before tomorrow, but maybe @vertexclique will :)

Relrin · 2020-01-21T16:53:00Z

This PR is still in progress and restricted by the following issue:
The following piece of code contains the separate logic for restoring the single actor and the range. I'd like to use the single call for restoring a failed actor. Any ideas how to reorganize signature of the async fn restart(&mut self, range: RangeFrom<usize>) method, so that I could write as in the following snippet in async fn recover(&mut self, id: BastionId) method:

match self.strategy {
    SupervisionStrategy::OneForOne => {
        // expected call 
        self.restart(id..id+1).await; 
    }
    SupervisionStrategy::OneForAll => {
        self.restart(0..).await;


        self.stopped.shrink_to_fit();
        self.killed.shrink_to_fit();
    }
    SupervisionStrategy::RestForOne => {
        let (start, _, _) = self.launched.get(&id).ok_or(())?;
        let start = *start;

        self.restart(start..).await;
    }
}

o0Ignition0o · 2020-01-21T17:22:40Z

That’s a pretty tricky question! Off the top of my head the function could take an into iterator over bastionId so we could invoke it with std::iter::once(id) or with a range, but I’m not quite sure it’s worth the hassle.

(Sorry I’m not at the computer right now so I can’t develop much more on that -pun intended-)

Edit https://doc.rust-lang.org/std/iter/index.html#implementing-iterator would allow us to use a bastionId or a range seamlessly. Not sure if worth exploring though :)

vertexclique · 2020-01-21T18:19:15Z

One more idea would be:

Seek in the range
Use splice to extract the chunk that contains our one and the only.
Yield to your method with dereffed chunk and use 0 on that (or directly use first).

Will take a look @ home again.

Relrin · 2020-01-21T19:21:05Z

One more idea would be:

Seek in the range

Use splice to extract the chunk that contains our one and the only.

Yield to your method with dereffed chunk and use 0 on that (or directly use first).

I'm also thinking about the following solution:

async fn recover(&mut self, id: BastionId) -> Result<(), ()> {
        debug!(
            "Supervisor({}): Recovering using strategy: {:?}",
            self.id(),
            self.strategy
        );
        match self.strategy {
            SupervisionStrategy::OneForOne => {
                let (index, _, _) = self.launched.remove(&id).ok_or(())?;
                let value = self.order.remove(index);
                self.order.push(value);
                let start = self.order.len();
                self.restart(start..).await;
            }
            SupervisionStrategy::OneForAll => {
                self.restart(0..).await;

                // TODO: should be empty
                self.stopped.shrink_to_fit();
                self.killed.shrink_to_fit();
            }
            SupervisionStrategy::RestForOne => {
                let (start, _, _) = self.launched.get(&id).ok_or(())?;
                let start = *start;

                self.restart(start..).await;
            }
        }

        Ok(())
    }

It will work, however, I'm not sure that will be correct because we're doing pop from the some place in the vector tracked by the supervisor and then pushing in the end. Is it ok for the SupervisionStrategy::RestForOne strategy? I guess for covering this case, we will need to update all the rest of the indexes (by the minus one) for each value added after this particular index.

Another approach which is more or less reliable is to replace the RangeFrom<T> type onto the std::ops::Range<T> type. And for any of restart policies we actually know what to update (e.g. (start..self.order.len()) for the OneForAll or the RestForOne strategies). The only one disadvantage of this approach is having potential breaking changes in existing private API.

Relrin · 2020-01-22T08:14:59Z

I have decided to change the signature of the restart/kill/stop methods, so that it will use the Range<T> instead of the RangeFrom<T> type. After these changes, everything works as expected. :)

vertexclique

Overall this looks good, just I have suggested couple of small nits may be useful for the user point of view.

vertexclique · 2020-01-22T12:08:15Z

bastion/src/supervisor.rs

+    }
+
+    /// Sets the limit of attempts for restoring failed actors.
+    pub fn with_max_restarts(mut self, max_restarts: Option<usize>) -> Self {


I have surface API improvement:

Let's make this isize

isize = !0 means that unlimited restarts.

isize > 0 means that use given restart count.

isize = 0 means that don't restart.

What do you think @o0Ignition0o and @Relrin ?

Perhaps it worth to reorganize it into enum (for better checks in match expressions), like this:

pub enum MaxRestartsPolicy { Unlimited, Limit(usize) DontRestart, }

It looks better and easier to understand intentions of other devs from code.

An enum sounds great
Maybe this naming would convey the meaning more accurately:

pub enum RestartPolicy { Always, Never, Tries(usize) } pub fn with_restarts(mut self, restarts: RestartPolicy) -> Self {}

What do you think ?

Sounds good to me. I will refactor code with this enum

vertexclique · 2020-01-22T12:10:48Z

bastion/examples/restart_strategy.rs

+use std::time::Duration;
+
+use bastion::prelude::*;
+


We mostly write a prelude intro to our examples to explain where an example can be used for and what's the intent. I would like to have that. You might want to take a gaze to this one:
e.g. https://github.com/bastion-rs/bastion/blob/master/bastion/examples/parallel_computation.rs#L4

vertexclique · 2020-01-22T12:12:10Z

bastion/src/supervisor.rs

+    /// #         timeout: Duration::from_secs(5)
+    /// #     };
+    /// #     let restart_strategy = RestartStrategy::default()
+    /// #        .with_actor_restart_strategy(actor_restart_strategy);


This just tests the initiation, that's nice but we need an actual test for this case in tests/ I think:
https://github.com/bastion-rs/bastion/tree/master/bastion/tests

vertexclique

Thank you and welcome to our max retries implementation! 🎉

Implemented max retries for restoring actors

d2b3d5a

Added example for the supervisor with recovery strategy

c7b64b7

Relrin added 4 commits January 21, 2020 23:55

Changed signature of the Supervisor::restart method

4c79d7a

Changed signature of the Supervisor::kill method

f0e4354

Changed signature of the Supervisor::stop method

af58a40

Added saving old counters for the failed actors

1cb4b48

vertexclique reviewed Jan 22, 2020

View reviewed changes

Relrin and others added 4 commits January 22, 2020 18:26

Replaced max_retries on the RestartPolicy enum

cffa1cf

Updated example with restart_strategy

a006c89

Added tests for the RestartStrategy struct

638fefd

Fixed formatting warnings

87b4082

vertexclique approved these changes Jan 22, 2020

View reviewed changes

vertexclique merged commit 14c51d6 into bastion-rs:master Jan 22, 2020

vertexclique mentioned this pull request Jan 22, 2020

Supervisor restart should allow for exponential backoff on retry #146

Closed

Relrin deleted the feature-max-retries branch January 22, 2020 21:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Max retries support for existing restart strategies #157

Max retries support for existing restart strategies #157

Relrin commented Jan 21, 2020

o0Ignition0o commented Jan 21, 2020

Relrin commented Jan 21, 2020 •

edited

Loading

o0Ignition0o commented Jan 21, 2020 •

edited

Loading

vertexclique commented Jan 21, 2020

Relrin commented Jan 21, 2020 •

edited

Loading

Relrin commented Jan 22, 2020 •

edited

Loading

vertexclique left a comment

vertexclique Jan 22, 2020

Relrin Jan 22, 2020 •

edited

Loading

o0Ignition0o Jan 22, 2020

Relrin Jan 22, 2020

vertexclique Jan 22, 2020

Relrin Jan 22, 2020

vertexclique Jan 22, 2020

Relrin Jan 22, 2020

vertexclique left a comment

Max retries support for existing restart strategies #157

Max retries support for existing restart strategies #157

Conversation

Relrin commented Jan 21, 2020

Checklist

o0Ignition0o commented Jan 21, 2020

Relrin commented Jan 21, 2020 • edited Loading

o0Ignition0o commented Jan 21, 2020 • edited Loading

vertexclique commented Jan 21, 2020

Relrin commented Jan 21, 2020 • edited Loading

Relrin commented Jan 22, 2020 • edited Loading

vertexclique left a comment

Choose a reason for hiding this comment

vertexclique Jan 22, 2020

Choose a reason for hiding this comment

Relrin Jan 22, 2020 • edited Loading

Choose a reason for hiding this comment

o0Ignition0o Jan 22, 2020

Choose a reason for hiding this comment

Relrin Jan 22, 2020

Choose a reason for hiding this comment

vertexclique Jan 22, 2020

Choose a reason for hiding this comment

Relrin Jan 22, 2020

Choose a reason for hiding this comment

vertexclique Jan 22, 2020

Choose a reason for hiding this comment

Relrin Jan 22, 2020

Choose a reason for hiding this comment

vertexclique left a comment

Choose a reason for hiding this comment

Relrin commented Jan 21, 2020 •

edited

Loading

o0Ignition0o commented Jan 21, 2020 •

edited

Loading

Relrin commented Jan 21, 2020 •

edited

Loading

Relrin commented Jan 22, 2020 •

edited

Loading

Relrin Jan 22, 2020 •

edited

Loading