Support eventual consistency #298

windsource · 2024-06-19T14:40:38Z

Description

When applying a new manifest, currently Ankaios has a fixed number of retries when the start fails and after that finally gives up. The workload remains in state Pending, subState StartingFailed. There can be different reasons why the start fails like

cannot pull image (registry not available, image not found, not authorized, ...)
invalid options passed to either commandOptions or commandArgs
Folder not existing when mouting volumes
etc.

While some of the problems cannot be solved without changing the manifest (e.g. invalid options) others might disappear after some time (e.g. registry not available or folder not existing).

Some users expect that Ankaios constantly tries to reach the desired state and also that Ankaios provides the result of the latest try (e.g. Podman error message).

Goals

Ankaios should constantly try to reach the desired state.
The interval between the retries shall be increased over time with a jitter (maybe same strategy as K8S).
ank get workloads shall provide the latest result.
ank apply shall return after the first attempt

Final result

Summary

To be filled when the final solution is sketched.

Tasks

Task 1
Task 2
...

The text was updated successfully, but these errors were encountered:

windsource · 2024-06-21T12:01:30Z

Maybe we can also have an optional maximum time before Ankaios stops to reach desired state. The parameter could be part of a config file (see #302).

inf17101 · 2024-06-26T09:13:53Z

Builds upon #67 (PR #137)

krucod3 · 2024-12-04T12:12:56Z

Kubernetes uses a backoff strategy capped at 300 seconds (5 minutes):
https://kubernetes.io/docs/concepts/containers/images/

There are different capped exponential backoff algorithms that can be used. A small comparison can be found here:
https://aws.amazon.com/blogs/architecture/exponential-backoff-and-jitter/
Additionally general information can be found here:
https://en.wikipedia.org/wiki/Exponential_backoff

According to the AWS comparison we can just use some sort of full or decorated jitter, e.g.

delay = min(cap, random_between(base, 3*last_delay)) // note that this is not exponential, but relatively fast growing

where cap is 300 seconds and base could be something between 100 and 500 milliseconds.

krucod3 · 2024-12-04T12:29:51Z

We should probably think about adding a new substate to the pending state as the backoff could be quite long and we should signalize the backoff waiting to the users. This must be done carefully as we already had problems with quickly changing substates for the retry.

krucod3 · 2024-12-04T12:31:17Z

An optional maximal time for the retry can be configured centrally at the server as a start. We already added the possibility to distribute config options to agents using the server hello message so the workflow is already prepared.

windsource · 2024-12-06T07:39:33Z

Just to make it sure: The capping at 5 minutes in K8S does not mean, that all attempts are stopped after 5 minutes but that the maximum time between two attempts is 5 minutes.

krucod3 · 2024-12-06T12:00:05Z

Yes, the backoff delay is capped (see proposed formula above).
Regarding stopping of the further retry attempts, that would be your proposal of an optional configuration option which can be implemented using this high level idea.

windsource added the enhancement New feature or request. Issue will appear in the change log "Features" label Jun 19, 2024

krucod3 added this to the v0.6 milestone Jun 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support eventual consistency #298

Support eventual consistency #298

windsource commented Jun 19, 2024 •

edited

Loading

windsource commented Jun 21, 2024 •

edited

Loading

inf17101 commented Jun 26, 2024 •

edited

Loading

krucod3 commented Dec 4, 2024

krucod3 commented Dec 4, 2024

krucod3 commented Dec 4, 2024

windsource commented Dec 6, 2024 •

edited

Loading

krucod3 commented Dec 6, 2024 •

edited

Loading

Support eventual consistency #298

Support eventual consistency #298

Comments

windsource commented Jun 19, 2024 • edited Loading

Description

Goals

Final result

Summary

Tasks

windsource commented Jun 21, 2024 • edited Loading

inf17101 commented Jun 26, 2024 • edited Loading

krucod3 commented Dec 4, 2024

krucod3 commented Dec 4, 2024

krucod3 commented Dec 4, 2024

windsource commented Dec 6, 2024 • edited Loading

krucod3 commented Dec 6, 2024 • edited Loading

windsource commented Jun 19, 2024 •

edited

Loading

windsource commented Jun 21, 2024 •

edited

Loading

inf17101 commented Jun 26, 2024 •

edited

Loading

windsource commented Dec 6, 2024 •

edited

Loading

krucod3 commented Dec 6, 2024 •

edited

Loading