Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Beat might not be able to start after update #3485

Closed
david-kow opened this issue Jul 20, 2020 · 6 comments · Fixed by #3633
Closed

Beat might not be able to start after update #3485

david-kow opened this issue Jul 20, 2020 · 6 comments · Fixed by #3633
Assignees
Labels
>bug Something isn't working v1.3.0

Comments

@david-kow
Copy link
Contributor

When:

  • Beat is already running, and
  • Beat is deployed as Deployment, and
  • Beat is updated, and
  • New Pod lands on the same Node as old Pod, and
  • the timing is right, then

new Beat will keep crashing with the below.

ERROR   instance/beat.go:958    Exiting: data path already locked by another beat. Please make sure that multiple beats are not sharing the same data path (path.data).

(We want the new Pod to use the same path to preserve Beat identity.)

What could be a solution is to allow modifying maxUnavailable for deployment update strategy, but we don't expose that currently. Right now, the paths to unblock would be:

  • remove ReplicaSet that contains old Pods (requires manual intervention), or
  • use emptyDir for beat-data volume (Beat won't preserve its identity):
volumes:
- name: beat-data
  emptyDir: {}
@david-kow david-kow added >bug Something isn't working :beats labels Jul 20, 2020
@sebgl
Copy link
Contributor

sebgl commented Jul 20, 2020

What could be a solution is to allow modifying maxUnavailable for deployment update strategy, but we don't expose that currently

I think I'd be +1 for adding a Strategy appsv1.DeploymentStrategy field in the CRD:

type DeploymentSpec struct {
PodTemplate corev1.PodTemplateSpec `json:"podTemplate,omitempty"`
Replicas *int32 `json:"replicas,omitempty"`
}

@david-kow david-kow added v1.3.0 and removed :beats labels Jul 28, 2020
@pebrc pebrc self-assigned this Aug 14, 2020
@pebrc
Copy link
Collaborator

pebrc commented Aug 14, 2020

What could be a solution is to allow modifying maxUnavailable for deployment update strategy, but we don't expose that currently.

Would you not rather want to change the type of the DeploymentStrategy to Recreate than changing maxUnavailable?

@david-kow
Copy link
Contributor Author

What could be a solution is to allow modifying maxUnavailable for deployment update strategy, but we don't expose that currently.

Would you not rather want to change the type of the DeploymentStrategy to Recreate than changing maxUnavailable?

Wouldn't that cause all Pods to be deleted before any new ones appear? It seems we lose a lot of rollout safety with that approach - if the config is bad, user is left with no Beats until correct config is supplied (vs the current behavior where only a single Beat is affected). But when I think about it, this seems to be the only way to guarantee avoiding the issue - even with maxUnavailable the next Pod might hit the same issue.

@ghost
Copy link

ghost commented Aug 20, 2020

Any chance we can also have an option to remove/override the host mount? Deployments are not bound to nodes, so storing state in a host mount doesn't always make sense. What if multiple replicas are scheduled on the same node?

@pebrc
Copy link
Collaborator

pebrc commented Aug 20, 2020

@anders-cognite you already have full control over the podTemplate and are able to change mounts to your liking.

@ghost
Copy link

ghost commented Aug 20, 2020

I tried overriding volumeMounts but that gave duplication errors. I see now that you do de-duplication of the volumes, so that works. Thanks!

I guess it would be nice to be able to also override the volumeMount options too, but that's not as important.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug Something isn't working v1.3.0
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants