For failed deploys, add the information about the k8s node #762

ayatsynych · 2020-10-27T18:58:02Z

Feature request

Proposal: If it's possible, in case of deploy failure, add the information about the k8s node. There are cases when deployment failures are caused by the underlying node issues. It will be much easier to identify these causes by outputting the node information (name) for each of the failed resources.
Having this information in the logs also helps with audits and debugging.

dturn · 2020-10-27T19:38:06Z

In general this sounds like it would be helpful. Though can you be a bit more specific about the types of failures you're seeing. e.g. DS pods not scheduling, container problems, ...

The other big question are you interested PRing this or is this just a request?

ayatsynych · 2020-10-27T20:04:11Z

The other big question are you interested PRing this or is this just a request?

This is just a request, but if we find the time, we will consider putting in the work to implement this request

Though can you be a bit more specific about the types of failures you're seeing.

Few specific examples we have seen in the past:

The underlying node is having docker daemon issues and all the pods that get scheduled on that node are in a bad state (stuck in "Terminating" or "Initializing" state)
The underlying node is having performance issues, therefore causing a timeout (if this is helpful, the specific case we have seen was that the image pulls on one of the nodes were extra slow. Having surfaced node name in all the failed resources would have right away pointed to the node-specific problem)

ajshepley · 2020-10-30T17:50:55Z

cc @Shopify/pipeline

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

For failed deploys, add the information about the k8s node #762

For failed deploys, add the information about the k8s node #762

ayatsynych commented Oct 27, 2020

dturn commented Oct 27, 2020

ayatsynych commented Oct 27, 2020

ajshepley commented Oct 30, 2020

For failed deploys, add the information about the k8s node #762

For failed deploys, add the information about the k8s node #762

Comments

ayatsynych commented Oct 27, 2020

Feature request

dturn commented Oct 27, 2020

ayatsynych commented Oct 27, 2020

ajshepley commented Oct 30, 2020