[BUG]: CSI-PowerFlex entering boot loop when array has long response times #1639
Labels
area/csi-powerflex
Issue pertains to the CSI Driver for Dell EMC PowerFlex
type/bug
Something isn't working. This is the default label associated with a bug issue.
Bug Description
When installing csi-powerflex driver (v2.12.0), if multiple powerflex arrays are provided in the secret, and one of the arrays is unreachable and takes a long time to respond, the driver controller is stuck in a boot loop trying to authenticate with the unreachable powerflex array.
Specifically, if one of the arrays does not respond before the timeout specified by the kubernetes sidecar in the driver deployment workload (.spec.template.spec.container[].args["--timeout=120s"]), this is when the issue is encountered. If the timeout is not specified, the default is 15s.
Logs
You can see here, there are 27
Probe
requests before the first reply.vxflexos-controller - driver container logs:
Screenshots
No response
Additional Environment Information
No response
Steps to Reproduce
Install csi-powerflex with >=2 storage systems configured in the secret.
One of the storage systems should be unreachable and take longer than 2 minutes to respond.
Expected Behavior
Ideally, if one of the arrays responds within the given timeout, the driver should continue with initialization, eventually entering a running state and allowing the user to perform storage maintenance actions against all powerflex arrays that are online.
CSM Driver(s)
csi-powerflex:v2.12.0
Installation Type
Operator:v1.7.0
Container Storage Modules Enabled
N/A
Container Orchestrator
OpenShift v4.17.7
Operating System
RHEL 9.4
The text was updated successfully, but these errors were encountered: