You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thanks for opening an issue for the M3DB Operator! We'd love to help you, but we need the following information included
with any issue:
What version of the operator are you running? Please include the docker tag. If using master, please include the git
SHA logged when the operator first starts.
v0.10.0
What version of Kubernetes are you running? Please include the output of kubectl version.
At that time, we chatted with @robskillington who suggested we upgrade to 0.8.0 or newer where there would be better state syncing in large k8s clusters that might reduce issues where there are stale view of objects, such as statefulsets not being seen as existing.
We thought it might be resolved by upgrading to v0.10.0 but we think the same issue persists. Though it seems like the "statefulset already exists" log is info level rather than error.
We're trying to understand more about how "statefulset already exists" might relate to the operator is not beginning to scale up the cluster. Still unsure if this is an issue on our k8s cluster side or a bug in the operator.
Other things we've tried:
[didn't work] edit the m3dbcluster back to the original number of instances, then restart the operator, then edit the m3dbcluster back up to the desired number of instances
[worked] delete the m3db-rep0 statefulset (operator doesn't recreate sts yet), then restart the operator, then we saw the operator started creating the new statefulset with the desired number of instances + started scaling up the cluster
The text was updated successfully, but these errors were encountered:
Hi @yywandb! Sorry for the delay in following up on this issue. Based on your description, it seems like the operator doesn't become aware that the cluster spec has been updated unless it's restarted. Does that sound right? If so, it seems like this issue might be similar to a previous issue we ran into, #268, where the operator would update a StatefulSet without waiting for a previous StatefulSet that it just updated to become healthy. The root cause of that issue was that the operator was working with stale copies of the StatefulSet's in the cluster and was addressed in #271. That commit was included in the most recent release, v0.13.0, and while it's concerned with StatefulSet's and not m3db cluster CRD's like this issue, it would be interesting to see if the issue still occurs with the latest release. To that end, would it possible to update your operator to v0.13.0? One tricky thing to be aware of before upgrading is that v0.12.0 contained breaking changes to the default ConfigMap that the operator uses for M3DB, so if you are relying on the default ConfigMap you'll need to provide the old default as a custom ConfigMap.
Thanks for opening an issue for the M3DB Operator! We'd love to help you, but we need the following information included
with any issue:
master
, please include the gitSHA logged when the operator first starts.
v0.10.0
kubectl version
.Increase the instances per isolation group of our m3db cluster by 1, i.e. adding 3 nodes to the cluster, one for each replica.
Operator to detect that it needs to begin adding the node.
The operator doesn't scale up the cluster. We see logs that look like this:
We previously saw the same issue when using v0.7.0 of the operator with these logs.
At that time, we chatted with @robskillington who suggested we upgrade to 0.8.0 or newer where there would be better state syncing in large k8s clusters that might reduce issues where there are stale view of objects, such as statefulsets not being seen as existing.
We thought it might be resolved by upgrading to v0.10.0 but we think the same issue persists. Though it seems like the "statefulset already exists" log is info level rather than error.
We're trying to understand more about how "statefulset already exists" might relate to the operator is not beginning to scale up the cluster. Still unsure if this is an issue on our k8s cluster side or a bug in the operator.
Other things we've tried:
The text was updated successfully, but these errors were encountered: