Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[controller] Support multi instance placement add #275

Merged
merged 4 commits into from
Mar 2, 2021

Conversation

notbdu
Copy link
Collaborator

@notbdu notbdu commented Feb 25, 2021

Modifies the placement client Add() API to accept multiple instances.

@notbdu notbdu force-pushed the bdu/placement-multi-add branch 2 times, most recently from ee06603 to 05f3c48 Compare February 25, 2021 15:52
@codecov
Copy link

codecov bot commented Feb 25, 2021

Codecov Report

Merging #275 (9447ac0) into master (7a6466e) will increase coverage by 0.00%.
The diff coverage is 63.82%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #275   +/-   ##
=======================================
  Coverage   76.01%   76.02%           
=======================================
  Files          32       32           
  Lines        2381     2394   +13     
=======================================
+ Hits         1810     1820   +10     
- Misses        427      429    +2     
- Partials      144      145    +1     

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7a6466e...9447ac0. Read the comment docs.

Copy link
Collaborator

@schallert schallert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. A few minor nits.

One general question: do our current "ready replica" checks guarantee that we'll add all pods at once, rather than accidentally adding one at a time?

For example, say we scale a set from 2 to 4 pods. At time T1, 1/2 new pods is up. Are our readiness checks structured in such a way that we won't immediate add the 3rd pod, and will wait until we can evaluate the two new ones at once?

Comment on lines 373 to 374
err := fmt.Errorf("error creating instance for pod %s", pod.Name)
c.logger.Error(err.Error())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: take advantage of structured logging here (i.e. logger.Error("error creating instance for pod", zap.String("pod", ...

func (c *M3DBController) addPodsToPlacement(cluster *myspec.M3DBCluster, pods []*corev1.Pod) error {
var (
instances = make([]*placementpb.Instance, 0, len(pods))
reasonBuf = bytes.NewBufferString("adding pods to placement (")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a pretty different convention than how build up logs, and this isn't very perf-sensitive code.

I would vote for just adding to an array of strings, which are the names of the instances / pods, and then calling zap.Strings.

if err != nil {
err := fmt.Errorf("error adding pod to placement: %s", pod.Name)
err := fmt.Errorf("error: %s", reason)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same thing here re: taking the change to structure the below .Error log.

"github.com/m3db/m3/src/cluster/placement"
dbns "github.com/m3db/m3/src/dbnode/generated/proto/namespace"
"github.com/m3db/m3/src/query/generated/proto/admin"

appsv1 "k8s.io/api/apps/v1"
corev1 "k8s.io/api/core/v1"
v1 "k8s.io/api/core/v1"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: this ls already imported as corev1 on the line above.

@notbdu
Copy link
Collaborator Author

notbdu commented Mar 2, 2021

In practice, I observed the operator bringing up a pod at a time (waiting for each to be ready before proceeding) and then expanding the placement when all requested pods are up/ready/bootstrapped (w/ no shards). I observed this pattern testing in rc scaling from 2 -> 4 and 2 -> 6 nodes.

In the code it looks like we add a pod at a time here:

var newCount int32
if current < desired {
newCount = current + 1
} else {
newCount = current - 1
}
setLogger.Info("resizing set, desired != current", zap.Int32("newSize", newCount))

And then when the number of desired == current pods we expand the placement here (placement shouldn't be expanded earlier than this point):

if desired == current {
// If the set is at its desired size, and all pods in the set are in the
// placement, there's nothing we need to do for this set.
if current == inPlacement {
continue
}
// If the set is at its desired size but there's pods in the set that are
// absent from the placement, add pods to placement.
if inPlacement < current {
setLogger.Info("expanding placement for set")
return c.expandPlacementForSet(cluster, set, group, placement)
}
}

Can observe this in the bootstrap metrics as well (2 -> 4 and then 2 -> 6):
Screen Shot 2021-03-01 at 10 06 10 PM

@notbdu notbdu merged commit 66671e8 into master Mar 2, 2021
@notbdu notbdu deleted the bdu/placement-multi-add branch March 2, 2021 23:09
soundvibe added a commit that referenced this pull request Apr 1, 2021
* master:
  Backwards compatibility when using the original update annoation with an OnDelete update strategy (#284)
  Add support for parallel node updates within a statefulset (#283)
  Support namespace ExtendedOptions in cluster spec (#282)
  [controller] Support multi instance placement add (#275)
  [gomod] Update M3DB dependency (#277)
  [cmd] Fix instrument package name (#280)

# Conflicts:
#	pkg/k8sops/annotations/annotations.go
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants