Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HAProxy config template: Fix weight logic for A/B testing #19893

Merged
merged 1 commit into from
Jul 13, 2018
Merged

HAProxy config template: Fix weight logic for A/B testing #19893

merged 1 commit into from
Jul 13, 2018

Conversation

fcami
Copy link
Contributor

@fcami fcami commented May 31, 2018

Using an OpenShift route with several services setup for A/B testing,
and the "oc set route-backends", setting a zero weigth to a service
removes the endpoints from the haproxy backend.
This results in a loss of all existing connections, and a loss of
associated end-users sessions.

Changing:
{{ if ne $weight 0 }}
to
{{ if ge $weight 0 }}
fixes the issue.
Removing older services is then done whith the "set route-backend" command.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1584701

@openshift-ci-robot openshift-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels May 31, 2018
@knobunc
Copy link
Contributor

knobunc commented May 31, 2018

/ok-to-test

@knobunc
Copy link
Contributor

knobunc commented May 31, 2018

/hold
We don't want this until 3.11

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 31, 2018
@knobunc knobunc self-assigned this May 31, 2018
Copy link
Contributor

@knobunc knobunc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is reasonable for the ones where we can see the HTTP headers and dispatch the existing connections to the endpoints with weight 0 using cookies.

We need to think more about the pass-through case. It probably should drop the connections with weight 0 since those will only be able to be dispatched by round-robin (where the weight will be skipped) or src-IP (where the weight will be ignored). So if r-r there is no point to add them, and if src-ip it does the wrong thing.

Here's more info on the weights for the servers:
https://cbonte.github.io/haproxy-dconv/1.7/configuration.html#4.2-balance
https://cbonte.github.io/haproxy-dconv/1.7/configuration.html#5.2-weight
https://cbonte.github.io/haproxy-dconv/1.7/configuration.html#hash-type

source The source IP address is hashed and divided by the total
weight of the running servers to designate which server will
receive the request. This ensures that the same client IP
address will always reach the same server as long as no
server goes down or up. If the hash result changes due to the
number of running servers changing, many clients will be
directed to a different server. This algorithm is generally
used in TCP mode where no cookie may be inserted. It may also
be used on the Internet to provide a best-effort stickiness
to clients which refuse session cookies. This algorithm is
static by default, which means that changing a server's
weight on the fly will have no effect, but this can be
changed using "hash-type".

We use the consistent hash type for tcp connections... and that supports weights.

BUT I don't see what you would gain because new tcp connections should not use the ones with weight 0, and ongoing connections are left until they close.

So, I think all we should do is put a good comment on the second weight case (for the passthrough) explaining that we are deliberately different, and why.

@knobunc
Copy link
Contributor

knobunc commented May 31, 2018

/approve

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 31, 2018
Copy link
Contributor

@pecameron pecameron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"ge 0" effectively removes the test, so remove the test completely. The problem is for the backend server to permit existing endpoints to complete when weight goes to 0 while not allowing new connections. Have you verified that setting up server backends with weight 0 works as needed. (I am not sure how haproxy handles this.)

@fcami
Copy link
Contributor Author

fcami commented May 31, 2018

@pecameron do you mean setting up all server backends with weight 0, or just some of them? What was verified is that if existing servers have weight=0, and new servers have weight>0, existing connections are not dropped and new connections all go to the new servers.
Reading https://cbonte.github.io/haproxy-dconv/1.7/configuration.html#5.2-weight it does not make sense to have all servers with weight=0 - at least not to me.

@knobunc
Copy link
Contributor

knobunc commented Jun 8, 2018

@pecameron: ge 0 is fine in case someone managed to get a negative weight in. We do validate the range in the api server, but it doesn't hurt here.

@fcami: I would like to have a comment added explaining why this section allows it but tcp passthrough does not (otherwise someone may think it is a mistake that they are the same). I would like to merge this once master opens for features.

@fcami
Copy link
Contributor Author

fcami commented Jun 8, 2018

@knobunc @pecameron thanks for the review!

@knobunc comments added to both sections.

@knobunc
Copy link
Contributor

knobunc commented Jun 8, 2018

/lgtm
/ok-to-test

@openshift-ci-robot openshift-ci-robot added lgtm Indicates that a PR is ready to be merged. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jun 8, 2018
@knobunc
Copy link
Contributor

knobunc commented Jun 8, 2018

I'll pull the hold as soon as the gates open.

and the "oc set route-backends", setting a zero weigth to a service
removes the endpoints from the haproxy backend.
This results in a loss of all existing connections, and a loss of
associated end-users sessions.

Changing:
      {{ if ne $weight 0 }}
to
      {{ if ge $weight 0 }}
fixes the issue.
Removing older services is then done whith the  "set route-backend" command.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1584701
@openshift-ci-robot openshift-ci-robot removed the lgtm Indicates that a PR is ready to be merged. label Jun 8, 2018
@fcami
Copy link
Contributor Author

fcami commented Jun 8, 2018

@knobunc there was a typo in the previous commit, now everything should be fine.

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Jun 18, 2018
Copy link
Contributor

@knobunc knobunc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: fcami, knobunc

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ironcladlou
Copy link
Contributor

/hold cancel

@openshift-ci-robot openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 3, 2018
@stevekuznetsov
Copy link
Contributor

/retest

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-merge-robot openshift-merge-robot merged commit e689dd3 into openshift:master Jul 13, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants