-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Live Nginx configuration update without reloading #2174
Conversation
/assign @aledbf |
@ElvinEfendi @valeriano-manassero can we get consensus about which version we are going to push? |
@aledbf I’d love to hear what @valeriano-manassero thinks about going forward with this PR and collaboratively extending it later. I am going to work on this feature in coming weeks consistently and if you want I can put together a small RFC as well to discuss the future of this work in a more organized way. I have already commented on other PR and raised my concerns. In my humble opinion the controller change in this PR is more concise and cleaner and the Lua implementation is more performant(no ‘json’ decoding in request path), correct(I use lua-resty-lock to avoid possible race condition between different Nginx workers accessing load balancing state in shared dictionary) and easier to extend(configuration.lua has its clear responsibility of accepting raw config piece per component and storing it in general shared dictionary which will be consumed by the respective middleware as needed. For example as of now ‘balancer.lua’ needs ‘backends’ only and it is responsible what needs to be done with that data alone. In future we can POST ‘servers’ config as well that will be consumed in a similar way by i.e ‘certificate.lua’ that serves certs dynamically etc). We are also already running this code in our test cluster without any issue and also starting to migrate some less critical services to it(I will update with more data later). Given the above compelling reasons I suggest we move forward with this PR. Please let me know what do you think. |
@aledbf @ElvinEfendi Sorry for the delay, I had a couple of very busy days. |
The CI should be fixed once #2172 gets merged and new Nginx image gets published. |
Thanks @valeriano-manassero! I would be happy to discuss that with you. |
@ElvinEfendi today I will publish the new nginx image |
@ElvinEfendi please squash the commits (no more than three or four) |
8b550d0
to
efb3f13
Compare
Codecov Report
@@ Coverage Diff @@
## master #2174 +/- ##
=========================================
Coverage ? 36.85%
=========================================
Files ? 70
Lines ? 4963
Branches ? 0
=========================================
Hits ? 1829
Misses ? 2855
Partials ? 279
Continue to review full report at Codecov.
|
efb3f13
to
b0e5e5a
Compare
@aledbf looks like you've published the new Nginx image already, thanks! CI is 🍏 now. Please let me know if you think this PR requires some more changes. |
b0e5e5a
to
1d4f09f
Compare
@aledbf is there anything blocking this PR from merging? |
@ElvinEfendi only testing. If I don't find any issues I will merge this in two days. |
@aledbf thanks for the update! |
if isSticky(host, location, backend.SessionAffinity.CookieSessionAffinity.Locations) { | ||
upstreamName = fmt.Sprintf("sticky-%v", upstreamName) | ||
} | ||
if !dynamicConfigurationEnabled { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is currently buggy, it will ignore https
proto when dynamic configuration is enabled. I have fixed it at https://github.com/Shopify/ingress/pull/29/files. @aledbf let me know if you want me include the fix(and corresponding regression test) in this PR or in a separate PR after this gets merged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please include the fix in this PR
internal/ingress/controller/nginx.go
Outdated
newBackends = append(newBackends, pcfg.Backends[i].DeepCopy()) | ||
} | ||
|
||
n.runningConfig.Backends = []*ingress.Backend{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please do not change the state of n.runningConfig
. Please make a copy and use that to make the comparison.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aledbf notice that I'm changing only n.runningConfig.Backends
and I have the copy of its original value at https://github.com/kubernetes/ingress-nginx/pull/2174/files#diff-cde3fffe2425ad7efaa8add1d05ae2c0R756 where I restore at https://github.com/kubernetes/ingress-nginx/pull/2174/files#diff-cde3fffe2425ad7efaa8add1d05ae2c0R772.
The idea is to make sure that the change between new and running config is only Backends
change. Please let me know if you think there's a better way of doing this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should not change n.runningConfig
. I prefer a full copy that we can be disposed of later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there already a utility function to deep copy runningConfig
?
@ElvinEfendi please rebase and squash for the last review and merge |
7d1a560
to
d855a5e
Compare
@aledbf done! |
end | ||
|
||
function _M.call() | ||
if ngx.var.request_method ~= "POST" or ngx.var.request_uri ~= "/configuration/backends" then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's better to have a GET
api to retrieve all configurations. As all configurations now store in memory not files, when something gets wrong, it will be difficult to debug if configuration is correct
@ElvinEfendi I just finished testing the PR and I just have two comments:
|
Thanks @aledbf!
I implemented it that way intentionally thinking that it would be more resilient to failures. But now that I think more about it is probably better to completely skip
👍 I'll do both. |
Please keep in mind you only need to comment the upstream generation. We still need the reload of nginx when an Ingress is added/removed or a new annotation is configured To be clear, https://github.com/kubernetes/ingress-nginx/blob/master/rootfs/etc/nginx/template/nginx.tmpl#L311-L338 should be inside an if section asking for dynamic configuration being disabled (so this is not generated) |
If someone else can help to test this please use |
@ElvinEfendi this is working just fine. Let's wait until tomorrow and I will merge this. |
@ElvinEfendi after this PR we need to clean up the JSON is being sent to LUA because there's a lot of information we are not going to use (and I don't want to use more memory than necessary in nginx) |
ab02f3e
to
af6ae14
Compare
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: aledbf, ElvinEfendi The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@ElvinEfendi thanks! |
Requires #2172 to be merged first and published the new Nginx image.
What this PR does / why we need it:
The PR implements live configuration update for Nginx upstreams/k8s endpoints. That means every time you deploy your app Nginx won't be reloaded. This is very important when you are front-ending many applications behind a single deployment of ingress-nginx and those app gets deployed frequently and have significant traffic.
I've mainly focused on laying out proper foundation for further development of live Nginx (re)configuration feature in ingress-nginx. The PR implements only Round Robin algorithm but can easily be extended to support more. I have another PR at #2167 to configure load balancing algorithm per ingress, that configuration will be used by Lua code as well to decide which LB algorithm to use for a given app(namespace/service).
The feature can be enabled by using
--enable-dynamic-configuration
command line flag. If you wanna see more Lua logs you can also seterror-log-level
toinfo
to see what it does on endpoints changes.In order to avoid JSON decoding in the request path, for every Nginx worker I spin up a periodic function that reads raw configuration from a shared Lua dictionary and decodes it into local cache per worker. Then balancer uses its local cache to get the list of endpoints and other backend configuration for given namespace/service.
If you want to test you can use
index.docker.io/elvinefendi/nginx-ingress-controller:0.0.1
Next steps: Setup a testing framework for Lua code, add support for EWMA and more LB algorithms, add support for live update of secrets by using
certificate_by_lua
Which issue this PR fixes: n/a
Special notes for your reviewer:
I'm aware that there's another similar PR being worked on at #2152. I was told about that only after I started working on this feature at Shopify#16 (comment). There are significant differences in the implementation between the two PRs and I thought there could be value in creating this PR as well.