Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(config_etcd): use a single long http connection to watch all resources #9456

Merged
merged 12 commits into from
May 23, 2023

Conversation

kingluo
Copy link
Contributor

@kingluo kingluo commented May 10, 2023

Description

  1. only one http connection to watch the prefix for all resources
  2. use chunked streaming to receive events in the connection, timeout=50 second

Sequence Diagram:

/routes watcher->/routes watcher: first sync_data()
/routes watcher->main watcher: start
/routes watcher->/routes watcher: wait main watcher
/upstreams watcher->/upstreams watcher: first sync_data(), wait main watcher
main watcher->etcd: get current etcd revision and start watch there
main watcher-->/routes watcher: started
main watcher-->/upstreams watcher: started
/routes watcher->etcd: readdir() if need_reloaded
/routes watcher->main watcher: iterate history watch events, found
/upstreams watcher->main watcher: no history event, wait...
etcd-->main watcher: watch event
main watcher->main watcher: save event
main watcher->/upstreams watcher: notify
/upstreams watcher->main watcher: iterate history watch events, found
etcd-->main watcher: timeout
main watcher->etcd: reconnect and restart watch from max received revision
etcd-->main watcher: compacted
main watcher->etcd: reconnect and restart watch from compacted revision

sequence_diagram (21)

Checklist

  • I have explained the need for this PR and the problem it solves
  • I have explained the changes or the new features added to this PR
  • I have added tests corresponding to this change
  • I have updated the documentation to reflect this change
  • I have verified that this change is backward compatible (If not, please discuss on the APISIX mailing list first)

@kingluo kingluo changed the title feat(config_etcd): use only one http connection to watch all resources feat: config_etcd: use only one http connection to watch all resources May 10, 2023
@kingluo kingluo changed the title feat: config_etcd: use only one http connection to watch all resources feat(config_etcd): use only one http connection to watch all resources May 11, 2023
@kingluo kingluo changed the title feat(config_etcd): use only one http connection to watch all resources feat(config_etcd): use only one long http connection to watch all resources May 11, 2023
@kingluo kingluo changed the title feat(config_etcd): use only one long http connection to watch all resources feat(config_etcd): use a single long http connection to watch all resources May 11, 2023
@@ -118,8 +118,8 @@ qr/Batch Processor\[error-log-logger\] failed to process entries: error while se
--- request
GET /tg
--- response_body
--- error_log eval
qr/.*\[\{\"body\":\{\"text\":\{\"text\":\".*this is an error message for test.*\"\}\},\"endpoint\":\"\",\"service\":\"APISIX\",\"serviceInstance\":\"instance\".*/
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new watch subsystem may print new logs.
All in all, strict log assumption is not a good practice.

--- response_body
prev_index updated
--- error_log eval
qr/(create watch stream for key|cancel watch connection success)/
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

deprecated test. etcd watch is a single long connection now.

@@ -18,5 +18,5 @@


export OPENRESTY_VERSION=source
export TEST_CI_USE_GRPC=true
#export TEST_CI_USE_GRPC=true
Copy link
Contributor Author

@kingluo kingluo May 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're going to start focusing on etcd's http tests.

@kingluo kingluo marked this pull request as ready for review May 13, 2023 16:08
watch_ctx.wait_init = nil

local opts = {}
opts.timeout = 50 -- second
Copy link
Contributor Author

@kingluo kingluo May 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The choice of 50 seconds is to make it smaller than the default proxy_read_timeout value, 60 seconds, so that nginx will not print error logs, such as:

2023/05/10 23:43:44 [error] 3668019#3668019: *809 upstream timed out (110: Connection timed out) while reading upstream, client: unix:, server: , request: "POST /v3/watch HTTP/1.1", upstream: "http://127.0.0.1:2379/v3/watch", host: "127.0.0.1"

Comment on lines 190 to 202
if err ~= "closed" and
err ~= "timeout" and
err ~= "broken pipe" then
log.error("wait watch event: ", err)
end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if err ~= "closed" and
err ~= "timeout" and
err ~= "broken pipe" then
log.error("wait watch event: ", err)
end
if err ~= "closed" and
err ~= "timeout" and
err ~= "broken pipe"
then
log.error("wait watch event: ", err)
end

end
end

for i = 1,min_idx-1 do
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for i = 1,min_idx-1 do
for i = 1, min_idx-1 do


if min_idx > 100 then
for k, idx in pairs(watch_ctx.idx) do
watch_ctx.idx[k] = idx-min_idx+1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
watch_ctx.idx[k] = idx-min_idx+1
watch_ctx.idx[k] = idx - min_idx + 1

watch_ctx.idx[k] = idx-min_idx+1
end
-- trim the res table
for i = 1,min_idx-1 do
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for i = 1,min_idx-1 do
for i = 1, min_idx - 1 do

res, err = res_func()
end
::iterate_events::
for i = watch_ctx.idx[key],#watch_ctx.res do
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for i = watch_ctx.idx[key],#watch_ctx.res do
for i = watch_ctx.idx[key], #watch_ctx.res do

end
::iterate_events::
for i = watch_ctx.idx[key],#watch_ctx.res do
watch_ctx.idx[key] = i+1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
watch_ctx.idx[key] = i+1
watch_ctx.idx[key] = i + 1

@monkeyDluffy6017
Copy link
Contributor

Why do you call this "long http connection", the http_cli will be recreated every time: https://github.com/api7/lua-resty-etcd/blob/master/lib/resty/etcd/v3.lua#L803?

@kingluo
Copy link
Contributor Author

kingluo commented May 19, 2023

Why do you call this "long http connection", the http_cli will be recreated every time: https://github.com/api7/lua-resty-etcd/blob/master/lib/resty/etcd/v3.lua#L803?

        ::watch_event::
        while true do
            local res, err = res_func()
            log.info("res_func: ", inspect(res))

Please read the code carefully.
Once the http connection for watching is created, it keeps receiving the watch event chunks, unless it is timeout or compacted.
So is it long or not?


::watch_event::
while true do
local res, err = res_func()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's very good design here, the server response is a stream, if we don't close the connection here, we could get the response event one by one

Comment on lines +116 to +118
if log_level >= NGX_INFO then
log.info("append res: ", inspect(res), ", err: ", inspect(err))
end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use json.delay_encode here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the delay stuff has bug:

apisix/apisix/core/json.lua

Lines 116 to 120 in a943c03

function _M.delay_encode(data, force)
delay_tab.data = data
delay_tab.force = force
return delay_tab
end

It only uses a singleton table to log, but here I need two vars to log.
And inspect is more informational than json for debugging.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants