-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BB-466 Dealing with expired zookeeper sessions #2471
BB-466 Dealing with expired zookeeper sessions #2471
Conversation
Hello nicolas2bert,My role is to assist you with the merge of this Status report is not available. |
ping |
Request integration branchesWaiting for integration branch creation to be requested by the user. To request integration branches, please comment on this pull request with the following command:
Alternatively, the |
/create_integration_branches |
ConflictA conflict has been raised during the creation of I have not created the integration branch. Here are the steps to resolve this conflict: $ git fetch
$ git checkout -B w/7.70/bugfix/BB-466/zookeeper-expired-session origin/development/7.70
$ git merge origin/bugfix/BB-466/zookeeper-expired-session
$ # <intense conflict resolution>
$ git commit
$ git push -u origin w/7.70/bugfix/BB-466/zookeeper-expired-session The following options are set: create_integration_branches |
6b1ac86
to
e6fa60f
Compare
const logger = new werelogs.Logger('NotificationConfigManager:test'); | ||
const zkConfigParentNode = 'config'; | ||
const concurrency = 10; | ||
const bucketPrefix = 'bucket'; | ||
const timeoutMs = 100; | ||
|
||
function mockZookeeperManager() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a big deal, feel free to ignore, but this helper could be factored out as it's used in more than 1 place.
zkClient.close(); | ||
zkClient.on('disconnected', () => done()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure how often this would happen, but I think there's a risk of test flakiness if attaching to an event after triggering it - depending on how async close
is.
86326e1
to
e9ac8af
Compare
_connect() { | ||
// clean up exists client before reconnect | ||
if (this.client) { | ||
this.client.removeAllListeners(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we call this.client.close()
before removing the listeners?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already close it here already: https://github.com/scality/backbeat/pull/2471/files#diff-ab8f069d2d46a96dbf066330d2185a0e4705edacbc2cb222cfc40abf4511b9d3R91
but moving this.client.close()
here might make more sense to make it more generic.
lib/clients/ZookeeperManager.js
Outdated
} | ||
// NO_NODE error and autoCreateNamespace is enabled | ||
if (err) { | ||
const nsIndex = this.connectionString.indexOf('/'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The connection string may not always contain a namespace. Maybe in our products we always have one, but for more genericity it could be better to handle cases where there's no namespace.
this.client.removeAllListeners(); | ||
} | ||
|
||
this.client = zookeeper.createClient(this.connectionString, this.options); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Structurally I think we should turn the whole construct into an async series or waterfall to avoid nesting too many levels. Also we would then be able to treat success/error in a unique place at the end of the series and emit the appropriate ready
or error
event.
* @param {Function} callback The callback function. | ||
* @return {undefined} | ||
*/ | ||
create(path, data, acls, mode, callback) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not too fond of having to reproduce the original API of the zookeeper client, for the maintenance burden notably if we have to use new functions later and for the extra code.
I am thinking, would it be better to extend the zookeeper client itself in such case, as if we're manipulating the zookeeper client but with extra logic inside to deal with session expiration etc., possibly it would cause some issues on its own, so not sure. Not blocking for me but just mentioning it doesn't look like an ideal solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not too sure about extending the zk client itself; the client we use is pretty antiquated and not super well maintained so we may want to migrate at some point to a more modern client (async
based?) and having ZookeeperManager
be an abstraction layer can help make the transition smoother.
ab34b90
to
ddb0d6c
Compare
if (this.options && this.options.autoCreateNamespace) { | ||
nsIndex = this.connectionString.indexOf('/'); | ||
namespace = this.connectionString.slice(nsIndex); | ||
if (nsIndex > -1 && namespace !== '/') { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The connection string could look like 1.2.3.4:2181
, I don't think we would even see a /
in this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this has been tested already: https://github.com/scality/backbeat/pull/2471/files#diff-5fc9cf9601779e975b71eba0b498d453274fe5807e82c8ad97850b43eb3e2098R33
Not sure what you meant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes I missed the nsIndex > -1
check, we precompute the namespace
though without checking if there's a namespace, but in this case calling slice(-1)
should not cause issues except it will return "garbage" in namespace
but we don't use it. Might be slightly cleaner to compute namespace
only if nsIndex > -1
but it's minor and not blocking for me.
if (err && err.name !== 'NODE_EXISTS') { | ||
return next({ event: 'error', err }); | ||
} | ||
return next({ event: 'ready' }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you need to disconnect/close the rootZkClient
once the path exists?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. Thanks!
1e60f42
to
7a41c33
Compare
Introducing the ZookeeperManager Class, which offers: * Connection Management: This class efficiently manages connections to the ZooKeeper server. It ensures continuity by handling reconnections whenever the session expires. * Error and State Management: The class handles errors and meticulously logs state changes. It is essential for debugging. * Abstraction Layer: It creates an abstraction over the database-specific code, simplifying the integration of application logic. This layer not only facilitates an easier transition to different databases in the future but also aids in simplifying the mocking and testing processes.
7a41c33
to
c9a634d
Compare
ConflictA conflict has been raised during the creation of I have not created the integration branch. Here are the steps to resolve this conflict: $ git fetch
$ git checkout -B w/8.5/bugfix/BB-466/zookeeper-expired-session origin/development/8.5
$ git merge origin/w/7.70/bugfix/BB-466/zookeeper-expired-session
$ # <intense conflict resolution>
$ git commit
$ git push -u origin w/8.5/bugfix/BB-466/zookeeper-expired-session The following options are set: create_integration_branches |
Integration data createdI have created the integration data for the additional destination branches.
The following branches will NOT be impacted:
You can set option
The following options are set: create_integration_branches |
Waiting for approvalThe following approvals are needed before I can proceed with the merge:
The following options are set: create_integration_branches |
@bert-e approve |
In the queueThe changeset has received all authorizations and has been added to the The changeset will be merged in:
The following branches will NOT be impacted:
There is no action required on your side. You will be notified here once IMPORTANT Please do not attempt to modify this pull request.
If you need this pull request to be removed from the queue, please contact a The following options are set: approve, create_integration_branches |
I have successfully merged the changeset of this pull request
The following branches have NOT changed:
Please check the status of the associated issue BB-466. Goodbye nicolas2bert. |
Introducing the ZookeeperManager Class, which offers:
Connection Management: This class efficiently manages connections to the ZooKeeper server. It ensures continuity by handling reconnections whenever the session expires.
Error and State Management: The class handles errors and meticulously logs state changes. It is essential for debugging.
Abstraction Layer: It creates an abstraction over the database-specific code, simplifying the integration of application logic. This layer not only facilitates an easier transition to different databases in the future but also aids in simplifying the mocking and testing processes.
Why isn't this issue present in Artesca?
Artesca uses Kubernetes as its orchestrator, which uses liveness probes to check if containers are operating correctly. Zookeeper state is part of the liveness probe of the Backbeat services. If a liveness probe fails, Kubernetes will understand that the application in the container is not functioning properly. In response, Kubernetes typically restarts the problematic container to try to resolve the issue.
This logic is not present in S3C/Federation.
However, to ensure consistency between the two projects, this change will be forward-ported to Artesca.
TODO: while forward-porting, make sure all the Zookeeper client uses the new ZookeeperManager class.