-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(discovery): observed containers should be checked with persisted nodes #423
Conversation
b44e7d0
to
5c94bb8
Compare
Does the |
|
Ahh I was about to ask! Thanks I will try that! |
Ahh, I am seeing an odd violation during startup:
|
Looks good to me!! It was very helpful! I tried with these steps to test this PR:
Okay this should be fixed now thanks to the |
Yea, that was semi-intentional back when I first put all that together just so I could test that Cryostat/the database would properly reject the definitions due to the duplicate URLs. I think by now that's already well-proven, so we can change it to do something different - give the containers different hostnames to be referenced by, or just turn off one of the discovery mechanisms in the smoketest by default so they don't overlap/collide.
It's a good question. I think from a user perspective it would be much more preferable that it works a bit more slowly but is able to discover as many of the targets as possible, vs being very fast but if it runs into any errors then leaving me with no discovered targets at all. Anyway, if we do determine there are performance issues due to database accesses and the associated latency, Quarkus/Hibernate/JPA has a lot of powerful caching capabilities: https://quarkus.io/guides/hibernate-orm#caching so it seems like some judicious application of the |
This means that if there are many worker threads invoked by the informer, all those threads will get blocked waiting for the lock to perform their database transaction, which will be relatively slow because they're waiting for database I/O. I suggest another approach: have the callback (invoked by worker threads?) just add the event to a blocking queue for processing, and have a dedicated thread that takes items from the queue as they become available and processes them in a transaction. This way there is only one thread that is spending a lot of its time waiting for database I/O before moving on to process the next event and waiting for the database again, and the other threads are free to do other work. It would be a similar pattern to how I wrote the WebSocket MessagingServer: "event" (notification to be sent): https://github.com/cryostatio/cryostat3/blob/12b97bf7746a72a965dc168c23c4fae69ec19953/src/main/java/io/cryostat/ws/MessagingServer.java#L144 task queue processor: https://github.com/cryostatio/cryostat3/blob/12b97bf7746a72a965dc168c23c4fae69ec19953/src/main/java/io/cryostat/ws/MessagingServer.java#L95 It is probably also possible to achieve this using the EventBus alone without an explicit ExecutorService, with something like this pseudocode:
|
Sounds good to me! Just need to look into it a little deeper so I am marking this draft for now. I suppose we also want to apply these changes to kubeAPI also? |
I think that makes sense. |
1f58984
to
6d7594e
Compare
/build_test |
Workflow started at 5/2/2024, 12:57:34 AM. View Actions Run. |
No OpenAPI schema changes detected. |
No GraphQL schema changes detected. |
CI build and push: All tests pass ✅ (JDK17) |
Unfortunately, for this case, we still need the
Phew, took me a while to realize that the I think this should be working now? The changes are such that:
But, seems like when using |
Another thing I notice with this eventBus approach is that the same message might be emitted multiple times. I think its because |
/build_test |
Workflow started at 5/3/2024, 12:26:09 PM. View Actions Run. |
No OpenAPI schema changes detected. |
No GraphQL schema changes detected. |
CI build and push: All tests pass ✅ (JDK17) |
/build_test |
Workflow started at 5/3/2024, 2:37:27 PM. View Actions Run. |
No GraphQL schema changes detected. |
No OpenAPI schema changes detected. |
CI build and push: All tests pass ✅ (JDK17) |
Welcome to Cryostat3! 👋
Before contributing, make sure you have:
main
branch[chore, ci, docs, feat, fix, test]
To recreate commits with GPG signature
git fetch upstream && git rebase --force --gpg-sign upstream/main
Related to #420
Description of the change:
Motivation for the change:
See #420 (comment). This fixes the problem when cryostat is restarted, observed targets will cause duplicate key violation and removed targets remain in database (stale).
How to manually test