-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(csv): enable AllNamespaces install mode #514
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great to me! Worked as expected :D Just some comments with tests.
I also noticed the issue that Cryostat was missing some targets. But by looking at the logs, Cryostat seems to detect new target but no notification was sent to web-console (using make sample_app
).
I also observe an odd pattern: If I deploy a sample app (e.g. vertx), Cryostat might not send notifications. But if I deploy another one (e.g. quarkus), notifications are sent correctly and both targets are shown correctly...
INFO: Observing new target: io.cryostat.platform.ServiceRef@2ccba08d[alias=vertx-fib-demo-97487cc5-56zpx,annotations=io.cryostat.platform.ServiceRef$Annotations@77912c1c[cryostat={HOST=10.128.2.48, PORT=9093, NAMESPACE=cryostat-test2, POD_NAME=vertx-fib-demo-97487cc5-56zpx, REALM=KubernetesApi},platform={k8s.v1.cni.cncf.io/networks-status=[{
"name": "openshift-sdn",
"interface": "eth0",
"ips": [
"10.128.2.48"
],
"default": true,
"dns": {}
}], openshift.io/generated-by=OpenShiftNewApp, openshift.io/scc=restricted-v2, seccomp.security.alpha.kubernetes.io/pod=runtime/default, k8s.v1.cni.cncf.io/network-status=[{
"name": "openshift-sdn",
"interface": "eth0",
"ips": [
"10.128.2.48"
],
"default": true,
"dns": {}
}]}],jvmId=<null>,labels={pod-template-hash=97487cc5, deployment=vertx-fib-demo},serviceUri=service:jmx:rmi:///jndi/rmi://10.128.2.48:9093/jmxrmi]
Jan 16, 2023 4:02:10 AM io.cryostat.core.log.Logger info
INFO: Creating connection for service:jmx:rmi:///jndi/rmi://10.128.2.48:9093/jmxrmi
Jan 16, 2023 4:02:10 AM io.cryostat.core.log.Logger info
INFO: connection attempt failed.
Jan 16, 2023 4:02:10 AM io.cryostat.core.log.Logger info
INFO: Connection for service:jmx:rmi:///jndi/rmi://10.128.2.48:9093/jmxrmi closed
Jan 16, 2023 4:02:10 AM io.cryostat.core.log.Logger error
SEVERE: Exception thrown
org.openjdk.jmc.rjmx.ConnectionException caused by java.io.IOException: Failed to retrieve RMIServer stub: javax.naming.ServiceUnavailableException [Root exception is java.rmi.ConnectException: Connection refused to host: 10.128.2.48; nested exception is:
java.net.ConnectException: Connection refused]
at org.openjdk.jmc.rjmx.internal.RJMXConnection.connect(RJMXConnection.java:441)
at io.cryostat.core.net.JFRJMXConnection.attemptConnect(JFRJMXConnection.java:314)
at io.cryostat.core.net.JFRJMXConnection.connect(JFRJMXConnection.java:270)
at io.cryostat.core.net.JFRJMXConnection.getJvmId(JFRJMXConnection.java:176)
at io.cryostat.net.TargetConnectionManager.lambda$executeConnectedTaskAsync$2(TargetConnectionManager.java:153)
at java.base/java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:646)
at java.base/java.util.concurrent.CompletableFuture$Completion.exec(CompletableFuture.java:483)
at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:373)
at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1182)
at java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1655)
at java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1622)
at java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:165)
Caused by: java.io.IOException: Failed to retrieve RMIServer stub: javax.naming.ServiceUnavailableException [Root exception is java.rmi.ConnectException: Connection refused to host: 10.128.2.48; nested exception is:
java.net.ConnectException: Connection refused]
at java.management.rmi/javax.management.remote.rmi.RMIConnector.connect(RMIConnector.java:370)
at org.openjdk.jmc.rjmx.internal.RJMXConnection.connectJmxConnector(RJMXConnection.java:487)
at org.openjdk.jmc.rjmx.internal.RJMXConnection.establishConnection(RJMXConnection.java:464)
at org.openjdk.jmc.rjmx.internal.RJMXConnection.connect(RJMXConnection.java:434)
... 11 more
Caused by: javax.naming.ServiceUnavailableException [Root exception is java.rmi.ConnectException: Connection refused to host: 10.128.2.48; nested exception is:
java.net.ConnectException: Connection refused]
at jdk.naming.rmi/com.sun.jndi.rmi.registry.RegistryContext.lookup(RegistryContext.java:138)
at java.naming/com.sun.jndi.toolkit.url.GenericURLContext.lookup(GenericURLContext.java:220)
at java.naming/javax.naming.InitialContext.lookup(InitialContext.java:409)
at java.management.rmi/javax.management.remote.rmi.RMIConnector.findRMIServerJNDI(RMIConnector.java:1839)
at java.management.rmi/javax.management.remote.rmi.RMIConnector.findRMIServer(RMIConnector.java:1813)
at java.management.rmi/javax.management.remote.rmi.RMIConnector.connect(RMIConnector.java:302)
... 14 more
Caused by: java.rmi.ConnectException: Connection refused to host: 10.128.2.48; nested exception is:
java.net.ConnectException: Connection refused
at java.rmi/sun.rmi.transport.tcp.TCPEndpoint.newSocket(TCPEndpoint.java:626)
at java.rmi/sun.rmi.transport.tcp.TCPChannel.createConnection(TCPChannel.java:217)
at java.rmi/sun.rmi.transport.tcp.TCPChannel.newConnection(TCPChannel.java:204)
at java.rmi/sun.rmi.server.UnicastRef.newCall(UnicastRef.java:344)
at java.rmi/sun.rmi.registry.RegistryImpl_Stub.lookup(RegistryImpl_Stub.java:116)
at jdk.naming.rmi/com.sun.jndi.rmi.registry.RegistryContext.lookup(RegistryContext.java:134)
... 19 more
Caused by: java.net.ConnectException: Connection refused
at java.base/sun.nio.ch.Net.connect0(Native Method)
at java.base/sun.nio.ch.Net.connect(Net.java:579)
at java.base/sun.nio.ch.Net.connect(Net.java:568)
at java.base/sun.nio.ch.NioSocketImpl.connect(NioSocketImpl.java:588)
at java.base/java.net.SocksSocketImpl.connect(SocksSocketImpl.java:327)
at java.base/java.net.Socket.connect(Socket.java:633)
at java.base/java.net.Socket.connect(Socket.java:583)
at java.base/java.net.Socket.<init>(Socket.java:507)
at java.base/java.net.Socket.<init>(Socket.java:287)
at java.rmi/sun.rmi.transport.tcp.TCPDirectSocketFactory.createSocket(TCPDirectSocketFactory.java:40)
at java.rmi/sun.rmi.transport.tcp.TCPEndpoint.newSocket(TCPEndpoint.java:620)
... 24 more
I wonder if that could be a timing issue. Cryostat is using an Informer to get push notiifcations from the k8s API server when Endpoints objects appear, which the discovery mechanism uses to correlate to target JVMs. When a target appears (from any discovery plugin) Cryostat tries to connect to it immediately in order to verify that it's really there/reachable and to collect its JVM hash ID. Maybe k8s is informing Cryostat about the Endpoints object appearing sooner than the sample app JVM is actually ready to accept incoming JMX connections. Later, when you add a second one, Cryostat discovers the new one as well as re-discovers the old one, both of which it has no JVM hash ID for, so it tries to connect and this time succeeds? I wonder if this could be tested by setting up liveness/readiness probes on the sample apps. Maybe k8s won't create and advertise the corresponding Endpoints objects until the container is actually ready to accept connections in that case. If the vertx/quarkus webserver is up and responding to probes then presumably the JVM is already ready to accept JMX connections as well. |
Ahh seems to be the problem...This issue wasn't reliably reproduced so that it makes sense. So, this means jvms must set up/customize their readiness probes? Or is there anything Cryostat can do (maybe some fallback to connect again later?) |
It's good practice for containers to have liveness and readiness probes set up regardless, so I think in documentation where we explain to users how to set up their applications for Cryostat compatibility in k8s we should just emphasize the point that the probes should be set up in order to make sure they don't fall into this timing problem. This way it's handled by standard platform mechanisms and doesn't add extra complexity to the Cryostat application layer, particularly because this fault would be specific to only some discovery mechanisms. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great to me! I think target-missing issue is not related to this PR then.
The implementation change is just setting the install mode to
true
. I've updated the tests to create two Cryostat instances in different namespaces and check that the proper resources are created for each.I created a catalog image to test this change. You can use it by creating the following catalog source.
The all namespaces install mode should be set as default:
Try creating Cryostat CRs in different namespaces:
I noticed some difficulty with Cryostat missing targets. I'm not sure if that's related to this PR or not.
Fixes: #502