Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(csv): enable AllNamespaces install mode #514

Merged
merged 1 commit into from
Jan 16, 2023

Conversation

ebaron
Copy link
Member

@ebaron ebaron commented Jan 13, 2023

The implementation change is just setting the install mode to true. I've updated the tests to create two Cryostat instances in different namespaces and check that the proper resources are created for each.

I created a catalog image to test this change. You can use it by creating the following catalog source.

apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
  name: cryostat-all-ns
  namespace: openshift-marketplace
spec:
  sourceType: grpc
  image: quay.io/ebaron/cryostat-operator-catalog:all-ns-01
  displayName: Cryostat Test
  publisher: grpc

The all namespaces install mode should be set as default:
Screenshot 2023-01-13 at 18-06-42 Operator Installation · Red Hat OpenShift

Try creating Cryostat CRs in different namespaces:
Screenshot 2023-01-13 at 18-05-53 cryostat-operator v2 3 0-dev · Details · Red Hat OpenShift

I noticed some difficulty with Cryostat missing targets. I'm not sure if that's related to this PR or not.

Fixes: #502

@ebaron ebaron added the feat New feature or request label Jan 13, 2023
@ebaron ebaron requested review from andrewazores and tthvo January 13, 2023 23:10
Copy link
Member

@tthvo tthvo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great to me! Worked as expected :D Just some comments with tests.

I also noticed the issue that Cryostat was missing some targets. But by looking at the logs, Cryostat seems to detect new target but no notification was sent to web-console (using make sample_app).

I also observe an odd pattern: If I deploy a sample app (e.g. vertx), Cryostat might not send notifications. But if I deploy another one (e.g. quarkus), notifications are sent correctly and both targets are shown correctly...

INFO: Observing new target: io.cryostat.platform.ServiceRef@2ccba08d[alias=vertx-fib-demo-97487cc5-56zpx,annotations=io.cryostat.platform.ServiceRef$Annotations@77912c1c[cryostat={HOST=10.128.2.48, PORT=9093, NAMESPACE=cryostat-test2, POD_NAME=vertx-fib-demo-97487cc5-56zpx, REALM=KubernetesApi},platform={k8s.v1.cni.cncf.io/networks-status=[{
    "name": "openshift-sdn",
    "interface": "eth0",
    "ips": [
        "10.128.2.48"
    ],
    "default": true,
    "dns": {}
}], openshift.io/generated-by=OpenShiftNewApp, openshift.io/scc=restricted-v2, seccomp.security.alpha.kubernetes.io/pod=runtime/default, k8s.v1.cni.cncf.io/network-status=[{
    "name": "openshift-sdn",
    "interface": "eth0",
    "ips": [
        "10.128.2.48"
    ],
    "default": true,
    "dns": {}
}]}],jvmId=<null>,labels={pod-template-hash=97487cc5, deployment=vertx-fib-demo},serviceUri=service:jmx:rmi:///jndi/rmi://10.128.2.48:9093/jmxrmi]
Jan 16, 2023 4:02:10 AM io.cryostat.core.log.Logger info
INFO: Creating connection for service:jmx:rmi:///jndi/rmi://10.128.2.48:9093/jmxrmi
Jan 16, 2023 4:02:10 AM io.cryostat.core.log.Logger info
INFO: connection attempt failed.

Jan 16, 2023 4:02:10 AM io.cryostat.core.log.Logger info
INFO: Connection for service:jmx:rmi:///jndi/rmi://10.128.2.48:9093/jmxrmi closed
Jan 16, 2023 4:02:10 AM io.cryostat.core.log.Logger error
SEVERE: Exception thrown
org.openjdk.jmc.rjmx.ConnectionException caused by java.io.IOException: Failed to retrieve RMIServer stub: javax.naming.ServiceUnavailableException [Root exception is java.rmi.ConnectException: Connection refused to host: 10.128.2.48; nested exception is: 
	java.net.ConnectException: Connection refused]
	at org.openjdk.jmc.rjmx.internal.RJMXConnection.connect(RJMXConnection.java:441)
	at io.cryostat.core.net.JFRJMXConnection.attemptConnect(JFRJMXConnection.java:314)
	at io.cryostat.core.net.JFRJMXConnection.connect(JFRJMXConnection.java:270)
	at io.cryostat.core.net.JFRJMXConnection.getJvmId(JFRJMXConnection.java:176)
	at io.cryostat.net.TargetConnectionManager.lambda$executeConnectedTaskAsync$2(TargetConnectionManager.java:153)
	at java.base/java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:646)
	at java.base/java.util.concurrent.CompletableFuture$Completion.exec(CompletableFuture.java:483)
	at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:373)
	at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1182)
	at java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1655)
	at java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1622)
	at java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:165)
Caused by: java.io.IOException: Failed to retrieve RMIServer stub: javax.naming.ServiceUnavailableException [Root exception is java.rmi.ConnectException: Connection refused to host: 10.128.2.48; nested exception is: 
	java.net.ConnectException: Connection refused]
	at java.management.rmi/javax.management.remote.rmi.RMIConnector.connect(RMIConnector.java:370)
	at org.openjdk.jmc.rjmx.internal.RJMXConnection.connectJmxConnector(RJMXConnection.java:487)
	at org.openjdk.jmc.rjmx.internal.RJMXConnection.establishConnection(RJMXConnection.java:464)
	at org.openjdk.jmc.rjmx.internal.RJMXConnection.connect(RJMXConnection.java:434)
	... 11 more
Caused by: javax.naming.ServiceUnavailableException [Root exception is java.rmi.ConnectException: Connection refused to host: 10.128.2.48; nested exception is: 
	java.net.ConnectException: Connection refused]
	at jdk.naming.rmi/com.sun.jndi.rmi.registry.RegistryContext.lookup(RegistryContext.java:138)
	at java.naming/com.sun.jndi.toolkit.url.GenericURLContext.lookup(GenericURLContext.java:220)
	at java.naming/javax.naming.InitialContext.lookup(InitialContext.java:409)
	at java.management.rmi/javax.management.remote.rmi.RMIConnector.findRMIServerJNDI(RMIConnector.java:1839)
	at java.management.rmi/javax.management.remote.rmi.RMIConnector.findRMIServer(RMIConnector.java:1813)
	at java.management.rmi/javax.management.remote.rmi.RMIConnector.connect(RMIConnector.java:302)
	... 14 more
Caused by: java.rmi.ConnectException: Connection refused to host: 10.128.2.48; nested exception is: 
	java.net.ConnectException: Connection refused
	at java.rmi/sun.rmi.transport.tcp.TCPEndpoint.newSocket(TCPEndpoint.java:626)
	at java.rmi/sun.rmi.transport.tcp.TCPChannel.createConnection(TCPChannel.java:217)
	at java.rmi/sun.rmi.transport.tcp.TCPChannel.newConnection(TCPChannel.java:204)
	at java.rmi/sun.rmi.server.UnicastRef.newCall(UnicastRef.java:344)
	at java.rmi/sun.rmi.registry.RegistryImpl_Stub.lookup(RegistryImpl_Stub.java:116)
	at jdk.naming.rmi/com.sun.jndi.rmi.registry.RegistryContext.lookup(RegistryContext.java:134)
	... 19 more
Caused by: java.net.ConnectException: Connection refused
	at java.base/sun.nio.ch.Net.connect0(Native Method)
	at java.base/sun.nio.ch.Net.connect(Net.java:579)
	at java.base/sun.nio.ch.Net.connect(Net.java:568)
	at java.base/sun.nio.ch.NioSocketImpl.connect(NioSocketImpl.java:588)
	at java.base/java.net.SocksSocketImpl.connect(SocksSocketImpl.java:327)
	at java.base/java.net.Socket.connect(Socket.java:633)
	at java.base/java.net.Socket.connect(Socket.java:583)
	at java.base/java.net.Socket.<init>(Socket.java:507)
	at java.base/java.net.Socket.<init>(Socket.java:287)
	at java.rmi/sun.rmi.transport.tcp.TCPDirectSocketFactory.createSocket(TCPDirectSocketFactory.java:40)
	at java.rmi/sun.rmi.transport.tcp.TCPEndpoint.newSocket(TCPEndpoint.java:620)
	... 24 more

@andrewazores
Copy link
Member

I wonder if that could be a timing issue. Cryostat is using an Informer to get push notiifcations from the k8s API server when Endpoints objects appear, which the discovery mechanism uses to correlate to target JVMs. When a target appears (from any discovery plugin) Cryostat tries to connect to it immediately in order to verify that it's really there/reachable and to collect its JVM hash ID. Maybe k8s is informing Cryostat about the Endpoints object appearing sooner than the sample app JVM is actually ready to accept incoming JMX connections. Later, when you add a second one, Cryostat discovers the new one as well as re-discovers the old one, both of which it has no JVM hash ID for, so it tries to connect and this time succeeds?

I wonder if this could be tested by setting up liveness/readiness probes on the sample apps. Maybe k8s won't create and advertise the corresponding Endpoints objects until the container is actually ready to accept connections in that case. If the vertx/quarkus webserver is up and responding to probes then presumably the JVM is already ready to accept JMX connections as well.

@tthvo
Copy link
Member

tthvo commented Jan 16, 2023

Ahh seems to be the problem...This issue wasn't reliably reproduced so that it makes sense. So, this means jvms must set up/customize their readiness probes? Or is there anything Cryostat can do (maybe some fallback to connect again later?)

@andrewazores
Copy link
Member

It's good practice for containers to have liveness and readiness probes set up regardless, so I think in documentation where we explain to users how to set up their applications for Cryostat compatibility in k8s we should just emphasize the point that the probes should be set up in order to make sure they don't fall into this timing problem. This way it's handled by standard platform mechanisms and doesn't add extra complexity to the Cryostat application layer, particularly because this fault would be specific to only some discovery mechanisms.

Copy link
Member

@tthvo tthvo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great to me! I think target-missing issue is not related to this PR then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feat New feature or request
Projects
No open projects
Status: Done
Development

Successfully merging this pull request may close these issues.

Support AllNamespaces install mode
3 participants