Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HMC 3.12.12 keeps dying with OOM Java Heap Space after some time when Hazelcast works with large number of topics #44

Open
spliakos opened this issue Oct 14, 2020 · 3 comments

Comments

@spliakos
Copy link

kctl describe pod infra-hmc-57f77bd495-czr54
Name: infra-hmc-57f77bd495-czr54
Namespace: default
Priority: 0
Node:
Start Time: Wed, 14 Oct 2020 09:21:35 +0200
Labels: appName=hmc
pod-template-hash=57f77bd495
version=3.12.12
Annotations: kubectl.kubernetes.io/restartedAt: 2020-10-12T16:00:29+02:00
kubernetes.io/psp: restricted
seccomp.security.alpha.kubernetes.io/pod: runtime/default
Status: Running
IP:
IPs:
IP:
Controlled By: ReplicaSet/infra-hmc-57f77bd495
Containers:
hmc:
Container ID: docker://0a3a41bde2236a357c305485fbdf30811d395917b388b8fd2235c2171574cef7
Image: hazelcast/management-center:3.12.12
Image ID: docker-pullable://hazelcast/management-center@sha256:bebce8775ec86718a7a4adef330254b63fd8c94d3becbeca34038b9b17341712
Ports: 8080/TCP, 8081/TCP
Host Ports: 0/TCP, 0/TCP
State: Running
Started: Wed, 14 Oct 2020 13:31:52 +0200
Last State: Terminated
Reason: Error
Exit Code: 3
Started: Wed, 14 Oct 2020 13:14:17 +0200
Finished: Wed, 14 Oct 2020 13:31:50 +0200
Ready: True
Restart Count: 14
Requests:
memory: 4Gi
Environment:
JAVA_OPTS: -Dhazelcast.mc.healthCheck.enable=true -Dhazelcast.mc.allowMultipleLogin=true -XX:+ExitOnOutOfMemoryError
MC_ADMIN_USER:
MC_ADMIN_PASSWORD:
CONTAINER_SUPPORT: false
MIN_HEAP_SIZE: 1024m
MAX_HEAP_SIZE: 4096m
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-4glt8 (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
default-token-4glt8:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-4glt8
Optional: false
QoS Class: Burstable
Node-Selectors:
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message


Normal Pulling 15m (x15 over 4h25m) kubelet Pulling image "hazelcast/management-center:3.12.12"
Normal Pulled 15m (x15 over 4h25m) kubelet Successfully pulled image "hazelcast/management-center:3.12.12"
Normal Created 15m (x15 over 4h25m) kubelet Created container hmc
Normal Started 15m (x15 over 4h25m) kubelet Started container hmc

Logs:
kctl logs -f infra-hmc-57f77bd495-czr54
########################################

JAVA_OPTS=-Dhazelcast.mancenter.home=/data -Djava.net.preferIPv4Stack=true -Dhazelcast.mc.healthCheck.enable=true -Dhazelcast.mc.allowMultipleLogin=true -XX:+ExitOnOutOfMemoryError -Xms1024m -Xmx4096m

MC_CLASSPATH=/opt/hazelcast/mancenter/hazelcast-mancenter-3.12.12.war

starting now....

########################################

  • exec java --add-opens java.base/java.lang=ALL-UNNAMED -server -Dhazelcast.mancenter.home=/data -Djava.net.preferIPv4Stack=true -Dhazelcast.mc.healthCheck.enable=true -Dhazelcast.mc.allowMultipleLogin=true -XX:+ExitOnOutOfMemoryError -Xms1024m -Xmx4096m -cp /opt/hazelcast/mancenter/hazelcast-mancenter-3.12.12.war Launcher 8080 8443 hazelcast-mancenter
    2020-10-14 11:31:52 [main] INFO c.h.webmonitor.config.BuildInfo - Management Center 3.12.12
    2020-10-14 11:31:52 [main] INFO Launcher - Health check is enabled and available at http://localhost:8081/hazelcast-mancenter/health
    2020-10-14 11:31:56 [main] INFO c.h.w.storage.DiskUsageMonitor - Monitoring /data [mode=purge, interval=1000ms, limit=512 MB]
    2020-10-14 11:31:56 [main] INFO c.h.webmonitor.config.SqlDbConfig - Checking DB for required migrations.
    2020-10-14 11:31:56 [main] INFO c.h.webmonitor.config.SqlDbConfig - Number of applied DB migrations: 2.
    2020-10-14 11:31:56 [main] INFO c.h.w.s.s.impl.DisableLoginStrategy - Login will be disabled for 5 seconds after 3 failed login attempts. For every 3 consecutive failed login attempts, disable period will be multiplied by 10.
    2020-10-14 11:31:57 [main] INFO c.h.webmonitor.config.AppConfig - Creating cache with maxSize=768
    2020-10-14 11:31:58 [main] INFO Launcher - Hazelcast Management Center successfully started at http://localhost:8080/hazelcast-mancenter
    WARNING: An illegal reflective access operation has occurred
    WARNING: Illegal reflective access by com.google.gson.internal.ConstructorConstructor (file:/opt/hazelcast/mancenter/hazelcast-mancenter-3.12.12.war) to constructor java.util.Collections$EmptyMap()
    WARNING: Please consider reporting this to the maintainers of com.google.gson.internal.ConstructorConstructor
    WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
    WARNING: All illegal access operations will be denied in a future release
    Terminating due to java.lang.OutOfMemoryError: Java heap space

3 Hazelcast nodes, 8 maps, ~85000 topics
In other environments when we have smaller number of topics, HMC seems to be working and not crushing. But when number of topics reaches 10k+ then we have the same situation over and over.

@erosb
Copy link
Contributor

erosb commented Oct 14, 2020

Hello,

I suggest adjusting the value of the hazelcast.mc.cache.max.size system property to a value lower than the default 768. It limits the number of timestamped cluster states stored in-memory. I can't advise about the exact setting, because it is a matter of cluster size & also we never stress-tested it for a high number of topics, but we have some reference data for a lot of maps as a starting point. Topic states are expected to take much less space than map stats though.

@emre-aydin
Copy link
Contributor

@spliakos it might make sense to disable statistics for some of your topics to not flood Management Center with all their metrics. Note that you can also use regular expressions to apply the same config to more than one topic, or even change the default config but applying specialized config to the ones you like.

@spliakos
Copy link
Author

Hey @erosb, I thought about changing the cache size, but according to HC..
"It is not recommended to change the cache size unless the cluster has a large number of maps which may cause Management Center to run out of heap memory. Setting too low a value for hazelcast.mc.cache.max.size can be detrimental to the level of detail shown within Management Center, especially when it comes to graphs."
We only have 8 maps, but a lot of topics. I will try it however and see how it goes.

@emre-aydin: This actually makes sense, we will try this and update :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants