HMC 3.12.12 keeps dying with OOM Java Heap Space after some time when Hazelcast works with large number of topics #44

spliakos · 2020-10-14T12:00:51Z

kctl describe pod infra-hmc-57f77bd495-czr54
Name: infra-hmc-57f77bd495-czr54
Namespace: default
Priority: 0
Node:
Start Time: Wed, 14 Oct 2020 09:21:35 +0200
Labels: appName=hmc
pod-template-hash=57f77bd495
version=3.12.12
Annotations: kubectl.kubernetes.io/restartedAt: 2020-10-12T16:00:29+02:00
kubernetes.io/psp: restricted
seccomp.security.alpha.kubernetes.io/pod: runtime/default
Status: Running
IP:
IPs:
IP:
Controlled By: ReplicaSet/infra-hmc-57f77bd495
Containers:
hmc:
Container ID: docker://0a3a41bde2236a357c305485fbdf30811d395917b388b8fd2235c2171574cef7
Image: hazelcast/management-center:3.12.12
Image ID: docker-pullable://hazelcast/management-center@sha256:bebce8775ec86718a7a4adef330254b63fd8c94d3becbeca34038b9b17341712
Ports: 8080/TCP, 8081/TCP
Host Ports: 0/TCP, 0/TCP
State: Running
Started: Wed, 14 Oct 2020 13:31:52 +0200
Last State: Terminated
Reason: Error
Exit Code: 3
Started: Wed, 14 Oct 2020 13:14:17 +0200
Finished: Wed, 14 Oct 2020 13:31:50 +0200
Ready: True
Restart Count: 14
Requests:
memory: 4Gi
Environment:
JAVA_OPTS: -Dhazelcast.mc.healthCheck.enable=true -Dhazelcast.mc.allowMultipleLogin=true -XX:+ExitOnOutOfMemoryError
MC_ADMIN_USER:
MC_ADMIN_PASSWORD:
CONTAINER_SUPPORT: false
MIN_HEAP_SIZE: 1024m
MAX_HEAP_SIZE: 4096m
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-4glt8 (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
default-token-4glt8:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-4glt8
Optional: false
QoS Class: Burstable
Node-Selectors:
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message

Normal Pulling 15m (x15 over 4h25m) kubelet Pulling image "hazelcast/management-center:3.12.12"
Normal Pulled 15m (x15 over 4h25m) kubelet Successfully pulled image "hazelcast/management-center:3.12.12"
Normal Created 15m (x15 over 4h25m) kubelet Created container hmc
Normal Started 15m (x15 over 4h25m) kubelet Started container hmc

Logs:
kctl logs -f infra-hmc-57f77bd495-czr54
########################################

JAVA_OPTS=-Dhazelcast.mancenter.home=/data -Djava.net.preferIPv4Stack=true -Dhazelcast.mc.healthCheck.enable=true -Dhazelcast.mc.allowMultipleLogin=true -XX:+ExitOnOutOfMemoryError -Xms1024m -Xmx4096m

MC_CLASSPATH=/opt/hazelcast/mancenter/hazelcast-mancenter-3.12.12.war

starting now....

########################################

exec java --add-opens java.base/java.lang=ALL-UNNAMED -server -Dhazelcast.mancenter.home=/data -Djava.net.preferIPv4Stack=true -Dhazelcast.mc.healthCheck.enable=true -Dhazelcast.mc.allowMultipleLogin=true -XX:+ExitOnOutOfMemoryError -Xms1024m -Xmx4096m -cp /opt/hazelcast/mancenter/hazelcast-mancenter-3.12.12.war Launcher 8080 8443 hazelcast-mancenter
2020-10-14 11:31:52 [main] INFO c.h.webmonitor.config.BuildInfo - Management Center 3.12.12
2020-10-14 11:31:52 [main] INFO Launcher - Health check is enabled and available at http://localhost:8081/hazelcast-mancenter/health
2020-10-14 11:31:56 [main] INFO c.h.w.storage.DiskUsageMonitor - Monitoring /data [mode=purge, interval=1000ms, limit=512 MB]
2020-10-14 11:31:56 [main] INFO c.h.webmonitor.config.SqlDbConfig - Checking DB for required migrations.
2020-10-14 11:31:56 [main] INFO c.h.webmonitor.config.SqlDbConfig - Number of applied DB migrations: 2.
2020-10-14 11:31:56 [main] INFO c.h.w.s.s.impl.DisableLoginStrategy - Login will be disabled for 5 seconds after 3 failed login attempts. For every 3 consecutive failed login attempts, disable period will be multiplied by 10.
2020-10-14 11:31:57 [main] INFO c.h.webmonitor.config.AppConfig - Creating cache with maxSize=768
2020-10-14 11:31:58 [main] INFO Launcher - Hazelcast Management Center successfully started at http://localhost:8080/hazelcast-mancenter
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by com.google.gson.internal.ConstructorConstructor (file:/opt/hazelcast/mancenter/hazelcast-mancenter-3.12.12.war) to constructor java.util.Collections$EmptyMap()
WARNING: Please consider reporting this to the maintainers of com.google.gson.internal.ConstructorConstructor
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
Terminating due to java.lang.OutOfMemoryError: Java heap space

3 Hazelcast nodes, 8 maps, ~85000 topics
In other environments when we have smaller number of topics, HMC seems to be working and not crushing. But when number of topics reaches 10k+ then we have the same situation over and over.

erosb · 2020-10-14T13:58:50Z

Hello,

I suggest adjusting the value of the hazelcast.mc.cache.max.size system property to a value lower than the default 768. It limits the number of timestamped cluster states stored in-memory. I can't advise about the exact setting, because it is a matter of cluster size & also we never stress-tested it for a high number of topics, but we have some reference data for a lot of maps as a starting point. Topic states are expected to take much less space than map stats though.

emre-aydin · 2020-10-15T05:49:58Z

@spliakos it might make sense to disable statistics for some of your topics to not flood Management Center with all their metrics. Note that you can also use regular expressions to apply the same config to more than one topic, or even change the default config but applying specialized config to the ones you like.

spliakos · 2020-10-15T09:39:07Z

Hey @erosb, I thought about changing the cache size, but according to HC..
"It is not recommended to change the cache size unless the cluster has a large number of maps which may cause Management Center to run out of heap memory. Setting too low a value for hazelcast.mc.cache.max.size can be detrimental to the level of detail shown within Management Center, especially when it comes to graphs."
We only have 8 maps, but a lot of topics. I will try it however and see how it goes.

@emre-aydin: This actually makes sense, we will try this and update :)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HMC 3.12.12 keeps dying with OOM Java Heap Space after some time when Hazelcast works with large number of topics #44

HMC 3.12.12 keeps dying with OOM Java Heap Space after some time when Hazelcast works with large number of topics #44

spliakos commented Oct 14, 2020

erosb commented Oct 14, 2020

emre-aydin commented Oct 15, 2020

spliakos commented Oct 15, 2020

HMC 3.12.12 keeps dying with OOM Java Heap Space after some time when Hazelcast works with large number of topics #44

HMC 3.12.12 keeps dying with OOM Java Heap Space after some time when Hazelcast works with large number of topics #44

Comments

spliakos commented Oct 14, 2020

JAVA_OPTS=-Dhazelcast.mancenter.home=/data -Djava.net.preferIPv4Stack=true -Dhazelcast.mc.healthCheck.enable=true -Dhazelcast.mc.allowMultipleLogin=true -XX:+ExitOnOutOfMemoryError -Xms1024m -Xmx4096m

MC_CLASSPATH=/opt/hazelcast/mancenter/hazelcast-mancenter-3.12.12.war

starting now....

erosb commented Oct 14, 2020

emre-aydin commented Oct 15, 2020

spliakos commented Oct 15, 2020