Replies: 2 comments 2 replies
-
Hello and thanks for sharing this experience with us. How did you install kubernetes components on Flatcar vs Ubuntu? First thing that comes to my mind is that |
Beta Was this translation helpful? Give feedback.
-
I could find a difference in Flatcar vs Ubuntu (in my setups, would be great to get a confirmation):
See: https://www.kernel.org/doc/Documentation/vm/overcommit-accounting It might be that the Kafka/Java knows that the Ubuntu might not let it overcommit and its memory cleanup is less active? |
Beta Was this translation helpful? Give feedback.
-
We are in the process of establishing a CDC pipeline involving Kafka, Kafka Connect and a bunch of Kafka Streams applications on top of a (managed) Kubernetes 1.29 cluster with Flatcar operated nodes (version 3815.2.5).
For a long time we've had lots of issues with Kafka Streams applications running out of memory, eventually being OOMKilled when hitting the configured Pod limits. This happened even with very conservative JVM / off-heap settings with lots of headroom, yet somehow none of our configurations ever solved the issue. Memory consumption of the Pods kept slowly increasing (over days) until hitting the limits.
Now configuring Kafka Streams applications at times can be a tricky task as it involves JNI components (RocksDB) that are not managed by JVM and utilize their own configuration. But even with RocksDB off-heap memory strictly bounded and using an alternative malloc implementation (jemalloc) to avoid fragmentation, the issue persisted...
... Until just recently we changed the node OS to Ubuntu - and suddenly the applications seem to be run much more stable, we had no OOMKilled events since the change.
Now I wonder if there is anything in the underlying Flatcar architecture that might contribute to this situation, and whether this would be the place to track potential issues down. E.g. are there circumstances regarding memory management that might lead to incompatibilities with jemalloc (or tcmalloc as well) that might render those non-functional? In fact the behaviour might look like some kind of memory leak, yet there are no such (known) issues with neither Kafka Streams libraries nor RocksDB, and I tend to think this might be more an issue of memory fragmentation (?) not being taken care of sufficiently - even more because on an alternative OS the issues seems to be gone magically.
I'd be happy to provide additional information if needed, as otherwise we are perfectly satisfied with Flatcar as the driver of our worker nodes (and are even using it on other worker groups not related to Kafka workload).
Beta Was this translation helpful? Give feedback.
All reactions