-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[memory_monitoring] Enhance monitoring the memory usage of containers #19179
Conversation
…ic-buildimage into memory_checker
…ic-buildimage into memory_checker
…ic-buildimage into memory_checker
/azpw ms_conflict -f |
1 similar comment
/azpw ms_conflict -f |
/azpw ms_checker -f |
/azpw ms_checker |
/azpw ms_checker |
/azpw ms_checker |
/azpw ms_conflict |
…sonic-net#19179) ### Why I did it We need to restrict memory usage of container specifically, and the reliable option is to read cgroup subsystem files instead of using "docker stats" commands, since the commands will be no response if containers hits hard limit. ### How I did it Instead of depending on the output of docker stats, the background script memory_checker will calculate the memory usage of a container based on values reading from the cgroup subsystem files /sys/fs/cgroup/memory/docker/<container_name>/memory.usage_in_bytes and /sys/fs/cgroup/memory/docker/<container_name>/memory.stats. Refer to this Docker official document (https://docs.docker.com/engine/reference/commandline/stats/#extended-description) to make sure the memory usage of a specific container reading from command output of docker stats is equal to the value subtracting cache usage from the total memory usage. #### How to verify it Local verified, since it's just internal enhancement for getting memory usage of container, below are comparison between new memory_check and previous implementation based on "docker stats --no-stream --format {{.MemUsage}} telemetry" <img width="799" alt="image" src="https://github.com/sonic-net/sonic-buildimage/assets/97083744/3807fc7f-cfc2-4e2f-a078-eaf08b68f803"> Added Unit test code, since there's no build time UT available in this repo currently so verified manually as below: <img width="1121" alt="image" src="https://github.com/user-attachments/assets/2c7ce241-7967-41ee-a2e9-4bdb2e43f8c2">
…sonic-net#19179) We need to restrict memory usage of container specifically, and the reliable option is to read cgroup subsystem files instead of using "docker stats" commands, since the commands will be no response if containers hits hard limit. Instead of depending on the output of docker stats, the background script memory_checker will calculate the memory usage of a container based on values reading from the cgroup subsystem files /sys/fs/cgroup/memory/docker/<container_name>/memory.usage_in_bytes and /sys/fs/cgroup/memory/docker/<container_name>/memory.stats. Refer to this Docker official document (https://docs.docker.com/engine/reference/commandline/stats/#extended-description) to make sure the memory usage of a specific container reading from command output of docker stats is equal to the value subtracting cache usage from the total memory usage. Local verified, since it's just internal enhancement for getting memory usage of container, below are comparison between new memory_check and previous implementation based on "docker stats --no-stream --format {{.MemUsage}} telemetry" <img width="799" alt="image" src="https://github.com/sonic-net/sonic-buildimage/assets/97083744/3807fc7f-cfc2-4e2f-a078-eaf08b68f803"> Added Unit test code, since there's no build time UT available in this repo currently so verified manually as below: <img width="1121" alt="image" src="https://github.com/user-attachments/assets/2c7ce241-7967-41ee-a2e9-4bdb2e43f8c2">
…sonic-net#19179) ### Why I did it We need to restrict memory usage of container specifically, and the reliable option is to read cgroup subsystem files instead of using "docker stats" commands, since the commands will be no response if containers hits hard limit. ### How I did it Instead of depending on the output of docker stats, the background script memory_checker will calculate the memory usage of a container based on values reading from the cgroup subsystem files /sys/fs/cgroup/memory/docker/<container_name>/memory.usage_in_bytes and /sys/fs/cgroup/memory/docker/<container_name>/memory.stats. Refer to this Docker official document (https://docs.docker.com/engine/reference/commandline/stats/#extended-description) to make sure the memory usage of a specific container reading from command output of docker stats is equal to the value subtracting cache usage from the total memory usage. #### How to verify it Local verified, since it's just internal enhancement for getting memory usage of container, below are comparison between new memory_check and previous implementation based on "docker stats --no-stream --format {{.MemUsage}} telemetry" <img width="799" alt="image" src="https://github.com/sonic-net/sonic-buildimage/assets/97083744/3807fc7f-cfc2-4e2f-a078-eaf08b68f803"> Added Unit test code, since there's no build time UT available in this repo currently so verified manually as below: <img width="1121" alt="image" src="https://github.com/user-attachments/assets/2c7ce241-7967-41ee-a2e9-4bdb2e43f8c2">
Cherry-pick PR to 202311: #20234 |
…#19179) ### Why I did it We need to restrict memory usage of container specifically, and the reliable option is to read cgroup subsystem files instead of using "docker stats" commands, since the commands will be no response if containers hits hard limit. ### How I did it Instead of depending on the output of docker stats, the background script memory_checker will calculate the memory usage of a container based on values reading from the cgroup subsystem files /sys/fs/cgroup/memory/docker/<container_name>/memory.usage_in_bytes and /sys/fs/cgroup/memory/docker/<container_name>/memory.stats. Refer to this Docker official document (https://docs.docker.com/engine/reference/commandline/stats/#extended-description) to make sure the memory usage of a specific container reading from command output of docker stats is equal to the value subtracting cache usage from the total memory usage. #### How to verify it Local verified, since it's just internal enhancement for getting memory usage of container, below are comparison between new memory_check and previous implementation based on "docker stats --no-stream --format {{.MemUsage}} telemetry" <img width="799" alt="image" src="https://github.com/sonic-net/sonic-buildimage/assets/97083744/3807fc7f-cfc2-4e2f-a078-eaf08b68f803"> Added Unit test code, since there's no build time UT available in this repo currently so verified manually as below: <img width="1121" alt="image" src="https://github.com/user-attachments/assets/2c7ce241-7967-41ee-a2e9-4bdb2e43f8c2">
…sonic-net#19179) ### Why I did it We need to restrict memory usage of container specifically, and the reliable option is to read cgroup subsystem files instead of using "docker stats" commands, since the commands will be no response if containers hits hard limit. ### How I did it Instead of depending on the output of docker stats, the background script memory_checker will calculate the memory usage of a container based on values reading from the cgroup subsystem files /sys/fs/cgroup/memory/docker/<container_name>/memory.usage_in_bytes and /sys/fs/cgroup/memory/docker/<container_name>/memory.stats. Refer to this Docker official document (https://docs.docker.com/engine/reference/commandline/stats/#extended-description) to make sure the memory usage of a specific container reading from command output of docker stats is equal to the value subtracting cache usage from the total memory usage. #### How to verify it Local verified, since it's just internal enhancement for getting memory usage of container, below are comparison between new memory_check and previous implementation based on "docker stats --no-stream --format {{.MemUsage}} telemetry" <img width="799" alt="image" src="https://github.com/sonic-net/sonic-buildimage/assets/97083744/3807fc7f-cfc2-4e2f-a078-eaf08b68f803"> Added Unit test code, since there's no build time UT available in this repo currently so verified manually as below: <img width="1121" alt="image" src="https://github.com/user-attachments/assets/2c7ce241-7967-41ee-a2e9-4bdb2e43f8c2">
Hi @FengPan-Frank, publish_events is now deleted in this PR, which is not expected. |
@zbud-msft sorry seems this publish_events was missed previously, may I ask which test case failed? so that I can further check it if update this part code later. |
### Why I did it #19179 removed call to publish_events when memory usage container exceeds threshold, causing test_events to fail. ### How I did it Add back call to publish_events #### How to verify it Manual test
Why I did it
We need to restrict memory usage of container specifically, and the reliable option is to read cgroup subsystem files instead of using "docker stats" commands, since the commands will be no response if containers hits hard limit.
Work item tracking
How I did it
Instead of depending on the output of docker stats, the background script memory_checker will calculate the memory usage of a container based on values reading from the cgroup subsystem files /sys/fs/cgroup/memory/docker/<container_name>/memory.usage_in_bytes and /sys/fs/cgroup/memory/docker/<container_name>/memory.stats.
Refer to this Docker official document (https://docs.docker.com/engine/reference/commandline/stats/#extended-description) to make sure the memory usage of a specific container reading from command output of docker stats is equal to the value subtracting cache usage from the total memory usage.
How to verify it
Local verified, since it's just internal enhancement for getting memory usage of container, below are comparison between new memory_check and previous implementation based on "docker stats --no-stream --format {{.MemUsage}} telemetry"
Added Unit test code, since there's no build time UT available in this repo currently so verified manually as below:
Which release branch to backport (provide reason below if selected)
Tested branch (Please provide the tested image version)
Description for the changelog
Link to config_db schema for YANG module changes
A picture of a cute animal (not mandatory but encouraged)