- This
Helm chart
deploys aCronJob
which deletes all unused images on Kubernetes nodes by the defined schedule. This approach helps to avoidnode disk pressure
issues (especially in case of frequent deployments) and improves overall memory usage.
-
CronJob
relies on the work of the script. It performs the following :-
gets all nodes in the cluster;
-
in turn runs a kubectl node-shell on each node :
-
ctr
(containerd CLI) gets all images by defined pattern (usesgrep
); -
if an image isn't associated with any container -> delete it, else -> do nothing;
-
-
modifies output to make it more beautiful and informative;
-
-
When a
CronJob
is triggered,Job
is created. Then new pods are spawned one by one (fromkubectl node-shell
).Job
collects logs from those pods and outputs them to its log. As soon as the pod finishes its task, it automatically deletes. This continues until the script runs through all the nodes. As a result, we have oneJob
with all the needed logs. -
How the process looks like :
-
CronJob
execution results :
-
Helm chart
contains the following Kubernetes resources :-
CronJob
; -
ServiceAccount
; -
ClusterRole
; -
ClusterRoleBinding
.
-
-
Clone the repository or download the Release you need.
-
Set desired
container_cpu
&container_memory
in thekube-shell
(these areresources
for the spawned pods bykubectl node-shell
). -
Build the
Docker image
:
docker build -t ${NAME}:${TAG} .
-
Push your image to the desired repository (AWS ECR, Docker Hub, etc).
-
Adjust
values.yaml
. You must set the following :-
CronJob image
; -
grep
pattern for theall_images
variable. For example, if you keep images inAWS ECR
, you can grep them byaccount id
.
-
-
We also recommend setting the desired
schedule, ttlSecondsAfterFinished & resources
for theCronJob
. Configure the restvalues.yaml
content as you wish. -
Generate
helm template
before deployment (setrelease name / namespace
as you wish) :
helm template nodes-unused-images-cleanup ./nodes-unused-images-cleanup/ --create-namespace --namespace nodes-unused-images-cleanup --debug
- Deploy release (set
release name / namespace
as you wish) :
helm upgrade --install nodes-unused-images-cleanup ./nodes-unused-images-cleanup/ --create-namespace --namespace nodes-unused-images-cleanup --debug
Note
This block will help you to investigate issues with CronJob
execution.
- The most likely cause of the error below is
Node didn't have enough resource
. It can becpu, memory or pods
.
Error from server (BadRequest): container "nodes-unused-images-cleanup" in pod "nodes-unused-images-cleanup-sxvmq5" is not available
-
No output after the
spawning
text tells about no images with definedgrep
criteria were found. Check the image below : -
The most likely cause of the error below is the container may have already completed and exited or the pod may have been deleted or restarted between the time the command was issued and the connection attempt was made. This means that the pod did his task, but
Job
wasn't able to get the logs from it.
warning: couldn't attach to pod/nodes-unused-images-cleanup-sc9qph, falling back to streaming logs: unable to upgrade connection: container nodes-unused-images-cleanup not found in pod nodes-unused-images-cleanup-sc9qph_nodes-unused-images-cleanup
Important
You can temporarily remove the delete
rule for pods
inside ClusterRole
and run the CronJob
. This helps to investigate the root cause of the errors inside Job
logs because pods won't be automatically deleted. Check the image below.
Don't forget to adjust ClusterRole
and delete pods after testing.
- We'll be very thankful for any ideas, improvements, bug reports, etc. Feel free to open
Issues
🤗
Apache 2 Licensed
. See LICENSE for full details.