You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Jun 6, 2024. It is now read-only.
Recently we found some jobs write large data into the container file system. Actually, docker map the container's file system into a path of local host. So if the finished job's data isn't cleaned, it may take a lot of disk space.
Clean yarn local data in the path /tmp/pai-root/
In our bed, we found a lot of history data in the path. And some of them are very large. We do believe that we need some policies to remove those data. But not sure whether yarn or pai have implemented it or not.
root@xxxxxxxxxxxxxx:/# du -h --max-depth=1 /tmp/pai-root/code | grep G
1.3G /tmp/pai-root/code/application_1534124332808_1913
1.3G /tmp/pai-root/code/application_1534124332808_1918
1.3G /tmp/pai-root/code/application_1534124332808_1920
103G /tmp/pai-root/code/application_1535954247490_2193
1.3G /tmp/pai-root/code/application_1534124332808_1929
The text was updated successfully, but these errors were encountered:
ydye
changed the title
Are there any polices for yarn to clean the completed or failed job's data?
Are there any policies for yarn to clean the completed or failed job's data?
Nov 27, 2018
For 1: We add --rm when launch job, shouldn't it be cleaned by docker daemon?
For 2: Current release we download code to yarn local directory, yarn should clean it regardless of results. But unfortunately, due to disk pressure, k8s might kill yarn before cleanup, which is a severe conflict between 2 system. Especially by default, we enable fancy retry. Then such a job with large data might keep retry on every node, until all nodes down. Need a better design for it.
Back to this issue, these dirs belong to old version pai, I think we could delete them directly.
@mzmssg@ydye - is this issue trying to automatic clean up failed jobs? If so, I'm having concerns, as lots of customers are asking for keeping those logs for digging out the root cause.
Recently we found some jobs write large data into the container file system. Actually, docker map the container's file system into a path of local host. So if the finished job's data isn't cleaned, it may take a lot of disk space.
In our bed, we found a lot of history data in the path. And some of them are very large. We do believe that we need some policies to remove those data. But not sure whether yarn or pai have implemented it or not.
The text was updated successfully, but these errors were encountered: