We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
When I submit a pytorchjob with arena, I could't find parameters related to shared memory size, which is very important for pytorch training.
The size is fixed to 2Gi.
... - mountPath: /dev/shm name: dshm ... ... - emptyDir: medium: Memory sizeLimit: 2Gi name: dshm ...
Can anyone know how to set dshm size?
The text was updated successfully, but these errors were encountered:
When I submit a pytorchjob with arena, I could't find parameters related to shared memory size, which is very important for pytorch training. The size is fixed to 2Gi. ... - mountPath: /dev/shm name: dshm ... ... - emptyDir: medium: Memory sizeLimit: 2Gi name: dshm ... Can anyone know how to set dshm size?
OK. I find a workaround solution. Modified file /charts/pytorchjob/values.yaml :
shmSize: 2Gi
to
shmSize: 64Gi # or any value you want
Sorry, something went wrong.
Same issue
/assign
Syulin7
Successfully merging a pull request may close this issue.
When I submit a pytorchjob with arena, I could't find parameters related to shared memory size, which is very important for pytorch training.
The size is fixed to 2Gi.
Can anyone know how to set dshm size?
The text was updated successfully, but these errors were encountered: