Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[k8s] Allow configuring /dev/shm size limit #3244

Merged
merged 12 commits into from
Mar 1, 2024
Merged

Conversation

romilbhardwaj
Copy link
Collaborator

@romilbhardwaj romilbhardwaj commented Feb 28, 2024

Many production environments require limits on the size of /dev/shm partition allocated to a pod. This PR adds support for this configuration through pod_config in ~/.sky/config.yaml.

Tested (run the relevant ones):

  • Code formatting: bash format.sh
  • Manual tests - sky launch with config.yaml set and inspect pod with kubectl describe pods

config.yaml used for testing:

kubernetes:
  ports: ingress
  pod_config:
    spec:
      containers:
        - volumeMounts: # Custom volume mounts for the pod
            - mountPath: /foo
              name: example-volume
              readOnly: true
      volumes:
        - name: example-volume
          hostPath:
            path: /tmp
            type: Directory
        - name: dshm          # SkyPilot creates a /dev/shm volume for each pod called dshm. You can override the sizeLimits/volume type here.
          emptyDir:
            medium: Memory
            sizeLimit: 3Gi

Copy link
Collaborator

@Michaelvll Michaelvll left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for adding this @romilbhardwaj! Question: could we mention why we the use case of this and the best practice for a user to specify this field, e.g. if they encounter some specific errors, they should change this field?

Comment on lines 290 to 294
# Size of the /dev/shm shared memory for the pod (optional).
#
# Defaults to None, which means no size limits are set. If set, the value
# must be a string that is a valid Kubernetes quantity, e.g., "3Gi".
# https://kubernetes.io/docs/reference/kubernetes-api/common-definitions/quantity/
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we mention the use case for specifying this in the comment?

Also, it seems a user could specify this directly in the pod_config above, is it necessary to have a separate field in the config for this?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a comment. This is required by one of our users, since their production environment uses an AdmissionController that denies any pods which do not have a specific limit on /dev/shm size.

Using pod_config is possible, but requires the user to understand our pod template and use the exact fields ( (name: dshm, ...) when overriding the pod_config.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation! How frequent this field will be used? I am wondering if it is not that frequently used, we can, instead, just have the full config in the pod_config in this doc, and have the comment there.
I kind of feeling including two ways to change the same config might be a bit confusing.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh good point. I thought more about it and indeed it likely won't be used that frequently. I added a comment to our config.yaml docs and updated the PR description.

Copy link
Collaborator

@Michaelvll Michaelvll left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for updating this PR @romilbhardwaj! LGTM.

- name: dshm # Use this to modify the /dev/shm volume mounted by SkyPilot
emptyDir:
medium: Memory
sizeLimit: 3Gi # Set a size limit for the /dev/shm volume
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is /dev/shm a convention name that people would know? Just wondering if we should elaborate a bit for what is this for.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/dev/shm is generally well understood as a shared memory, and explaining it here might add more clutter. It's usage depends on the application (e.g., Ray stores GCS state there, pytorch uses it for IPC), so it's hard to generalize. Folks running docker containers would usually be familiar with this, so might be fine to leave it be?

Comment on lines +1130 to +1132
destination_volume = next(
(v for v in destination[key]
if v.get('name') == new_volume_name), None)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: using filter might be easier to read. not feeling strongly, please feel free to ignore.

Suggested change
destination_volume = next(
(v for v in destination[key]
if v.get('name') == new_volume_name), None)
destination_volume = next(filter(lambda v: v.get('name') == new_volume_name, destination[key]), None)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the generator might be more pythonic :)

@romilbhardwaj romilbhardwaj merged commit 2f78a6a into master Mar 1, 2024
19 checks passed
@romilbhardwaj romilbhardwaj deleted the k8s_dev_shm branch March 1, 2024 02:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants