Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/initContainer, test/system: Handle NVIDIA's create-symlinks CDI hook #1545

Conversation

debarshiray
Copy link
Member

@debarshiray debarshiray commented Sep 17, 2024

NVIDIA Container Toolkit 0.16.0 started using create-symlinks hooks in
the Container Device Interface specification generated by it [1]. For
example:

  "hookName": "createContainer",
  "path": "/usr/bin/nvidia-cdi-hook",
  "args": [
    "nvidia-cdi-hook",
    "create-symlinks",
    "--link",
    "libnvidia-allocator.so.560.35.03::/usr/lib64/libnvidia-allocator.so.1",
    "--link",
    "../libnvidia-allocator.so.1::/usr/lib64/gbm/nvidia-drm_gbm.so"
  ]

Fallout from 649d02f

[1] NVIDIA Container Toolkit commit aae3da88c33d9cf2
NVIDIA/nvidia-container-toolkit@aae3da88c33d9cf2
NVIDIA/nvidia-container-toolkit#548

debarshiray added a commit to debarshiray/toolbox that referenced this pull request Sep 17, 2024
The following commit will handle create-symlinks hooks in the Container
Device Interface specification for the proprietary NVIDIA driver,
because NVIDIA Container Toolkit 0.16.0 started using those [1].  So,
make some space for the new code.

This will make the following commit easier to read.

Fallout from 649d02f

[1] NVIDIA Container Toolkit commit aae3da88c33d9cf2
    NVIDIA/nvidia-container-toolkit@aae3da88c33d9cf2
    NVIDIA/nvidia-container-toolkit#548

containers#1545
@debarshiray debarshiray force-pushed the wip/rishi/cmd-initContainer-nvidia-create-symlinks branch from c9d2daf to a79e427 Compare September 17, 2024 19:23
Copy link

Build succeeded.
https://softwarefactory-project.io/zuul/t/local/buildset/ea99da3dd5a243e2a2e4ce1294fb9950

✔️ unit-test SUCCESS in 5m 33s
✔️ unit-test-migration-path-for-coreos-toolbox SUCCESS in 3m 34s
✔️ unit-test-restricted SUCCESS in 5m 44s
✔️ system-test-fedora-rawhide SUCCESS in 1h 48m 26s
✔️ system-test-fedora-40 SUCCESS in 1h 45m 46s
✔️ system-test-fedora-39 SUCCESS in 1h 48m 59s

@debarshiray debarshiray changed the title [WIP] handle create-symlinks [WIP] cmd/initContainer, test/system: Handle NVIDIA's create-symlinks CDI hook Sep 17, 2024
debarshiray added a commit to debarshiray/toolbox that referenced this pull request Sep 17, 2024
NVIDIA Container Toolkit 0.16.0 started using create-symlinks hooks in
the Container Device Interface specification generated by it [1].  For
example:
  "hookName": "createContainer",
  "path": "/usr/bin/nvidia-cdi-hook",
  "args": [
    "nvidia-cdi-hook",
    "create-symlinks",
    "--link",
    "libnvidia-allocator.so.560.35.03::/usr/lib64/libnvidia-allocator.so.1",
    "--link",
    "../libnvidia-allocator.so.1::/usr/lib64/gbm/nvidia-drm_gbm.so"
  ]

Fallout from 649d02f

[1] NVIDIA Container Toolkit commit aae3da88c33d9cf2
    NVIDIA/nvidia-container-toolkit@aae3da88c33d9cf2
    NVIDIA/nvidia-container-toolkit#548

containers#1545
Copy link

Build succeeded.
https://softwarefactory-project.io/zuul/t/local/buildset/1db1890908c94635bb2c16710605328d

✔️ unit-test SUCCESS in 6m 03s
✔️ unit-test-migration-path-for-coreos-toolbox SUCCESS in 3m 38s
✔️ unit-test-restricted SUCCESS in 6m 09s
✔️ system-test-fedora-rawhide SUCCESS in 1h 48m 20s
✔️ system-test-fedora-40 SUCCESS in 1h 47m 57s
✔️ system-test-fedora-39 SUCCESS in 1h 47m 17s

debarshiray added a commit to debarshiray/toolbox that referenced this pull request Sep 18, 2024
The following commit will handle create-symlinks hooks in the Container
Device Interface specification for the proprietary NVIDIA driver,
because NVIDIA Container Toolkit 0.16.0 started using those [1].  So,
make some space for the new code.

This will make the following commit easier to read.

Fallout from 649d02f

[1] NVIDIA Container Toolkit commit aae3da88c33d9cf2
    NVIDIA/nvidia-container-toolkit@aae3da88c33d9cf2
    NVIDIA/nvidia-container-toolkit#548

containers#1545
@debarshiray debarshiray force-pushed the wip/rishi/cmd-initContainer-nvidia-create-symlinks branch from deb48cd to c5a0651 Compare September 18, 2024 20:40
Copy link

Build failed.
https://softwarefactory-project.io/zuul/t/local/buildset/4ca203e56dc84031821b686e7a8fc6d5

unit-test RETRY_LIMIT in 3m 28s
✔️ unit-test-migration-path-for-coreos-toolbox SUCCESS in 3m 37s
unit-test-restricted RETRY_LIMIT in 2m 34s
system-test-fedora-rawhide RETRY_LIMIT in 5m 18s
system-test-fedora-40 RETRY_LIMIT in 3m 20s
system-test-fedora-39 RETRY_LIMIT in 3m 24s

@debarshiray
Copy link
Member Author

recheck

Copy link

Build failed.
https://softwarefactory-project.io/zuul/t/local/buildset/5c1effb76aed426382ec82a949bc4e5a

✔️ unit-test SUCCESS in 5m 37s
✔️ unit-test-migration-path-for-coreos-toolbox SUCCESS in 3m 05s
✔️ unit-test-restricted SUCCESS in 5m 40s
✔️ system-test-fedora-rawhide SUCCESS in 1h 48m 21s
✔️ system-test-fedora-40 SUCCESS in 1h 50m 11s
system-test-fedora-39 TIMED_OUT in 1h 50m 25s

@debarshiray
Copy link
Member Author

recheck

Copy link

Build failed.
https://softwarefactory-project.io/zuul/t/local/buildset/56b3320861a2433891c5a1cf244bdc05

✔️ unit-test SUCCESS in 7m 42s
✔️ unit-test-migration-path-for-coreos-toolbox SUCCESS in 3m 07s
✔️ unit-test-restricted SUCCESS in 5m 31s
system-test-fedora-rawhide TIMED_OUT in 2h 10m 23s
✔️ system-test-fedora-40 SUCCESS in 1h 40m 36s
system-test-fedora-39 TIMED_OUT in 1h 50m 21s

Commit 87eaeea already added a dependency on Bats >= 1.10.0,
which is present on Fedora >= 39.  Therefore, it should be exploited
wherever possible to simplify things.

Currently, the CI has been frequently timing out on stable Fedora nodes.
So, increase the timeout from 1 hour 50 minutes to 2 hours to avoid
that.

For what it's worth, the timeout for Fedora Rawhide nodes is 2 hours 10
minutes and it seems enough.

containers#1546
The following commit will handle create-symlinks hooks in the Container
Device Interface specification for the proprietary NVIDIA driver,
because NVIDIA Container Toolkit 0.16.0 started using those [1].  So,
make some space for the new code.

This will make the following commit easier to read.

Fallout from 649d02f

[1] NVIDIA Container Toolkit commit aae3da88c33d9cf2
    NVIDIA/nvidia-container-toolkit@aae3da88c33d9cf2
    NVIDIA/nvidia-container-toolkit#548

containers#1545
@debarshiray debarshiray force-pushed the wip/rishi/cmd-initContainer-nvidia-create-symlinks branch from c5a0651 to c399243 Compare September 19, 2024 12:00
Copy link

Build succeeded.
https://softwarefactory-project.io/zuul/t/local/buildset/c20e0a90f8594a1b93b45ce0cefe10a4

✔️ unit-test SUCCESS in 5m 11s
✔️ unit-test-migration-path-for-coreos-toolbox SUCCESS in 3m 12s
✔️ unit-test-restricted SUCCESS in 5m 16s
✔️ system-test-fedora-rawhide SUCCESS in 1h 49m 03s
✔️ system-test-fedora-40 SUCCESS in 1h 47m 55s
✔️ system-test-fedora-39 SUCCESS in 1h 50m 26s

debarshiray added a commit to debarshiray/toolbox that referenced this pull request Sep 20, 2024
NVIDIA Container Toolkit 0.16.0 started using create-symlinks hooks in
the Container Device Interface specification generated by it [1].  For
example:
  "hookName": "createContainer",
  "path": "/usr/bin/nvidia-cdi-hook",
  "args": [
    "nvidia-cdi-hook",
    "create-symlinks",
    "--link",
    "libnvidia-allocator.so.560.35.03::/usr/lib64/libnvidia-allocator.so.1",
    "--link",
    "../libnvidia-allocator.so.1::/usr/lib64/gbm/nvidia-drm_gbm.so"
  ]

Fallout from 649d02f

[1] NVIDIA Container Toolkit commit aae3da88c33d9cf2
    NVIDIA/nvidia-container-toolkit@aae3da88c33d9cf2
    NVIDIA/nvidia-container-toolkit#548

containers#1545
NVIDIA Container Toolkit 0.16.0 started using create-symlinks hooks in
the Container Device Interface specification generated by it [1].  For
example:
  "hookName": "createContainer",
  "path": "/usr/bin/nvidia-cdi-hook",
  "args": [
    "nvidia-cdi-hook",
    "create-symlinks",
    "--link",
    "libnvidia-allocator.so.560.35.03::/usr/lib64/libnvidia-allocator.so.1",
    "--link",
    "../libnvidia-allocator.so.1::/usr/lib64/gbm/nvidia-drm_gbm.so"
  ]

Fallout from 649d02f

[1] NVIDIA Container Toolkit commit aae3da88c33d9cf2
    NVIDIA/nvidia-container-toolkit@aae3da88c33d9cf2
    NVIDIA/nvidia-container-toolkit#548

containers#1545
Copy link

Build succeeded.
https://softwarefactory-project.io/zuul/t/local/buildset/fef897c836004a6f884ed0c9e7d539a6

✔️ unit-test SUCCESS in 5m 52s
✔️ unit-test-migration-path-for-coreos-toolbox SUCCESS in 3m 07s
✔️ unit-test-restricted SUCCESS in 5m 50s
✔️ system-test-fedora-rawhide SUCCESS in 1h 55m 30s
✔️ system-test-fedora-40 SUCCESS in 1h 56m 20s
✔️ system-test-fedora-39 SUCCESS in 1h 57m 52s

@debarshiray debarshiray merged commit dd23baa into containers:main Sep 20, 2024
3 checks passed
@debarshiray debarshiray deleted the wip/rishi/cmd-initContainer-nvidia-create-symlinks branch September 20, 2024 20:01
@debarshiray debarshiray changed the title [WIP] cmd/initContainer, test/system: Handle NVIDIA's create-symlinks CDI hook cmd/initContainer, test/system: Handle NVIDIA's create-symlinks CDI hook Sep 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant