Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix bug where libcuda.so is not found in ldcache #755

Merged
merged 5 commits into from
Oct 24, 2024

Conversation

elezar
Copy link
Member

@elezar elezar commented Oct 21, 2024

This change fixes a bug where libcuda.so cannot be located even if it is present in the ldcache. This is relevant on systems where the library is not present in one of the "standard" paths.

@elezar elezar force-pushed the fix-libcuda-so branch 2 times, most recently from 068a888 to bda19fb Compare October 24, 2024 10:15
@elezar elezar force-pushed the fix-libcuda-so branch 3 times, most recently from b9cff9c to 6fe8e98 Compare October 24, 2024 11:29
Signed-off-by: Evan Lezar <elezar@nvidia.com>
Since we use a map to keep track of the elements of a symlink chain
the construction of the final list of located elements is not stable.
This change constructs the output as this is being discovered and as
such maintains the original ordering.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change adds a test for locating libcuda as a driver library.
This includes a failing test on a system where libcuda.so.1 is in
the ldcache, but not at one of the predefined library search paths.

A testdata folder with sample root filesystems is included to test
various combinations.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
Signed-off-by: Evan Lezar <elezar@nvidia.com>
@elezar elezar force-pushed the fix-libcuda-so branch 2 times, most recently from 0bd1900 to 338b9d2 Compare October 24, 2024 13:59
Copy link
Contributor

@klueska klueska left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't do a thorough code review of every line, but in general the new approach looks good and is in line with what we discussed.

This change udpates the ldcache locator to read the ldcache at construction
and use these contents to perform future lookups against. Each of the cache
entries are resolved and lookups return the resolved target.

Assuming a symlink chain: libcuda.so -> libcuda.so.1 -> libcuda.so.VERSION, this
means that libcuda.so.VERION will be returned for any of the following inputs:
libcuda.so, libcuda.so.1, libcudal.so.*.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
@elezar elezar self-assigned this Oct 24, 2024
@elezar elezar added the must-backport The changes in PR need to be backported to at least one stable release branch. label Oct 24, 2024
@elezar elezar merged commit 8860878 into NVIDIA:main Oct 24, 2024
10 checks passed
@elezar elezar deleted the fix-libcuda-so branch October 24, 2024 21:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
must-backport The changes in PR need to be backported to at least one stable release branch.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow LDCache to be used to discover libcuda.so in CDI spec generation
3 participants