Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate from RAFT to CUVS #3549

Closed
wants to merge 218 commits into from

Conversation

tarang-jain
Copy link
Contributor

@tarang-jain tarang-jain commented Jun 25, 2024

Remove the dependency on raft::compiled and modify GPU implementations to use cuVS backend in place of RAFT.

A deeper insight into the dependency:
FAISS gets the ANN algorithm implementations such as IVF-Flat and IVF-PQ from cuVS. RAFT is meant to be a lightweight C++ header-only template library that cuVS relies on for the more fundamental / low-level utilities. Some examples of these are RAFT's device mdarray and mdspan objects; the RAFT resource object (raft::resource) that takes care of the stream ordering of device functions; linear algebra functions such as mapping, reduction, BLAS routines etc. A lot of the cuVS functions take the RAFT mdspan objects as arguments (for example raft::device_matrix_view). Therefore FAISS relies on both cuVS and RAFT. FAISS gets RAFT headers through cuVS and uses them to create the function arguments that can be consumed by cuVS. Note that we are not explicitly linking FAISS against raft::raft or raft::compiled. Only the required headers are included and compiled rather than compiling the whole RAFT shared library. This is the reason we still see mentions of raft in FAISS.

@tarang-jain
Copy link
Contributor Author

@asadoughi @cjnolet, due to the segmentation fault in the cagra tests with cuvs 24.10, I have reverted to cuVS 24.08. I will continue to work on getting 24.10 in on the side. But it is crucial to get this PR merged quickly.

@asadoughi
Copy link
Contributor

@tarang-jain Did you mean to push those latest 4 commits to this PR? If we're rolling forward with 24.08 we should probably go back to 3e056ed.

@tarang-jain
Copy link
Contributor Author

tarang-jain commented Nov 7, 2024

@asadoughi I have fixed the segmentation fault occurring with 24.10 in the CAGRA tests and so we are very close to getting cuVS 24.10 in. There are still some std::bad_cast in the torch CPU tests. If I am not able to fix those by EOD today, I will open a follow-up PR with 24.10 and revert this PR to 3e056ed

@tarang-jain
Copy link
Contributor Author

I have downgraded cuVS to 24.08 again. I'm working on #4021 to get 24.10 in after this PR is merged.

@facebook-github-bot
Copy link
Contributor

@asadoughi has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Copy link
Contributor

@asadoughi asadoughi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @tarang-jain! Thanks for updating. I have just a couple of suggestions inline.

faiss/gpu/GpuDistance.cu Show resolved Hide resolved
.github/actions/build_cmake/action.yml Outdated Show resolved Hide resolved
faiss/gpu/GpuIndex.cu Show resolved Hide resolved
@facebook-github-bot
Copy link
Contributor

@asadoughi has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@tarang-jain
Copy link
Contributor Author

@asadoughi what are the facebook internal checks that are failing?

@facebook-github-bot
Copy link
Contributor

@asadoughi merged this pull request in 1349220.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

Successfully merging this pull request may close these issues.