Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add codebook passing and pq/opq dim overwrite. #288

Merged
merged 27 commits into from
Apr 3, 2023

Conversation

jinwei14
Copy link
Contributor

@jinwei14 jinwei14 commented Mar 29, 2023

  • Does this PR have a descriptive title that could go in our release notes?
  • Does this PR add any new dependencies?
  • Does this PR modify any existing APIs?
    • Is the change to the API backwards compatible?
  • Should this result in any changes to our documentation, either updating existing docs or adding new ones?

Reference Issues/PRs

What does this implement/fix? Briefly explain your changes.

  1. this change will pass in a pretrained codebook prefix. for pq and OPQ part it will load the pretrained roatation matrix and pivots.
  2. changed the default max PQ dim from 256 to 384 to adopt richer embeddings.
  3. we will need to update readme for new parameters if this PR is accepted.

Any other comments?

jinwei14 and others added 14 commits March 29, 2023 10:59
…osoft#225)

- add code for two variants of filtered index, readme and CI tests

- add utils for synthetic label generation and CI tests.

* Add co-authors

Co-authored-by: ravishankar <rakri@microsoft.com>
Co-authored-by: Varun Sivashankar <t-varunsi@microsoft.com>

---------

Co-authored-by: ravishankar <rakri@microsoft.com>
Co-authored-by: David Kaczynski <dkaczynski@microsoft.com>
Co-authored-by: Siddharth Gollapudi <t-gollapudis@microsoft.com>
Co-authored-by: Neelam Mahapatro <nmahapatro@microsoft.com>
Co-authored-by: Harsha Vardhan Simhadri <harshasi@microsoft.com>
Co-authored-by: Harsha Vardhan Simhadri <harsha-simhadri@users.noreply.github.com>
Co-authored-by: REDMOND\patelyash <patelyash@microsoft.com>
Co-authored-by: Varun Sivashankar <t-varunsi@microsoft.com>
…crosoft#236)

* Rather than sift through all the *.cpp and *.h in the root directory, we're looking for only the sources in our main repository for formatting. Git submodules are excluded

* Removing the --Werror flag only until we actually format all of the code in a future commit

* We're choosing to base our style on the Microsoft style guide and not make any changes

* Running format action on source code.  Settling on Google styling.  Settled on '.clang-format' instead of '_clang-format'. Fixed instructions such that only clang-format 12 is installed (13 changes SortIncludes options from true/false to a trinary set of options, none of which include the word 'false')

* Enabling error on malformatted file

* Revert "Enabling error on malformatted file"

This reverts commit fa33e82.

* Revert "Running format action on source code.  Settling on Google styling.  Settled on '.clang-format' instead of '_clang-format'. Fixed instructions such that only clang-format 12 is installed (13 changes SortIncludes options from true/false to a trinary set of options, none of which include the word 'false')"

This reverts commit e0281be.

* Trying again; formatting rules based on Google rules, disables sorting includes as that breaks us, and enabling check on build.

* Somehow this was missed in the mass format.  Formatting include/distance.h.

* Manually fixing the formatting because clang-format wouldn't, but WOULD flag it as invalid
Fix typo in SSD index readme
Remove warnings affecting internal build pipelines

---------

Co-authored-by: Yiyong Lin <yiyolin@microsoft.com>
* Add support for multiple frozen points

* Add the missing parameters to the constructor.
* Added filtered disk index readme
* Transferring Varun's chagges from external fork with squash merge

* generating multiple gt's for each filter label + search with multiple filter labels (code cleanup)

* supporting no-filter + one filter label + filter label file (multiple filters) while computing GT

* generating multiple gt's + refactoring code for readability & cleanliness

* adding more tests for filtered search

* updating pr-test to test filtered cases

* lowering recall requirement for disk index

* transferred functions to filter_utils 

* adding more test for build and search without universal label

* adding one_per_point distribution to generate_synthetic_labels + cleaning up artifacts after compute gt+ removing minor errors

* refactoring search_disk_index to use a query filter vector
---------

Co-authored-by: patelyash <patelyash@microsoft.com>
Co-authored-by: Varun Sivashankar <t-varunsi@microsoft.com>
src/disk_utils.cpp Outdated Show resolved Hide resolved
@PhilipBAdams
Copy link
Contributor

One other thing to consider, do you want to support overriding codebook for disk PQ too? We don't currently use disk PQ in production

@jinwei14
Copy link
Contributor Author

One other thing to consider, do you want to support overriding codebook for disk PQ too? We don't currently use disk PQ in production

Idk, what do you think? shall I also do it in diskpq?

include/pq.h Outdated Show resolved Hide resolved
include/pq.h Outdated Show resolved Hide resolved
src/disk_utils.cpp Show resolved Hide resolved
src/disk_utils.cpp Outdated Show resolved Hide resolved
src/disk_utils.cpp Show resolved Hide resolved
src/disk_utils.cpp Show resolved Hide resolved
src/disk_utils.cpp Outdated Show resolved Hide resolved
tests/build_disk_index.cpp Show resolved Hide resolved
@harsha-simhadri
Copy link
Contributor

One other thing to consider, do you want to support overriding codebook for disk PQ too? We don't currently use disk PQ in production

Idk, what do you think? shall I also do it in diskpq?

I am OK with leaving this out unless you need this too.

@jinwei14 jinwei14 requested a review from gopalrs April 3, 2023 01:15
src/index.cpp Outdated Show resolved Hide resolved
include/pq.h Show resolved Hide resolved
src/disk_utils.cpp Show resolved Hide resolved
@gopalrs gopalrs merged commit 4c8041b into microsoft:main Apr 3, 2023
@jinwei14 jinwei14 deleted the codebookPassin branch April 3, 2023 06:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.