Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Iterator for diskann #874

Merged
merged 3 commits into from
Oct 10, 2024

Conversation

alwayslove2013
Copy link
Collaborator

@alwayslove2013 alwayslove2013 commented Sep 27, 2024

  • add iterator support for knowhere diskann
  • remove diskann-range_search, use index_node-range_search instead. (implemented by diskann-iterator)

issue: milvus-io/milvus#34816
co-author: @xxxlzhxxx

report: https://zilliverse.feishu.cn/docx/SEnCdj247oMxnXxIrp2cNH5knqd?from=from_copylink

Copy link

mergify bot commented Sep 27, 2024

@alwayslove2013 🔍 Important: PR Classification Needed!

For efficient project management and a seamless review process, it's essential to classify your PR correctly. Here's how:

  1. If you're fixing a bug, label it as kind/bug.
  2. For small tweaks (less than 20 lines without altering any functionality), please use kind/improvement.
  3. Significant changes that don't modify existing functionalities should be tagged as kind/enhancement.
  4. Adjusting APIs or changing functionality? Go with kind/feature.

For any PR outside the kind/improvement category, ensure you link to the associated issue using the format: “issue: #”.

Thanks for your efforts and contribution to the community!.

aligned_query_float[q_dim] = 0;
}
for (uint32_t i = 0; i < q_dim; i++) {
aligned_query_T[i] = (T) ((float) aligned_query_T[i] / query_norm);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no check for query_norm == 0

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alwayslove2013

    aligned_query_T[q_dim]
    aligned_query_float[q_dim]

For the fixed version, this two elements do not get populated for query_norm == 0 use case.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically,

    if (query_norm != 0) {
        query_norm = std::sqrt(query_norm);
        if (metric == diskann::Metric::INNER_PRODUCT) {
          aligned_query_T[q_dim] = 0;
          aligned_query_float[q_dim] = 0;
        }
        for (uint32_t i = 0; i < q_dim; i++) {
          aligned_query_T[i] = (T) ((float) aligned_query_T[i] / query_norm);
          aligned_query_float[i] /= query_norm;
        }
      }
    else {
      .. do what.. ?
    }

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to do anything.
When the metric is ip or cosine and query_norm==0, workspace next batch will be skipped. iterator's results will not be updated and will always be empty, which means Iterator.HasNext() will be false.

void PQFlashIndex<T>::getIteratorNextBatch(IteratorWorkspace<T> *workspace) {
  if (metric == diskann::Metric::INNER_PRODUCT ||
      metric == diskann::Metric::COSINE) {
    if (workspace->query_norm == 0) {
      return;
    }
  }
  ...
}

I will modify the logic to make it clearer.

thirdparty/DiskANN/src/pq_flash_index.cpp Outdated Show resolved Hide resolved
Copy link

codecov bot commented Oct 8, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 77.67%. Comparing base (3c46f4c) to head (b79b1ee).
Report is 214 commits behind head on main.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff            @@
##           main     #874       +/-   ##
=========================================
+ Coverage      0   77.67%   +77.67%     
=========================================
  Files         0       84       +84     
  Lines         0     6364     +6364     
=========================================
+ Hits          0     4943     +4943     
- Misses        0     1421     +1421     

see 84 files with indirect coverage changes

@alwayslove2013
Copy link
Collaborator Author

/kind improvement

diskann iterator for knowhere (opensource)

Signed-off-by: min.tian <min.tian.cn@gmail.com>

remove diskann-range_search, use index_node-range_search instead

Signed-off-by: min.tian <min.tian.cn@gmail.com>

pre-commit style

Signed-off-by: min.tian <min.tian.cn@gmail.com>

knowhere-diskann iterator fix: skip if query_norm == 0; remove range_search test - min_k < max_k

Signed-off-by: min.tian <min.tian.cn@gmail.com>
Signed-off-by: min.tian <min.tian.cn@gmail.com>
@mergify mergify bot added ci-passed and removed ci-passed labels Oct 9, 2024
@alexanderguzhva
Copy link
Collaborator

/lgtm

auto filter_ratio = static_cast<float>(search_conf.filter_threshold.value());

bool use_reorder_data = false;
bool for_tuning = false;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these two default to false? Possibly pass them in from config?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

both removed.

Signed-off-by: min.tian <min.tian.cn@gmail.com>
@sre-ci-robot sre-ci-robot removed the lgtm label Oct 10, 2024
@mergify mergify bot removed the ci-passed label Oct 10, 2024
@PwzXxm
Copy link
Collaborator

PwzXxm commented Oct 10, 2024

/lgtm
/approve

@sre-ci-robot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: alwayslove2013, PwzXxm

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@PwzXxm
Copy link
Collaborator

PwzXxm commented Oct 10, 2024

/hold

@PwzXxm
Copy link
Collaborator

PwzXxm commented Oct 10, 2024

/unhold

@mergify mergify bot added the ci-passed label Oct 10, 2024
@sre-ci-robot sre-ci-robot merged commit 690ade0 into zilliztech:main Oct 10, 2024
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants