Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fine tune connector process function #1954

Merged
merged 3 commits into from
Jan 30, 2024
Merged

Conversation

ylwu-amzn
Copy link
Collaborator

@ylwu-amzn ylwu-amzn commented Jan 29, 2024

Description

  1. Add default pre/post process function for Cohere rerank model
  2. Support pre/post process function for all input data, not just text embedding

Issues Resolved

[List any issues this PR will resolve]

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Yaliang Wu <ylwu@amazon.com>
Copy link

codecov bot commented Jan 29, 2024

Codecov Report

Attention: 42 lines in your changes are missing coverage. Please review.

Comparison is base (152c5e2) 82.83% compared to head (8746b23) 82.93%.
Report is 4 commits behind head on main.

Files Patch % Lines
...ch/ml/engine/algorithms/remote/ConnectorUtils.java 46.66% 21 Missing and 3 partials ⚠️
...n/java/org/opensearch/ml/common/input/MLInput.java 44.44% 4 Missing and 1 partial ⚠️
...s/postprocess/CohereRerankPostProcessFunction.java 88.00% 0 Missing and 3 partials ⚠️
...va/org/opensearch/ml/common/utils/StringUtils.java 80.00% 3 Missing ⚠️
...ions/postprocess/EmbeddingPostProcessFunction.java 90.00% 0 Missing and 2 partials ⚠️
...ctions/preprocess/ConnectorPreProcessFunction.java 90.00% 1 Missing and 1 partial ⚠️
...unctions/preprocess/DefaultPreProcessFunction.java 91.30% 2 Missing ⚠️
...stprocess/BedrockEmbeddingPostProcessFunction.java 93.75% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main    #1954      +/-   ##
============================================
+ Coverage     82.83%   82.93%   +0.10%     
- Complexity     5447     5506      +59     
============================================
  Files           522      533      +11     
  Lines         21912    22085     +173     
  Branches       2228     2244      +16     
============================================
+ Hits          18150    18316     +166     
- Misses         2851     2853       +2     
- Partials        911      916       +5     
Flag Coverage Δ
ml-commons 82.93% <83.90%> (+0.10%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Signed-off-by: Yaliang Wu <ylwu@amazon.com>
Signed-off-by: Yaliang Wu <ylwu@amazon.com>
Copy link
Collaborator

@Zhangxunmt Zhangxunmt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Some thoughts here:
It's likely that we need to add more and more pre/post functions to support different remote Models like triton, etc, we should pay attention to common processes and see if some Models can share these functions to avoid duplications. Hopefully in the next models that we support, we only need to config the existing pre/post functions without adding code.

@ylwu-amzn
Copy link
Collaborator Author

LGTM. Some thoughts here: It's likely that we need to add more and more pre/post functions to support different remote Models like triton, etc, we should pay attention to common processes and see if some Models can share these functions to avoid duplications. Hopefully in the next models that we support, we only need to config the existing pre/post functions without adding code.

Good point. We can continuously fine tune the code when adding more and more models. Create some common process functions will save effort

Copy link
Collaborator

@HenryL27 HenryL27 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cohere rerank connector functions look good to me.
Does this let us use pre-process functions on the generic remote inference dataset now? If so, awesome!

@ylwu-amzn
Copy link
Collaborator Author

Cohere rerank connector functions look good to me. Does this let us use pre-process functions on the generic remote inference dataset now? If so, awesome!

Yes, with this PR, user can use pre-process function on remote inference data

@ylwu-amzn ylwu-amzn merged commit c5225de into opensearch-project:main Jan 30, 2024
14 checks passed
opensearch-trigger-bot bot pushed a commit that referenced this pull request Jan 30, 2024
* fine tune connector process function

Signed-off-by: Yaliang Wu <ylwu@amazon.com>

* add unit test for process function

Signed-off-by: Yaliang Wu <ylwu@amazon.com>

* add license header

Signed-off-by: Yaliang Wu <ylwu@amazon.com>

---------

Signed-off-by: Yaliang Wu <ylwu@amazon.com>
(cherry picked from commit c5225de)
ylwu-amzn added a commit that referenced this pull request Jan 31, 2024
* fine tune connector process function

Signed-off-by: Yaliang Wu <ylwu@amazon.com>

* add unit test for process function

Signed-off-by: Yaliang Wu <ylwu@amazon.com>

* add license header

Signed-off-by: Yaliang Wu <ylwu@amazon.com>

---------

Signed-off-by: Yaliang Wu <ylwu@amazon.com>
(cherry picked from commit c5225de)

Co-authored-by: Yaliang Wu <ylwu@amazon.com>
austintlee pushed a commit to austintlee/ml-commons that referenced this pull request Mar 19, 2024
* fine tune connector process function

Signed-off-by: Yaliang Wu <ylwu@amazon.com>

* add unit test for process function

Signed-off-by: Yaliang Wu <ylwu@amazon.com>

* add license header

Signed-off-by: Yaliang Wu <ylwu@amazon.com>

---------

Signed-off-by: Yaliang Wu <ylwu@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants