fine tune connector process function #1954

ylwu-amzn · 2024-01-29T22:54:09Z

Description

Add default pre/post process function for Cohere rerank model
Support pre/post process function for all input data, not just text embedding

Issues Resolved

[List any issues this PR will resolve]

Check List

New functionality includes testing.
- All tests pass
New functionality has been documented.
- New functionality has javadoc added
Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Yaliang Wu <ylwu@amazon.com>

codecov · 2024-01-29T23:19:59Z

Codecov Report

Attention: 42 lines in your changes are missing coverage. Please review.

Comparison is base (152c5e2) 82.83% compared to head (8746b23) 82.93%.
Report is 4 commits behind head on main.

Files	Patch %	Lines
...ch/ml/engine/algorithms/remote/ConnectorUtils.java	46.66%	21 Missing and 3 partials ⚠️
...n/java/org/opensearch/ml/common/input/MLInput.java	44.44%	4 Missing and 1 partial ⚠️
...s/postprocess/CohereRerankPostProcessFunction.java	88.00%	0 Missing and 3 partials ⚠️
...va/org/opensearch/ml/common/utils/StringUtils.java	80.00%	3 Missing ⚠️
...ions/postprocess/EmbeddingPostProcessFunction.java	90.00%	0 Missing and 2 partials ⚠️
...ctions/preprocess/ConnectorPreProcessFunction.java	90.00%	1 Missing and 1 partial ⚠️
...unctions/preprocess/DefaultPreProcessFunction.java	91.30%	2 Missing ⚠️
...stprocess/BedrockEmbeddingPostProcessFunction.java	93.75%	0 Missing and 1 partial ⚠️

Additional details and impacted files

@@             Coverage Diff              @@
##               main    #1954      +/-   ##
============================================
+ Coverage     82.83%   82.93%   +0.10%     
- Complexity     5447     5506      +59     
============================================
  Files           522      533      +11     
  Lines         21912    22085     +173     
  Branches       2228     2244      +16     
============================================
+ Hits          18150    18316     +166     
- Misses         2851     2853       +2     
- Partials        911      916       +5

Flag	Coverage Δ
ml-commons	`82.93% <83.90%> (+0.10%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Signed-off-by: Yaliang Wu <ylwu@amazon.com>

Zhangxunmt

LGTM. Some thoughts here:
It's likely that we need to add more and more pre/post functions to support different remote Models like triton, etc, we should pay attention to common processes and see if some Models can share these functions to avoid duplications. Hopefully in the next models that we support, we only need to config the existing pre/post functions without adding code.

ylwu-amzn · 2024-01-30T21:33:10Z

LGTM. Some thoughts here: It's likely that we need to add more and more pre/post functions to support different remote Models like triton, etc, we should pay attention to common processes and see if some Models can share these functions to avoid duplications. Hopefully in the next models that we support, we only need to config the existing pre/post functions without adding code.

Good point. We can continuously fine tune the code when adding more and more models. Create some common process functions will save effort

HenryL27

Cohere rerank connector functions look good to me.
Does this let us use pre-process functions on the generic remote inference dataset now? If so, awesome!

ylwu-amzn · 2024-01-30T23:58:14Z

Cohere rerank connector functions look good to me. Does this let us use pre-process functions on the generic remote inference dataset now? If so, awesome!

Yes, with this PR, user can use pre-process function on remote inference data

* fine tune connector process function Signed-off-by: Yaliang Wu <ylwu@amazon.com> * add unit test for process function Signed-off-by: Yaliang Wu <ylwu@amazon.com> * add license header Signed-off-by: Yaliang Wu <ylwu@amazon.com> --------- Signed-off-by: Yaliang Wu <ylwu@amazon.com> (cherry picked from commit c5225de)

* fine tune connector process function Signed-off-by: Yaliang Wu <ylwu@amazon.com> * add unit test for process function Signed-off-by: Yaliang Wu <ylwu@amazon.com> * add license header Signed-off-by: Yaliang Wu <ylwu@amazon.com> --------- Signed-off-by: Yaliang Wu <ylwu@amazon.com> (cherry picked from commit c5225de) Co-authored-by: Yaliang Wu <ylwu@amazon.com>

* fine tune connector process function Signed-off-by: Yaliang Wu <ylwu@amazon.com> * add unit test for process function Signed-off-by: Yaliang Wu <ylwu@amazon.com> * add license header Signed-off-by: Yaliang Wu <ylwu@amazon.com> --------- Signed-off-by: Yaliang Wu <ylwu@amazon.com>

fine tune connector process function

8519cc4

Signed-off-by: Yaliang Wu <ylwu@amazon.com>

ylwu-amzn had a problem deploying to ml-commons-cicd-env January 29, 2024 22:54 — with GitHub Actions Failure

ylwu-amzn had a problem deploying to ml-commons-cicd-env January 29, 2024 22:54 — with GitHub Actions Error

ylwu-amzn temporarily deployed to ml-commons-cicd-env January 29, 2024 22:54 — with GitHub Actions Inactive

ylwu-amzn temporarily deployed to ml-commons-cicd-env January 29, 2024 23:22 — with GitHub Actions Inactive

add unit test for process function

664d715

Signed-off-by: Yaliang Wu <ylwu@amazon.com>

ylwu-amzn had a problem deploying to ml-commons-cicd-env January 30, 2024 01:25 — with GitHub Actions Error

ylwu-amzn had a problem deploying to ml-commons-cicd-env January 30, 2024 01:25 — with GitHub Actions Failure

ylwu-amzn had a problem deploying to ml-commons-cicd-env January 30, 2024 01:25 — with GitHub Actions Error

ylwu-amzn had a problem deploying to ml-commons-cicd-env January 30, 2024 01:25 — with GitHub Actions Failure

ylwu-amzn had a problem deploying to ml-commons-cicd-env January 30, 2024 01:25 — with GitHub Actions Error

ylwu-amzn had a problem deploying to ml-commons-cicd-env January 30, 2024 01:25 — with GitHub Actions Failure

ylwu-amzn temporarily deployed to ml-commons-cicd-env January 30, 2024 02:37 — with GitHub Actions Inactive

ylwu-amzn temporarily deployed to ml-commons-cicd-env January 30, 2024 03:04 — with GitHub Actions Inactive

add license header

8746b23

Signed-off-by: Yaliang Wu <ylwu@amazon.com>

ylwu-amzn had a problem deploying to ml-commons-cicd-env January 30, 2024 11:44 — with GitHub Actions Failure

ylwu-amzn temporarily deployed to ml-commons-cicd-env January 30, 2024 11:44 — with GitHub Actions Inactive

ylwu-amzn requested review from rbhavna, zane-neo, Zhangxunmt, austintlee and HenryL27 as code owners January 30, 2024 19:27

Zhangxunmt approved these changes Jan 30, 2024

View reviewed changes

ylwu-amzn temporarily deployed to ml-commons-cicd-env January 30, 2024 22:46 — with GitHub Actions Inactive

b4sjoo approved these changes Jan 30, 2024

View reviewed changes

HenryL27 approved these changes Jan 30, 2024

View reviewed changes

ylwu-amzn merged commit c5225de into opensearch-project:main Jan 30, 2024
14 checks passed

ylwu-amzn added the backport 2.x label Jan 30, 2024

opensearch-trigger-bot bot mentioned this pull request Jan 30, 2024

[Backport 2.x] fine tune connector process function #1963

Merged

SuZhou-Joe mentioned this pull request Jan 31, 2024

[BUG] connector throws error when doing predict. #1971

Closed

This was referenced Feb 3, 2024

[FEATURE] Support local cross-encoder model #1589

Open

[FEATURE] Support for HttpConnector request and response body transformations through scripts #1475

Open

HenryL27 mentioned this pull request Feb 7, 2024

[DOC] Documentation for new reranking feature opensearch-project/documentation-website#6359

Closed

4 tasks

ylwu-amzn mentioned this pull request Feb 8, 2024

[FEATURE] Add default process functions for bedrock embedding and cohere rerank model #1543

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fine tune connector process function #1954

fine tune connector process function #1954

ylwu-amzn commented Jan 29, 2024 •

edited

Loading

codecov bot commented Jan 29, 2024 •

edited

Loading

Zhangxunmt left a comment

ylwu-amzn commented Jan 30, 2024

HenryL27 left a comment

ylwu-amzn commented Jan 30, 2024

fine tune connector process function #1954

fine tune connector process function #1954

Conversation

ylwu-amzn commented Jan 29, 2024 • edited Loading

Description

Issues Resolved

Check List

codecov bot commented Jan 29, 2024 • edited Loading

Codecov Report

Zhangxunmt left a comment

Choose a reason for hiding this comment

ylwu-amzn commented Jan 30, 2024

HenryL27 left a comment

Choose a reason for hiding this comment

ylwu-amzn commented Jan 30, 2024

ylwu-amzn commented Jan 29, 2024 •

edited

Loading

codecov bot commented Jan 29, 2024 •

edited

Loading