Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open Source Embedding + Contrastive Code #1615

Merged
merged 52 commits into from
Oct 29, 2024
Merged

Open Source Embedding + Contrastive Code #1615

merged 52 commits into from
Oct 29, 2024

Conversation

KuuCi
Copy link
Contributor

@KuuCi KuuCi commented Oct 24, 2024

Note: Scroll down to the bottom to see additional testing

Summary

Since this PR is extremely long, I will instead outline some important to hopefully make the review painless.

llmfoundry/models/llm_embed/ was taken copy paste from plugins + pre-commit.

contrastive_pairs folder was also moved directly from plugins

All files in test were also moved from plugins. Namely:

  1. test_contrastive_dataloader
  2. test_embedding_finetune
  3. test_llm_embedding
  4. test_contrastive_dataloader

The file data_utils from plugins was concatenated to foundry.

Small change to text_data because it was causing circular imports with the dataloader.

As you can tell, most commits were just to solve circular imports that were popping up.

Manual Testing
Finetune Embedding old run from private code (https://github.com/databricks-mosaic/runtime-private-plugins/pull/165):
embedding-ft-FIMVdd
Mapping to:
https://eng-ml-inference-team-us-east-1.cloud.databricks.com/ml/experiments/2327133678740256/runs/4100b272b4384cf79b6a140910db79cb

image

Finetune Embedding new run using foundry branch and no private
embedding-ft-g2XSux
Mapping to:
https://eng-ml-inference-team-us-east-1.cloud.databricks.com/ml/experiments/3702773029215793/runs/30c4a0bd415e42e09ccebf62aeb75568/model-metrics?o=1669080675700484

image

Test UC model registration given mlflow dev
embedding-ft-nsJDiD
https://eng-ml-inference-team-us-east-1.cloud.databricks.com/explore/data/models/ft_embedding/data/vinchenzo_ft_embedding/version/3?o=1669080675700484
image

Contrastive LM old run from private code
mpt-small-contrastive-test-Bxf1uU
https://dbc-559ffd80-2bfc.cloud.databricks.com/ml/experiments/2656944668850176/runs/50cbda7a569e4176ab81b8cf5681fc0b/model-metrics?o=7395834863327820
image

Contrasive LM new run with only foundry
mpt-small-contrastive-test-qiW9gx
https://eng-ml-inference-team-us-east-1.cloud.databricks.com/ml/experiments/3702773029215793/runs/344fd1fc372c4581a050a49d80ca5400

image

And there's also this regression test here ran on this branch:

https://databricks.slack.com/archives/C05T1A4UMT8/p1729864541067169
https://github.com/databricks-mosaic/regression-testing/actions/runs/11518415501/job/32065131952
image

@KuuCi KuuCi changed the title Opoen Source Embedding + Contrastive Code Open Source Embedding + Contrastive Code Oct 24, 2024
@KuuCi KuuCi requested a review from milocress October 25, 2024 12:31
Copy link
Contributor

@milocress milocress left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First pass, mostly naming nits, good work!

llmfoundry/data/text_data.py Show resolved Hide resolved
llmfoundry/models/llm_embed/README.md Outdated Show resolved Hide resolved
llmfoundry/models/llm_embed/README.md Outdated Show resolved Hide resolved
llmfoundry/models/llm_embed/gte_finetune_lm.py Outdated Show resolved Hide resolved
tests/data/test_contrastive_dataloader.py Outdated Show resolved Hide resolved
tests/data_utils.py Outdated Show resolved Hide resolved
tests/data_utils.py Outdated Show resolved Hide resolved
tests/models/llm_embed/test_llm_embedding.py Outdated Show resolved Hide resolved
@KuuCi KuuCi requested review from milocress and dakinggg October 25, 2024 19:45
@KuuCi KuuCi marked this pull request as ready for review October 25, 2024 20:05
@KuuCi KuuCi requested a review from a team as a code owner October 25, 2024 20:05
Copy link
Contributor

@milocress milocress left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀 Good work!

@KuuCi KuuCi requested a review from irenedea October 25, 2024 20:39
llmfoundry/models/llm_embed/README.md Outdated Show resolved Hide resolved
llmfoundry/models/llm_embed/README.md Outdated Show resolved Hide resolved
llmfoundry/models/llm_embed/README.md Outdated Show resolved Hide resolved
llmfoundry/models/llm_embed/README.md Outdated Show resolved Hide resolved
llmfoundry/models/llm_embed/modeling_llm_embed.py Outdated Show resolved Hide resolved
tests/models/llm_embed/test_llm_embedding.py Outdated Show resolved Hide resolved
tests/models/llm_embed/test_llm_embedding.py Show resolved Hide resolved
tests/models/llm_embed/test_llm_embedding.py Outdated Show resolved Hide resolved
tests/models/llm_embed/test_llm_embedding.py Outdated Show resolved Hide resolved
tests/models/llm_embed/test_llm_embedding.py Show resolved Hide resolved
@KuuCi
Copy link
Contributor Author

KuuCi commented Oct 29, 2024

Additional Round of Testing:

run mcli logs embedding-ft-jzojKA gives:
https://eng-ml-inference-team-us-east-1.cloud.databricks.com/ml/experiments/3702773029215793/runs/8ffde6a453f34bb59d62bef2a6c55a0e/model-metrics?o=1669080675700484
image
Which looks consistent as above.

It also registers to UC here:
https://eng-ml-inference-team-us-east-1.cloud.databricks.com/explore/data/models/ft_embedding/data/vinchenzo_ft_embedding/version/8?o=1669080675700484

and can be served with provisioned throughput here:
image

run mcli logs mpt-small-contrastive-test-LZopCr gives:
https://eng-ml-inference-team-us-east-1.cloud.databricks.com/ml/experiments/3702773029215793/runs/3f0ce0787c114421b24f6e70f849d771
image
Which also looks consistent with above.

@KuuCi KuuCi requested a review from dakinggg October 29, 2024 02:26
@KuuCi KuuCi merged commit e7dddd2 into main Oct 29, 2024
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants