-
Notifications
You must be signed in to change notification settings - Fork 532
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open Source Embedding + Contrastive Code #1615
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First pass, mostly naming nits, good work!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚀 Good work!
🚀 yolo
Additional Round of Testing: run It also registers to UC here: and can be served with provisioned throughput here: run |
Note: Scroll down to the bottom to see additional testing
Summary
Since this PR is extremely long, I will instead outline some important to hopefully make the review painless.
llmfoundry/models/llm_embed/ was taken copy paste from plugins + pre-commit.
contrastive_pairs folder was also moved directly from plugins
All files in test were also moved from plugins. Namely:
The file data_utils from plugins was concatenated to foundry.
Small change to text_data because it was causing circular imports with the dataloader.
As you can tell, most commits were just to solve circular imports that were popping up.
Manual Testing
Finetune Embedding old run from private code (https://github.com/databricks-mosaic/runtime-private-plugins/pull/165):
embedding-ft-FIMVdd
Mapping to:
https://eng-ml-inference-team-us-east-1.cloud.databricks.com/ml/experiments/2327133678740256/runs/4100b272b4384cf79b6a140910db79cb
Finetune Embedding new run using foundry branch and no private
embedding-ft-g2XSux
Mapping to:
https://eng-ml-inference-team-us-east-1.cloud.databricks.com/ml/experiments/3702773029215793/runs/30c4a0bd415e42e09ccebf62aeb75568/model-metrics?o=1669080675700484
Test UC model registration given mlflow dev
embedding-ft-nsJDiD
https://eng-ml-inference-team-us-east-1.cloud.databricks.com/explore/data/models/ft_embedding/data/vinchenzo_ft_embedding/version/3?o=1669080675700484
Contrastive LM old run from private code
mpt-small-contrastive-test-Bxf1uU
https://dbc-559ffd80-2bfc.cloud.databricks.com/ml/experiments/2656944668850176/runs/50cbda7a569e4176ab81b8cf5681fc0b/model-metrics?o=7395834863327820
Contrasive LM new run with only foundry
mpt-small-contrastive-test-qiW9gx
https://eng-ml-inference-team-us-east-1.cloud.databricks.com/ml/experiments/3702773029215793/runs/344fd1fc372c4581a050a49d80ca5400
And there's also this regression test here ran on this branch:
https://databricks.slack.com/archives/C05T1A4UMT8/p1729864541067169
https://github.com/databricks-mosaic/regression-testing/actions/runs/11518415501/job/32065131952