-
Notifications
You must be signed in to change notification settings - Fork 190
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fetch the cached weights for Mistral-7B-Instruct-v0.1 from GCS bucket #606
Fetch the cached weights for Mistral-7B-Instruct-v0.1 from GCS bucket #606
Conversation
/gcbrun |
|
From the pod log of deployment
Which indicated it went past the download weights step |
Looks like the test failed before the shard was ready. Saw bunch of |
The deployment |
Can we wait longer before making the test prompt call to confirm? |
This seems really long. I don't think this was the previous start-up time. The cloudbuild step has a Feel free to directly change the cloudbuild.yaml in this PR to change the behavior of the test |
Increased the timeout. @andrewsykim please kick off the CI once you get chance. Thanks! |
@gongmax you should be able to run it with "`/gcbrun" |
/gcbrun |
@gongmax from the build logs it looks like mistrla pod is "ready"
Can you try reproducing the whole RAG quick start solution to see why it's still returning 500? It's possible the pod is "ready" but not actually ready to serve because the mistral pod has no readiness probe |
I followed the "Installation" and "Launch the frontend chat interface" part of the README, tested via the frontend and all the request returned 200. |
Did you send a prompt to the frontend? That's the part of the test that is failing |
How long did you wait for the frontend to be up and running before you queried it? It seems likely we're trying to query the model before it's ready to serve because there's no readiness probe. But this is kind of weird because I would expect the cached weights to load faster than what we had before. |
Yes, that's what I mentioned before. From my local log, I can see |
There's an error at the bottom of that response "missing 2 required positional arguments: params and orig". Is that related? I believe @blackzlq made a change recently to show the prompt response and the error code for easier debugability |
This is really long startup time, is it expected? Either way if we expect longer start-up time we should add a readiness probe so that the |
Ignore this, the error is probably because you didn't run the notebook to generate vector embedings locally |
/gcbrun |
1 similar comment
/gcbrun |
/gcbrun |
1 similar comment
/gcbrun |
/gcbrun |
/gcbrun |
/gcbrun |
/gcbrun |
/gcbrun |
/gcbrun |
@@ -72,6 +79,15 @@ resource "kubernetes_deployment" "inference_deployment" { | |||
} | |||
|
|||
spec { | |||
init_container { | |||
name = "download-model" | |||
image = "google/cloud-sdk:473.0.0-alpine" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gongmax if possible it would be good to use an image hosted on GCR or AR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will investigate and address in a follow up PR
…#606) Fetch the cached weights for Mistral-7B-Instruct-v0.1 from in an init container. Also increase ephemeral storage (boot disk size).
…#606) Fetch the cached weights for Mistral-7B-Instruct-v0.1 from in an init container. Also increase ephemeral storage (boot disk size).
Mistral is now gated model which is breaks our RAG QSS. As a short-term mitigation, we now fetch the cached weights of the model from a GCS bucket.
Tested by:
and