Clarification Needed: Deepgram Self-Hosted Transcriptions Functioning Without Local Models #995
-
We’ve been testing Deepgram's self-hosted environment and built a simple client-server streaming app, which operates as follows:
Questions:
Please clarify, as per our discussion with the sales team, we cannot proceed with client testing Thank you. |
Beta Was this translation helpful? Give feedback.
Replies: 4 comments 6 replies
-
Hey there! It looks like you haven't connected your GitHub account to your Deepgram account. You can do this at https://community.deepgram.com - being verified through this process will allow our team to help you in a much more streamlined fashion. |
Beta Was this translation helpful? Give feedback.
-
It looks like we're missing some important information to help debug your issue. Would you mind providing us with the following details in a reply?
|
Beta Was this translation helpful? Give feedback.
-
Hi @devalbham, I see that the request ID you provided is served by the formatted, streaming Nova-2 Medical model. Models are loaded into memory when the containers are launched, so as you've found, it can continue to serve requests, though this is precarious and not a recommended approach. When serving a larger number of models, they may be loaded and unloaded from memory, which would cause a failure later once a model is unloaded from memory. Also, if the containers were to be restarted, the models would no longer be available in container memory. Traffic does not get rerouted to our hosted API when a request to a self-hosted model fails. Typically, if a model is not available (neither in your models directory, nor in memory), then you would get a 400 Bad Request error. |
Beta Was this translation helpful? Give feedback.
-
Hi Julia, thanks, that helps a lot. Coming back to the Mechanism, Yes, I served both successfully before I deleted models. |
Beta Was this translation helpful? Give feedback.
@devalbham, that's an interesting observation. Did you also successfully serve both batch and streaming requests before deleting both those models, to ensure the models were loaded in memory before being deleted?
Setting aside the exact mechanism, it sounds like the concern for your clients is that somehow Deepgram is using another cloud transcription endpoint as a fallback, is that right? I'll offer a few ways you can establish confidence that this isn't the case.
First, in the Deepgram Console, you can view Usages -> Logs, enter your request ID, and verify that the request was still
Deployment: Self-Hosted
.Second, the only outbound access the containers need is to
license.deepgram.com
,…