Clarification Needed: Deepgram Self-Hosted Transcriptions Functioning Without Local Models #995

devalbham · 2024-11-12T09:34:51Z

devalbham
Nov 12, 2024

We’ve been testing Deepgram's self-hosted environment and built a simple client-server streaming app, which operates as follows:

The client captures microphone audio and streams it over WebSockets to our server.
Our server connects to Deepgram’s API (running in a Docker container on the same server) via ws://localhost:8080/v1/listen.
This Docker API interacts with another Docker container running Deepgram’s Engine (same server again), which transcribes the audio and sends it back to our server.
Our server then relays the transcription back to the client.
The setup functions well, but we have some concerns:
According to Deepgram’s documentation, the Engine should rely on Speech-to-Text models stored in the models folder on the server.
This was verified using a local curl test with a WAV file: deleting the models causes curl transcriptions to fail, as expected.
However, after removing the models from the models folder, our streaming client-server app continues to work without issues.

Questions:

Why does transcription continue to function with our streaming app even after model deletion? Shouldn’t this fail without local models?
Is there a fallback mechanism that routes transcription to Deepgram’s remote API when local models are unavailable?
If so, how can we disable this fallback to ensure transcriptions are fully local?

Please clarify, as per our discussion with the sales team, we cannot proceed with client testing
until we have a clear understanding of this behavior.

Thank you.

Answered by jkroll-deepgram

Nov 12, 2024

@devalbham, that's an interesting observation. Did you also successfully serve both batch and streaming requests before deleting both those models, to ensure the models were loaded in memory before being deleted?

Setting aside the exact mechanism, it sounds like the concern for your clients is that somehow Deepgram is using another cloud transcription endpoint as a fallback, is that right? I'll offer a few ways you can establish confidence that this isn't the case.

First, in the Deepgram Console, you can view Usages -> Logs, enter your request ID, and verify that the request was still Deployment: Self-Hosted.

Second, the only outbound access the containers need is to license.deepgram.com,…

View full answer

2024-11-12T09:35:20Z

deepgram-community[bot]
bot Nov 12, 2024

Hey there! It looks like you haven't connected your GitHub account to your Deepgram account. You can do this at https://community.deepgram.com - being verified through this process will allow our team to help you in a much more streamlined fashion.

0 replies

devalbham · 2024-11-12T09:35:21Z

deepgram-community[bot]
bot Nov 12, 2024

It looks like we're missing some important information to help debug your issue. Would you mind providing us with the following details in a reply?

The programming language you are working in (e.g. JavaScript, Python).
A request ID that triggered your error or issue.

1 reply

devalbham Nov 12, 2024
Author

FE App: React
BE server: python
Request ID: 0a5f8c7c-721e-4cb9-b7dc-252534331d35

Complete Deepgram Response: {'type': 'Results', 'channel_index': [0, 1], 'duration': 1.46, 'start': 1.02, 'is_final': True, 'speech_final': True, 'channel': {'alternatives': [{'transcript': 'Yeah. Can you hear my voice?', 'confidence': 0.9995117, 'words': [{'word': 'yeah', 'start': 1.26, 'end': 1.4, 'confidence': 0.9995117, 'speaker': 1, 'punctuated_word': 'Yeah.'}, {'word': 'can', 'start': 1.4, 'end': 1.54, 'confidence': 0.99902344, 'speaker': 1, 'punctuated_word': 'Can'}, {'word': 'you', 'start': 1.54, 'end': 1.68, 'confidence': 0.99902344, 'speaker': 1, 'punctuated_word': 'you'}, {'word': 'hear', 'start': 1.68, 'end': 1.8199999, 'confidence': 1.0, 'speaker': 1, 'punctuated_word': 'hear'}, {'word': 'my', 'start': 1.8199999, 'end': 1.98, 'confidence': 1.0, 'speaker': 1, 'punctuated_word': 'my'}, {'word': 'voice', 'start': 1.98, 'end': 2.48, 'confidence': 0.99902344, 'speaker': 0, 'punctuated_word': 'voice?'}]}]}, 'metadata': {'request_id': '0a5f8c7c-721e-4cb9-b7dc-252534331d35', 'model_info': {'name': '2-medical-nova', 'version': '2024-06-11.4692', 'arch': 'nova-2'}, 'model_uuid': '46658e06-4884-4582-9bc2-e92ef7baa396'}, 'from_finalize': False}

Issue: we shouldn't have got this response as there is no local model available.

jkroll-deepgram · 2024-11-12T15:37:10Z

jkroll-deepgram
Nov 12, 2024
Collaborator

Hi @devalbham, I see that the request ID you provided is served by the formatted, streaming Nova-2 Medical model.

Models are loaded into memory when the containers are launched, so as you've found, it can continue to serve requests, though this is precarious and not a recommended approach. When serving a larger number of models, they may be loaded and unloaded from memory, which would cause a failure later once a model is unloaded from memory. Also, if the containers were to be restarted, the models would no longer be available in container memory.

Traffic does not get rerouted to our hosted API when a request to a self-hosted model fails. Typically, if a model is not available (neither in your models directory, nor in memory), then you would get a 400 Bad Request error.

4 replies

devalbham Nov 12, 2024
Author

Hello Julia, thanks for your answer. Below are my follow-up questions:

if the models are loaded in the container memory, then why does the CURL command fail if I delete model from Models folder? Then CURL also should work using the models loaded into the memory?

The CURL command I used is: curl -X POST --data-binary @bueller.wav "http://localhost:8080/v1/listen?model=nova-2-medical&smart_format=true"

the response is:
{"err_code":"Bad Request","err_msg":"Bad Request: No such model/language/tier combination found. You could try the "2-general" model (language: en-US, Nova tier).","request_id":"56edf3b1-16d4-4e4e-bfe9-e0136fdc3bf8"}

I have manually deleted nova-2-medical model (but streaming transcription is still working)

jkroll-deepgram Nov 12, 2024
Collaborator

Hi @devalbham, streaming and pre-recorded/batch transcription use separate models. Is it possible that you deleted the Nova-2 Medical model for batch, but not streaming?

Here are the four Nova-2 Medical models:
nova-2-medical.en.formatted.batch.565f7f09.dg - formatted batch
nova-2-medical.en.formatted.streaming.46658e06.dg - formatted streaming
nova-2-medical.en.non-formatted.batch.349c83dc.dg - non-formatted batch
nova-2-medical.en.non-formatted.streaming.899378af.dg - non-formatted streaming

A couple other constraints to note:

A batch model cannot serve a streaming request, and vice-versa.
If a request is made with smart_format=true, a formatted model is preferred, but a non-formatted model will be used if no formatted model is available. When a request omits the smart formatting parameter, or specifies smart_format=false, then a non-formatted model must be available, otherwise a 400 is raised. A formatted model cannot serve a non-formatted request.

devalbham Nov 12, 2024
Author

Hi Julia,

Nope I double checked and I deleted ALL models to be on safe side. Result is same:
CURL Fails but Live Streaming Transcription works.

The mystery is whatever is applied 'Cached models' or 'other fallback model or mechanism' should be applied in the same way to both CURL and Live streaming. But it's not, which makes me think, is there a different fallback mechanism for Live Streaming transcription? Where can I get the documentation for the se to show to our clients, with sensitive data?

If not, how do we justify this the behaviour reported above.

Thanks,
Deval.

jkroll-deepgram Nov 12, 2024
Collaborator

@devalbham, that's an interesting observation. Did you also successfully serve both batch and streaming requests before deleting both those models, to ensure the models were loaded in memory before being deleted?

Setting aside the exact mechanism, it sounds like the concern for your clients is that somehow Deepgram is using another cloud transcription endpoint as a fallback, is that right? I'll offer a few ways you can establish confidence that this isn't the case.

First, in the Deepgram Console, you can view Usages -> Logs, enter your request ID, and verify that the request was still Deployment: Self-Hosted.

Second, the only outbound access the containers need is to license.deepgram.com, which verifies usage metadata for billing purposes. Our hosted API is at api.deepgram.com, and is never used in a self-hosted context.

After a self-hosted trial, we grant contract customers access to a License Proxy node, which proxies licensing information between the API container and Deepgram's license server. Here's the architecture diagram. Having the License Proxy in place in the future can further reassure your own company and your clients that the API and Engine nodes are not accessing the public internet.

Answer selected by devalbham

devalbham · 2024-11-12T16:57:45Z

devalbham
Nov 12, 2024
Author

Hi Julia, thanks, that helps a lot.

Coming back to the Mechanism, Yes, I served both successfully before I deleted models.
Now, after deleting models, I restarted containers, and now live streaming fails as well, establishing the point that for live streaming it must be accessing models from the memory.
Still it doesn't answer the core question, why did CURL fail earlier n not live streaming? Is it that Batch transcription path (CURL) doesn't check the models in the memory? Just out of curiosity.

1 reply

jkroll-deepgram Nov 12, 2024
Collaborator

@devalbham, I am not exactly sure why the CURL request for the batch transcription failed, but the streaming kept working. That's not a case that I've heard customers test before. My best guess is that it has to do with model caching, either a difference between batch and streaming, or more likely due to the conditions and state of your deployment and requests to it.

If you wanted to examine further, we do have a models endpoint for self-hosted, where a CURL request will tell you which models the deployment officially has available. You could test that out and see if there is a difference between the models that endpoint reports, and the models that are actually able to serve requests after being deleted.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deepgram

Clarification Needed: Deepgram Self-Hosted Transcriptions Functioning Without Local Models #995

{{title}}

Replies: 4 comments 6 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Deepgram

Clarification Needed: Deepgram Self-Hosted Transcriptions Functioning Without Local Models #995

devalbham Nov 12, 2024

Replies: 4 comments · 6 replies

deepgram-community[bot] bot Nov 12, 2024

deepgram-community[bot] bot Nov 12, 2024

devalbham Nov 12, 2024 Author

jkroll-deepgram Nov 12, 2024 Collaborator

devalbham Nov 12, 2024 Author

jkroll-deepgram Nov 12, 2024 Collaborator

devalbham Nov 12, 2024 Author

jkroll-deepgram Nov 12, 2024 Collaborator

devalbham Nov 12, 2024 Author

jkroll-deepgram Nov 12, 2024 Collaborator

devalbham
Nov 12, 2024

Replies: 4 comments 6 replies

deepgram-community[bot]
bot Nov 12, 2024

deepgram-community[bot]
bot Nov 12, 2024

devalbham Nov 12, 2024
Author

jkroll-deepgram
Nov 12, 2024
Collaborator

devalbham Nov 12, 2024
Author

jkroll-deepgram Nov 12, 2024
Collaborator

devalbham Nov 12, 2024
Author

jkroll-deepgram Nov 12, 2024
Collaborator

devalbham
Nov 12, 2024
Author

jkroll-deepgram Nov 12, 2024
Collaborator