Zero length transcriptions #2249

PhiTiet · 2024-06-27T08:04:02Z

PhiTiet
Jun 27, 2024

I'm using the transcriptions API and I am noticing that in my responses I get words which have effectively 0 duration.
I'm using these parameters in the request:

POST -> https://api.openai.com/v1/audio/transcription 
responseformat = verbose_json
timestamp_granularities[] = word
model = whisper-1

For the most part, the transcription is accurate, but sometimes it will have a word where the start and end are seperated by less than a millisecond (which is also inaccurate when I listen to the mp3 which is also generated by openai)

Has anyone else had trouble with this/what can I do?

Example (I'm using java), see the word stairs.

After looking at the average duration/letter I suspect the words with zero duration 'belong' to the word prior

daskalou · 2024-08-21T12:44:25Z

daskalou
Aug 21, 2024

I can confirm we are seeing the same thing.

Regularly words are returned with 0 duration.

There's no consistency about whether they should belong to the prior word's timing or next word's.

This is in relatively short audio files, approximately 20-40 seconds long.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Zero length transcriptions #2249

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment

{{title}}

Select a reply

Zero length transcriptions #2249

PhiTiet Jun 27, 2024

Replies: 1 comment

daskalou Aug 21, 2024

PhiTiet
Jun 27, 2024

daskalou
Aug 21, 2024