Replies: 22 comments 49 replies
-
The BBC requests line length not exceed 37 characters and offers guidance on how and when to segment lines. |
Beta Was this translation helpful? Give feedback.
-
So have you managed to find a way to break up the lines of the srt file? |
Beta Was this translation helpful? Give feedback.
-
Hi Interestingly your SRT sample is showing values other than ,000 ms in the timing. What command line options are you using to transcribe and create the SRT files? I am assuming you are using the CLI tool ? Subtitle accuracy is something I am interested in. Also in my tests, the first registered word is logged at 00:00 even through it's 5 seconds into the audio file Many thanks Darren B. |
Beta Was this translation helpful? Give feedback.
-
HI Here's a fragment what i did (in js) r[2] just contains the texts as a string and data contains the growning content of my vtt output it only allows for breaking the text once. rough but it works for testing if (r[2].toString().length > 42) |
Beta Was this translation helpful? Give feedback.
-
It would be nice to have this be either built into the .srt output function or allow another variable that would define how many characters in a line and how many lines to display at once. |
Beta Was this translation helpful? Give feedback.
-
Took a stab at implementing this on a fork here: rBrenick@6f2e2aa Added an optional parameter called Did not do any clever grammatical analysis to keep groups of words together, just simple split by length. However, as I realise after writing this. It might be more suited as a post-processing step for existing .srt files. Since it might be considered outside the scope of this repository. So I've ported to a quick standalone script as well. |
Beta Was this translation helpful? Give feedback.
-
With current When word level timestamps are released, this is a much easier problem to solve (apart from the "don't split up meaningful phrases" guidance) and I think it would make sense to include in |
Beta Was this translation helpful? Give feedback.
-
Any news of accomplishing this without doing a line break? |
Beta Was this translation helpful? Give feedback.
-
Hello OpenAl, |
Beta Was this translation helpful? Give feedback.
-
Have resorted to creating an additional untility to process the Turns this: 00:00:00,000 --> 00:00:09,600
For the next week, Mount Maunganui will be home to New Zealand's first ever Wingfoil To this: 1
00:00:00,000 --> 00:00:09,600
For the next week, Mount Maunganui will be
home to New Zealand's first ever Wingfoil /// Takes a `String`, caps line length by `lineCharLimit` and returns a `Subtitle`.
List<Subtitle> fromSRT(String srtContent, {lineCharLimit = 50}) {
try {
final regex = RegExp(
r'(\d+)\n'
r'(\d{2}:\d{2}:\d{2},\d{3}) --> (\d{2}:\d{2}:\d{2},\d{3})\n'
r'((?:[^\n]+\n?)+)',
dotAll: true,
);
final matches = regex.allMatches(srtContent);
return matches.map((match) {
final index = int.parse(match.group(1)!);
final startTime = parseDuration(match.group(2)!);
final endTime = parseDuration(match.group(3)!);
var text = match.group(4)!.trim();
if (text.length > lineCharLimit) {
final _splitText = splitText(lineCharLimit, text);
text = _splitText.join('\n');
}
return Subtitle(
index: index,
startTime: startTime,
endTime: endTime,
text: text,
);
}).toList();
} catch (e) {
rethrow;
}
} One consideration though is that before we even consider line breaks, it would be ideal to specify a limit on the maximum duration of a subtitle before whisper generates the proceeding subtitle. From there, the ability to also specify the maximum number of line breaks, followed by the character limit as this topic discussed. Not directly related to this discussion, but additionally, the ability to join gaps that are less than a specified duration In cases where the time between subtitles is so short, that it's not worth allowing a gap between them and rather have them "butted up" to eachother. Currently we're seeing continuous (gapless) segments which I assumed is WIP currently. |
Beta Was this translation helpful? Give feedback.
-
Unfortunately my channel closed :(
…On 23 Mar 2023 Thu at 20:06 mayeaux ***@***.***> wrote:
Now that this is shipped do you think the idea of adding a maximum number
of characters per line while still outputting accurate timestamps is more
feasible?
—
Reply to this email directly, view it on GitHub
<#314 (reply in thread)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/APNQ2E2QQ57UZHPL7Y23DATW5R7JLANCNFSM6AAAAAAREUYEPI>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
Following the approach whisper takes in defining a writer for text, subtitles, etc. here is a class for creating subtitles with a maximum line length and number of lines. Appreciate any comments if you are able to try it out. @jongwook would you consider something like this in a pull request? Example usage using a .json file generated with import json
writer = SubtitlesWriterTimed(max_line_count=2, max_line_length=42)
js = open("test.json")
jsdata = json.load(js)
with open("test.srt", "w") as f:
writer.write_result(jsdata, f) The class from whisper.utils import SubtitlesWriter
from typing import TextIO
class SubtitleBlock:
def __init__(self, max_line_count, max_line_length):
self.max_line_count = max_line_count
self.max_line_length = max_line_length
self.block_start = 0
self.block_end = 0
self.block = [""]
self.line = 0
def add_word(self, word_timed):
word = word_timed["word"]
if not self.block_start:
self.block_start = word_timed["start"]
self.block_end = word_timed["end"] # provisional end time
if len(self.block[self.line]) + len(word) > self.max_line_length and word[0] != "-": # don't split hyphenated words over lines
self.line += 1
self.block.append("")
self.block[self.line] += word
def is_complete_before(self, word_timed) -> bool:
"""Indicate if the upcoming word won't fit and a new block should be started"""
word = word_timed["word"]
max_length = len(self.block[self.line]) + len(word) > self.max_line_length
no_tail_hyphenation = word[0] != "-"
max_lines = self.line + 1 >= self.max_line_count
return max_length and no_tail_hyphenation and max_lines
def do_yield(self, formatter):
text = "\n".join([line.strip() for line in self.block])
yield formatter(self.block_start), formatter(self.block_end), text
class SubtitlesWriterTimed(SubtitlesWriter):
"""Write an .srt file after transcribing with word_timestamps enabled,
imposing a maximum line length and number of lines per entry.
"""
always_include_hours = True
decimal_marker = ","
def __init__(self, max_line_count=1, max_line_length=42):
self.max_line_count = max_line_count
self.max_line_length = max_line_length
def iterate_result(self, result: dict):
block = SubtitleBlock(self.max_line_count, self.max_line_length)
for segment in result["segments"]:
for word_timed in segment["words"]: # .word, .start, .end
if block.is_complete_before(word_timed):
yield from block.do_yield(self.format_timestamp)
block = SubtitleBlock(self.max_line_count, self.max_line_length)
block.add_word(word_timed)
yield from block.do_yield(self.format_timestamp)
def write_result(self, result: dict, file: TextIO):
for i, (start, end, text) in enumerate(self.iterate_result(result), start=1):
print(f"{i}\n{start} --> {end}\n{text}\n", file=file, flush=True)
|
Beta Was this translation helpful? Give feedback.
-
I am reading the BBC Subtitle Guidelines mentioned in this thread, and wanted to point out a key aspect of the maximum line length recommendation.
Also, as we already know
So fixing line length to 37 characters, for example, is not correct in general for video consumed online. The other implication is that ideally we would be able to segment sentences multi-lingually. I had a quick look at this in spaCy and NLTK. In the end, reliably segmenting sentences integrated with whisper means introducing a dependency on an NLP library. https://www.bbc.co.uk/accessibility/forproducts/guides/subtitles/ |
Beta Was this translation helpful? Give feedback.
-
@brainwane Is the latest whisper (with line length and number of lines options) solving this problem for you? If not, I have a preliminary approach which separates the text into individual sentences, and chooses line breaks gramatically where possible (broadly following the Netflix guidelines). It also realigns the subtitle timing. Still a lot of testing to do on it - if you have a word timestamped .json (English only for now) I can generate an .srt if you want to give it a try. |
Beta Was this translation helpful? Give feedback.
-
I'm an absolute novice so sorry for the stupidity of this question but.. I'm seeing in the thread it looks that Whisper has been updated with the functionality to specify maximum line widths. I've been using the collab example (because I have no idea how to code or anything), does THAT have the new functionality in it? And if so how do I use that function? At the moment I just run all of this in LibriSpeech.ipynb, is there a line of code I can add in that will make it export with max line widths as specified? Again, sorry if all of this is intensely stupid. ! pip install git+https://github.com/openai/whisper.git import os try: import torch from tqdm.notebook import tqdm DEVICE = "cuda" if torch.cuda.is_available() else "cpu" !whisper "episode.mp3" --model medium |
Beta Was this translation helpful? Give feedback.
-
Okay, here's a json file: |
Beta Was this translation helpful? Give feedback.
-
This piece of code working fine for me. import whisper
from whisper.utils import get_writer
audio = './audio.mp3'
model = whisper.load_model(model='small')
result = model.transcribe(audio=audio, language='en', word_timestamps=True, task="transcribe")
# Set VTT Line and words width
word_options = {
"highlight_words": False,
"max_line_count": 1,
"max_line_width": 42
}
vtt_writer = get_writer(output_format='vtt', output_dir='./')
vtt_writer(result, audio, word_options) |
Beta Was this translation helpful? Give feedback.
-
@glangford are you still working on that approch that would split lines gramatically where possible? I'm using Whisper for my studies and I notice (regrettably) while using --word_timestamps True --max_line_width 42 --max_line_count 2 a lot of truncated sentences in the output, where the next line contains just the last or the last two words of a sentence before a full stop. If you found a solution to this kind of problem, I would be really grateful to hear it. |
Beta Was this translation helpful? Give feedback.
-
Hi, from version v20231105 there is an extra option called --max_words_per_line to fix a maximum number of words per subtitle line. You can check the PR for more details and make your own tests. From my experience, subtitles generated using this option are more pleasant comparing with the results I obtained using --max_line_width. Depending on the length of the words, subtitles can be longer or smaller, but you can expect always a maximum amount of words per line. Additionally, --max_words_per_line will respect end of the segments, so when a sentence finish and there is a small gap of time to start the next one, subtitle lines won't join the end with the start and they will keep less time hanging in the screen. That means that finishing lines can have less words than the number you set, like:
Code example
Still if you want to have a better prediction of the subtitles lenght, the usage of --max_line_width is a better choice. Notice that you need word_timestamps=True to make it work. |
Beta Was this translation helpful? Give feedback.
-
Keep subtitles to 42 characters per line seems too few. Because some speakers talk fast, it will be difficult to read the subtitles. |
Beta Was this translation helpful? Give feedback.
-
The original teletext standard was a max of 32 chars per line.
42 chars, including white space chars is a good option and works well for TV, both broadcast and online.
On Mon, Mar 4, 2024, at 5:06 PM, Will 保哥 wrote:
Keep subtitles to 42 characters per line seems too few. Because some speakers talk fast, it will be difficult to read the subtitles.
—
Reply to this email directly, view it on GitHub <#314 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AORH3E6KC33K5X2ET2JIAHTYWQMO7AVCNFSM6AAAAAAREUYEPKVHI2DSMVQWIX3LMV43SRDJONRXK43TNFXW4Q3PNVWWK3TUHM4DMNRSHA2DC>.
You are receiving this because you commented.Message ID: ***@***.***>
This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify the system manager. This message contains confidential information and is intended only for the individual named. If you are not the named addressee you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately by e-mail if you have received this e-mail by mistake and delete this e-mail from your system. If you are not the intended recipient you are notified that disclosing, copying, distributing or taking any action in reliance on the contents of this information is strictly prohibited.
|
Beta Was this translation helpful? Give feedback.
-
|
Beta Was this translation helpful? Give feedback.
-
It's very cool that Whisper can emit a
.srt
subtitle file!For English, it's best to keep subtitles to 42 characters per line, as Amara.org suggests and as Netflix also suggests (see those pages for suggestions for several other languages). That ensures that the subtitles will render reasonably well on most displays.
The
.srt
I just got from a Whisper run had some lines that were 50-60 characters long, which is longer than some displays will render well; some of the beginning or end of the line is likely to be cut off. For example, in this.srt
:the second line is 56 characters long and the fourth is 59 characters long.
I'd love for
write_srt
to break up lines appropriately, so that instead of always emitting a single line for a timestamp, it sometimes breaks up the subtitle into 2-3 lines.whisper/whisper/utils.py
Line 63 in 02b7430
When deciding where to split a line of text, per Amara's guidance,
I don't know as much about the characters-per-line conventions for
.vtt
files but perhaps a similar approach could be used to improve those as well.Beta Was this translation helpful? Give feedback.
All reactions