Skip to content

Commit

Permalink
added alignment.locate() and updated all docstring
Browse files Browse the repository at this point in the history
-added `alignment.locate()` to locate where in specific words / phrase are spoken in an audio without transcribing; significantly faster than transcribing then `result.WhisperResult.find()`
-updated all docstring to be consistent with more common practices (to ease future documentation generation)
-renamed the parameter `original_spit` to `original_split` for `alignment.align()`
-the parameters: `time_scale`, `input_sr`, `demucs_output`, `demucs_device` are deprecated for all functions and method, except for `input_sr` which is not deprecated for `non_whisper.transcribe_any()`
-fixed `alignment.align()` not work if `text` is an instance of `result.WhisperResult` without tokens but words
-added the method `to_display_str()` to `result.Segment` as a consistent was to format a segment for printing out when `verbose=True` for all transcription functions that uses it
-improved efficiency of segment splitting for `alignment.align()` when `original_split=True`; significantly faster, especially with extremely long `text`.
-added parameters: `demucs`, `demucs_options`, `only_voice_freq` to `alignment.refine()`
-refactored the audio preprocessing in most transcription functions into `audio.prep_audio()`
-the parameter, `demucs` now also accept instance of a Demucs model instance instead of a bool; model can be loading with `audio.load_demucs_model()`
-remove `__is_whisper_repo_version` from `utils.py` so that `result.py` does not depend on Whisper
-added `utils.format_timestamp()` and `utils.make_safe()` from `whisper.utils.py`
-added `utils.safe_print()`, a wrapper for printing content returned by `utils.make_safe()`
-changed the parameter `audio` such that it is always expected to be 16kHz if `audio` a `torch.Tensor` or `numpy.ndarray`
-added the parameter, `demucs_options` to `whisper_word_level.load_faster_whisper.faster_transcribe` so that `demucs_option` can used with faster-whisper
-set `action="extend"` for all CLI keyword arguments that take multiple values;
 allowing for `-o` example to be use like:  `-o 1.srt -o 2.srt 3.srt` instead of only `-o 1.srt 2.srt 3.srt`
  • Loading branch information
jianfch committed Oct 21, 2023
1 parent 83ae509 commit a777206
Show file tree
Hide file tree
Showing 15 changed files with 1,660 additions and 864 deletions.
60 changes: 38 additions & 22 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,9 +49,9 @@ stable-ts audio.mp3 -o audio.srt
</details>

Parameters:
[load_model()](https://github.com/jianfch/stable-ts/blob/main/stable_whisper/whisper_word_level.py#L858-L883),
[transcribe()](https://github.com/jianfch/stable-ts/blob/main/stable_whisper/whisper_word_level.py#L74-L227),
[transcribe_minimal()](https://github.com/jianfch/stable-ts/blob/main/stable_whisper/whisper_word_level.py#L677-L699)
[load_model()](https://github.com/jianfch/stable-ts/blob/main/stable_whisper/whisper_word_level.py#L985-L1014),
[transcribe()](https://github.com/jianfch/stable-ts/blob/main/stable_whisper/whisper_word_level.py#L74-L211),
[transcribe_minimal()](https://github.com/jianfch/stable-ts/blob/main/stable_whisper/whisper_word_level.py#L650-L723)

<details>
<summary>faster-whisper</summary>
Expand All @@ -62,7 +62,7 @@ model = stable_whisper.load_faster_whisper('base')
result = model.transcribe_stable('audio.mp3')
```
Parameters:
[transcribe_stable()](https://github.com/jianfch/stable-ts/blob/main/stable_whisper/whisper_word_level.py#L772-L796),
[transcribe_stable()](https://github.com/jianfch/stable-ts/blob/main/stable_whisper/whisper_word_level.py#L835-L912),

</details>

Expand All @@ -75,9 +75,10 @@ result.to_ass('audio.ass') #ASS
result.to_tsv('audio.tsv') #TSV
```
Parameters:
[to_srt_vtt()](https://github.com/jianfch/stable-ts/blob/main/stable_whisper/text_output.py#L261-L297),
[to_ass()](https://github.com/jianfch/stable-ts/blob/main/stable_whisper/text_output.py#L393-L440),
[to_tsv()](https://github.com/jianfch/stable-ts/blob/main/stable_whisper/text_output.py#L329-L359)
[to_srt_vtt()](https://github.com/jianfch/stable-ts/blob/main/stable_whisper/text_output.py#L260-L302),
[to_ass()](https://github.com/jianfch/stable-ts/blob/main/stable_whisper/text_output.py#L406-L459),
[to_tsv()](https://github.com/jianfch/stable-ts/blob/main/stable_whisper/text_output.py#L334-L372)
[save_as_json()](https://github.com/jianfch/stable-ts/blob/main/stable_whisper/text_output.py#L522-L531)
<br /><br />
There are word-level and segment-level timestamps. All output formats support them.
They also support will both levels simultaneously except TSV.
Expand Down Expand Up @@ -172,7 +173,7 @@ stable-ts audio.mp3 --align text.txt --language en
</details>

Parameters:
[align()](https://github.com/jianfch/stable-ts/blob/main/stable_whisper/alignment.py#L27-L84)
[align()](https://github.com/jianfch/stable-ts/blob/main/stable_whisper/alignment.py#L56-L153)

#### Adjustments
Timestamps are adjusted after the model predicts them.
Expand All @@ -185,7 +186,7 @@ Note: both results are required to have word timestamps and matching words.
result.adjust_by_result(new_result)
```
Parameters:
[adjust_by_result()](https://github.com/jianfch/stable-ts/blob/main/stable_whisper/result.py#L710-L723)
[adjust_by_result()](https://github.com/jianfch/stable-ts/blob/main/stable_whisper/result.py#L754-L765)

### Refinement
Timestamps can be further improved with `refine()`.
Expand All @@ -211,7 +212,7 @@ stable-ts result.json --refine -o audio.srt --refine_option "audio=audio.mp3"
</details>

Parameters:
[refine()](https://github.com/jianfch/stable-ts/blob/main/stable_whisper/alignment.py#L246-L316)
[refine()](https://github.com/jianfch/stable-ts/blob/main/stable_whisper/alignment.py#L348-L419)


### Regrouping Words
Expand Down Expand Up @@ -241,19 +242,34 @@ result0.reset()
```
Any regrouping algorithm can be expressed as a string. Please feel free share your strings [here](https://github.com/jianfch/stable-ts/discussions/162)
#### Regrouping Methods
- [regroup()](https://github.com/jianfch/stable-ts/blob/main/stable_whisper/result.py#L1145-L1195)
- [split_by_gap()](https://github.com/jianfch/stable-ts/blob/main/stable_whisper/result.py#L881-L893)
- [split_by_punctuation()](https://github.com/jianfch/stable-ts/blob/main/stable_whisper/result.py#L934-L945)
- [split_by_length()](https://github.com/jianfch/stable-ts/blob/main/stable_whisper/result.py#L997-L1018)
- [merge_by_gap()](https://github.com/jianfch/stable-ts/blob/main/stable_whisper/result.py#L905-L923)
- [merge_by_punctuation()](https://github.com/jianfch/stable-ts/blob/main/stable_whisper/result.py#L957-L975)
- [merge_all_segments()](https://github.com/jianfch/stable-ts/blob/main/stable_whisper/result.py#L982-L984)
- [clamp_max()](https://github.com/jianfch/stable-ts/blob/main/stable_whisper/result.py#L1038-L1055)
- [lock()](https://github.com/jianfch/stable-ts/blob/main/stable_whisper/result.py#L1096-L1114)
- [regroup()](https://github.com/jianfch/stable-ts/blob/main/stable_whisper/result.py#L1226-L1277)
- [split_by_gap()](https://github.com/jianfch/stable-ts/blob/main/stable_whisper/result.py#L923-L937)
- [split_by_punctuation()](https://github.com/jianfch/stable-ts/blob/main/stable_whisper/result.py#L981-L995)
- [split_by_length()](https://github.com/jianfch/stable-ts/blob/main/stable_whisper/result.py#L1055-L1084)
- [merge_by_gap()](https://github.com/jianfch/stable-ts/blob/main/stable_whisper/result.py#L948-L970)
- [merge_by_punctuation()](https://github.com/jianfch/stable-ts/blob/main/stable_whisper/result.py#L1007-L1028)
- [merge_all_segments()](https://github.com/jianfch/stable-ts/blob/main/stable_whisper/result.py#L1035-L1042)
- [clamp_max()](https://github.com/jianfch/stable-ts/blob/main/stable_whisper/result.py#L1105-L1127)
- [lock()](https://github.com/jianfch/stable-ts/blob/main/stable_whisper/result.py#L1167-L1190)

### Locating Words
You can locate words with regular expression.
There are two ways to locate words.
The first way is by approximating time at which the words are spoken
then transcribing a few seconds around that approximating times as need.
This also the faster way for locating words.
```python
matches = model.locate('audio.mp3', 'are', 'English')
for match in matches:
print(match.to_display_str())
# verbose=True does the same thing as this for-loop.
```
Parameters:
[locate()](https://github.com/jianfch/stable-ts/blob/main/stable_whisper/alignment.py#L728-L824)

The second way allows you to locate words with regular expression,
but it requires the audio to be fully transcribed first.
```python
result = model.transcribe('audio.mp3')
# Find every sentence that contains "and"
matches = result.find(r'[^.]+and[^.]+\.')
# print the all matches if there are any
Expand All @@ -272,7 +288,7 @@ for match in matches:
f'end: {match.end}\n')
```
Parameters:
[find()](https://github.com/jianfch/stable-ts/blob/main/stable_whisper/result.py#L1232-L1248)
[find()](https://github.com/jianfch/stable-ts/blob/main/stable_whisper/result.py#L1314-L1330)

### Tips
- do not disable word timestamps with `word_timestamps=False` for reliable segment timestamps
Expand Down Expand Up @@ -318,7 +334,7 @@ stable_whisper.encode_video_comparison(
)
```
Parameters:
[encode_video_comparison()](https://github.com/jianfch/stable-ts/blob/main/stable_whisper/video_output.py#L29-L91)
[encode_video_comparison()](https://github.com/jianfch/stable-ts/blob/main/stable_whisper/video_output.py#L29-L73)

#### Multiple Files with CLI
Transcribe multiple audio files then process the results directly into SRT files.
Expand Down
2 changes: 1 addition & 1 deletion stable_whisper/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,4 +5,4 @@
from .stabilization import visualize_suppression
from .non_whisper import transcribe_any
from ._version import __version__
from .utils import _is_whisper_repo_version, _required_whisper_ver, _COMPATIBLE_WHISPER_VERSIONS
from .utils import _required_whisper_ver, _COMPATIBLE_WHISPER_VERSIONS
2 changes: 1 addition & 1 deletion stable_whisper/_version.py
Original file line number Diff line number Diff line change
@@ -1 +1 @@
__version__ = "2.12.3"
__version__ = "2.13.0"
Loading

0 comments on commit a777206

Please sign in to comment.