Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for HLS WebVTT subtitles #6106

Open
yadayada opened this issue Jun 27, 2015 · 10 comments
Open

Add support for HLS WebVTT subtitles #6106

yadayada opened this issue Jun 27, 2015 · 10 comments

Comments

@yadayada
Copy link

JSON dumps of CICGC URLs already include a link to sliced English subtitles in WebVTT format that could easily be downloaded using ffmpeg. It would be nice if the ComCarCoff extractor was able to detect these.

@yadayada
Copy link
Author

Any URL should do. As far as I can tell, any episode includes subs.

For example, take
http://comediansincarsgettingcoffee.com/bill-maher-the-comedy-team-of-smug-and-arrogant

Using --dump-json you'll find a subtitle URL in the format object with id 398, e.g.
http://content-ause2.uplynk.com/a1a924b17afe47a0b86c0c4bd085fc4c/sub4.m3u8?ad=crackle_live&pbs=c97ce4e47cdf453eaad38536c7ceb4e1

@fstirlitz
Copy link
Contributor

I got some code to do it. There are two problems:

  • Since the URL points to an m3u8 manifest, it has to be post-processed to obtain offline-viewable subtitles. My code just injects a FFmpegSubtitlesConvertorPP (always converting to WebVTT) unless one's already been added.
  • FFmpeg (git snapshot as of 2015-06-23) cannot actually download playable subtitles. The downloaded WebVTT mixes up timestamps from multiple video fragments. The .vtt fragment files contain X-TIMESTAMP-MAP tags that can be used to synchronise the subtitles against the video, but FFmpeg doesn't make use of them.

Also, FYI:

I suppose the short-term option is to write a parser for HLS WebVTT streams…

@remitamine
Copy link
Collaborator

i found this also in abc7news, so i created a function that download the segments of the subtitle and convert it to normal webvtt with the real time of each statment that ffmpeg can convert into ass or srt but the problem now is in the _extract_m3u8_formats function it return an array of video formats but not always the m3u8 variant playlist contain only videos somtimes it contain also subtitles or audio parts.
i think it shouldn't return a formats list is should return a dictionary contains formats and subtitles if there is and creating a new function only to extract the urls of subtitles from the variant playlist is not a good solution.
if this changed it will break the compatibilty of the extractors that use the function but it can solved with simple changes in every extractor that uses the function.
i know that in the formats array i can found the subtitle url in m3u8_media of the first format but i think it's not the right place.

@remitamine
Copy link
Collaborator

as i see @fstirlitz make a pull request and he make a great work so i will put the code i made in a gist may be he can benefit from it.
m3u8 webvtt download function

@remitamine
Copy link
Collaborator

support for subtitle extraction(ttml) will be added in the next version.

remitamine added a commit that referenced this issue Mar 30, 2016
previously extraction has been delegated to crackle to extract more info
and subtitles #6106 but some of the episodes can't be extracted using
crackle #8995.
@remitamine remitamine reopened this Mar 30, 2016
@dstftw dstftw changed the title ComCarCoff subtitle support Add support for HLS WebVTT subtitles Jan 27, 2018
@ytdl-org ytdl-org deleted a comment from Sopor Feb 10, 2018
@AndnixSH
Copy link

AndnixSH commented Jan 29, 2019

When will it be possible to download subtitle from dplay.dk?

I want to watch video in MPV+SVP4 with placed youtube-dl.exe from rg3.github.io (since the pre-built youtube-dl by SVP4 did not work correctly with dplay) to watch 30fps videos in 60fps

@ngdio
Copy link

ngdio commented Feb 14, 2019

@AndnixSH you should open a seperate issue for that

@ghost
Copy link

ghost commented Dec 3, 2019

I might have same issue with dplay.dk that subs could not be found
This is 4 years old issue. When is it going to be fixed?

@ghost
Copy link

ghost commented Feb 16, 2020

Anyone?

@xybydy
Copy link

xybydy commented Mar 27, 2021

wow 5 years and it's still open

pukkandan added a commit to yt-dlp/yt-dlp that referenced this issue Apr 28, 2021
Authored by fstirlitz
Modified from: ytdl-org/youtube-dl#6144

Closes: #73
Fixes:
ytdl-org/youtube-dl#6106
ytdl-org/youtube-dl#14977
ytdl-org/youtube-dl#21438
ytdl-org/youtube-dl#23609
ytdl-org/youtube-dl#28132

Might also fix (untested):
ytdl-org/youtube-dl#15424
ytdl-org/youtube-dl#18267
ytdl-org/youtube-dl#23899
ytdl-org/youtube-dl#24375
ytdl-org/youtube-dl#24595
ytdl-org/youtube-dl#27899

Related:
ytdl-org/youtube-dl#22379
ytdl-org/youtube-dl#24517
ytdl-org/youtube-dl#24886
ytdl-org/youtube-dl#27215

Notes:
* The functions `extractor.common._extract_..._formats` are still kept for compatibility
* Only some extractors have currently been moved to using `_extract_..._formats_and_subtitles`
* Direct subtitle manifests (without a master) are not supported and are wrongly identified as containing video formats
* AES support is untested
* The fragmented TTML subtitles extracted from DASH/ISM are valid, but are unsupported by `ffmpeg` and most video players
    * Their XML fragments can be dumped using `ffmpeg -i in.mp4 -f data -map 0 -c copy out.ttml`.
        Once the unnecessary headers are stripped out of this, it becomes a valid self-contained ttml file
    * The ttml subs downloaded from DASH manifests can also be directly opened with <https://github.com/SubtitleEdit>
* Fragmented WebVTT files extracted from DASH/ISM are also unsupported by most tools
    * Unlike the ttml files, the XML fragments of these cannot be dumped using `ffmpeg`
    * The webtt subs extracted from DASH can be parsed by <https://github.com/gpac/gpac>
    * But validity of the those extracted from ISM are untested
@ghost ghost mentioned this issue Sep 14, 2021
3 tasks
nixxo pushed a commit to nixxo/yt-dlp that referenced this issue Nov 22, 2021
Authored by fstirlitz
Modified from: ytdl-org/youtube-dl#6144

Closes: #73
Fixes:
ytdl-org/youtube-dl#6106
ytdl-org/youtube-dl#14977
ytdl-org/youtube-dl#21438
ytdl-org/youtube-dl#23609
ytdl-org/youtube-dl#28132

Might also fix (untested):
ytdl-org/youtube-dl#15424
ytdl-org/youtube-dl#18267
ytdl-org/youtube-dl#23899
ytdl-org/youtube-dl#24375
ytdl-org/youtube-dl#24595
ytdl-org/youtube-dl#27899

Related:
ytdl-org/youtube-dl#22379
ytdl-org/youtube-dl#24517
ytdl-org/youtube-dl#24886
ytdl-org/youtube-dl#27215

Notes:
* The functions `extractor.common._extract_..._formats` are still kept for compatibility
* Only some extractors have currently been moved to using `_extract_..._formats_and_subtitles`
* Direct subtitle manifests (without a master) are not supported and are wrongly identified as containing video formats
* AES support is untested
* The fragmented TTML subtitles extracted from DASH/ISM are valid, but are unsupported by `ffmpeg` and most video players
    * Their XML fragments can be dumped using `ffmpeg -i in.mp4 -f data -map 0 -c copy out.ttml`.
        Once the unnecessary headers are stripped out of this, it becomes a valid self-contained ttml file
    * The ttml subs downloaded from DASH manifests can also be directly opened with <https://github.com/SubtitleEdit>
* Fragmented WebVTT files extracted from DASH/ISM are also unsupported by most tools
    * Unlike the ttml files, the XML fragments of these cannot be dumped using `ffmpeg`
    * The webtt subs extracted from DASH can be parsed by <https://github.com/gpac/gpac>
    * But validity of the those extracted from ISM are untested
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants