Al Jazeera seems broken with the current version of youtube-dl #27779

palmer-eldritch · 2021-01-12T00:02:26Z

Checklist

I'm reporting a broken site support issue
I've verified that I'm running youtube-dl version 2021.01.08
I've checked that all provided URLs are alive and playable in a browser
I've checked that all URLs and arguments with special characters are properly quoted or escaped
I've searched the bugtracker for similar bug reports including closed ones
I've read bugs section in FAQ

Verbose log

$ youtube-dl --verbose https://www.aljazeera.com/program/101-east/2021/1/8/mongolia-from-the-steppe-to-the-slum
[debug] System config: []
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: ['--verbose', 'https://www.aljazeera.com/program/101-east/2021/1/8/mongolia-from-the-steppe-to-the-slum']
[debug] Encodings: locale UTF-8, fs utf-8, out utf-8, pref UTF-8
[debug] youtube-dl version 2021.01.08
[debug] Git HEAD: 806b30d6
[debug] Python version 3.9.0 (CPython) - Linux-5.9.0-5-amd64-x86_64-with-glibc2.31
[debug] exe versions: ffmpeg 4.3.1, ffprobe 4.3.1, phantomjs 2.1.1, rtmpdump 2.4
[debug] Proxy map: {}
[generic] mongolia-from-the-steppe-to-the-slum: Requesting header
WARNING: Falling back on generic information extractor.
[generic] mongolia-from-the-steppe-to-the-slum: Downloading webpage
[generic] mongolia-from-the-steppe-to-the-slum: Extracting information
ERROR: Unsupported URL: https://www.aljazeera.com/program/101-east/2021/1/8/mongolia-from-the-steppe-to-the-slum
Traceback (most recent call last):
  File "/home/youen/.pyenv/versions/youtube-dl/lib/python3.9/site-packages/youtube_dl-2021.1.8-py3.9.egg/youtube_dl/YoutubeDL.py", line 803, in wrapper
    return func(self, *args, **kwargs)
  File "/home/youen/.pyenv/versions/youtube-dl/lib/python3.9/site-packages/youtube_dl-2021.1.8-py3.9.egg/youtube_dl/YoutubeDL.py", line 824, in __extract_info
    ie_result = ie.extract(url)
  File "/home/youen/.pyenv/versions/youtube-dl/lib/python3.9/site-packages/youtube_dl-2021.1.8-py3.9.egg/youtube_dl/extractor/common.py", line 532, in extract
    ie_result = self._real_extract(url)
  File "/home/youen/.pyenv/versions/youtube-dl/lib/python3.9/site-packages/youtube_dl-2021.1.8-py3.9.egg/youtube_dl/extractor/generic.py", line 3467, in _real_extract
    raise UnsupportedError(url)
youtube_dl.utils.UnsupportedError: Unsupported URL: https://www.aljazeera.com/program/101-east/2021/1/8/mongolia-from-the-steppe-to-the-slum

Description

I've tried downloading a video from Al Jazeera using youtube-dl and get "Unsupported URL" error. I think the extractor might be broken since the last update to it is from 2017 and they've most likely made some changes since.

I'm a python developer and tried quickly to look at the code of the extractor and the InfoExtractor class it's based on. But I'm very unfamiliar with youtube-dl internal so I thought opening a bug request might be an easier way to solve the problem since you have people familiar with the structure of the project that might be able to troobleshoot the issue much faster than I could if I had to dig deep into youtube-dl to understand its internals.

As put in the log, the url of the video is https://www.aljazeera.com/program/101-east/2021/1/8/mongolia-from-the-steppe-to-the-slum which definitely doesn't match the _VALID_URL regex in the extractor. I naively tried to add a "|program" to the regex but suspect there might be much deeper things that changed rather than just the url format and as I said, I have no understanding of the way youtube-dl extractors work so I thought someone on the team might be more qualified to do this quickly as trying to do it myself would involve digging into youtube-dl's code and I'm not sure I could find some personal time to do this.

If someone of the team was willing to look into it, I'd be super glad, because I believe, as it stands today, the extractor for Al Jazeera is broken. It can't download new videos, and by trying the url put in the test of the extractor, all I got was:

$ youtube-dl --verbose http://www.aljazeera.com/programmes/the-slum/2014/08/deliverance-201482883754237240.html
[debug] System config: []
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: ['--verbose', 'http://www.aljazeera.com/programmes/the-slum/2014/08/deliverance-201482883754237240.html']
[debug] Encodings: locale UTF-8, fs utf-8, out utf-8, pref UTF-8
[debug] youtube-dl version 2021.01.08
[debug] Git HEAD: 806b30d6
[debug] Python version 3.9.0 (CPython) - Linux-5.9.0-5-amd64-x86_64-with-glibc2.31
[debug] exe versions: ffmpeg 4.3.1, ffprobe 4.3.1, phantomjs 2.1.1, rtmpdump 2.4
[debug] Proxy map: {}
[AlJazeera] deliverance-201482883754237240: Downloading webpage
ERROR: Unable to extract brightcove id; please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see  https://yt-dl.org/update  on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.
Traceback (most recent call last):
  File "/home/youen/.pyenv/versions/youtube-dl/lib/python3.9/site-packages/youtube_dl-2021.1.8-py3.9.egg/youtube_dl/YoutubeDL.py", line 803, in wrapper
    return func(self, *args, **kwargs)
  File "/home/youen/.pyenv/versions/youtube-dl/lib/python3.9/site-packages/youtube_dl-2021.1.8-py3.9.egg/youtube_dl/YoutubeDL.py", line 824, in __extract_info
    ie_result = ie.extract(url)
  File "/home/youen/.pyenv/versions/youtube-dl/lib/python3.9/site-packages/youtube_dl-2021.1.8-py3.9.egg/youtube_dl/extractor/common.py", line 532, in extract
    ie_result = self._real_extract(url)
  File "/home/youen/.pyenv/versions/youtube-dl/lib/python3.9/site-packages/youtube_dl-2021.1.8-py3.9.egg/youtube_dl/extractor/aljazeera.py", line 31, in _real_extract
    brightcove_id = self._search_regex(
  File "/home/youen/.pyenv/versions/youtube-dl/lib/python3.9/site-packages/youtube_dl-2021.1.8-py3.9.egg/youtube_dl/extractor/common.py", line 1010, in _search_regex
    raise RegexNotFoundError('Unable to extract %s' % _name)
youtube_dl.utils.RegexNotFoundError: Unable to extract brightcove id; please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see  https://yt-dl.org/update  on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.

BTW, I'm using the master version of youtube-dl with a virtualenv using Python 3.9.0.

Thanks you for taking the time to look at this bug request and hopefully put on the work to update the Al Jazeera extractor. And even if you don't, thank you all anyway for the work you put into this great project.

The text was updated successfully, but these errors were encountered:

october262 · 2021-01-12T04:26:32Z

for this link - https://www.aljazeera.com/program/101-east/2021/1/8/mongolia-from-the-steppe-to-the-slum, i just used the Firefox addon called the stream detector use it to grab the master.m3u8
stream and download the video.

remitamine closed this as completed in 26499ba Jan 17, 2021

ThirumalaiK pushed a commit to ThirumalaiK/youtube-dl that referenced this issue Jan 28, 2021

[aljazeera] fix extraction(closes ytdl-org#20911)(closes ytdl-org#27779)

e3b5ee3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Al Jazeera seems broken with the current version of youtube-dl #27779

Al Jazeera seems broken with the current version of youtube-dl #27779

palmer-eldritch commented Jan 12, 2021

october262 commented Jan 12, 2021

Al Jazeera seems broken with the current version of youtube-dl #27779

Al Jazeera seems broken with the current version of youtube-dl #27779

Comments

palmer-eldritch commented Jan 12, 2021

Checklist

Verbose log

Description

october262 commented Jan 12, 2021