Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Al Jazeera seems broken with the current version of youtube-dl #27779

Closed
6 tasks done
palmer-eldritch opened this issue Jan 12, 2021 · 1 comment
Closed
6 tasks done

Comments

@palmer-eldritch
Copy link

Checklist

  • I'm reporting a broken site support issue
  • I've verified that I'm running youtube-dl version 2021.01.08
  • I've checked that all provided URLs are alive and playable in a browser
  • I've checked that all URLs and arguments with special characters are properly quoted or escaped
  • I've searched the bugtracker for similar bug reports including closed ones
  • I've read bugs section in FAQ

Verbose log

$ youtube-dl --verbose https://www.aljazeera.com/program/101-east/2021/1/8/mongolia-from-the-steppe-to-the-slum
[debug] System config: []
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: ['--verbose', 'https://www.aljazeera.com/program/101-east/2021/1/8/mongolia-from-the-steppe-to-the-slum']
[debug] Encodings: locale UTF-8, fs utf-8, out utf-8, pref UTF-8
[debug] youtube-dl version 2021.01.08
[debug] Git HEAD: 806b30d6
[debug] Python version 3.9.0 (CPython) - Linux-5.9.0-5-amd64-x86_64-with-glibc2.31
[debug] exe versions: ffmpeg 4.3.1, ffprobe 4.3.1, phantomjs 2.1.1, rtmpdump 2.4
[debug] Proxy map: {}
[generic] mongolia-from-the-steppe-to-the-slum: Requesting header
WARNING: Falling back on generic information extractor.
[generic] mongolia-from-the-steppe-to-the-slum: Downloading webpage
[generic] mongolia-from-the-steppe-to-the-slum: Extracting information
ERROR: Unsupported URL: https://www.aljazeera.com/program/101-east/2021/1/8/mongolia-from-the-steppe-to-the-slum
Traceback (most recent call last):
  File "/home/youen/.pyenv/versions/youtube-dl/lib/python3.9/site-packages/youtube_dl-2021.1.8-py3.9.egg/youtube_dl/YoutubeDL.py", line 803, in wrapper
    return func(self, *args, **kwargs)
  File "/home/youen/.pyenv/versions/youtube-dl/lib/python3.9/site-packages/youtube_dl-2021.1.8-py3.9.egg/youtube_dl/YoutubeDL.py", line 824, in __extract_info
    ie_result = ie.extract(url)
  File "/home/youen/.pyenv/versions/youtube-dl/lib/python3.9/site-packages/youtube_dl-2021.1.8-py3.9.egg/youtube_dl/extractor/common.py", line 532, in extract
    ie_result = self._real_extract(url)
  File "/home/youen/.pyenv/versions/youtube-dl/lib/python3.9/site-packages/youtube_dl-2021.1.8-py3.9.egg/youtube_dl/extractor/generic.py", line 3467, in _real_extract
    raise UnsupportedError(url)
youtube_dl.utils.UnsupportedError: Unsupported URL: https://www.aljazeera.com/program/101-east/2021/1/8/mongolia-from-the-steppe-to-the-slum

Description

I've tried downloading a video from Al Jazeera using youtube-dl and get "Unsupported URL" error. I think the extractor might be broken since the last update to it is from 2017 and they've most likely made some changes since.

I'm a python developer and tried quickly to look at the code of the extractor and the InfoExtractor class it's based on. But I'm very unfamiliar with youtube-dl internal so I thought opening a bug request might be an easier way to solve the problem since you have people familiar with the structure of the project that might be able to troobleshoot the issue much faster than I could if I had to dig deep into youtube-dl to understand its internals.

As put in the log, the url of the video is https://www.aljazeera.com/program/101-east/2021/1/8/mongolia-from-the-steppe-to-the-slum which definitely doesn't match the _VALID_URL regex in the extractor. I naively tried to add a "|program" to the regex but suspect there might be much deeper things that changed rather than just the url format and as I said, I have no understanding of the way youtube-dl extractors work so I thought someone on the team might be more qualified to do this quickly as trying to do it myself would involve digging into youtube-dl's code and I'm not sure I could find some personal time to do this.

If someone of the team was willing to look into it, I'd be super glad, because I believe, as it stands today, the extractor for Al Jazeera is broken. It can't download new videos, and by trying the url put in the test of the extractor, all I got was:

$ youtube-dl --verbose http://www.aljazeera.com/programmes/the-slum/2014/08/deliverance-201482883754237240.html
[debug] System config: []
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: ['--verbose', 'http://www.aljazeera.com/programmes/the-slum/2014/08/deliverance-201482883754237240.html']
[debug] Encodings: locale UTF-8, fs utf-8, out utf-8, pref UTF-8
[debug] youtube-dl version 2021.01.08
[debug] Git HEAD: 806b30d6
[debug] Python version 3.9.0 (CPython) - Linux-5.9.0-5-amd64-x86_64-with-glibc2.31
[debug] exe versions: ffmpeg 4.3.1, ffprobe 4.3.1, phantomjs 2.1.1, rtmpdump 2.4
[debug] Proxy map: {}
[AlJazeera] deliverance-201482883754237240: Downloading webpage
ERROR: Unable to extract brightcove id; please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see  https://yt-dl.org/update  on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.
Traceback (most recent call last):
  File "/home/youen/.pyenv/versions/youtube-dl/lib/python3.9/site-packages/youtube_dl-2021.1.8-py3.9.egg/youtube_dl/YoutubeDL.py", line 803, in wrapper
    return func(self, *args, **kwargs)
  File "/home/youen/.pyenv/versions/youtube-dl/lib/python3.9/site-packages/youtube_dl-2021.1.8-py3.9.egg/youtube_dl/YoutubeDL.py", line 824, in __extract_info
    ie_result = ie.extract(url)
  File "/home/youen/.pyenv/versions/youtube-dl/lib/python3.9/site-packages/youtube_dl-2021.1.8-py3.9.egg/youtube_dl/extractor/common.py", line 532, in extract
    ie_result = self._real_extract(url)
  File "/home/youen/.pyenv/versions/youtube-dl/lib/python3.9/site-packages/youtube_dl-2021.1.8-py3.9.egg/youtube_dl/extractor/aljazeera.py", line 31, in _real_extract
    brightcove_id = self._search_regex(
  File "/home/youen/.pyenv/versions/youtube-dl/lib/python3.9/site-packages/youtube_dl-2021.1.8-py3.9.egg/youtube_dl/extractor/common.py", line 1010, in _search_regex
    raise RegexNotFoundError('Unable to extract %s' % _name)
youtube_dl.utils.RegexNotFoundError: Unable to extract brightcove id; please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see  https://yt-dl.org/update  on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.

BTW, I'm using the master version of youtube-dl with a virtualenv using Python 3.9.0.

Thanks you for taking the time to look at this bug request and hopefully put on the work to update the Al Jazeera extractor. And even if you don't, thank you all anyway for the work you put into this great project.

@october262
Copy link

for this link - https://www.aljazeera.com/program/101-east/2021/1/8/mongolia-from-the-steppe-to-the-slum, i just used the Firefox addon called the stream detector use it to grab the master.m3u8
stream and download the video.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants