Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[motherless] Fixed the broken uploader_id in the extractor. #31243

Merged
merged 6 commits into from
Oct 10, 2022

Conversation

xiyue077
Copy link
Contributor

@xiyue077 xiyue077 commented Sep 19, 2022

Apparently the motherless.com has the "thumb-member-username" item replaced by "media-meta-member", so updating the extractor accordingly.

Please follow the guide below

  • You will be asked some questions, please read them carefully and answer honestly
  • Put an x into all the boxes [ ] relevant to your pull request (like that [x])
  • Use Preview tab to see how your pull request will actually look like

Before submitting a pull request make sure you have:

In order to be accepted and merged into youtube-dl each piece of code must be in public domain or released under Unlicense. Check one of the following options:

  • I am the original author of this code and I am willing to release it under Unlicense
  • I am not the original author of this code but it is in public domain or released under Unlicense (provide reliable evidence)

What is the purpose of your pull request?

  • Bug fix
  • Improvement
  • New extractor
  • New feature

Description of your pull request and other information

Explanation of your pull request in arbitrary form goes here. Please make sure the description explains the purpose and effect of your pull request and is worded well enough to be understood. Provide as much context and examples as possible. Fix #29626 and #31128.

Current extractor of motherless.com is unable to extract the uploader_id. For example:

$ ./youtube-dl -v https://motherless.com/060351D
[debug] System config: []
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: [u'-v', u'https://motherless.com/060351D']
[debug] Encodings: locale UTF-8, fs UTF-8, out UTF-8, pref UTF-8
[debug] youtube-dl version 2021.12.17
[debug] Python version 2.7.16 (CPython) - Linux-4.19.0-16-amd64-x86_64-with-debian-10.12
[debug] exe versions: none
[debug] Proxy map: {}
[Motherless] 060351D: Downloading webpage
ERROR: Unable to extract uploader_id; please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; type  youtube-dl -U  to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.
Traceback (most recent call last):
  File "./youtube-dl/youtube_dl/YoutubeDL.py", line 816, in wrapper
    return func(self, *args, **kwargs)
  File "./youtube-dl/youtube_dl/YoutubeDL.py", line 837, in __extract_info
    ie_result = ie.extract(url)
  File "./youtube-dl/youtube_dl/extractor/common.py", line 534, in extract
    ie_result = self._real_extract(url)
  File "./youtube-dl/youtube_dl/extractor/motherless.py", line 131, in _real_extract
    webpage, 'uploader_id')
  File "./youtube-dl/youtube_dl/extractor/common.py", line 1021, in _html_search_regex
    res = self._search_regex(pattern, string, name, default, fatal, flags, group)
  File "./youtube-dl/youtube_dl/extractor/common.py", line 1012, in _search_regex
    raise RegexNotFoundError('Unable to extract %s' % _name)
RegexNotFoundError: Unable to extract uploader_id; please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; type  youtube-dl -U  to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.

The reason has been found the website had changed the "thumb-member-username" by "media-meta-member". So change the extractor accordingly:

--- a/youtube_dl/extractor/motherless.py
+++ b/youtube_dl/extractor/motherless.py
@@ -127,7 +127,7 @@ class MotherlessIE(InfoExtractor):
 
         comment_count = webpage.count('class="media-comment-contents"')
         uploader_id = self._html_search_regex(
-            r'"thumb-member-username">\s+<a href="/m/([^"]+)"',
+            r'"media-meta-member">\s+<a href="/m/([^"]+)"',
             webpage, 'uploader_id')
 
         categories = self._html_search_meta('keywords', webpage, default=None)

Testing like this:

$ ./youtube-dl -v https://motherless.com/060351D
[debug] System config: []
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: [u'-v', u'https://motherless.com/060351D']
[debug] Encodings: locale UTF-8, fs UTF-8, out UTF-8, pref UTF-8
[debug] youtube-dl version 2021.12.17
[debug] Python version 2.7.16 (CPython) - Linux-4.19.0-16-amd64-x86_64-with-debian-10.12
[debug] exe versions: none
[debug] Proxy map: {}
[Motherless] 060351D: Downloading webpage
[debug] Default format spec: best/bestvideo+bestaudio
[debug] Invoking downloader on u'https://cdn5-videos.motherlessmedia.com/videos/060351D.mp4'
[download] Destination: Skydiving pussy - hilarious with sound on-060351D.mp4
[download] 100% of 2.24MiB in 00:06

xuminic and others added 5 commits September 20, 2022 00:32
Apparently the motherless.com has the "thumb-member-username" item replaced by "media-meta-member", so updating the extractor accordingly.
@dirkf dirkf changed the title [patch_motherless] Fixed the broken uploader_id in the extractor. [motherless] Fixed the broken uploader_id in the extractor. Oct 10, 2022
@dirkf dirkf merged commit 82e4eca into ytdl-org:master Oct 10, 2022
github-actions bot added a commit to hellopony/youtube-dl that referenced this pull request Oct 11, 2022
* https://github.com/ytdl-org/youtube-dl:
  [netease] Get netease music download url through player api (ytdl-org#31235)
  [Common:JWPlayer] Fix x1000 scaling error
  [utils] Sanitize look-alike Unicode glyphs in non-ID filename fields when --restrict-filenames
  [JSInterp] Improve separation logic
  [ZDF] Overhaul ZDF extractors * pull some yt-dlp changes into ZDFBaseIE._extract_format() * add test cases from yt-dlp to ZDFIE * fix crash in ZDFIE._extract_mobile() when object had no `formitaeten` * improve title extraction in ZDFChannelIE (remove trailing station ident) * avoid extracting non-video playlist items (fixes ytdl-org#31149)
  [test] Implement string "lambda x: condition(x)" as an expected value
  [motherless] Fixed the broken uploader_id in the extractor (ytdl-org#31243)
@Vangelis66
Copy link

Vangelis66 commented Oct 11, 2022

Fix #31273

Pardon me, but how is this PR a fix for issue 31273?
That one pertains to ADN (AnimationDigitalNetwork), while here it's "motherless" the service in question ...

Unless I'm missing something, this PR fixes #29626 and its (recent) duplicate #31128 😉 ...

@xiyue077
Copy link
Contributor Author

Never mean to be 31273. Must be a typo :bump
I remember I did a search before open the pr. How could I miss the #29626? :bumpagain

@dirkf
Copy link
Contributor

dirkf commented Oct 12, 2022

I suspect JS autofilling ... Anyhow this extractor now has no open issues!

@Vangelis66
Copy link

Thank you both 😄 ; for correctness' sake, could either one of you remove the reference/link to (wrong) #31273 from the description of this PR (first post) ?

@xiyue077
Copy link
Contributor Author

done.

@Vangelis66
Copy link

done.

Thanks 😄 , but you can leave #31243 out 😉 ; "Fix" inside the description is meant to reference/link open issues that will be "fixed" by said PR, while #31243 is the "id" of the PR itself, not of an open issue... Sorry for being that pedantic... 😺

@xiyue077
Copy link
Contributor Author

done again.
No worries Vangelis66. It's just a few twitches of fingers

alxlive pushed a commit to alxlive/youtube-dl that referenced this pull request Feb 27, 2023
…31243)

* Fixed the broken uploader_id in the extractor.
* Make uploader_id RE looser
* Fix uploader_id in test Motherless_3
* Fix group pagination
* # coding: utf-8

Co-authored-by: Andy Xuming <xuminic@gmail.com>
Co-authored-by: dirkf <fieldhouse@gmx.net>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Motherless ERROR: Unable to extract uploader_id
4 participants