Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pornhub] Fix like and dislike count extraction #27356

Closed
wants to merge 1 commit into from
Closed

[pornhub] Fix like and dislike count extraction #27356

wants to merge 1 commit into from

Conversation

imba-tjd
Copy link
Contributor

@imba-tjd imba-tjd commented Dec 9, 2020

Please follow the guide below

  • You will be asked some questions, please read them carefully and answer honestly
  • Put an x into all the boxes [ ] relevant to your pull request (like that [x])
  • Use Preview tab to see how your pull request will actually look like

Before submitting a pull request make sure you have:

In order to be accepted and merged into youtube-dl each piece of code must be in public domain or released under Unlicense. Check one of the following options:

  • I am the original author of this code and I am willing to release it under Unlicense
  • I am not the original author of this code but it is in public domain or released under Unlicense (provide reliable evidence)

What is the purpose of your pull request?

  • Bug fix
  • Improvement
  • New extractor
  • New feature

Description of your pull request and other information

Related to #27234

In my limited test, the structure has become

<div class="votes-fav-wrap">
	<div class="js-voteUp icon-wrapper tooltipTrig"
         data-title="顶一下"
            >
        <i class="thumbs-up"></i>
		<span class="votesUp" data-rating="4614">5K</span>
	</div>

	<div class="js-voteDown icon-wrapper tooltipTrig"
         data-title="踩一下"
            >
        <i class="thumbs-down"></i>
        <span class="votesDown" data-rating="703">703</span>
    </div>
</div>

@JChris246
Copy link
Contributor

Yeah my bad, the video i tested with didn't have K in it. You should still keep the regex relaxed along with your changes

@imba-tjd
Copy link
Contributor Author

imba-tjd commented Dec 9, 2020

IMO the relax doesn't make too much sense, because the data relies on class="votesUp".

Or I'd prefer class="votesUp"\s+data-rating="(\d+)" i.e. remove <span if you insist the relax.

@JChris246
Copy link
Contributor

i'm just quoting their doc on regex https://github.com/ytdl-org/youtube-dl#make-regular-expressions-relaxed-and-flexible 🤷‍♀️

@imba-tjd
Copy link
Contributor Author

imba-tjd commented Dec 9, 2020

Yeah, I know. I read and thought about it. Let the maintainer decide.

@@ -354,9 +354,9 @@ def add_video_url(video_url):
view_count = self._extract_count(
r'<span class="count">([\d,\.]+)</span> [Vv]iews', webpage, 'view')
like_count = self._extract_count(
r'<span[^>]+class="votesUp"[^>]*>([\d,\.]+)</span>', webpage, 'like')
r'<span class="votesUp" data-rating="(\d+)">', webpage, 'like')
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
r'<span class="votesUp" data-rating="(\d+)">', webpage, 'like')
r'class="votesUp".+?data-rating="(\d+)"', webpage, 'like')

dislike_count = self._extract_count(
r'<span[^>]+class="votesDown"[^>]*>([\d,\.]+)</span>', webpage, 'dislike')
r'<span class="votesDown" data-rating="(\d+)">', webpage, 'dislike')
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
r'<span class="votesDown" data-rating="(\d+)">', webpage, 'dislike')
r'class="votesDown".+?data-rating="(\d+)"', webpage, 'dislike')

Copy link
Collaborator

@dstftw dstftw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Do not remove old patterns.
  2. New patterns must be relaxed similarly.

@@ -354,9 +354,9 @@ def add_video_url(video_url):
view_count = self._extract_count(
r'<span class="count">([\d,\.]+)</span> [Vv]iews', webpage, 'view')
like_count = self._extract_count(
r'<span[^>]+class="votesUp"[^>]*>([\d,\.]+)</span>', webpage, 'like')
r'<span class="votesUp" data-rating="(\d+)">', webpage, 'like')
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
r'<span class="votesUp" data-rating="(\d+)">', webpage, 'like')
r'<span[^>]+class="votesUp"[^>]+data-rating="(\d+)"[^>]*>', webpage, 'like')

dislike_count = self._extract_count(
r'<span[^>]+class="votesDown"[^>]*>([\d,\.]+)</span>', webpage, 'dislike')
r'<span class="votesDown" data-rating="(\d+)">', webpage, 'dislike')
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
r'<span class="votesDown" data-rating="(\d+)">', webpage, 'dislike')
r'<span[^>]+class="votesDown"[^>]+data-rating="(\d+)"[^>]*>', webpage, 'dislike')

@dstftw dstftw closed this in 4f1dc14 Dec 26, 2020
ThirumalaiK pushed a commit to ThirumalaiK/youtube-dl that referenced this pull request Jan 28, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants