Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[motherless] Fix broken download on recent videos and other values extraction #27450

Closed
wants to merge 3 commits into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 14 additions & 5 deletions youtube_dl/extractor/motherless.py
Original file line number Diff line number Diff line change
Expand Up @@ -85,18 +85,27 @@ def _real_extract(self, url):
or 'http://cdn4.videos.motherlessmedia.com/videos/%s.mp4?fs=opencloud' % video_id)
age_limit = self._rta_search(webpage)
view_count = str_to_int(self._html_search_regex(
(r'>(\d+)\s+Views<', r'<strong>Views</strong>\s+([^<]+)<'),
(r'>([\d,.]+)\s+Views<', # 1,234,567 Views
r'<strong>Views</strong>\s+([^<]+)<'),
webpage, 'view count', fatal=False))
like_count = str_to_int(self._html_search_regex(
(r'>(\d+)\s+Favorites<', r'<strong>Favorited</strong>\s+([^<]+)<'),
(r'>([\d,.]+)\s+Favorites<', # 1,234 Favorites
r'<strong>Favorited</strong>\s+([^<]+)<'),
webpage, 'like count', fatal=False))

upload_date = self._html_search_regex(
(r'class=["\']count[^>]+>(\d+\s+[a-zA-Z]{3}\s+\d{4})<',
r'class=["\']count[^>]+>(\d+[hd])\s+[aA]go<', # 20h/1d ago
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not mix different scenarios in single regex. Upload date should be extracted first. If this fails it should fallback on ago pattern extraction.

r'<strong>Uploaded</strong>\s+([^<]+)<'), webpage, 'upload date')
if 'Ago' in upload_date:
days = int(re.search(r'([0-9]+)', upload_date).group(1))
upload_date = (datetime.datetime.now() - datetime.timedelta(days=days)).strftime('%Y%m%d')
relative = re.match(r'(\d+)([hd])$', upload_date)
if relative:
delta = int(relative.group(1))
unit = relative.group(2)
if unit == 'h':
delta_t = datetime.timedelta(hours=delta)
else: # unit == 'd'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be in assert not in comment.

delta_t = datetime.timedelta(days=delta)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DRY 105, 107.

upload_date = (datetime.datetime.now() - delta_t).strftime('%Y%m%d')
else:
upload_date = unified_strdate(upload_date)

Expand Down