-
Notifications
You must be signed in to change notification settings - Fork 10k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[iTunes] Add new extractor (Closes #2097) #9590
Conversation
youtube_dl/extractor/itunes.py
Outdated
|
||
webpage = self._download_webpage(sanitized_Request(self._html_search_regex( | ||
r'<string>\s*(https?://itunes.apple.com/[^>]+)</string>', self._download_webpage( | ||
request, display_id), 'iTunes url'), headers={'User-Agent': self._USER_AGENT}), display_id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid such cumbersome code. That's unreadable.
Seems without an iTunes User-Agent, the response from https://itunes.apple.com/us/itunes-u/uc-davis-symphony-orchestra/id403834767 just contains all what we want. If you're going to keep the current approach, use |
youtube_dl/extractor/itunes.py
Outdated
# coding: utf-8 | ||
from __future__ import unicode_literals | ||
|
||
import datetime |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No longer used.
@yan12125 You're right. I'll change the approach to not spoof the user agent. |
webpage = self._download_webpage(url, display_id) | ||
|
||
video_infos = re.findall(r'var\s+__desc_popup_d_\d+\s*=\s*({[^><]+});', webpage) | ||
html_entries = re.findall(r'<tr\s+[^>]*role="row"[^>]+>', webpage) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
get_element_by_attribute()
may be useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will get the content between <tr>
and </tr>
, but not the content inside the tags, like <tr preview-duration="485000">
Don't |
@yan12125 Sorry didn't know that. |
d4bf26f
to
5762b67
Compare
|
||
|
||
class iTunesIE(InfoExtractor): | ||
_VALID_URL = r'https?://itunes\.apple\.com/[a-z]{2}/[a-z0-9-]+/(?P<display_id>[a-z0-9-]+)?/(?:id)?(?P<id>[0-9]+)' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be the following as the (valid) URL variations of https://itunes.apple.com/us/itunes-u/uc-davis-symphony-orchestra/id403834767 don't match otherwise
_VALID_URL = r'https?://itunes\.apple\.com/[a-z]{2}?/?[a-z0-9-]+/?(?P<display_id>[a-z0-9-]+)?/(?:id)?(?P<id>[0-9]+)'
The extractor only works for free content, like most podcasts, i.e. it does not download 30-seconds previews of paid songs.