-
Notifications
You must be signed in to change notification settings - Fork 10k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[hketv] Add new extractor #18696
[hketv] Add new extractor #18696
Conversation
youtube_dl/extractor/hketv.py
Outdated
webpage = self._download_webpage(url, video_id) | ||
|
||
file_id = self._html_search_regex(r'post_var\["file_id"\] = ([0-9]+);', webpage, 'file ID') | ||
curr_url = self._html_search_regex(r'post_var\["curr_url"\] = "(.+?)";', webpage, 'curr URL') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Relax regexes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed the regex to r'post_var\["file_id"\]\s*=\s*(.+?);'
for example.
Hope that is relax enough.
youtube_dl/extractor/hketv.py
Outdated
elif label == 'SD': | ||
h = 360 | ||
w = h * width // height | ||
urlh = self._downloader.urlopen(urljoin(_APPS_BASE_URL, fmt.get('file'))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
urljoin
replaced with +
.
Or did you mean the use of self._downloader.urlopen()
? That was intentionally done so that the download link of the video could be downloaded without geo-restriction, i.e. without proxy, and without the PHPSESSID cookie. (The geo-restriction is done within their PHP script only.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, may I keep using self._downloader.urlopen() for the MPEG-4 video file? :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, what for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So that no cookie is needed with the link given by -g
. The link given by urljoin(_APPS_BASE_URL, fmt.get('file'))
looks something like this:
which requires the correct cookies to work. It gives a "HTTP/1.1 302 Found" redirecting to a final URL that looks like https://video1.hkedcity.net/streaming/b3848d0a3296b85d67d88b749642c29a/5c3738d1/channels/etv/201507/20150716151039_989507_720.mp4 which requires no cookie. This final URL is easier for me to copy-and-paste to other programs or even to stream to other devices.
That said, I have decided to take your advice and not "fully resolve" to the final URL because the intermediate URL, with the appropriate cookies saved with --cookies
, remains valid for many hours, whereas the fully resolved URL would be "410 Gone" within one or two hours. Also, I realized self._downloader.urlopen
was being run twice in the for fmt in fmts:
loop, leading to quite a few seconds of delay. (The HKETV server seems rather slow in responding to this request.)
Sorry, this recent push is incorrect and incomplete; I will push again soon. |
youtube_dl/extractor/hketv.py
Outdated
}), | ||
headers={'Content-Type': 'application/x-www-form-urlencoded'}, | ||
fatal=False) | ||
if emotion.get('result'): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
emotion
is not guaranteed to be a dict.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Line removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Line removed.
As in
like_count = int_or_none(try_get(emotion, lambda x: x['data']['emotion_data'][0]['count']))
alone will work just fine without the extraneous if emotion.get('result'):
check.
youtube_dl/extractor/hketv.py
Outdated
h = 720 | ||
elif label == 'SD': | ||
h = 360 | ||
w = h * width // height |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Breaks on uninitialized h
. Breaks on None width and height.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Indeed, some older videos were uploaded without width and height information, as in they are set to "0". Changed to the following:
for fmt in fmts:
label = fmt.get('label')
w = None
h = None
if label == 'HD':
h = 720
elif label == 'SD':
h = 360
if h:
if width and height:
w = h * width // height
else:
w = h * 4 // 3
youtube_dl/extractor/hketv.py
Outdated
elif label == 'SD': | ||
h = 360 | ||
w = h * width // height | ||
urlh = self._downloader.urlopen(_APPS_BASE_URL + fmt.get('file')) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Breaks on missing file key.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Sorry for my oversight. I have restructured that code a little bit to check that fmt.get('file') is not None before continuing with appending to formats.
youtube_dl/extractor/hketv.py
Outdated
elif label == 'SD': | ||
h = 360 | ||
w = h * width // height | ||
urlh = self._downloader.urlopen(urljoin(_APPS_BASE_URL, fmt.get('file'))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, what for?
Thank you for your second round of thorough review. I apologize for taking so long to complete the revision as I had been busy, and also it took me a while to think through the various issues you pointed out in your review. I think I have finally fixed all the issues that you pointed out, and additionally added an additional test case of an 2007 HKETV video which has no subtitle, has width and height set as "0", and used to fail with my earlier code due to insufficient error checking. Thank you indeed for correctly pointing out these issues in your review! |
Cherry-picked, thanks. |
Thank you for your review, improvement and cherry-pick! |
Please follow the guide below
x
into all the boxes [ ] relevant to your pull request (like that [x])Before submitting a pull request make sure you have:
In order to be accepted and merged into youtube-dl each piece of code must be in public domain or released under Unlicense. Check one of the following options:
What is the purpose of your pull request?
Description of your pull request and other information
The Educational Bureau of the Government of Hong Kong has been producing HKETV (香港教育電視), i.e. Educational Television (ETV, eTV), for students from kindergarten to secondary schools in Hong Kong for over 40 years. Their website is at https://etv.edb.gov.hk/ .
This new extractor pulls information from the website through https://apps.hkedcity.net/media/play/handler.php which has recently (mid-2018?) become geo-restricted, i.e. viewable in Hong Kong only. The final URL to the MPEG-4 video looks something like https://video1.hkedcity.net/streaming/79ea4e9feec5e8291a7ffc72632ce19e/5c28ab09/channels/etv/201810/20181024230027_355992_720.mp4 which is not geo-restricted, but due to URL (Token) signing, the URL expires after an hour or so.
Note that the official ETV Android app and iOS app are not currently geo-restricted, though they have not been studied in depth for this extractor.
Please do not hesitate to let me know if you have any questions.
Cheers,
Anthony