Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update video_id regexp to account for recent abc.com changes (fixes #25197 and #25207) #25216

Closed
wants to merge 2 commits into from
Closed
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion youtube_dl/extractor/go.py
Original file line number Diff line number Diff line change
Expand Up @@ -138,7 +138,7 @@ def _real_extract(self, url):
# from http://freeform.go.com/shows/shadowhunters/episodes/season-2/1-this-guilty-blood
r'data-video-id=["\']*(VDKA\w+)',
# https://abc.com/shows/the-rookie/episode-guide/season-02/03-the-bet
r'\b(?:video)?id["\']\s*:\s*["\'](VDKA\w+)'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not touch old patterns.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for not minding my own business, but it looks like your requested change was implemented by @tmthywynn8 a couple of months ago. I'm not that familiar with the GitHub workflow, but it looks like this is just waiting for your approval.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I made the necessary changes in commit 783d296 and requested another review to make sure that I fulfilled the requirements. I'm not sure what the procedure is suppose to be, so maybe requesting another review was the wrong thing to do

Copy link

@tony-izzo tony-izzo Aug 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure yet but I think removing the original pattern (by replacing it with the new pattern) fixed the problem, and putting it back has re-broken it. My suspicion is that both patterns are matched on the page in question but one returns an invalid ID. Or perhaps a valid ID that nevertheless yields an error when the code later tries to retrieve it.

If I'm right, undoing the requested change will get it working again. But that's not really a solution, and again I'm not sure yet.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry it took me so long to get to this, but I just tested #25216 and it works for me as-is. So my theory above is completely wrong.

r'\bvideoIdCode["\']\s*:\s*["\'](vdka\w+)'
), webpage, 'video id', default=video_id)
if not site_info:
brand = self._search_regex(
Expand Down