-
Notifications
You must be signed in to change notification settings - Fork 10k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[utils] Fix handling of comments in js_to_json #28446
Conversation
Fixes #28335 and many other related issues am thinking, is good fix |
CI is failing on flake8 issues: diff --git i/youtube_dl/utils.py w/youtube_dl/utils.py
index d7162e401..30830af42 100644
--- i/youtube_dl/utils.py
+++ w/youtube_dl/utils.py
@@ -4110,6 +4110,7 @@ def js_to_json(code):
[0-9]+(?=\s*:)
''', fix_kv, code)
+
def qualities(quality_ids):
""" Get a numeric quality value out of a list of possible values """
def q(qid): The other failures stem from the fact that the |
Input:
Output from the cleanup function:
This is not valid JSON. |
https://www.python.org/dev/peps/pep-0008/#blank-lines Fixes: E302 expected 2 blank lines, found 1
While this results in wrong values, it is the previous behaviour and what the tests are currently expecting.
@dstftw can you please review/remove the pending-fixes label? |
# Remove all comments first, including all whitespace leading up to them. | ||
# This regular expression is based on this Stack Overflow answer: | ||
# https://stackoverflow.com/a/25735600 | ||
code = re.sub(r'("(?:[^"\\]|\\[\s\S])*"|\'(?:[^\'\\]|\\[\s\S])*\')|[ \t]*//.*|[ \t]*/\*(?:[^*]|\*(?!/))*\*/', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Use re.X to lay out the RE
- Supposing
[ \t]
is meant to be space and not newline, use(?:(?!\n)\s)
- Use re.s to avoid
[\s\S]
and re.M to match$
at an end of line
code = re.sub(r'("(?:[^"\\]|\\[\s\S])*"|\'(?:[^\'\\]|\\[\s\S])*\')|[ \t]*//.*|[ \t]*/\*(?:[^*]|\*(?!/))*\*/', | |
code = re.sub( | |
r'''(?xmi) | |
("(?:[^"\\]|\\.)*"|'(?:[^'\\]|\\.)*')| | |
(?:(?!\n)\s)* | |
(?: | |
//.*$| | |
/\*(?:[^*]|\*(?!/))*\*/ | |
) | |
''', |
# Remove all ! characters | ||
code = code.replace('!', '') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a different problem? !0 for true, etc?
def fix_kv(m): | ||
v = m.group(0) | ||
if v in ('true', 'false', 'null'): | ||
return v | ||
elif v.startswith('/*') or v.startswith('//') or v.startswith('!') or v == ',': | ||
elif v == ',': | ||
return "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def fix_kv(m): | |
v = m.group(0) | |
if v in ('true', 'false', 'null'): | |
return v | |
elif v.startswith('/*') or v.startswith('//') or v.startswith('!') or v == ',': | |
elif v == ',': | |
return "" | |
INVERTS = {'true': 'false, 'false': 'true', 'null': 'true'} | |
def js_not(x): | |
if isinstance(x, compat_basestring): | |
return INVERTS.get(x) or '' | |
try: | |
return not json.dumps(x) | |
except BaseException: | |
return '' | |
def fix_kv(m): | |
v = m.group(0) | |
i = 0 | |
while v[i] == '!': | |
i = i + 1 | |
v = v[i:] | |
if v in INVERTS: | |
return v if i % 2 == 0 else js_not(v) | |
elif v in ('undefined', 'void 0'): | |
return 'null' if i == 0 else 'true' if i % 2 == 1 else 'false' | |
elif v == ',': | |
return '' |
The tests pass with d9d07a9:In [1]: import json
In [2]: from youtube_dl.utils import js_to_json
In [3]: inp = '''{
...: foo: "value",
...: // bar: { nested:'x' },
...: bar: { nested:'x' },
...: chaff: "something"
...: }'''
In [4]: json.loads(js_to_json(inp)) == json.loads('''{
...: "foo": "value",
...:
...: "bar": { "nested":"x" },
...: "chaff": "something"
...: }''')
Out[4]: True
In [5]: inp = '''{
...: id: "player_prog",
...: googleCast: true,
...: //extraSettings: { googleCastReceiverAppId:'1A6F2224', skin:'s3', skinAccentCol
...: or: '0073FF'},
...: extraSettings: { googleCastReceiverAppId:'1A6F2224'},
...: mediaType: "video",
...: }'''
In [6]: json.loads(js_to_json(inp)) == json.loads('''{
...: "id": "player_prog",
...: "googleCast": true,
...: "extraSettings": { "googleCastReceiverAppId":"1A6F2224"},
...: "mediaType": "video"
...: }''')
Out[6]: True |
Please follow the guide below
x
into all the boxes [ ] relevant to your pull request (like that [x])Before submitting a pull request make sure you have:
In order to be accepted and merged into youtube-dl each piece of code must be in public domain or released under Unlicense. Check one of the following options:
What is the purpose of your pull request?
Description of your pull request and other information
js_to_json()
removes commas that appear right before a comment line, creating invalid JSON.This pull request fixes this by making the conversion a two-step process. It first removes all comments and then the actual JS to JSON fix-up happens. This also somewhat simplifies the conversion as comment handling is no longer needed in the second step.
Doing it in one step makes it hard to deal properly with both commas right before comments and trailing commas at the end of a dictionary or array. A two-step approach is easy to follow and works.
The regular expression used for comment removal is not thrown off by comments within string literals. It is based on this Stack Overflow answer: https://stackoverflow.com/a/25735600
Additional tests in test/test_utils.py based on issue #23785 are also provided.
Closes #23785.
This Pull Request is a rebase of the PR provided (and approved) at #23877. As such:
Closes #23877 .