-
Notifications
You must be signed in to change notification settings - Fork 10k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[JSInterpreter] Improve performance #30643
Conversation
Pull out some constant regex expressions and change a loop over assignment operators into a regex alternative match
8e45ccf
to
200ce6c
Compare
What about (optionally, if installed on the system) an external interpreter. Thus, youtube-dl could be simple and do only the basics, but if intensive users want more features or more performance, then then could ensure that suitable tools are available to youtube-dl. |
At the moment PhantomJS, which was used by some extractors and is a potential dependency of of yt-dlp, similar to what you suggest, is unmaintained ("discontinued" is what Wikipedia says); owing to undisciplined feature creep a JS interpreter that worked last year will fail next year (for year in (2015, 2018, 2020)). I had a look for other Python JS implementations, but didn't come up with anything that was (a) supported (b) compatible with Unlicense (c) compatible with yt-dl platforms; in fact I don't think any of those were satisfied. So I think it's best is to stick with the "toy" interpreter and extend it when required. |
Good points. I suppose a dependency is only a good idea when the benefit it brings exceeds the cost of using it. Pity that something so obvious is lacking in the libre world. Reminds me of If only the interpreters of browsers were available as standalone modules. At least with rolling-your-own, you have control without scope-creep. Kudos for improving the ‘toy’.
That certainly was the case, for a while. The source repo seems to have been revived (no longer archived), with relatively recent (mid-2020) commits. As usual, Wikipedia is lacking in factual accuracy. |
Pukkandan did the hard part. |
Please follow the guide below
Before submitting a pull request make sure you have:
In order to be accepted and merged into youtube-dl each piece of code must be in public domain or released under Unlicense. Check one of the following options:
What is the purpose of your pull request?
Description of your pull request and other information
The limited JS interpreter used to decode YT signature URLs was enhanced in PR #30188, back-porting yt-dlp PR 1437, and merged from PR #30184 in order to respond to the n parameter challenge introduced by YT.
In #30641, a specific embedded application of yt-dl identified significantly reduced extraction performance compared with previous versions and even the stopgap n-parameter response.
A test application was profiled. After pulling out some constant regex expressions and changing a loop over assignment operators into a regex alternative match, the execution time (Py 2.7) went from 22s to 18s. The number of function calls was down by almost 1M, nearly 25% (application extracting 42 YT pages).
The regex optimisations apply to expressions that used f-strings in the yt-dlp version, which are said to be faster than formatting operations available to yt-dl; however, the loop optimisation, and one already made in the back-port, could equally apply to the yt-dlp version.
Currently constants used in the
JSInterpreter
class include both global and class vars:_OPERATORS
,_ASSIGN_OPERATORS
,_NAME_RE
,_MATCHING_PARENS
_EXPR_SPLIT_RE
,_VARNAME_RE
,_ARRAY_REF_RE
,_FN_CALL_RE
,_MEMBER_REF_RE
,_FN_NAME_RE
,_FN_DEF_RE
,_ASSIGN_EXPR_RE
.As to whether the constant regex expressions used for parsing should be class or global vars, probably the following heuristics should apply: