Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[JSInterpreter] Improve performance #30643

Closed
wants to merge 3 commits into from

Conversation

dirkf
Copy link
Contributor

@dirkf dirkf commented Feb 17, 2022

Please follow the guide below


Before submitting a pull request make sure you have:

In order to be accepted and merged into youtube-dl each piece of code must be in public domain or released under Unlicense. Check one of the following options:

  • I am the original author of this code and I am willing to release it under Unlicense
  • I am not the original author of this code but it is in public domain or released under Unlicense (provide reliable evidence)

What is the purpose of your pull request?

  • Bug fix
  • Improvement
  • New extractor
  • New feature

Description of your pull request and other information

The limited JS interpreter used to decode YT signature URLs was enhanced in PR #30188, back-porting yt-dlp PR 1437, and merged from PR #30184 in order to respond to the n parameter challenge introduced by YT.

In #30641, a specific embedded application of yt-dl identified significantly reduced extraction performance compared with previous versions and even the stopgap n-parameter response.

A test application was profiled. After pulling out some constant regex expressions and changing a loop over assignment operators into a regex alternative match, the execution time (Py 2.7) went from 22s to 18s. The number of function calls was down by almost 1M, nearly 25% (application extracting 42 YT pages).

The regex optimisations apply to expressions that used f-strings in the yt-dlp version, which are said to be faster than formatting operations available to yt-dl; however, the loop optimisation, and one already made in the back-port, could equally apply to the yt-dlp version.

Currently constants used in the JSInterpreter class include both global and class vars:

  • globals: _OPERATORS, _ASSIGN_OPERATORS, _NAME_RE, _MATCHING_PARENS
  • class vars: _EXPR_SPLIT_RE, _VARNAME_RE, _ARRAY_REF_RE, _FN_CALL_RE, _MEMBER_REF_RE, _FN_NAME_RE, _FN_DEF_RE, _ASSIGN_EXPR_RE.

As to whether the constant regex expressions used for parsing should be class or global vars, probably the following heuristics should apply:

  • constants that are characteristics of the JS language should be _vars of the module or the class
  • constants that are specific to the internal workings of the JS interpreter should be __vars of the class.

Pull out some constant regex expressions and change a loop over assignment operators into a regex alternative match
@dirkf dirkf marked this pull request as ready for review February 17, 2022 00:29
@dirkf dirkf marked this pull request as draft February 17, 2022 00:30
@Lee-Carre
Copy link

What about (optionally, if installed on the system) an external interpreter. Thus, youtube-dl could be simple and do only the basics, but if intensive users want more features or more performance, then then could ensure that suitable tools are available to youtube-dl.

@dirkf
Copy link
Contributor Author

dirkf commented Feb 20, 2022

At the moment PhantomJS, which was used by some extractors and is a potential dependency of of yt-dlp, similar to what you suggest, is unmaintained ("discontinued" is what Wikipedia says); owing to undisciplined feature creep a JS interpreter that worked last year will fail next year (for year in (2015, 2018, 2020)).

I had a look for other Python JS implementations, but didn't come up with anything that was (a) supported (b) compatible with Unlicense (c) compatible with yt-dl platforms; in fact I don't think any of those were satisfied.

So I think it's best is to stick with the "toy" interpreter and extend it when required.

@Lee-Carre
Copy link

I think it's best is to stick with the "toy" interpreter and extend it when required.

Good points. I suppose a dependency is only a good idea when the benefit it brings exceeds the cost of using it.

Pity that something so obvious is lacking in the libre world. Reminds me of wget; highly depended upon, but widely neglected (though, maintained enough to be useful).

If only the interpreters of browsers were available as standalone modules.

At least with rolling-your-own, you have control without scope-creep.

Kudos for improving the ‘toy’.

PhantomJS, which was used by some extractors and is a potential dependency of of yt-dlp, similar to what you suggest, is unmaintained ("discontinued" is what Wikipedia says)

That certainly was the case, for a while. The source repo seems to have been revived (no longer archived), with relatively recent (mid-2020) commits.

As usual, Wikipedia is lacking in factual accuracy.

@dirkf
Copy link
Contributor Author

dirkf commented Feb 21, 2022

Kudos for improving the ‘toy’.

Pukkandan did the hard part.

@dirkf dirkf closed this Jun 30, 2024
@dirkf dirkf deleted the df-jsinterp-speedup-patch branch June 30, 2024 17:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants