[JSInterpreter] Improve performance #30643

dirkf · 2022-02-17T00:25:00Z

Please follow the guide below

Before submitting a pull request make sure you have:

Searched the bugtracker for similar pull requests
Read adding new extractor tutorial
Read youtube-dl coding conventions and adjusted the code to meet them
Covered the code with tests (note that PRs without tests will be REJECTED)
Checked the code with flake8

In order to be accepted and merged into youtube-dl each piece of code must be in public domain or released under Unlicense. Check one of the following options:

I am the original author of this code and I am willing to release it under Unlicense
I am not the original author of this code but it is in public domain or released under Unlicense (provide reliable evidence)

What is the purpose of your pull request?

Bug fix
Improvement
New extractor
New feature

Description of your pull request and other information

The limited JS interpreter used to decode YT signature URLs was enhanced in PR #30188, back-porting yt-dlp PR 1437, and merged from PR #30184 in order to respond to the n parameter challenge introduced by YT.

In #30641, a specific embedded application of yt-dl identified significantly reduced extraction performance compared with previous versions and even the stopgap n-parameter response.

A test application was profiled. After pulling out some constant regex expressions and changing a loop over assignment operators into a regex alternative match, the execution time (Py 2.7) went from 22s to 18s. The number of function calls was down by almost 1M, nearly 25% (application extracting 42 YT pages).

The regex optimisations apply to expressions that used f-strings in the yt-dlp version, which are said to be faster than formatting operations available to yt-dl; however, the loop optimisation, and one already made in the back-port, could equally apply to the yt-dlp version.

Currently constants used in the JSInterpreter class include both global and class vars:

globals: _OPERATORS, _ASSIGN_OPERATORS, _NAME_RE, _MATCHING_PARENS
class vars: _EXPR_SPLIT_RE, _VARNAME_RE, _ARRAY_REF_RE, _FN_CALL_RE, _MEMBER_REF_RE, _FN_NAME_RE, _FN_DEF_RE, _ASSIGN_EXPR_RE.

As to whether the constant regex expressions used for parsing should be class or global vars, probably the following heuristics should apply:

constants that are characteristics of the JS language should be _vars of the module or the class
constants that are specific to the internal workings of the JS interpreter should be __vars of the class.

Pull out some constant regex expressions and change a loop over assignment operators into a regex alternative match

Py 2.6 FTW

Lee-Carre · 2022-02-20T19:39:09Z

What about (optionally, if installed on the system) an external interpreter. Thus, youtube-dl could be simple and do only the basics, but if intensive users want more features or more performance, then then could ensure that suitable tools are available to youtube-dl.

dirkf · 2022-02-20T22:58:04Z

At the moment PhantomJS, which was used by some extractors and is a potential dependency of of yt-dlp, similar to what you suggest, is unmaintained ("discontinued" is what Wikipedia says); owing to undisciplined feature creep a JS interpreter that worked last year will fail next year (for year in (2015, 2018, 2020)).

I had a look for other Python JS implementations, but didn't come up with anything that was (a) supported (b) compatible with Unlicense (c) compatible with yt-dl platforms; in fact I don't think any of those were satisfied.

So I think it's best is to stick with the "toy" interpreter and extend it when required.

Lee-Carre · 2022-02-21T16:03:26Z

I think it's best is to stick with the "toy" interpreter and extend it when required.

Good points. I suppose a dependency is only a good idea when the benefit it brings exceeds the cost of using it.

Pity that something so obvious is lacking in the libre world. Reminds me of wget; highly depended upon, but widely neglected (though, maintained enough to be useful).

If only the interpreters of browsers were available as standalone modules.

At least with rolling-your-own, you have control without scope-creep.

Kudos for improving the ‘toy’.

PhantomJS, which was used by some extractors and is a potential dependency of of yt-dlp, similar to what you suggest, is unmaintained ("discontinued" is what Wikipedia says)

That certainly was the case, for a while. The source repo seems to have been revived (no longer archived), with relatively recent (mid-2020) commits.

As usual, Wikipedia is lacking in factual accuracy.

dirkf · 2022-02-21T19:49:05Z

Kudos for improving the ‘toy’.

Pukkandan did the hard part.

[JSInterp] Improve performance

200ce6c

Pull out some constant regex expressions and change a loop over assignment operators into a regex alternative match

dirkf force-pushed the df-jsinterp-speedup-patch branch from 8e45ccf to 200ce6c Compare February 17, 2022 00:29

dirkf marked this pull request as ready for review February 17, 2022 00:29

dirkf marked this pull request as draft February 17, 2022 00:30

dirkf mentioned this pull request Feb 17, 2022

Multi-threading slowdown for YouTube #30641

Open

5 tasks

dirkf added 2 commits February 19, 2022 01:30

[JSInterp] Use Counter instead of dict comprehension

2241472

Py 2.6 FTW

[JSInterp] Introduce JSError with some exception tweaks

d42e900

dirkf mentioned this pull request Feb 21, 2022

Download webpage after executing JavaScript scripts #30661

Closed

3 tasks

dirkf force-pushed the master branch from 01bf89e to 4c6fba3 Compare August 26, 2022 07:51

dirkf closed this Jun 30, 2024

dirkf deleted the df-jsinterp-speedup-patch branch June 30, 2024 17:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[JSInterpreter] Improve performance #30643

[JSInterpreter] Improve performance #30643

dirkf commented Feb 17, 2022

Lee-Carre commented Feb 20, 2022

dirkf commented Feb 20, 2022

Lee-Carre commented Feb 21, 2022

dirkf commented Feb 21, 2022

[JSInterpreter] Improve performance #30643

[JSInterpreter] Improve performance #30643

Conversation

dirkf commented Feb 17, 2022

Please follow the guide below

Before submitting a pull request make sure you have:

In order to be accepted and merged into youtube-dl each piece of code must be in public domain or released under Unlicense. Check one of the following options:

What is the purpose of your pull request?

Description of your pull request and other information

Lee-Carre commented Feb 20, 2022

dirkf commented Feb 20, 2022

Lee-Carre commented Feb 21, 2022

dirkf commented Feb 21, 2022