You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A flamegraph shows that most of the time is spent in the rewrite_html (expected since the HTML page in this WARC is huge) but inside this most time is spent in inspect.signature function.
This signature information should in fact be cached since it is not going to change during a warc2zim execution.
A quick change (tbc in a PR) confirms that caching this information allows to return to coherent timings (less than 20 secs, with lot of time spent parsing the HTML which is expected since HTML is huge).
The text was updated successfully, but these errors were encountered:
For a very small WARC like https://github.com/openzim/warc2zim/blob/main/tests/data-special/qsl.net-encoding-alias.warc.gz, it takes more than 2 minutes to build the ZIM.
A flamegraph shows that most of the time is spent in the
rewrite_html
(expected since the HTML page in this WARC is huge) but inside this most time is spent ininspect.signature
function.This
signature
information should in fact be cached since it is not going to change during a warc2zim execution.A quick change (tbc in a PR) confirms that caching this information allows to return to coherent timings (less than 20 secs, with lot of time spent parsing the HTML which is expected since HTML is huge).
The text was updated successfully, but these errors were encountered: