Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WebAssembly amphtml-validator is much slower on GCE e2-micro instance #37585

Closed
derat opened this issue Feb 5, 2022 · 2 comments
Closed

WebAssembly amphtml-validator is much slower on GCE e2-micro instance #37585

derat opened this issue Feb 5, 2022 · 2 comments

Comments

@derat
Copy link

derat commented Feb 5, 2022

Description

I run the command-line amphtml-validator program on a Compute Engine e2-micro instance. I just started receiving the error added by #34213 for #36110:

The native JavaScript AMPHTML Validator (validator.js) has been turned down. If you are seeing this error, update your tooling to instead load the API compatible WebAssembly AMPHTML Validator (validator_wasm.js) instead.

After upgrading to 1.0.35, the validator is much slower, to the point where it's no longer suitable for my workflow. Validating https://raw.githubusercontent.com/ampproject/amphtml/master/validator/testdata/feature_tests/minimum_valid_amp.html takes more than 5 seconds of wall time:

% time /home/node/bin/amphtml-validator --format=json - <minimum_valid_amp.html
{"-":{"status":"PASS","errors":[]}}
/home/node/bin/amphtml-validator --format=json - < minimum_valid_amp.html  7.02s user 0.34s system 140% cpu 5.251 total
% time /home/node/bin/amphtml-validator --format=json - <minimum_valid_amp.html
{"-":{"status":"PASS","errors":[]}}
/home/node/bin/amphtml-validator --format=json - < minimum_valid_amp.html  7.54s user 0.31s system 139% cpu 5.612 total

I'm not sure how to run the old non-WebAssembly version for comparison (since it returns errors now), but I believe it was typically able to validate much more complicated pages in around a second on the same instance.

Now that amphtml-validator is wrapping a validator that's apparently written in C++, would a natively-compiled version of amphtml-validator be a possibility for the future, or are there too many additional dependencies on Node.js?

Alternately, are there any suggestions for getting the new validator to run faster on slow VM instances? I'd prefer to stick with the Debian-maintained version of Node.js, but I could switch to e.g. Node.js 16 if there's reason to believe that Wasm performance will be significantly better there.

Reproduction Steps

  1. Create an e2-micro GCE instance using Debian 11.2.
  2. Install the nodejs 12.22.5~dfsg-2~11u1 and npm 7.5.2+ds-2 Debian packages.
  3. Run npm install amphtml-validator.
  4. Pass a trivial file to the amphtml-validator program via stdin.

Relevant Logs

No response

Browser(s) Affected

No response

OS(s) Affected

No response

Device(s) Affected

No response

AMP Version Affected

No response

@derat derat added the Type: Bug label Feb 5, 2022
@antiphoton
Copy link
Member

It is time inefficient to use amphtml-validator CLI to validate a single HTML file, because loading WebAssembly is a time-consuming step.

  • Option 1
    amphtml-validator command line interface accepts multiple input HTML files

    amphtml-validator --format=json minimum_valid_amp.html amp_geo.html amp_gist.html
  • Option 2
    amphtml-validator exports a getInstance function, so that you can re-use the instance to multiple files.

    const {
      getInstance,
    } = require('amphtml-validator');
    const validator = await getInstance();
    
    console.log(validator.validateString(fs.readFileSync('minimum_valid_amp.html', 'utf8')));
    console.log(validator.validateString(fs.readFileSync('amp_geo.html', 'utf8')));
    console.log(validator.validateString(fs.readFileSync('amp_gist.html', 'utf8')));

@derat
Copy link
Author

derat commented Feb 7, 2022

Thanks for the reply! I wasn't aware that multiple files could be passed to amphtml-validator, but I can confirm that on the GCE instance I was using before, it takes essentially the same amount of wall time (~5.7s) to validate 27 "real" files in a single run as it does to validate minimum_valid_amp.html. I'll find a way to process everything in a single call.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants