Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Static checker for the spec? #10483

Open
jmdyck opened this issue Jul 15, 2024 · 5 comments
Open

Static checker for the spec? #10483

jmdyck opened this issue Jul 15, 2024 · 5 comments

Comments

@jmdyck
Copy link
Contributor

jmdyck commented Jul 15, 2024

What is the issue with the HTML Standard?

While I imagine that the build process detects certain kinds of errors in the source file, it clearly fails to detect a lot too.

I've been looking at the feasibility of developing a static checker for the HTML spec. It seems to me that chunks of the spec (in particular, the algorithms) are formulaic enough that they might be amenable to parsing, followed by static analysis. This could potentially detect:

  • typos
  • incorrect or missing markup
  • argument/parameter mismatches (e.g., missing args, extra args, out-of-order args)
  • variable errors (use-before-definition, defined with no use)
  • "type" mismatches

(The idea is roughly similar to what I've done for the ECMAScript spec over at my ecmaspeak-py repo.)

So, two questions:

  • Has this idea come up before? If so, I'd be interested to see the discussion. (I looked in issues and PRs (open and closed) for anything like this, but didn't find anything. I particularly combed through the labels "spec tooling" and "meta".)

  • Would the editors be amenable if I were to suggest tweaks to the spec to make this more feasible? I don't think it would be anything major, I would mostly just identify cases where the spec uses multiple different ways to express the same thing, and have the editors pick the approved way(s) to express it.

@domenic
Copy link
Member

domenic commented Jul 16, 2024

Awesome to see you over here. Given your recent PRs, and how you've improved the JS spec, I was hoping you'd post something like this :)

There are some previous discussions on small aspects of this in https://github.com/whatwg/html-build/issues and https://github.com/whatwg/meta/issues . E.g. whatwg/meta#190 or whatwg/html-build#89 . But nothing too comprehensive.

We would definitely be amenable to making the spec more consistent and accepting PRs that do so.

If we were to make this a more official/required part of the CI process, I'd have a few small things to raise:

  • How does this integrate with our existing lint.sh? Can it replace or subsume it?
  • Right now Python is not a required dependency for html-build. Indeed, we've been trying to write new parts of the build tooling in Rust. Ideally any new tooling would be part of that Rust pipeline. But, having tooling is better than not having tooling, so if Python is your preference, go for it!
  • HTML is very special in various ways. But, some proposals are probably applicable across all WHATWG specs. If there were a way to factor those shared checks out and run them on all WHATWG specs, that would be a nice bonus. (Or at least all WHATWG specs where the editor wants to be consistent with HTML.)
  • Since you mentioned the algorithms: for non-HTML specs, where we use Bikeshed, we try to add <div algorithm>/</div> wrappers (which Bikeshed converts to <div class="algorithm">/</div>). This has two consequences: 1) Bikeshed checks every <var> is used at least once, to catch typos; 2) Bikeshed includes a script which lets you click on a variable to highlight other uses of it. It'd be awesome to do at least 2) for HTML, and maybe 1), but nobody has taken the time to wrap up the algorithms.

@annevk
Copy link
Member

annevk commented Jul 16, 2024

This sounds great! The lint rules for non-HTML are in https://github.com/whatwg/whatwg.org/blob/main/resources.whatwg.org/build/deploy.sh FWIW.

@tabatkins
Copy link
Contributor

tabatkins commented Jul 16, 2024

This has two consequences: 1) Bikeshed checks every var is used at least once, to catch typos; 2) Bikeshed includes a script which lets you click on a variable to highlight other uses of it. It'd be awesome to do at least 2) for HTML, and maybe 1), but nobody has taken the time to wrap up the algorithms.

2 is done by an isolated JS script, too, which should be very easy to drop in: https://github.com/speced/bikeshed/blob/main/bikeshed/stylescript/var-click-highlighting.js (and https://github.com/speced/bikeshed/blob/main/bikeshed/stylescript/var-click-highlighting.css for the supporting CSS)

1 is done in Python, but the algo is fairly simple and should be an easy port to JS or Rust: https://github.com/speced/bikeshed/blob/main/bikeshed/unsortedJunk.py#L134
(all the h.* methods are over in the dom module)

@jmdyck
Copy link
Contributor Author

jmdyck commented Jul 17, 2024

If we were to make this a more official/required part of the CI process,

I don't know. I'm expecting (based on my experience with the ECMAScript spec) that this tool will produce a lot of 'false positives', which presumably is not something you want in a CI process.

  • How does this integrate with our existing lint.sh? Can it replace or subsume it?

Possibly.

At first I thought no, because lint.sh greps through the full text of the spec, whereas I'm thinking that my checker would focus only on those chunks where it has a hope of 'understanding' what the text is saying, and skip everything else. But I suppose instead of skipping those other chunks entirely, it could at least do a grep for certain patterns like lint.sh does.

  • Right now Python is not a required dependency for html-build. Indeed, we've been trying to write new parts of the build tooling in Rust. Ideally any new tooling would be part of that Rust pipeline. But, having tooling is better than not having tooling, so if Python is your preference, go for it!

I do like Python, but over the course of the ecmaspeak project, I've become somewhat frustrated with it, and the reasons would probably apply to this project too.

I'm interested in Rust, but the learning curve looks steep, and I'm doubtful that this is a good first Rust project.

I've used Crystal for some preliminary work, but I imagine you'd be even less keen on adding that to the build pipeline.

  • [...] If there were a way to factor those shared checks out and run them on all WHATWG specs, that would be a nice bonus. (Or at least all WHATWG specs where the editor wants to be consistent with HTML.)

I agree, but I think it'll to be hard enough just to target the HTML spec. Applicability to other specs would be farther down the road.

  • for non-HTML specs, [...], we try to add <div algorithm>/</div> wrappers [...], but nobody has taken the time to wrap up the algorithms [in the HTML spec].

I could certainly start by wrapping up the algorithms in the HTML spec, as that would make them easier to find for all my subsequent work. But it raises a bunch of questions, which I think I'll pose in a separate issue.

@jmdyck
Copy link
Contributor Author

jmdyck commented Jul 23, 2024

I could certainly start by wrapping up the algorithms in the HTML spec, as that would make them easier to find for all my subsequent work. But it raises a bunch of questions, which I think I'll pose in a separate issue.

In order to figure out all the questions, I started to do the work, and just kept going. So it'll probably be a draft PR rather than an issue. But something else has come up, so it'll probably be a few weeks before I get back to it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

4 participants