Add new tidy check to ensure that rustdoc DOM IDs are all declared as expected #86178

GuillaumeGomez · 2021-06-09T20:37:24Z

We need to declare the IDs in a map in rustdoc to prevent the possibility to have conflicts when generating DOM. Until now, it was checked manually, which was very "unsafe", as the second commit of this PR proves it (second commit was moved into #86819 and merged).

cc @jsha
r? @Mark-Simulacrum

src/tools/tidy/src/rustdoc_html_ids.rs

Mark-Simulacrum

I would strongly prefer to avoid parsing free-form Rust code; rather something like the feature gate declarations which are in an ad-hoc but simple(r) DSL seem preferable, along with the requisite // START-DOM-ID-DECLS or so section, rather than looking for the function name.

I'm wary that the test as-is seems to reach into rustdoc's HTML generation and broadly speaking feels pretty invasive in terms of trying to match things, but at the same time feels pretty unlikely to be "good enough" to actually close out this problem. Do we have cases where not having this tidy check led to problems (e.g., issues, discussions, etc.)? Maybe some broader discussion amongst the rustdoc team that indicates that both a) this is a real problem and b) this solution is going to solve it sufficiently and not cause more pain than necessary?

src/tools/tidy/src/rustdoc_html_ids.rs

Mark-Simulacrum · 2021-06-20T22:21:06Z

The second commit of this PR adds a few declarations in the id map, but it's not clear to me what value that has precisely -- it doesn't seem like the generated HTML today had any problems, so that fix is not fully clear to me as to its purpose (especially when viewed in comparison to the declaration of the id "help" as ignored, rather than adding it to the map)...

GuillaumeGomez · 2021-06-20T23:04:21Z

It's prevention. It doesn't have issues in the std/core/test docs because we check them, but it very likely has issues in some other crates. The goal of this ID map is to replace duplicated IDs because you can't have the same ID used twice on the same page (it's invalid). My other comment above explain a bit more.

I'll update to fix the comment.

Mark-Simulacrum · 2021-06-20T23:18:47Z

I understand that in theory this prevents bugs, but if we've never encountered a bug due to this I'm a little hesitant to start parsing arbitrary Rust code in tidy via regexes to try to catch it.

cc @rust-lang/rustdoc

jyn514 · 2021-06-20T23:44:14Z

I am not super familiar with this part of the code, but I think I lean towards Mark's perspective, if we've never had bugs related to this it seems silly to spend lots of effort on it.

Manishearth · 2021-06-21T02:46:15Z

I also lean towards Mark's perspective. Parsing rust code this way seems rather invasive and I don't think that's how we should achieve this, and I'm unsure if this check is necessary in the first place.

GuillaumeGomez · 2021-06-23T14:07:30Z

I understand that in theory this prevents bugs, but if we've never encountered a bug due to this I'm a little hesitant to start parsing arbitrary Rust code in tidy via regexes to try to catch it.

We did, quite a few times. It's just that I sent PRs to add the ID in the ID map and that stopped there. But it happened. I don't have enough time to start writing checks if I don't think they're useful. ;)

However, maybe the approach here isn't the right one. If you have another idea on how to enforce it, I'd be very happy to hear it! (Parsing rust source code with regex is really not great...)

GuillaumeGomez · 2021-06-23T14:27:43Z

Updated/rebased: a new ID was created and wasn't added to the ID map (deref-methods). I can provide more explanations why having the same ID present more than once in the same page is a problem:

It's an anchor issue. For a long time, we detected them in the std when the html check we have detected that a page has the same ID more than once. We still do, but it actually doesn't cover everything, far from it. From time to time, someone tells me they found a conflict in IDs. In such case, I simply add the ID to the map and then it's fixed when the version is available to the user.

However, it happened quite a lot of time. At some point we even had two ID maps which were kinda working together but it was a huge mess. I simplified things a bit recently. Then @jsha started reworking the front-end a lot and I realized that IDs have been modified/created (and forgotten to be added to this ID map, creating new potential issues). So to prevent this problem to go unnoticed until someone comes to me directly, I wrote this check.

Now I completely agree that parsing Rust (and now HTML too because of the template) with regex is very suboptimal to say the least, but I didn't find a better idea... We need this test, but like I said, if you have a better idea on how to check the IDs, I'd be very happy to hear about it!

jyn514 · 2021-07-03T19:58:45Z

However, maybe the approach here isn't the right one. If you have another idea on how to enforce it, I'd be very happy to hear it! (Parsing rust source code with regex is really not great...)

https://github.com/rust-lang/rust/pull/86178/files#r649044525 seems like a good idea to me, we could use JSON or something something so it's easier to parse. Parsing a 15 line file once per run doesn't seem too expensive.

GuillaumeGomez · 2021-07-03T22:04:00Z

But this is half the problem like I said: we still need to check in the rustdoc source code that IDs are actually used, and that IDs in the rustdoc source code are declared inside the map. Making their declaration better is only a small part of the problem.

jyn514 · 2021-07-03T22:12:26Z

@GuillaumeGomez I don't know how to test all the IDs are used, but you can check they're in the map by forcing all the IDs to come from the map, something like this:

// pretend this has once_cell or something
static HTML_IDS: &[&str] = parse_json(include_str!("src/librustdoc/html_ids.json"));
write_str!(buffer, HTML_IDS[html_id::TRAIT_IMPLEMENTATIONS_LIST]);

GuillaumeGomez · 2021-07-03T22:30:11Z

But unfortunately that doesn't work inside the tera template files. Also, it doesn't prevent someone to use id=... (we can always try to catch them in reviews, but it's not the best fail-safe...).

jyn514 · 2021-07-05T02:15:31Z

Does your PR currently work for tera files?

Also, it doesn't prevent someone to use id=... (we can always try to catch them in reviews, but it's not the best fail-safe...).

Sure, but it's a lot easier to catch "you used an id without going through the html_ids module" than "the hard-coded ID you used needs to be updated in html::markdown" IMO. In particular my idea will catch refactors from Rust to JS and the other way around automatically without needing review.

GuillaumeGomez · 2021-07-05T08:15:27Z

Does your PR currently work for tera files?

Yes. I simply added the .html files to the extensions. Since it's a regex, it doesn't care much...

Sure, but it's a lot easier to catch "you used an id without going through the html_ids module" than "the hard-coded ID you used needs to be updated in html::markdown" IMO. In particular my idea will catch refactors from Rust to JS and the other way around automatically without needing review.

But that would still require this regex to check it. ;)

Also, we currently don't check in js because there is simply no way to do so...

… r=jyn514 Clean up rustdoc IDs I cherry-picked the commit from rust-lang#86178. It adds missing rustdoc IDs (for the HTML) and remove unused ones. cc `@camelid` r? `@jyn514`

bors · 2021-07-07T06:27:59Z

☔ The latest upstream changes (presumably #86920) made this pull request unmergeable. Please resolve the merge conflicts.

Mark-Simulacrum

I am still a little uncertain that this is worthwhile to pursue, but if we do, I would like some comments a the top of the tidy file added describing the failure modes and how we're handling them. For example: new way of generating IDs is added to rustdoc. How do we catch that? (Or, if we don't, that is useful to document too).

I think some discussion of the goals would also be useful. At least, embedding

We need to declare the IDs in a map in rustdoc to prevent the possibility to have conflicts when generating DOM. Until now, it was checked manually, which was very "unsafe", as the second commit of this PR proves it.

into the tidy check top-level doc comment would already be useful, but expanding on what is unsafe about it (i.e., what we're trying to prevent) and how this mitigates that would be good, I think. Part of the problem is that the rustdoc ID map is not documented anywhere (that I can find quickly) which means that there's no basis for understanding its purpose without doing some digging around and having background knowledge of HTML's ID "limitations".

src/tools/tidy/src/rustdoc_html_ids.rs

Mark-Simulacrum · 2021-08-01T14:40:42Z

src/tools/tidy/src/rustdoc_html_ids.rs

+                check_id(path, trimmed.split('"').skip(1).next().unwrap(), ids, line_nb, bad);
+                is_checking_small_section_header = None;
+            }
+        } else if trimmed.starts_with("write_small_section_header(") {


Similarly, it is currently too easy for this function to be renamed or argument order changed -- there's no comments that I can see indicating it's special. I think we should at least have comments to the definition. But still, parsing arbitrary Rust code seems worrisome (easy to get wrong).

@GuillaumeGomez Could you add a doc-comment to write_small_section_header() saying that calls to it are parsed by tidy?

Also, this check seems somewhat fragile since it will only check calls that don't have any non-whitespace characters before them. Could you instead use something like this?

Suggested change

} else if trimmed.starts_with("write_small_section_header(") {

} else if trimmed.contains("write_small_section_header(") && !trimmed.contains("fn write_small_section_header(") {

Adding the doc comment and updating the check.

GuillaumeGomez · 2021-09-15T12:35:12Z

Until now, it was checked manually, which was very "unsafe", as the second commit of this PR proves it.

This part of the PR description seems to be out of date...? I don't see any changes to the contents of the ID map.

Good point, I updated the PR description.

camelid · 2021-09-15T17:16:19Z

We need to declare the IDs in a map in rustdoc to prevent the possibility to have conflicts when generating DOM.

Can you explain this part to me? In what situations can we have ID conflicts? IIUC, all the IDs in init_id_map are used in static HTML generation, so they should only appear once. Why do they need to be tracked in the IdMap?

I would expect that only "dynamic" IDs (e.g., #tymethod.foo on a struct page since there could be multiple trait impls with the function foo) would need to be in the IdMap, and they can't be in init_id_map because we don't know them ahead-of-time.

GuillaumeGomez · 2021-09-15T17:28:00Z

It's when we generate HTML from markdown (for titles). In order to avoid IDs to be duplicated, we list the ones used by rustdoc (the "static" ones, not the "dynamic" ones). Basically, the ones that don't go through our ID map if we don't put them ourselves.

camelid · 2021-09-15T17:38:28Z

It's when we generate HTML from markdown (for titles). In order to avoid IDs to be duplicated, we list the ones used by rustdoc (the "static" ones, not the "dynamic" ones). Basically, the ones that don't go through our ID map if we don't put them ourselves.

Ah, that makes sense, I forgot about headings 👍

camelid · 2021-09-15T17:50:13Z

I just realized that #85833 (comment) is a real-world example of the fragility of the write_small_section_header( check. I'm not sure what to do about this PR; it seems like we should have some kind of tooling to prevent ID conflicts, but this approach doesn't seem robust enough.

GuillaumeGomez · 2021-09-15T17:54:26Z

Yep... I think it's better than nothing while we look for a better idea.

GuillaumeGomez · 2021-09-29T09:11:59Z

ping

camelid · 2021-10-30T00:35:35Z

I think this check might not work with some of the IDs that were landed in the scrape-examples PR. Closing and re-opening (to re-run CI) to see for sure.

GuillaumeGomez · 2021-11-08T15:13:23Z

@camelid CI passed but just in case I rebased on master.

… expected

bors · 2021-12-05T21:49:23Z

☔ The latest upstream changes (presumably #91356) made this pull request unmergeable. Please resolve the merge conflicts.

camelid · 2021-12-10T21:24:44Z

I'm going to close this PR for the reasons I listed on Zulip. Sorry for taking so long to follow up on this.

GuillaumeGomez added T-rustdoc Relevant to the rustdoc team, which will review and decide on the PR/issue. T-infra Relevant to the infrastructure team, which will review and decide on the PR/issue. labels Jun 9, 2021

rust-highfive assigned Mark-Simulacrum Jun 9, 2021

rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jun 9, 2021

the8472 reviewed Jun 10, 2021

View reviewed changes

src/tools/tidy/src/rustdoc_html_ids.rs Show resolved Hide resolved

Mark-Simulacrum reviewed Jun 20, 2021

View reviewed changes

src/tools/tidy/src/rustdoc_html_ids.rs Outdated Show resolved Hide resolved

src/tools/tidy/src/rustdoc_html_ids.rs Outdated Show resolved Hide resolved

Mark-Simulacrum added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jun 20, 2021

GuillaumeGomez force-pushed the rustdoc-id-tidy-check branch from 07e0162 to 6584dbf Compare June 23, 2021 14:22

GuillaumeGomez force-pushed the rustdoc-id-tidy-check branch from 6584dbf to 708711b Compare June 23, 2021 14:30

GuillaumeGomez mentioned this pull request Jul 2, 2021

Clean up rustdoc IDs #86819

Merged

GuillaumeGomez force-pushed the rustdoc-id-tidy-check branch from 708711b to 03ed00e Compare July 7, 2021 09:29

GuillaumeGomez removed the S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. label Jul 12, 2021

crlf0710 added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jul 31, 2021

Mark-Simulacrum reviewed Aug 1, 2021

View reviewed changes

camelid added the T-bootstrap Relevant to the bootstrap subteam: Rust's build system (x.py and src/bootstrap) label Sep 13, 2021

GuillaumeGomez force-pushed the rustdoc-id-tidy-check branch from 46d9dcd to 0cf8877 Compare September 15, 2021 12:31

camelid added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Sep 15, 2021

GuillaumeGomez removed the S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. label Sep 29, 2021

JohnCSimon added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Oct 19, 2021

camelid closed this Oct 30, 2021

camelid reopened this Oct 30, 2021

Move HTML ID list creation into a macro

1c4e49a

GuillaumeGomez force-pushed the rustdoc-id-tidy-check branch from 0cf8877 to 9375fde Compare November 8, 2021 15:13

This comment has been minimized.

Sign in to view

GuillaumeGomez and others added 2 commits November 8, 2021 16:31

Add new tidy check to ensure that rustdoc DOM IDs are all declared as…

af66036

… expected

Improve code and add documentation

802a293

GuillaumeGomez force-pushed the rustdoc-id-tidy-check branch from 9375fde to 802a293 Compare November 8, 2021 15:32

JohnCSimon added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Dec 5, 2021

camelid closed this Dec 10, 2021

GuillaumeGomez mentioned this pull request Dec 17, 2021

Implement namespacing for doc comments IDs #92043

Closed

GuillaumeGomez deleted the rustdoc-id-tidy-check branch August 19, 2024 12:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add new tidy check to ensure that rustdoc DOM IDs are all declared as expected #86178

Add new tidy check to ensure that rustdoc DOM IDs are all declared as expected #86178

GuillaumeGomez commented Jun 9, 2021 •

edited

Loading

Mark-Simulacrum left a comment

Mark-Simulacrum commented Jun 20, 2021

GuillaumeGomez commented Jun 20, 2021

Mark-Simulacrum commented Jun 20, 2021

jyn514 commented Jun 20, 2021

Manishearth commented Jun 21, 2021

GuillaumeGomez commented Jun 23, 2021

GuillaumeGomez commented Jun 23, 2021

jyn514 commented Jul 3, 2021

GuillaumeGomez commented Jul 3, 2021

jyn514 commented Jul 3, 2021

GuillaumeGomez commented Jul 3, 2021

jyn514 commented Jul 5, 2021 •

edited

Loading

GuillaumeGomez commented Jul 5, 2021

bors commented Jul 7, 2021

Mark-Simulacrum left a comment

Mark-Simulacrum Aug 1, 2021

camelid Sep 13, 2021

GuillaumeGomez Sep 15, 2021

GuillaumeGomez commented Sep 15, 2021

camelid commented Sep 15, 2021 •

edited

Loading

GuillaumeGomez commented Sep 15, 2021

camelid commented Sep 15, 2021

camelid commented Sep 15, 2021 •

edited

Loading

GuillaumeGomez commented Sep 15, 2021

GuillaumeGomez commented Sep 29, 2021

camelid commented Oct 30, 2021

GuillaumeGomez commented Nov 8, 2021

This comment has been minimized.

bors commented Dec 5, 2021

camelid commented Dec 10, 2021 •

edited

Loading

	} else if trimmed.starts_with("write_small_section_header(") {
	} else if trimmed.contains("write_small_section_header(") && !trimmed.contains("fn write_small_section_header(") {

Add new tidy check to ensure that rustdoc DOM IDs are all declared as expected #86178

Add new tidy check to ensure that rustdoc DOM IDs are all declared as expected #86178

Conversation

GuillaumeGomez commented Jun 9, 2021 • edited Loading

Mark-Simulacrum left a comment

Choose a reason for hiding this comment

Mark-Simulacrum commented Jun 20, 2021

GuillaumeGomez commented Jun 20, 2021

Mark-Simulacrum commented Jun 20, 2021

jyn514 commented Jun 20, 2021

Manishearth commented Jun 21, 2021

GuillaumeGomez commented Jun 23, 2021

GuillaumeGomez commented Jun 23, 2021

jyn514 commented Jul 3, 2021

GuillaumeGomez commented Jul 3, 2021

jyn514 commented Jul 3, 2021

GuillaumeGomez commented Jul 3, 2021

jyn514 commented Jul 5, 2021 • edited Loading

GuillaumeGomez commented Jul 5, 2021

bors commented Jul 7, 2021

Mark-Simulacrum left a comment

Choose a reason for hiding this comment

Mark-Simulacrum Aug 1, 2021

Choose a reason for hiding this comment

camelid Sep 13, 2021

Choose a reason for hiding this comment

GuillaumeGomez Sep 15, 2021

Choose a reason for hiding this comment

GuillaumeGomez commented Sep 15, 2021

camelid commented Sep 15, 2021 • edited Loading

GuillaumeGomez commented Sep 15, 2021

camelid commented Sep 15, 2021

camelid commented Sep 15, 2021 • edited Loading

GuillaumeGomez commented Sep 15, 2021

GuillaumeGomez commented Sep 29, 2021

camelid commented Oct 30, 2021

GuillaumeGomez commented Nov 8, 2021

This comment has been minimized.

bors commented Dec 5, 2021

camelid commented Dec 10, 2021 • edited Loading

GuillaumeGomez commented Jun 9, 2021 •

edited

Loading

jyn514 commented Jul 5, 2021 •

edited

Loading

camelid commented Sep 15, 2021 •

edited

Loading

camelid commented Sep 15, 2021 •

edited

Loading

camelid commented Dec 10, 2021 •

edited

Loading