Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider passing -Awarnings (or similar) to avoid false alarms from lint *reporting* #1819

Open
pnkfelix opened this issue Feb 20, 2024 · 7 comments

Comments

@pnkfelix
Copy link
Member

pnkfelix commented Feb 20, 2024

See rust-lang/rust#117772 (comment), quoted here to sidestep github's collapsing of comments:

On a closer inspection, the results for ripgrep, regex and html5ever are quite misleading. Those crates actually have redundant imports, so the lint reporting machinery (lint level checking, diagnostic printing) is invoked now, and that's what is causing regressions, not the check_unused pass itself.

The unused-warnings regression is legitimate, but it's a stress test for imports specifically.

I think we can merge this after addressing #117772 (comment).

There were a number of benchmarks that starting hitting a lint and thus starting exercising lint reporting machinery that was previously uninvoked.

Now: If a lint check is expensive to exercise, then we should measure that as part of benchmarking.

But if a new lint fires and causes new diagnostic code paths to run, that sounds like noise to me.

It would be better, IMO, if we silenced the diagnostic machinery from running at all in such cases, except for isolated benchmarks that are themselves dedicated to exercising the efficiency of the diagnostic machinery.

What do others think?

@pnkfelix
Copy link
Member Author

(If the -A flag itself may cause micro-optimizations that cause a given lint to be skipped entirely, then that would defeat the purpose of what I seek here, and I would in that scenario propose a new -Z flag for silencing the diagnostic reporting alone without making any other effect take place...)

@Kobzol
Copy link
Contributor

Kobzol commented Feb 21, 2024

I don't think that we do some short-circuit on allowed lints, although there have been some unsuccessful attempts to do that in the past, IIRC.

Is it such a big problem in practice that sometimes new lints cause the diagnostics to be triggered? It probably only happens in the stress tests.

Disabling the output completely could potentially hide regressions or something bad happening in the output itself.

@nnethercote
Copy link
Contributor

I'm inclined to keep our compilation pipeline as vanilla as possible, i.e. no special flags.

Now: If a lint check is expensive to exercise, then we should measure that as part of benchmarking.

But if a new lint fires and causes new diagnostic code paths to run, that sounds like noise to me.

I'm not sure I understand the difference between these two cases. But if a new lint is added and it causes a benchmark to slow down, I think that's a useful signal. We may decide that the slowdown is worth it. I wouldn't want the slowdown to be hidden.

@pnkfelix
Copy link
Member Author

pnkfelix commented Mar 13, 2024

I guess the case I'm worried about is our benchmarks getting a significant hit due to diagnostic I/O overhead.

Here's my mental model: if a lint is added, and its firing on someone's code, and the added overhead is roughly proportional to the amount of diagnostic output generated (and the amount of extra output is roughly proportional to the number of violations of the lint in the underlying code, perhaps in a post macro-expanded form), then I would consider that to be something you could consider amortized : yes, the compilation has slowed down, but that extra cost is presumably being "paid off" by the value of whatever this lint is flagging about the underlying code.

On the other hand: If a lint is added, and the actual act of running the check is costly, and thus there is an overhead that is paid even for code that doesn't see output from the lint, then that should not be considered as being paid of in the same manner described above.


So, here's a throwaway thought: Could we pipe the stderr output to /dev/null during benchmarking? Or have a -Z flag that had some similar effect, but solel for the lint diagnostic output? Could that at all help to address the distinction I'm trying to draw above?

It really is a throwaway thought, and I'm tempted to just close this issue since I can tell that my ideas here are not truly fully formed. And I agree with @nnethercote that this isn't "noise" in the usual sense that we use it for spurious regressions...

@Kobzol
Copy link
Contributor

Kobzol commented Mar 13, 2024

Hmm, now that you mention it, why don't we already pipe stdout/stderr to /dev/null 🤔 @Mark-Simulacrum do you remember any justification for this? Seems like that could also slightly reduce noise. If the benchmark fails, we could re-run it again with piped stdout/stderr to find the compilation error.

@Mark-Simulacrum
Copy link
Member

It makes it super painful to debug actual failure if we swallow output, especially since that means debugging can't be done purely from logs visible on the public page.

@Mark-Simulacrum
Copy link
Member

Oh, I stopped reading. Anyway, I think it would be ok to rerun to actually get the output.. but it feels like needless complexity, and I'd be worried about losing coverage (e.g., imagine tty checking starts happening on every character in some slow way). But I'm not too worried about it.

Overall, historically we've just ignored these one-off bumps. They do affect historical/long-term metrics views but those are far less important than the commit -by-commit comparison anyway.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants