[RFC] feat(linter): support persisted rule state #5799

DonIsaac · 2024-09-16T03:17:45Z

Provides a way for rules to maintain state across invocations of run, run_on_symbol, and run_once without forcing those functions to take &mut self.

Note

I really do not like this approach. It's way too loose w.r.t Rust's type checker, and solutions requiring Any are red flags to me. I'd much prefer rules to create state ad-hoc inside of run_once and then iterate over nodes themselves.

I also didn't test this code at all. For now, this PR is meant to be a conversation starter.

graphite-app · 2024-09-16T03:17:50Z

Your org has enabled the Graphite merge queue for merging into main

Add the label “0-merge” to the PR and Graphite will automatically add it to the merge queue when it’s ready to merge. Or use the label “hotfix” to add to the merge queue as a hot fix.

You must have a Graphite account and log in to Graphite in order to use the merge queue. Sign up using this link.

DonIsaac · 2024-09-16T03:18:03Z

[RFC] feat(linter): support persisted rule state #5799 👈
perf(linter): move shared context info to ContextHost #5795 : 1 other dependent PR (#5797 )
main

This stack of pull requests is managed by Graphite. Learn more about stacking.

Join @DonIsaac and the rest of your teammates on Graphite

codspeed-hq · 2024-09-16T03:25:14Z

CodSpeed Performance Report

Merging #5799 will not alter performance

_{Comparing don/09-15-feat_linter_support_persisted_rule_state (32c4acd) with main (bb95306)}

Summary

✅ 29 untouched benchmarks

overlookmotel · 2024-09-16T18:37:01Z

I haven't delved fully into this but...

I really do not like this approach.

Don't do it then! You are the chief of the linter, it's your call.

Also, it seems my original issue #5204 which started this all was based on a fundamental misunderstanding (I didn't realize that Rules are shared across threads). So it may be that what I suggested was just a terrible idea! Please don't bend over backwards doing stuff you're not comfortable with just to satisfy my ill-informed nonsense.

Side note: What's with the "no hash hasher" crate? This seems quite a weird idea. Maybe I'm missing something, but isn't it a recipe for hashmap collisions?

FxHash is surprisingly cheap for hashing integers. Check out how simple it is:

https://github.com/rust-lang/rustc-hash/blob/eb049a8209f58003957c34477a2d8d2729f6b633/src/lib.rs#L96-L101
https://github.com/rust-lang/rustc-hash/blob/eb049a8209f58003957c34477a2d8d2729f6b633/src/lib.rs#L167-L190

That is just 3 CPU ops to hash a u32 or u64.

DonIsaac · 2024-09-16T21:53:40Z

I haven't delved fully into this but...

I really do not like this approach.

Don't do it then! You are the chief of the linter, it's your call.

Also, it seems my original issue #5204 which started this all was based on a fundamental misunderstanding (I didn't realize that Rules are shared across threads). So it may be that what I suggested was just a terrible idea! Please don't bend over backwards doing stuff you're not comfortable with just to satisfy my ill-informed nonsense.

Apologies for never responding with my thoughts. I do not enjoy writing and my thoughts about it could be quite long :P I'll get them down for you though.

Side note: What's with the "no hash hasher" crate? This seems quite a weird idea. Maybe I'm missing something, but isn't it a recipe for hashmap collisions?

FxHash is surprisingly cheap for hashing integers. Check out how simple it is:

https://github.com/rust-lang/rustc-hash/blob/eb049a8209f58003957c34477a2d8d2729f6b633/src/lib.rs#L96-L101 https://github.com/rust-lang/rustc-hash/blob/eb049a8209f58003957c34477a2d8d2729f6b633/src/lib.rs#L167-L190

That is just 3 CPU ops to hash a u32 or u64.

Its a crate that hashes small primitives using the identity function. It's supposed to replace a sparse array. There's a chance for collision if your have duplicate keys. In this case, all rules are guaranteed to have a unique ID, so this won't be a problem.

overlookmotel · 2024-09-17T09:24:56Z

I do not enjoy writing and my thoughts about it could be quite long :P I'll get them down for you though.

You don't need to explain yourself for me. I just wanted to make sure I hadn't pushed you in a direction you didn't really want to go in.

Its a crate that hashes small primitives using the identity function. It's supposed to replace a sparse array. There's a chance for collision if your have duplicate keys. In this case, all rules are guaranteed to have a unique ID, so this won't be a problem.

As far as I can see, that's not quite true...

nohash_hasher::IntMap is a wrapper around std::collections::HashMap, which is itself a wrapper around hashbrown.

If the keys are u32, for example, it doesn't allocate u32::MAX + 1 buckets (4 billion!). It'll instead allocate 1 << power buckets (power depending on how many entries in the hash map), and keys are converted to bucket index with key % (1 << power) (i.e. take power bottom bits). If power is 8, then capacity is 256, so keys 0, 256, 512, etc will share the same bucket (collision).

So the question isn't "are the keys unique?" but "are the bottom bits of keys unique?"

In addition, hashbrown uses the top 7 bits of hash to speed up searches, so you want those bits to have good entropy. If your keys don't exceed 1 << (32 - 7) (33 million), and you use an identity hash function, then the top 7 bits of every hash will be 0. So this hobbles part of what makes hashbrown fast.

There will probably be some use cases where the pattern of keys being used may fit well with this scheme (you'd hope so, that crate has 13 million downloads!). But in many cases, it won't - the cost of hashing with a cheap hash function like FxHash will be less than the cost of frequent collisions.

I see the repo for nohash_hasher was archived in 2022. Perhaps identity hash function made sense before std switched to using hashbrown as its underlying hash table implementation a few years ago, but now it's less advantageous. Perhaps - that's just a wild guess.

TLDR: I can't say for sure whether the pattern of keys here is suitable for nohash_hasher, but quite possibly not. I'd recommend at least testing FxHashMap and benchmarking the difference.

This was referenced Sep 16, 2024

perf(linter): move shared context info to ContextHost #5795

Merged

perf(linter): remove redundant clone of diagnostics in context #5797

Merged

DonIsaac mentioned this pull request Sep 16, 2024

perf(linter): shrink size of DisableDirectives #5798

Merged

github-actions bot added the A-linter Area - Linter label Sep 16, 2024

DonIsaac changed the title ~~feat(linter): support persisted rule state~~ [RFC] feat(linter): support persisted rule state Sep 16, 2024

DonIsaac marked this pull request as ready for review September 16, 2024 03:21

Boshen changed the base branch from don/09-15-perf_linter_move_shared_context_info_to_contexthost_ to graphite-base/5799 September 16, 2024 04:22

Boshen force-pushed the graphite-base/5799 branch from ecaa30e to e413cad Compare September 16, 2024 04:35

Boshen force-pushed the don/09-15-feat_linter_support_persisted_rule_state branch from a8b8745 to 58cc4f0 Compare September 16, 2024 04:35

Boshen changed the base branch from graphite-base/5799 to main September 16, 2024 04:36

Boshen force-pushed the don/09-15-feat_linter_support_persisted_rule_state branch from 58cc4f0 to 19dce4e Compare September 16, 2024 04:36

feat(linter): support persisted rule state

32c4acd

DonIsaac force-pushed the don/09-15-feat_linter_support_persisted_rule_state branch from 19dce4e to 32c4acd Compare September 16, 2024 12:39

DonIsaac closed this Sep 19, 2024

Boshen deleted the don/09-15-feat_linter_support_persisted_rule_state branch October 14, 2024 03:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] feat(linter): support persisted rule state #5799

[RFC] feat(linter): support persisted rule state #5799

DonIsaac commented Sep 16, 2024 •

edited

Loading

graphite-app bot commented Sep 16, 2024

DonIsaac commented Sep 16, 2024 •

edited

Loading

codspeed-hq bot commented Sep 16, 2024 •

edited

Loading

overlookmotel commented Sep 16, 2024 •

edited

Loading

DonIsaac commented Sep 16, 2024

overlookmotel commented Sep 17, 2024 •

edited

Loading

[RFC] feat(linter): support persisted rule state #5799

[RFC] feat(linter): support persisted rule state #5799

Conversation

DonIsaac commented Sep 16, 2024 • edited Loading

Note

graphite-app bot commented Sep 16, 2024

Your org has enabled the Graphite merge queue for merging into main

DonIsaac commented Sep 16, 2024 • edited Loading

codspeed-hq bot commented Sep 16, 2024 • edited Loading

CodSpeed Performance Report

Merging #5799 will not alter performance

Summary

overlookmotel commented Sep 16, 2024 • edited Loading

DonIsaac commented Sep 16, 2024

overlookmotel commented Sep 17, 2024 • edited Loading

DonIsaac commented Sep 16, 2024 •

edited

Loading

DonIsaac commented Sep 16, 2024 •

edited

Loading

codspeed-hq bot commented Sep 16, 2024 •

edited

Loading

overlookmotel commented Sep 16, 2024 •

edited

Loading

overlookmotel commented Sep 17, 2024 •

edited

Loading