Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] feat(linter): support persisted rule state #5799

Closed

Conversation

DonIsaac
Copy link
Contributor

@DonIsaac DonIsaac commented Sep 16, 2024

Provides a way for rules to maintain state across invocations of run, run_on_symbol, and run_once without forcing those functions to take &mut self.

Note

I really do not like this approach. It's way too loose w.r.t Rust's type checker, and solutions requiring Any are red flags to me. I'd much prefer rules to create state ad-hoc inside of run_once and then iterate over nodes themselves.

I also didn't test this code at all. For now, this PR is meant to be a conversation starter.

Copy link

graphite-app bot commented Sep 16, 2024

Your org has enabled the Graphite merge queue for merging into main

Add the label “0-merge” to the PR and Graphite will automatically add it to the merge queue when it’s ready to merge. Or use the label “hotfix” to add to the merge queue as a hot fix.

You must have a Graphite account and log in to Graphite in order to use the merge queue. Sign up using this link.

Copy link
Contributor Author

DonIsaac commented Sep 16, 2024

This stack of pull requests is managed by Graphite. Learn more about stacking.

Join @DonIsaac and the rest of your teammates on Graphite Graphite

@github-actions github-actions bot added the A-linter Area - Linter label Sep 16, 2024
@DonIsaac DonIsaac changed the title feat(linter): support persisted rule state [RFC] feat(linter): support persisted rule state Sep 16, 2024
@DonIsaac DonIsaac marked this pull request as ready for review September 16, 2024 03:21
Copy link

codspeed-hq bot commented Sep 16, 2024

CodSpeed Performance Report

Merging #5799 will not alter performance

Comparing don/09-15-feat_linter_support_persisted_rule_state (32c4acd) with main (bb95306)

Summary

✅ 29 untouched benchmarks

@Boshen Boshen changed the base branch from don/09-15-perf_linter_move_shared_context_info_to_contexthost_ to graphite-base/5799 September 16, 2024 04:22
@Boshen Boshen force-pushed the don/09-15-feat_linter_support_persisted_rule_state branch from a8b8745 to 58cc4f0 Compare September 16, 2024 04:35
@Boshen Boshen changed the base branch from graphite-base/5799 to main September 16, 2024 04:36
@Boshen Boshen force-pushed the don/09-15-feat_linter_support_persisted_rule_state branch from 58cc4f0 to 19dce4e Compare September 16, 2024 04:36
@DonIsaac DonIsaac force-pushed the don/09-15-feat_linter_support_persisted_rule_state branch from 19dce4e to 32c4acd Compare September 16, 2024 12:39
@overlookmotel
Copy link
Contributor

overlookmotel commented Sep 16, 2024

I haven't delved fully into this but...

I really do not like this approach.

Don't do it then! You are the chief of the linter, it's your call.

Also, it seems my original issue #5204 which started this all was based on a fundamental misunderstanding (I didn't realize that Rules are shared across threads). So it may be that what I suggested was just a terrible idea! Please don't bend over backwards doing stuff you're not comfortable with just to satisfy my ill-informed nonsense.

Side note: What's with the "no hash hasher" crate? This seems quite a weird idea. Maybe I'm missing something, but isn't it a recipe for hashmap collisions?

FxHash is surprisingly cheap for hashing integers. Check out how simple it is:

https://github.com/rust-lang/rustc-hash/blob/eb049a8209f58003957c34477a2d8d2729f6b633/src/lib.rs#L96-L101
https://github.com/rust-lang/rustc-hash/blob/eb049a8209f58003957c34477a2d8d2729f6b633/src/lib.rs#L167-L190

That is just 3 CPU ops to hash a u32 or u64.

@DonIsaac
Copy link
Contributor Author

I haven't delved fully into this but...

I really do not like this approach.

Don't do it then! You are the chief of the linter, it's your call.

Also, it seems my original issue #5204 which started this all was based on a fundamental misunderstanding (I didn't realize that Rules are shared across threads). So it may be that what I suggested was just a terrible idea! Please don't bend over backwards doing stuff you're not comfortable with just to satisfy my ill-informed nonsense.

Apologies for never responding with my thoughts. I do not enjoy writing and my thoughts about it could be quite long :P I'll get them down for you though.

Side note: What's with the "no hash hasher" crate? This seems quite a weird idea. Maybe I'm missing something, but isn't it a recipe for hashmap collisions?

FxHash is surprisingly cheap for hashing integers. Check out how simple it is:

https://github.com/rust-lang/rustc-hash/blob/eb049a8209f58003957c34477a2d8d2729f6b633/src/lib.rs#L96-L101 https://github.com/rust-lang/rustc-hash/blob/eb049a8209f58003957c34477a2d8d2729f6b633/src/lib.rs#L167-L190

That is just 3 CPU ops to hash a u32 or u64.

Its a crate that hashes small primitives using the identity function. It's supposed to replace a sparse array. There's a chance for collision if your have duplicate keys. In this case, all rules are guaranteed to have a unique ID, so this won't be a problem.

@overlookmotel
Copy link
Contributor

overlookmotel commented Sep 17, 2024

I do not enjoy writing and my thoughts about it could be quite long :P I'll get them down for you though.

You don't need to explain yourself for me. I just wanted to make sure I hadn't pushed you in a direction you didn't really want to go in.

Its a crate that hashes small primitives using the identity function. It's supposed to replace a sparse array. There's a chance for collision if your have duplicate keys. In this case, all rules are guaranteed to have a unique ID, so this won't be a problem.

As far as I can see, that's not quite true...

nohash_hasher::IntMap is a wrapper around std::collections::HashMap, which is itself a wrapper around hashbrown.

If the keys are u32, for example, it doesn't allocate u32::MAX + 1 buckets (4 billion!). It'll instead allocate 1 << power buckets (power depending on how many entries in the hash map), and keys are converted to bucket index with key % (1 << power) (i.e. take power bottom bits). If power is 8, then capacity is 256, so keys 0, 256, 512, etc will share the same bucket (collision).

So the question isn't "are the keys unique?" but "are the bottom bits of keys unique?"

In addition, hashbrown uses the top 7 bits of hash to speed up searches, so you want those bits to have good entropy. If your keys don't exceed 1 << (32 - 7) (33 million), and you use an identity hash function, then the top 7 bits of every hash will be 0. So this hobbles part of what makes hashbrown fast.

There will probably be some use cases where the pattern of keys being used may fit well with this scheme (you'd hope so, that crate has 13 million downloads!). But in many cases, it won't - the cost of hashing with a cheap hash function like FxHash will be less than the cost of frequent collisions.

I see the repo for nohash_hasher was archived in 2022. Perhaps identity hash function made sense before std switched to using hashbrown as its underlying hash table implementation a few years ago, but now it's less advantageous. Perhaps - that's just a wild guess.

TLDR: I can't say for sure whether the pattern of keys here is suitable for nohash_hasher, but quite possibly not. I'd recommend at least testing FxHashMap and benchmarking the difference.

@DonIsaac DonIsaac closed this Sep 19, 2024
@Boshen Boshen deleted the don/09-15-feat_linter_support_persisted_rule_state branch October 14, 2024 03:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-linter Area - Linter
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants