ast: Derive `Hash` and `PartialEq` on AST node types #5283

overlookmotel · 2024-07-16T07:34:04Z

Continuation from #3819

We should derive Hash and PartialEq on all AST node types.

Why?

The reasons we need to codegen them are (copied from #3819 (comment)):

`Hash`

We want Hash to skip span fields, and also symbol_id etc. Skipping Hash is currently achieved by a dummy Hash impl on Span type:

oxc/crates/oxc_span/src/span.rs

Lines 309 to 313 in 1b91d40

    
           impl Hash for Span { 
        
               fn hash<H: Hasher>(&self, _state: &mut H) { 
        
                   // hash to nothing so all ast spans can be comparable with hash 
        
               } 
        
           }

I saw this as a bit of a hack, and thought it would be more "proper" solution to implement Hash on all AST types with the span field explicitly skipped. That would be laborious and error-prone to do by hand, but much easier with codegen.

`PartialEq`

The reason why we are implementing Hash on AST types is for (if I remember right) comparing expressions in a lint rule to identify pointless if branches e.g. if (x) {} else if (x) { /* unreachable */ }. But actually Hash is not ideal for this use - PartialEq would be more efficient.

But it's hard to implement PartialEq because, again, we need to skip span fields, and the no-op hash hack isn't viable here, so we can't use #[derive(PartialEq)].

So I thought we could do that with codegen.

Use different traits?

Instead of implementing Hash and PartialEq, should we be more explicit about what these trait implementations do, and call them HashContent and PartialEqContent?

PartialEq could then also be implemented on AST nodes for when you want to == check if 2 AST nodes are the same including their spans. And when you compare 2 nodes without spans, you have to do node1.eq_content(&node2) so it's explicit that spans aren't included in the comparison.

I'm not sure if this is a good idea or not.

Other useful info

rzvxa provided some useful background on how Rust's internal derive macros work: #3819 (comment)

The text was updated successfully, but these errors were encountered:

Boshen · 2024-08-28T01:18:51Z

This stupid code is affecting Rolldown to use Span in hashmaps:

impl Hash for Span {
    fn hash<H: Hasher>(&self, _state: &mut H) {
        // hash to nothing so all ast spans can be comparable with hash
    }
}

Moving to the main repo as this is a real issue.

Boshen · 2024-08-29T09:20:20Z

Once this is done, we can replace this hash comparison of expressions to use PartialEq instead.

oxc/crates/oxc_linter/src/ast_util.rs

Lines 10 to 14 in 23e8456

    
           pub fn calculate_hash<T: Hash>(t: &T) -> u64 { 
        
               let mut hasher = FxHasher::default(); 
        
               t.hash(&mut hasher); 
        
               hasher.finish() 
        
           }

overlookmotel · 2024-08-29T10:03:29Z

@Boshen What do you think about my thought above under "Use different traits?" section? I think it's confusing to implement Hash and PartialEq on types and for them to skip hashing/comparing some of the fields - that's not what those traits usually do.

I suggest:

Create new trait PartialEqContent which is like PartialEq but skips Span, ScopeId, SymbolId and ReferenceId.
Implement PartialEqContent on all AST types.
Replace calculate_hash with code that uses PartialEqContent to compare.
Remove the #[derive(Hash)] and custom Hash imps from all AST types (probably we don't need Hash impls any more, if calculate_hash is only current user of it).
#[derive(Hash)] normally on Span.

If we want to, later on we could also implement HashContent, but right now it's not needed, so we can skip it for now.

Boshen · 2024-08-29T10:21:50Z

👍 for removing confusing Hash.

PartialEq means partial equal right? But either choice is fine.

overlookmotel · 2024-08-30T16:03:21Z

I find "partial" confusing, but it seems to mean a different kind of "partial" from "these 2 values are equal considering only part of them".

From PartialEq docs:

This trait allows for comparisons using the equality operator, for types that do not have a full equivalence relation. For example, in floating point numbers NaN != NaN, so floating point types implement PartialEq but not Eq. Formally speaking, when Rhs == Self, this trait corresponds to a partial equivalence relation.

I take this as meaning that PartialEq to all practical purposes means "equal", except from some weird edge cases.

rzvxa · 2024-09-03T13:03:10Z

I gave the definition of Partial equivalence relation a read, It states

In mathematics, a partial equivalence relation (often abbreviated as PER, in older literature also called restricted equivalence relation[1]) is a homogeneous binary relation that is symmetric and transitive. If the relation is also reflexive, then the relation is an equivalence relation.

Given that our equivalence relation between AST nodes is both symmetric and transitive I wonder maybe PartialEq is the right trait to describe it after all. Considering that 2 AST nodes are partially equal with respect to these 2 relations we are not that far off the mark with the actual definition.

My math is pretty rusty but here's what I understand.

Here are the properties of a "PER"

A. $\displaystyle R$ is an equivalence relation on the subset $\displaystyle Y=\{x\in X\mid x\,R\,x\}\subseteq X$.
B. difunctional: the relation is the set $\displaystyle \{(a,b)\mid fa=gb\}$ for two partial functions $\displaystyle f,g:X\rightharpoonup Y$ and some indicator set $\displaystyle Y$
C. right and left Euclidean: For $\displaystyle a,b,c\in X$, $\displaystyle aRb$ and $\displaystyle aRc$ implies $\displaystyle bRc$ and similarly for left Euclideanness $\displaystyle bRa$ and $\displaystyle cRa$ imply $\displaystyle bRc$
D. quasi-reflexive: If $\displaystyle x,y\in X$ and $\displaystyle xRy$, then $\displaystyle xRx$ and $\displaystyle yRy$.

Here's my reasoning These aren't proof or anything.

A. For a set of AST nodes there is always a $\displaystyle Y$ that is subset and equal to the original AST nodes set and every element has this partial relationship to itself.
B. If we have 2 functions f and g and they result in the same output it doesn't matter which one we use for mapping a collection to another
C. If we have 3 AST nodes that are elements of $\displaystyle X$ and a PartialEq b, a PartialEq c then b PartialEq c,b PartialEq a and b PartialEq c
D. If x PartialEq y then x PartialEq x and y PartialEq y

With that in mind, I feel we have both requirements of being a Partial Equivalence Relation which are:

if $\displaystyle aRb$, then $\displaystyle bRa$ (symmetry)
if $\displaystyle aRb$ and $\displaystyle bRc$, then $\displaystyle aRc$ (transitivity)

You may argue that in most of our use cases, we are reflexive as well but that might be stretching the definitions a bit too far.

Boshen · 2024-09-03T13:24:05Z

😵

My take on this is what IdentifierA == IdentifierB means to the user, is it intuitive enough to think they are comparing values only.

rzvxa · 2024-09-03T13:38:23Z

Give it some thought and let me know which direction you want this to go. Implementing it using our new oxc_ast_tools::Derive trait should be easy enough either way. The OP was written before generate_derive which now resolves this concern with #[generate_derive(PartialEq)].

But it's hard to implement PartialEq because, again, we need to skip span fields, and the no-op hash hack isn't viable here, so we can't use #[derive(PartialEq)].

If we want to go with PartialEq we can do it like this as well:

/// 2 AST nodes are partially equal even when they don't share the same code span.
trait AstEq<T>: PartialEq<T> { }

This would work as a self-documenting super trait so we can describe our custom derive behavior with explicit docs comments and a custom name so we can use it in the nominal contexts if we ever want to not use the PartialEq trait in our generic parameters.

Boshen · 2024-09-03T13:42:27Z

Shall we go with PartialEqContent and HashContent to remove confusion and ambiguity?

rzvxa · 2024-09-03T13:48:15Z

Shall we go with PartialEqContent and HashContent to remove confusion and ambiguity?

So It'd be something like this for comparison?

if a.content_eq(b) {
    // ....
}
// -------------
if !a.content_ne(b) {
    // ...
}

Or is it still a super trait of PartialEq to provide the == operator?

Boshen · 2024-09-03T14:04:58Z

Shall we go for

if a.content_eq(b) {
    // ....
}
// -------------
if !a.content_ne(b) {
    // ...
}

I had nightmares when I worked with with operator overloading in C++ 😁

overlookmotel · 2024-09-03T17:38:53Z

A. $\displaystyle R$ is an equivalence relation on the subset $\displaystyle Y=\{x\in X\mid x\,R\,x\}\subseteq X$.
B. difunctional: the relation is the set $\displaystyle \{(a,b)\mid fa=gb\}$ for two partial functions $\displaystyle f,g:X\rightharpoonup Y$ and some indicator set $\displaystyle Y$
C. right and left Euclidean: For $\displaystyle a,b,c\in X$, $\displaystyle aRb$ and $\displaystyle aRc$ implies $\displaystyle bRc$ and similarly for left Euclideanness $\displaystyle bRa$ and $\displaystyle cRa$ imply $\displaystyle bRc$
D. quasi-reflexive: If $\displaystyle x,y\in X$ and $\displaystyle xRy$, then $\displaystyle xRx$ and $\displaystyle yRy$.

Holy flipping christ! Well I can't argue with that 🥲

Concerning the super trait:

If we make ContentEq a super-trait of PartialEq, then node1 == node2 would return false for two NullLiterals with different spans, right? And node1.content_eq(node2) would return true?

If that's correct, I suggest that we don't implement PartialEq on AST types, as then it makes it too easy to do node1 == node2 when what you really mean to do is use content_eq.

Obviously x == y is much nicer syntax, but the reason I suggested not using PartialEq is that (speaking in "common sense" terms, not formal logic), we all understand what == usually means, so it'd be potentially surprising/confusing if there's a "but it's not quite like that" in this case.

Incidentally, I agree with your choice of ContentEq for the trait name, rather than PartialContentEq. If I've understood all this correctly, formally speaking, the relation between AST nodes we're talking about is a full equivalence relation (so like Eq) rather than partial equivalence (PartialEq). So we get a simpler, shorter name, and get to be formally correct. Bonza!

rzvxa · 2024-09-03T23:13:20Z

How are things like this working when our span hash implementation is noop?

oxc/tasks/rulegen/src/main.rs

Line 395 in c984219

    
           if let Some(group_comment) = self.expression_to_group_comment_map.get(&arg.span()) {

rzvxa · 2024-09-03T23:14:52Z

If that's correct, I suggest that we don't implement PartialEq on AST types, as then it makes it too easy to do node1 == node2 when what you really mean to do is use content_eq.

I agree, We should avoid implementing PartialEq where it doesn't make sense.

Boshen · 2024-09-04T00:52:46Z

How are things like this working when our span hash implementation is noop?

oxc/tasks/rulegen/src/main.rs

Line 395 in c984219

if let Some(group_comment) = self.expression_to_group_comment_map.get(&arg.span()) {

It's stored as an array in the hashmap, it'll still use PartialEq from Span 😅

We have a few cases in Rolldown as well.

rzvxa · 2024-09-04T01:04:25Z

It's stored as an array in the hashmap, it'll still use PartialEq from Span 😅

I suspected that it was working because of PartialEq

We have a few cases in Rolldown as well.

Right now I'm trying to remove unnecessary hash implementations in the AST, The final result should implement a normal hash trait for the Span type.

Part of #5283

… from `Span` (#5451) closes #5283 Also removes the noop Hash implementation on `Span` in favor of a real implementation.

rzvxa · 2024-09-05T16:44:18Z

@Boshen Didn't Graphite implement closing issues after their respective PR is merged?

Boshen · 2024-09-05T16:46:38Z

@Boshen Didn't Graphite implement closing issues after their respective PR is merged?

Nope, it's something else ... they close referenced PRs from PRs 🤷

rzvxa · 2024-09-05T16:54:50Z

😆 I thought that was a bug since they've added this feature recently. It doesn't make sense to close a PR because a comment from it is mentioned with the word "fixes".

Boshen transferred this issue from oxc-project/backlog Aug 28, 2024

Boshen changed the title ~~Derive Hash and PartialEq on AST node types~~ ast: Derive Hash and PartialEq on AST node types Aug 28, 2024

Boshen assigned rzvxa Aug 29, 2024

rzvxa mentioned this issue Sep 3, 2024

feat(ast): add ContentEq trait. #5427

Merged

Boshen pushed a commit that referenced this issue Sep 4, 2024

feat(ast): add ContentEq trait. (#5427)

23285f4

Part of #5283

rzvxa mentioned this issue Sep 4, 2024

feat(ast): add ContentHash trait; remove noop Hash implementation from Span #5451

Merged

Boshen pushed a commit that referenced this issue Sep 5, 2024

feat(ast): add ContentHash trait; remove noop Hash implementation…

b2ec6e9

… from `Span` (#5451) closes #5283 Also removes the noop Hash implementation on `Span` in favor of a real implementation.

rzvxa added a commit that referenced this issue Sep 5, 2024

feat(ast): add ContentHash trait; remove noop Hash implementation…

90facd3

… from `Span` (#5451) closes #5283 Also removes the noop Hash implementation on `Span` in favor of a real implementation.

Dunqing mentioned this issue Sep 5, 2024

feat(ast)!: add ThisExpression variants to JSXElementName and JSXMemberExpressionObject (#5466) #5471

Closed

rzvxa closed this as completed Sep 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ast: Derive `Hash` and `PartialEq` on AST node types #5283

ast: Derive `Hash` and `PartialEq` on AST node types #5283

overlookmotel commented Jul 16, 2024 •

edited

Loading

Boshen commented Aug 28, 2024

Boshen commented Aug 29, 2024

overlookmotel commented Aug 29, 2024 •

edited

Loading

Boshen commented Aug 29, 2024

overlookmotel commented Aug 30, 2024

rzvxa commented Sep 3, 2024 •

edited

Loading

Boshen commented Sep 3, 2024

rzvxa commented Sep 3, 2024

Boshen commented Sep 3, 2024

rzvxa commented Sep 3, 2024

Boshen commented Sep 3, 2024

overlookmotel commented Sep 3, 2024

rzvxa commented Sep 3, 2024

rzvxa commented Sep 3, 2024

Boshen commented Sep 4, 2024 •

edited

Loading

rzvxa commented Sep 4, 2024

rzvxa commented Sep 5, 2024

Boshen commented Sep 5, 2024

rzvxa commented Sep 5, 2024

ast: Derive Hash and PartialEq on AST node types #5283

ast: Derive Hash and PartialEq on AST node types #5283

Comments

overlookmotel commented Jul 16, 2024 • edited Loading

Why?

Hash

PartialEq

Use different traits?

Other useful info

Boshen commented Aug 28, 2024

Boshen commented Aug 29, 2024

overlookmotel commented Aug 29, 2024 • edited Loading

Boshen commented Aug 29, 2024

overlookmotel commented Aug 30, 2024

rzvxa commented Sep 3, 2024 • edited Loading

Boshen commented Sep 3, 2024

rzvxa commented Sep 3, 2024

Boshen commented Sep 3, 2024

rzvxa commented Sep 3, 2024

Boshen commented Sep 3, 2024

overlookmotel commented Sep 3, 2024

rzvxa commented Sep 3, 2024

rzvxa commented Sep 3, 2024

Boshen commented Sep 4, 2024 • edited Loading

rzvxa commented Sep 4, 2024

rzvxa commented Sep 5, 2024

Boshen commented Sep 5, 2024

rzvxa commented Sep 5, 2024

ast: Derive `Hash` and `PartialEq` on AST node types #5283

ast: Derive `Hash` and `PartialEq` on AST node types #5283

overlookmotel commented Jul 16, 2024 •

edited

Loading

`Hash`

`PartialEq`

overlookmotel commented Aug 29, 2024 •

edited

Loading

rzvxa commented Sep 3, 2024 •

edited

Loading

Boshen commented Sep 4, 2024 •

edited

Loading