AST Node ID #4188

Boshen · 2024-07-11T03:02:27Z

Blocking issues:

Related discussions:

Track AST node between passes backlog#30

Previous attempts:

Research:

TODO
rustc
other compilers

The text was updated successfully, but these errors were encountered:

overlookmotel · 2024-07-11T08:50:37Z

I'm planning to look into this question as part of review of semantic. I've suggested a solution in the meantime on rolldown/rolldown#1578 to unblock them.

rzvxa · 2024-07-12T17:43:56Z

Another use case for AST node ID. We can remove these methods and free up a few CPU cycles and some memory while building the semantics.

oxc/crates/oxc_semantic/src/builder.rs

Lines 263 to 276 in c65198f

    
           fn record_ast_nodes(&mut self) { 
        
               self.ast_nodes_records.push(Vec::new()); 
        
           } 
        
           fn retrieve_recorded_ast_nodes(&mut self) -> Vec<AstNodeId> { 
        
               self.ast_nodes_records.pop().expect("there is no ast nodes record to stop.") 
        
           } 
        
           fn record_ast_node(&mut self) { 
        
               if let Some(records) = self.ast_nodes_records.last_mut() { 
        
                   records.push(self.current_node_id); 
        
               } 
        
           }

Boshen · 2024-08-05T14:00:24Z

Rolldown needs to implement inject plugin https://esbuild.github.io/api/#inject, which requires an identifiable import statement to be inserted into the AST.

overlookmotel · 2024-08-05T17:55:48Z

I have suggested to them that they could use either:

Index of the statement in program.body.
Memory address of the ImportDeclaration.

fn addr_of_import(decl: &Box<ImportDeclaration>) -> usize {
    &**decl as *const _ as usize
}

Either of the above will work as a unique identifier, and neither depends on span.

I'm not saying that we shouldn't add AstNodeId to all AST types - personally I'm undecided at present. But just that it's not the only way to solve Rolldown's use case.

Boshen · 2024-08-06T00:40:44Z

I have suggested to them that they could use either:

Index of the statement in program.body.

Memory address of the ImportDeclaration.
fn addr_of_import(decl: &Box<ImportDeclaration>) -> usize {
    &**decl as *const _ as usize
}
Either of the above will work as a unique identifier, and neither depends on span.

I'm not saying that we shouldn't add AstNodeId to all AST types - personally I'm undecided at present. But just that it's not the only way to solve Rolldown's use case.

I may need to try it out myself at this point. I'll ask for a walk through to better understand the problem today.

Boshen · 2024-08-10T13:31:35Z

Let's decide on this one next week. I'm a little bit worn out on discussing about this but constantly coming up with hacks.

Boshen · 2024-08-12T10:43:02Z

I talked to @overlookmotel and here's what we agreed on:

To minimize code change and support Rolldown's use case as a start point, we should

add id: NodeId to all Statements (only structs not enums)
make ast builder stateful ... this is going to be hard because we made it copy and passed in all over the place *it needs to become a singleton attach to the allocator?) :-/
make rolldown work by following this PR: chore: showing usages of Span of module declarations rolldown/rolldown#1875)

@rzvxa It seems like this task is going to be on you again, feel free to ask questions to clear uncertainties.

overlookmotel · 2024-08-12T11:00:43Z

I'm going to think about implementation over the course of this week and come back with a proposal.

A useful first step would be to gather the use cases we have for Node IDs. Can anyone who is aware of such needs please comment below?

For other discussion, we have a thread on Discord: https://discord.com/channels/1079625926024900739/1272506817238405160

Boshen · 2024-08-12T14:31:12Z

AST node ids are super useful for caching data in a multi-pass compiler architecture.

A few points:

create indexes / cache / out-of-band storage https://rustc-dev-guide.rust-lang.org/hir.html#out-of-band-storage-and-the-crate-type
create a query system https://rustc-dev-guide.rust-lang.org/query.html
uniquely identify - A shorthand property ({ foo }) creates two nodes for foo with same span, we need to differentiate them

Concrete examples:

The type synthesizer prototype I wrote 2 years ago requires caching type for expressions
Our minifier will create out-of-band-storage for data in a previous ast pass and use it in a later pass

Dunqing · 2024-08-12T14:35:05Z

Example: #4767

DonIsaac · 2024-08-12T15:04:12Z

I talked to @overlookmotel and here's what we agreed on:

To minimize code change and support Rolldown's use case as a start point, we should

add id: NodeId to all Statements (only structs not enums)

make ast builder stateful ... this is going to be hard because we made it copy and passed in all over the place *it needs to become a singleton attach to the allocator?) :-/

make rolldown work by following this PR: chore: showing usages of Span of module declarations rolldown/rolldown#1875)

@rzvxa It seems like this task is going to be on you again, feel free to ask questions to clear uncertainties.

I attempted making AstBuilder stateful while we working on #4367. The idea was to record the number of nodes/symbols/scopes encountered while parsing. It is, in fact, very painful. I saw three possible implementations, and tried the first two of them:

Store an Rc<RefCell<Data>>. This is safe and doesn't require new lifetimes, but it makes it impossible to implement Copy and adds reference counting overhead. This becomes painful very quickly.
Store a reference to &'s RefCell<Data>. This removes Rc overhead, but introduces a new lifetime parameter which makes this an even more painful refactor.
Use ManuallyDrop<NonNull<UnsafeCell<Data>>> This uses unsafe rust, but it 1) doesn't add Rc overhead, 2) does not introduce any new lifetime parameters, and 3) preserves Copy. It should be sound since the parser is run synchronously and code mutating Data does so at the call site and never stores a mutable reference. However, it makes figuring out when to drop Data painful.

DonIsaac · 2024-08-12T15:11:46Z

I'm going to think about implementation over the course of this week and come back with a proposal.

A useful first step would be to gather the use cases we have for Node IDs. Can anyone who is aware of such needs please comment below?

For other discussion, we have a thread on Discord: https://discord.com/channels/1079625926024900739/1272506817238405160

Lint rules use AstNodeId to

Iterate up a node's parents, then match on kind
Generalize logic that does not require knowledge about what kind of node is being used.

I'm currently in favor of a single AstNodeId approach, but my opinion isn't tightly held. I'm mostly concerned insofar as those two use cases are satisfied.

As a thought, if we want to store statement/expression-specific information, we could discriminate using an enum:

// nodes.rs in oxc_semantic
enum SpecificNodeData<'a> {
  Statement(StatementNodeData<'a>),
  Expression(ExpressionNodeData<'a>)
}

struct AstNodes<'a> {
  // ...
  specific_data: IndexVec<AstNodeId, SpecificNodeData<'a>>
}

rzvxa · 2024-08-12T15:48:25Z

If the goal is to add it to all the types that have an AstKind variant. We can use it to eliminate all the existing Cell<Option<ID_TYPE>> fields in the AST. We can get those via SOA. And there is no longer a need for a dedicated AstNode type. We can get rid of it since now AstKind is all we need.

I propose we do the following:

// since we no longer need the struct that had the same name before
trait AstNode {
    fn ast_node_id(&self) -> AstNodeId;
}

impl From<AstNode> for AstNodeId {
    fn from(node: &AstNodeId) -> AstNodeId {
        node.ast_node_id()
    }
}

impl<'a> AstNode for Statement<'a> {
    fn ast_node_id(&self) -> AstNodeId {
        self.node_id
    }
}

impl<'a> AstNode for AstKind<'a> {
    fn ast_node_id(&self) -> AstNodeId {
        match self {
            // kinds => kind.ast_node_id(),
        }
    }
}

Then we can rewrite all the methods in the AstNodes SOA to be like this:

// from:
    #[inline]
    pub fn kind(&self, ast_node_id: AstNodeId) -> AstKind<'a> {
        self.nodes[ast_node_id].kind
    }

// to:
    #[inline]
    pub fn kind(&self, ast_node_id: impl Into<AstNodeId>) -> AstKind<'a> {
        self.kinds[ast_node_id.into()]
    }

On the subject of specific ast IDs, We can use something like this to provide access to both types.

// we put this in our structs
struct AstKindId<T> {
    id: AstNodeId,
    _marker: PhantomData<T>,
}

impl<T> AstNode for AstKindId<T> {
    fn ast_node_id(&self) -> AstNodeId {
        self.id
    }
}

Then we can use this generic type to do all sorts of type-safe stuff like this:

pub fn reference_id(&self, ident_ref_id: AstKindId<IdentifierReference>) -> ReferenceId
    // ...
}

I'm not saying this is better, Both approaches have their pros and cons, But having a conversion from "more specific -> less specific" is possible, So we aren't losing that much, especially if we use the AstNode trait which abstracts all of these details away.

I'm looking forward to seeing what are your thoughts on it.

overlookmotel · 2024-08-12T22:56:38Z

Thanks all for input into this.

@rzvxa Will come back on your proposals for APIs. But right now I'm still at the stage of trying to figure out what are all the use cases we want to support. I know you've brought some up previously - one related to CFG, if I remember right, but I think there were more. Could you please remind me?

In particular, are there any things we want to do which are currently impossible, or extremely painful/expensive without Node IDs?

overlookmotel · 2024-08-12T22:58:21Z

Lint rules use AstNodeId to

Iterate up a node's parents, then match on kind

Generalize logic that does not require knowledge about what kind of node is being used.

@DonIsaac Do I understand right that both these things are already possible with what we have now? Is there anything in linter which is currently blocked but having node IDs stored inline in the AST types would make possible?

overlookmotel · 2024-08-12T23:06:53Z

Just to throw one of my own ideas into the mix (which may or may not be a good idea):

If each AST type had node ID stored inline, Spans could be moved out of the AST into a separate side array indexed by the ID. It strikes me that Spans are pretty much only used for diagnostics (which is a cold path) and when compiling source maps, but otherwise they're almost never accessed - they just need to be maintained throughout transform process for purpose of source maps. Therefore, they're a good candidate to move out of the AST itself.

This would also allow us to add UTF-16 spans and/or line/column spans without bloating the AST (#959) - they'd also go in side arrays.

This may not be a good idea because it makes the AST less ergonomic to use (especially for those consuming Oxc as a library), but it does also have some positive effects.

DonIsaac · 2024-08-12T23:24:50Z

Lint rules use AstNodeId to

Iterate up a node's parents, then match on kind

Generalize logic that does not require knowledge about what kind of node is being used.

@DonIsaac Do I understand right that both these things are already possible with what we have now? Is there anything in linter which is currently blocked but having node IDs stored inline in the AST types would make possible?

Correct, these are all currently possible. However, I don't think they'd be possible with an ExpressionId/StatementId as we were talking about on discord.

The main thing that is not possible right now is walking back up the AST in contexts where an AST node or an ID is not available. It's possible to work around this, but it would be quite helpful to have one on hand in each node.

overlookmotel · 2024-08-12T23:27:17Z

Thanks for coming back Don. Can you give an example of where walking back up the AST from a node (ancestors) is useful, and what is the current workaround?

DonIsaac · 2024-08-13T01:32:58Z

For examples of iterating up the AST, see no_unused_vars/usage.rs. Since iteration occurs over AstNodes, we have access to an AstNodeId on each loop iteration.

For examples of where upwards traversal could be made cleaner with AstNodeIds in each struct, see no_useless_spread.rs. On my phone so I can't elaborate too much further on how/why, but you should find a redundant match on AstKind between the first check function and the second two.

rzvxa · 2024-08-13T10:02:09Z

In particular, are there any things we want to do which are currently impossible, or extremely painful/expensive without Node IDs?

Here's my take on this, I think the root of our issue is that we can't walk down the tree using AstNodeId. If we get to walk down while keeping access to the AstNodeIds of our descendants we can do anything we want including walking back up. I've tried to work around it in #3800 but that's just too much redundant information to make it fast.
So currently we all use hacky stuff to get over this limitation, I personally had to use CFG to access this information in the no_fallthrough rule which to be honest is painful to look at:

oxc/crates/oxc_linter/src/rules/eslint/no_fallthrough.rs

Lines 443 to 477 in a226962

    
           let (tests, exit) = graph 
        
               .edges_directed(node.cfg_id(), Direction::Outgoing) 
        
               .fold((Vec::new(), None), |(mut conds, exit), it| { 
        
                   let target = it.target(); 
        
                   if !matches!(it.weight(), EdgeType::Normal) { 
        
                       (conds, exit) 
        
                   } else if cfg 
        
                       .basic_block(target) 
        
                       .instructions() 
        
                       .iter() 
        
                       .any(|it| matches!(it.kind, InstructionKind::Condition)) 
        
                   { 
        
                       let is_empty = graph 
        
                           .edges_directed(target, Direction::Outgoing) 
        
                           .filter(|it| matches!(it.weight(), EdgeType::Jump)) 
        
                           .exactly_one() 
        
                           .ok() 
        
                           .and_then(|it| { 
        
                               cfg.basic_block(it.target()) 
        
                                   .instructions() 
        
                                   .first() 
        
                                   .and_then(|it| it.node_id) 
        
                                   .map(|id| ctx.nodes().parent_kind(id)) 
        
                                   .and_then(|it| match it { 
        
                                       Some(AstKind::SwitchCase(case)) => Some(case), 
        
                                       _ => None, 
        
                                   }) 
        
                           }) 
        
                           .is_some_and(|it| it.consequent.is_empty() || it.consequent.iter().exactly_one().is_ok_and(|it| matches!(it, Statement::BlockStatement(b) if b.body.is_empty()))); 
        
                       conds.push((target, is_empty)); 
        
                       (conds, exit) 
        
                   } else { 
        
                       (conds, Some(target)) 
        
                   } 
        
               });

There is also the need for Semantic::record_ast_node in our hot path which we can eliminate if we have access to the ast_node_id.

In my opinion, the most valuable thing is that we can now keep a bunch of secondary data all over the place - where we currently don't have access to AstNode - without paying for it upfront. Even if we have to use a hashmap, Hashing a number is much easier than a whole struct.

overlookmotel · 2024-08-13T17:06:20Z

The type synthesizer prototype I wrote 2 years ago requires caching type for expressions

@Boshen Can I just clarify: Is that really type for expressions, or is it type of bindings? i.e. could that info be in an array indexed by SymbolId?

overlookmotel · 2024-08-13T17:13:39Z

I attempted making AstBuilder stateful while we working on #4367. The idea was to record the number of nodes/symbols/scopes encountered while parsing.

I'm a bit confused why this is problematic in parser. All ParserImpl's methods have access to &mut ParserImpl and ParserImpl contains an AstBuilder. So can't methods access a &mut AstBuilder via self.ast without the need for RefCells and suchlike?

DonIsaac · 2024-08-15T14:18:02Z

I don't think so, but I'm not certain. AstBuilder is Copy, and there are parts of the code that rely on this behavior. Copying a mutable reference would violate rust borrow checker invariants. However, I never tested this; we could definitely try it.

overlookmotel · 2024-08-15T14:20:48Z

Yeah it was me that made AstBuilder Copy. It was a good idea at the time! But we can switch it back to how it was before and pass &mut AstBuilders around. As you say, a &mut cannot be copied, but it can be re-borrowed.

close: #3943 ## Further improvements There is a double visit here. We need to collect all react hooks calling in `Function` and `ArrowFunctionExpression`. If we want to remove this implementation, we may wait for #4188. https://github.com/oxc-project/oxc/blob/d797e595d286c613848b773c256bd43124ad1981/crates/oxc_transformer/src/react/refresh.rs#L744-L947 ## Tests All tests copy from https://github.com/facebook/react/blob/main/packages/react-refresh/src/__tests__/ReactFresh-test.js There are still 4 tests that have not been passed **1. refresh/can-handle-implicit-arrow-returns/input.jsx** Related to #4767. transform correct, just output doesn't match the expected output **2. refresh/registers-identifiers-used-in-jsx-at-definition-site/input.jsx** **3. refresh/registers-identifiers-used-in-react-create-element-at-definition-site/input.jsx** Blocked by #4746 **4. refresh/supports-typescript-namespace-syntax/input.tsx** oxc transforms ts to js first, so probably we can ignore this case. If we really want to pass this test, we also need to turn off `TypeScript` plugin. ## What's next? ### Options: 1. Support transform `refresh_reg` and `refresh_sig` options to `MemberExpression`. Currently `import.meta.xxxx` still is an `Identifier` 2. Support `emit_full_signatures` option ### Other NAPI, testing in `monitor-oxc`, etc..

overlookmotel · 2024-09-11T01:08:36Z

I have finally written up my thoughts: #5688

Side note: #5674 gets rid of Semantic::record_ast_node and demonstrates that we don't need node IDs stored on AST types for this use case. -1% perf regression on semantic benchmarks, but I'm not sure if that's spurious or not - you'd think it should be faster as it's less code. (The unsafe code could also be removed if we do oxc-project/backlog#109).

Boshen · 2024-09-11T07:38:11Z

Continue in #5689

Boshen added A-ast Area - AST A-transformer Area - Transformer / Transpiler labels Jul 11, 2024

Boshen added this to the Transformer Milestone 2 milestone Jul 11, 2024

Boshen self-assigned this Jul 11, 2024

Boshen mentioned this issue Jul 11, 2024

Track AST node between passes oxc-project/backlog#30

Closed

Boshen changed the title ~~AST Node ID~~ AST Node ID (Research Phase) Jul 11, 2024

This comment was marked as resolved.

Sign in to view

Dunqing mentioned this issue Aug 8, 2024

traverse: support for getting AstNode by AstNodeId #4746

Closed

Boshen removed this from the Transformer Milestone 2 milestone Aug 8, 2024

This was referenced Aug 9, 2024

traverse: insert new statements(If any) after calling each walk_statement in walk_statements #4767

Closed

feat(transformer): support react fast refresh #4587

Merged

Boshen added the P-high Priority - High label Aug 11, 2024

Boshen changed the title ~~AST Node ID (Research Phase)~~ AST Node ID Aug 12, 2024

rzvxa mentioned this issue Aug 13, 2024

refactor(ast)!: Change order of fields in CallExpression #4859

Merged

rzvxa added the blocker label Aug 13, 2024

Boshen assigned overlookmotel and unassigned Boshen Aug 23, 2024

Boshen added this to the Transformer Milestone 2 milestone Aug 29, 2024

overlookmotel mentioned this issue Sep 11, 2024

RFC: AST node IDs #5688

Closed

Boshen closed this as completed Sep 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AST Node ID #4188

AST Node ID #4188

Boshen commented Jul 11, 2024 •

edited

Loading

overlookmotel commented Jul 11, 2024

This comment was marked as resolved.

rzvxa commented Jul 12, 2024

Boshen commented Aug 5, 2024

overlookmotel commented Aug 5, 2024 •

edited

Loading

Boshen commented Aug 6, 2024 •

edited by overlookmotel

Loading

Boshen commented Aug 10, 2024

Boshen commented Aug 12, 2024

overlookmotel commented Aug 12, 2024

Boshen commented Aug 12, 2024

Dunqing commented Aug 12, 2024

DonIsaac commented Aug 12, 2024 •

edited

Loading

DonIsaac commented Aug 12, 2024

rzvxa commented Aug 12, 2024 •

edited

Loading

overlookmotel commented Aug 12, 2024

overlookmotel commented Aug 12, 2024 •

edited

Loading

overlookmotel commented Aug 12, 2024 •

edited

Loading

DonIsaac commented Aug 12, 2024

overlookmotel commented Aug 12, 2024

DonIsaac commented Aug 13, 2024

rzvxa commented Aug 13, 2024 •

edited

Loading

overlookmotel commented Aug 13, 2024

overlookmotel commented Aug 13, 2024 •

edited

Loading

DonIsaac commented Aug 15, 2024

overlookmotel commented Aug 15, 2024

overlookmotel commented Sep 11, 2024

Boshen commented Sep 11, 2024

AST Node ID #4188

AST Node ID #4188

Comments

Boshen commented Jul 11, 2024 • edited Loading

Blocking issues:

Related discussions:

Previous attempts:

Research:

overlookmotel commented Jul 11, 2024

This comment was marked as resolved.

rzvxa commented Jul 12, 2024

Boshen commented Aug 5, 2024

overlookmotel commented Aug 5, 2024 • edited Loading

Boshen commented Aug 6, 2024 • edited by overlookmotel Loading

Boshen commented Aug 10, 2024

Boshen commented Aug 12, 2024

overlookmotel commented Aug 12, 2024

Boshen commented Aug 12, 2024

Dunqing commented Aug 12, 2024

DonIsaac commented Aug 12, 2024 • edited Loading

DonIsaac commented Aug 12, 2024

rzvxa commented Aug 12, 2024 • edited Loading

overlookmotel commented Aug 12, 2024

overlookmotel commented Aug 12, 2024 • edited Loading

overlookmotel commented Aug 12, 2024 • edited Loading

DonIsaac commented Aug 12, 2024

overlookmotel commented Aug 12, 2024

DonIsaac commented Aug 13, 2024

rzvxa commented Aug 13, 2024 • edited Loading

overlookmotel commented Aug 13, 2024

overlookmotel commented Aug 13, 2024 • edited Loading

DonIsaac commented Aug 15, 2024

overlookmotel commented Aug 15, 2024

overlookmotel commented Sep 11, 2024

Boshen commented Sep 11, 2024

Boshen commented Jul 11, 2024 •

edited

Loading

overlookmotel commented Aug 5, 2024 •

edited

Loading

Boshen commented Aug 6, 2024 •

edited by overlookmotel

Loading

DonIsaac commented Aug 12, 2024 •

edited

Loading

rzvxa commented Aug 12, 2024 •

edited

Loading

overlookmotel commented Aug 12, 2024 •

edited

Loading

overlookmotel commented Aug 12, 2024 •

edited

Loading

rzvxa commented Aug 13, 2024 •

edited

Loading

overlookmotel commented Aug 13, 2024 •

edited

Loading