-
Notifications
You must be signed in to change notification settings - Fork 46.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[compiler] Stay in SSA form through entire pipeline #30573
[compiler] Stay in SSA form through entire pipeline #30573
Conversation
[ghstack-poisoned]
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
ghstack-source-id: b3c84364193cc58d405edda235252336d0aefab5 Pull Request resolved: #30573
[ghstack-poisoned]
ghstack-source-id: d15a711d164cac86e5c80148c759b259ce45a810 Pull Request resolved: #30573
[ghstack-poisoned]
ghstack-source-id: b6ec82a461d95283dea3b040e81f100b29c7cc4e Pull Request resolved: #30573
[ghstack-poisoned]
ghstack-source-id: 9962038ebeeb650d5a07146308c9ee1cfb26853b Pull Request resolved: #30573
[ghstack-poisoned]
ghstack-source-id: 7197375fff0345a5b4d2520ef863488bb1dc1d15 Pull Request resolved: #30573
[ghstack-poisoned]
ghstack-source-id: 1b653587414b4103f33f415ab6574e8dbedd90b1 Pull Request resolved: #30573
[ghstack-poisoned]
ghstack-source-id: 27d08cc1ee0cd4459078e25415da0c54518759ee Pull Request resolved: #30573
[ghstack-poisoned]
ghstack-source-id: e54a9d44a89a11cd9d65939da17222d85be05712 Pull Request resolved: #30573
[ghstack-poisoned]
ghstack-source-id: 1b4da810d5086a6b90d43effcfedb2ecbb15e573 Pull Request resolved: #30573
Nice find, i'll add a fixture for that and fix |
This PR updates to use SSA form through the entire compilation pipeline. This means that in both HIR form and ReactiveFunction form, `Identifier` instances map 1:1 to `IdentifierId` values. If two identifiers have the same IdentifierId, they are the same instance. What this means is that all our passes can use this more precise information to determine if two particular identifiers are not just the same variable, but the same SSA "version" of that variable. However, some parts of our analysis really care about program variables as opposed to SSA versions, and were relying on LeaveSSA to reset identifiers such that all Identifier instances for a particular program variable would have the same IdentifierId (though not necessarily the same Identifier instance). With LeaveSSA removed, those analysis passes can now use DeclarationId instead to uniquely identify a program variable. Note that this PR surfaces some opportunties to improve edge-cases around reassigned values being declared/reassigned/depended-upon across multiple scopes. Several passes could/should use IdentifierId to more precisely identify exactly which values are accessed - for example, a scope that reassigns `x` but doesn't use `x` prior to reassignment doesn't have to take a dependency on `x`. But today we take a dependnecy. My approach for these cases was to add a "TODO LeaveSSA" comment with notes and the name of the fixture demonstrating the difference, but to intentionally preserve the existing behavior (generally, switching to use DeclarationId when IdentifierId would have been more precise). Beyond updating passes to use DeclarationId instead of Identifier/IdentifierId, the other change here is to extract out the remaining necessary bits of LeaveSSA into a new pass that rewrites InstructionKind (const/let/reassign/etc) based on whether a value is actually const or has reassignments and should be let. [ghstack-poisoned]
This PR updates to use SSA form through the entire compilation pipeline. This means that in both HIR form and ReactiveFunction form, `Identifier` instances map 1:1 to `IdentifierId` values. If two identifiers have the same IdentifierId, they are the same instance. What this means is that all our passes can use this more precise information to determine if two particular identifiers are not just the same variable, but the same SSA "version" of that variable. However, some parts of our analysis really care about program variables as opposed to SSA versions, and were relying on LeaveSSA to reset identifiers such that all Identifier instances for a particular program variable would have the same IdentifierId (though not necessarily the same Identifier instance). With LeaveSSA removed, those analysis passes can now use DeclarationId instead to uniquely identify a program variable. Note that this PR surfaces some opportunties to improve edge-cases around reassigned values being declared/reassigned/depended-upon across multiple scopes. Several passes could/should use IdentifierId to more precisely identify exactly which values are accessed - for example, a scope that reassigns `x` but doesn't use `x` prior to reassignment doesn't have to take a dependency on `x`. But today we take a dependnecy. My approach for these cases was to add a "TODO LeaveSSA" comment with notes and the name of the fixture demonstrating the difference, but to intentionally preserve the existing behavior (generally, switching to use DeclarationId when IdentifierId would have been more precise). Beyond updating passes to use DeclarationId instead of Identifier/IdentifierId, the other change here is to extract out the remaining necessary bits of LeaveSSA into a new pass that rewrites InstructionKind (const/let/reassign/etc) based on whether a value is actually const or has reassignments and should be let. ghstack-source-id: 4b9b1d5f83b4421f6a49ed3ccf5c46da01cfa5f6 Pull Request resolved: #30573
Thanks @mofeiZ for saving me some time debugging, that was it! I updated that pass to use a new helper which only extends the range if the new range has a non-zero start |
This PR updates to use SSA form through the entire compilation pipeline. This means that in both HIR form and ReactiveFunction form, `Identifier` instances map 1:1 to `IdentifierId` values. If two identifiers have the same IdentifierId, they are the same instance. What this means is that all our passes can use this more precise information to determine if two particular identifiers are not just the same variable, but the same SSA "version" of that variable. However, some parts of our analysis really care about program variables as opposed to SSA versions, and were relying on LeaveSSA to reset identifiers such that all Identifier instances for a particular program variable would have the same IdentifierId (though not necessarily the same Identifier instance). With LeaveSSA removed, those analysis passes can now use DeclarationId instead to uniquely identify a program variable. Note that this PR surfaces some opportunties to improve edge-cases around reassigned values being declared/reassigned/depended-upon across multiple scopes. Several passes could/should use IdentifierId to more precisely identify exactly which values are accessed - for example, a scope that reassigns `x` but doesn't use `x` prior to reassignment doesn't have to take a dependency on `x`. But today we take a dependnecy. My approach for these cases was to add a "TODO LeaveSSA" comment with notes and the name of the fixture demonstrating the difference, but to intentionally preserve the existing behavior (generally, switching to use DeclarationId when IdentifierId would have been more precise). Beyond updating passes to use DeclarationId instead of Identifier/IdentifierId, the other change here is to extract out the remaining necessary bits of LeaveSSA into a new pass that rewrites InstructionKind (const/let/reassign/etc) based on whether a value is actually const or has reassignments and should be let. [ghstack-poisoned]
This PR updates to use SSA form through the entire compilation pipeline. This means that in both HIR form and ReactiveFunction form, `Identifier` instances map 1:1 to `IdentifierId` values. If two identifiers have the same IdentifierId, they are the same instance. What this means is that all our passes can use this more precise information to determine if two particular identifiers are not just the same variable, but the same SSA "version" of that variable. However, some parts of our analysis really care about program variables as opposed to SSA versions, and were relying on LeaveSSA to reset identifiers such that all Identifier instances for a particular program variable would have the same IdentifierId (though not necessarily the same Identifier instance). With LeaveSSA removed, those analysis passes can now use DeclarationId instead to uniquely identify a program variable. Note that this PR surfaces some opportunties to improve edge-cases around reassigned values being declared/reassigned/depended-upon across multiple scopes. Several passes could/should use IdentifierId to more precisely identify exactly which values are accessed - for example, a scope that reassigns `x` but doesn't use `x` prior to reassignment doesn't have to take a dependency on `x`. But today we take a dependnecy. My approach for these cases was to add a "TODO LeaveSSA" comment with notes and the name of the fixture demonstrating the difference, but to intentionally preserve the existing behavior (generally, switching to use DeclarationId when IdentifierId would have been more precise). Beyond updating passes to use DeclarationId instead of Identifier/IdentifierId, the other change here is to extract out the remaining necessary bits of LeaveSSA into a new pass that rewrites InstructionKind (const/let/reassign/etc) based on whether a value is actually const or has reassignments and should be let. ghstack-source-id: 69afdaee5fadf3fdc98ce97549da805f288218b4 Pull Request resolved: #30573
Latest diff is P1515916566, looks like the fbt issues are fixed, i went through the majority of the remaining differences manually. They are nearly all due to ternaries or optionals. There were a couple where deps changed and i debugged them and confirmed that the previous results were unnecessarily including a dependency that wasn't actually reactive. |
This PR updates to use SSA form through the entire compilation pipeline. This means that in both HIR form and ReactiveFunction form, `Identifier` instances map 1:1 to `IdentifierId` values. If two identifiers have the same IdentifierId, they are the same instance. What this means is that all our passes can use this more precise information to determine if two particular identifiers are not just the same variable, but the same SSA "version" of that variable. However, some parts of our analysis really care about program variables as opposed to SSA versions, and were relying on LeaveSSA to reset identifiers such that all Identifier instances for a particular program variable would have the same IdentifierId (though not necessarily the same Identifier instance). With LeaveSSA removed, those analysis passes can now use DeclarationId instead to uniquely identify a program variable. Note that this PR surfaces some opportunties to improve edge-cases around reassigned values being declared/reassigned/depended-upon across multiple scopes. Several passes could/should use IdentifierId to more precisely identify exactly which values are accessed - for example, a scope that reassigns `x` but doesn't use `x` prior to reassignment doesn't have to take a dependency on `x`. But today we take a dependnecy. My approach for these cases was to add a "TODO LeaveSSA" comment with notes and the name of the fixture demonstrating the difference, but to intentionally preserve the existing behavior (generally, switching to use DeclarationId when IdentifierId would have been more precise). Beyond updating passes to use DeclarationId instead of Identifier/IdentifierId, the other change here is to extract out the remaining necessary bits of LeaveSSA into a new pass that rewrites InstructionKind (const/let/reassign/etc) based on whether a value is actually const or has reassignments and should be let. ghstack-source-id: 69afdaee5fadf3fdc98ce97549da805f288218b4 Pull Request resolved: #30573
I forgot to clean this up before landing #30573. [ghstack-poisoned]
Resubmission of #30079 -- core logic unchanged, but needed to rebase past #30573 ### Quick background #### Temporaries The compiler currently treats temporaries and named variables (e.g. `x`) differently in this pass. - named variables may be reassigned (in fact, since we're running after LeaveSSA, a single named identifier's IdentifierId may map to multiple `Identifier` instances -- each with its own scope and mutable range) - temporaries are replaced with their represented expressions during codegen. This is correct (mostly correct, see #29878) as we're careful to always lower the correct evaluation semantics. However, since we rewrite reactive scopes entirely (to if/else blocks), we need to track temporaries that a scope produces in `ReactiveScope.declarations` and later promote them to named variables. In the same example, $4, $5, and $6 need to be promoted: $2 ->`t0`, $5 ->`t1`, and $6 ->`t2`. ```js [1] $2 = LoadGlobal(global) foo [2] $3 = LoadLocal bar$1 scope 0: [3] $4 = Call $2(<unknown> $3) scope 1: [4] $5 = Object { } scope 2: [5] $6 = Object { a: $4, b: $5 } [6] $8 = StoreLocal Const x$7 = $6 ``` #### Dependencies `ReactiveScope.dependencies` records the set of (read-only) values that a reactive scope is dependent on. This is currently limited to just variables (named variables from source and promoted temporaries) and property-loads. All dependencies we record need to be hoistable -- i.e. reordered to just before the ReactiveScope begins. Not all PropertyLoads are hoistable. In this example, we should not evaluate `obj.a.b` without before creating x and checking `objIsNull`. ```js // reduce-reactive-deps/no-uncond.js function useFoo({ obj, objIsNull }) { const x = []; if (isFalse(objIsNull)) { x.push(obj.a.b); } return x; } ``` While other memoization strategies with different constraints exist, the current compiler requires that `ReactiveScope.dependencies` be re-orderable to the beginning of the reactive scope. But.. `PropertyLoad`s from null values will throw `TypeError`. This means that evaluating hoisted dependencies should throw if and only if the source program throws. (It is also a bug if source throws and compiler output does not throw. See facebook/react-forget#2709) --- ### Rough high level overview 1. Pass 1 Walk over instructions to gather every temporary used outside of its defining scope (same as ReactiveFunction version). These determine the sidemaps we produce, as temporaries used outside of their declaring scopes get promoted to named variables later (and are not considered hoistable rvals). 2. Pass 2 (collectHoistablePropertyLoads) Walk over instructions to generate sidemaps: a. temporary identifier -> named variable and property path (e.g. `$3 -> {obj: props, path: ["a", "b"]}`) b. block -> accessed variables and properties (e.g. `bb0 -> [ {obj: props, path: ["a", "b"]} ]`) c. Walk over control flow graph to understand the set of object and property paths that can be read by each basic block. This analysis: - relies on post-dominator trees - traverses the CFG from entry (producing the set of variables/paths unconditionally evaluated *before* a block). - traverses the CFG from exit (producing the set of variables/paths unconditionally evaluated *after* a block). 4. Pass 3: (collectDependencies) Walks over instructions again to record dependencies and declarations, using the previously produced sidemaps. We do not record any control-flow here 5. Merge every scope's recorded dependencies with the set of hoistable PropertyLoads Tested by syncing internally and (1) checking compilation output differences ([internal link](https://www.internalfb.com/intern/everpaste/?handle=GPCfUBt_HCoy_S4EAJDVFJyJJMR0bsIXAAAB)), running internally e2e tests ([internal link](https://fburl.com/sandcastle/cs5mlkxq)) --- ### Followups: 1. Rewrite function expression deps This change produces much more optimal output as the compiler now uses the function CFG to understand which variables / paths are assumed to be non-null. However, it may exacerbate [this function-expr hoisting bug](https://github.com/facebook/react/blob/main/compiler/packages/babel-plugin-react-compiler/src/__tests__/fixtures/compiler/bug-invalid-hoisting-functionexpr.ts). A short term fix here is to simply call some form of `collectNonNullObjects` on every function expression to find hoistable variable / paths. In the longer term, we should refactor out `FunctionExpression.deps`. 2. Enable optional paths (a) don't count optional load temporaries as dependencies (e.g. `collectOptionalLoadRValues(...)`). (b) record optional paths in both collectHoistablePropertyLoads and dependency collection ghstack-source-id: 51e0759c68c931c19fee02665bd37a7274a45572 Pull Request resolved: #30894
Resubmission of #30079 -- core logic unchanged, but needed to rebase past #30573 ### Quick background #### Temporaries The compiler currently treats temporaries and named variables (e.g. `x`) differently in this pass. - named variables may be reassigned (in fact, since we're running after LeaveSSA, a single named identifier's IdentifierId may map to multiple `Identifier` instances -- each with its own scope and mutable range) - temporaries are replaced with their represented expressions during codegen. This is correct (mostly correct, see #29878) as we're careful to always lower the correct evaluation semantics. However, since we rewrite reactive scopes entirely (to if/else blocks), we need to track temporaries that a scope produces in `ReactiveScope.declarations` and later promote them to named variables. In the same example, $4, $5, and $6 need to be promoted: $2 ->`t0`, $5 ->`t1`, and $6 ->`t2`. ```js [1] $2 = LoadGlobal(global) foo [2] $3 = LoadLocal bar$1 scope 0: [3] $4 = Call $2(<unknown> $3) scope 1: [4] $5 = Object { } scope 2: [5] $6 = Object { a: $4, b: $5 } [6] $8 = StoreLocal Const x$7 = $6 ``` #### Dependencies `ReactiveScope.dependencies` records the set of (read-only) values that a reactive scope is dependent on. This is currently limited to just variables (named variables from source and promoted temporaries) and property-loads. All dependencies we record need to be hoistable -- i.e. reordered to just before the ReactiveScope begins. Not all PropertyLoads are hoistable. In this example, we should not evaluate `obj.a.b` without before creating x and checking `objIsNull`. ```js // reduce-reactive-deps/no-uncond.js function useFoo({ obj, objIsNull }) { const x = []; if (isFalse(objIsNull)) { x.push(obj.a.b); } return x; } ``` While other memoization strategies with different constraints exist, the current compiler requires that `ReactiveScope.dependencies` be re-orderable to the beginning of the reactive scope. But.. `PropertyLoad`s from null values will throw `TypeError`. This means that evaluating hoisted dependencies should throw if and only if the source program throws. (It is also a bug if source throws and compiler output does not throw. See facebook/react-forget#2709) --- ### Rough high level overview 1. Pass 1 Walk over instructions to gather every temporary used outside of its defining scope (same as ReactiveFunction version). These determine the sidemaps we produce, as temporaries used outside of their declaring scopes get promoted to named variables later (and are not considered hoistable rvals). 2. Pass 2 (collectTemporariesSidemap) Walk over instructions to generate a sidemap of temporary identifier -> named variable and property path (e.g. `$3 -> {obj: props, path: ["a", "b"]}`) 2. Pass 2 (collectHoistablePropertyLoads) a. Build a sidemap of block -> accessed variables and properties (e.g. `bb0 -> [ {obj: props, path: ["a", "b"]} ]`) b. Propagate "non-nullness" i.e. variables and properties for which we can safely evaluate `PropertyLoad`. A basic block can unconditionally read from identifier X if any of the following applies: - the block itself reads from identifier X - all predecessors of the block read from identifier X - all successors of the block read from identifier X 4. Pass 3: (collectDependencies) Walks over instructions again to record dependencies and declarations, using the previously produced sidemaps. We do not record any control-flow here 5. Merge every scope's recorded dependencies with the set of hoistable PropertyLoads Tested by syncing internally and (1) checking compilation output differences ([internal link](https://www.internalfb.com/intern/everpaste/?handle=GPCfUBt_HCoy_S4EAJDVFJyJJMR0bsIXAAAB)), running internally e2e tests ([internal link](https://fburl.com/sandcastle/cs5mlkxq)) --- ### Followups: 1. Rewrite function expression deps This change produces much more optimal output as the compiler now uses the function CFG to understand which variables / paths are assumed to be non-null. However, it may exacerbate [this function-expr hoisting bug](https://github.com/facebook/react/blob/main/compiler/packages/babel-plugin-react-compiler/src/__tests__/fixtures/compiler/bug-invalid-hoisting-functionexpr.tsx). A short term fix here is to simply call some form of `collectNonNullObjects` on every function expression to find hoistable variable / paths. In the longer term, we should refactor out `FunctionExpression.deps`. 2. Enable optional paths (a) don't count optional load temporaries as dependencies (e.g. `collectOptionalLoadRValues(...)`). (b) record optional paths in both collectHoistablePropertyLoads and dependency collection ghstack-source-id: 470ab175a901772567139bd0bb93b3e3c71c9dc7 Pull Request resolved: #30894
Resubmission of #30079 -- core logic unchanged, but needed to rebase past #30573 ### Quick background #### Temporaries The compiler currently treats temporaries and named variables (e.g. `x`) differently in this pass. - named variables may be reassigned (in fact, since we're running after LeaveSSA, a single named identifier's IdentifierId may map to multiple `Identifier` instances -- each with its own scope and mutable range) - temporaries are replaced with their represented expressions during codegen. This is correct (mostly correct, see #29878) as we're careful to always lower the correct evaluation semantics. However, since we rewrite reactive scopes entirely (to if/else blocks), we need to track temporaries that a scope produces in `ReactiveScope.declarations` and later promote them to named variables. In the same example, $4, $5, and $6 need to be promoted: $2 ->`t0`, $5 ->`t1`, and $6 ->`t2`. ```js [1] $2 = LoadGlobal(global) foo [2] $3 = LoadLocal bar$1 scope 0: [3] $4 = Call $2(<unknown> $3) scope 1: [4] $5 = Object { } scope 2: [5] $6 = Object { a: $4, b: $5 } [6] $8 = StoreLocal Const x$7 = $6 ``` #### Dependencies `ReactiveScope.dependencies` records the set of (read-only) values that a reactive scope is dependent on. This is currently limited to just variables (named variables from source and promoted temporaries) and property-loads. All dependencies we record need to be hoistable -- i.e. reordered to just before the ReactiveScope begins. Not all PropertyLoads are hoistable. In this example, we should not evaluate `obj.a.b` without before creating x and checking `objIsNull`. ```js // reduce-reactive-deps/no-uncond.js function useFoo({ obj, objIsNull }) { const x = []; if (isFalse(objIsNull)) { x.push(obj.a.b); } return x; } ``` While other memoization strategies with different constraints exist, the current compiler requires that `ReactiveScope.dependencies` be re-orderable to the beginning of the reactive scope. But.. `PropertyLoad`s from null values will throw `TypeError`. This means that evaluating hoisted dependencies should throw if and only if the source program throws. (It is also a bug if source throws and compiler output does not throw. See facebook/react-forget#2709) --- ### Rough high level overview 1. Pass 1 Walk over instructions to gather every temporary used outside of its defining scope (same as ReactiveFunction version). These determine the sidemaps we produce, as temporaries used outside of their declaring scopes get promoted to named variables later (and are not considered hoistable rvals). 2. Pass 2 (collectTemporariesSidemap) Walk over instructions to generate a sidemap of temporary identifier -> named variable and property path (e.g. `$3 -> {obj: props, path: ["a", "b"]}`) 2. Pass 2 (collectHoistablePropertyLoads) a. Build a sidemap of block -> accessed variables and properties (e.g. `bb0 -> [ {obj: props, path: ["a", "b"]} ]`) b. Propagate "non-nullness" i.e. variables and properties for which we can safely evaluate `PropertyLoad`. A basic block can unconditionally read from identifier X if any of the following applies: - the block itself reads from identifier X - all predecessors of the block read from identifier X - all successors of the block read from identifier X 4. Pass 3: (collectDependencies) Walks over instructions again to record dependencies and declarations, using the previously produced sidemaps. We do not record any control-flow here 5. Merge every scope's recorded dependencies with the set of hoistable PropertyLoads Tested by syncing internally and (1) checking compilation output differences ([internal link](https://www.internalfb.com/intern/everpaste/?handle=GPCfUBt_HCoy_S4EAJDVFJyJJMR0bsIXAAAB)), running internally e2e tests ([internal link](https://fburl.com/sandcastle/cs5mlkxq)) --- ### Followups: 1. Rewrite function expression deps This change produces much more optimal output as the compiler now uses the function CFG to understand which variables / paths are assumed to be non-null. However, it may exacerbate [this function-expr hoisting bug](https://github.com/facebook/react/blob/main/compiler/packages/babel-plugin-react-compiler/src/__tests__/fixtures/compiler/bug-invalid-hoisting-functionexpr.tsx). A short term fix here is to simply call some form of `collectNonNullObjects` on every function expression to find hoistable variable / paths. In the longer term, we should refactor out `FunctionExpression.deps`. 2. Enable optional paths (a) don't count optional load temporaries as dependencies (e.g. `collectOptionalLoadRValues(...)`). (b) record optional paths in both collectHoistablePropertyLoads and dependency collection ghstack-source-id: e1fe5d80076df15c5277c6761138f45f267e11dd Pull Request resolved: #30894
Resubmission of #30079 -- core logic unchanged, but needed to rebase past #30573 ### Quick background #### Temporaries The compiler currently treats temporaries and named variables (e.g. `x`) differently in this pass. - named variables may be reassigned (in fact, since we're running after LeaveSSA, a single named identifier's IdentifierId may map to multiple `Identifier` instances -- each with its own scope and mutable range) - temporaries are replaced with their represented expressions during codegen. This is correct (mostly correct, see #29878) as we're careful to always lower the correct evaluation semantics. However, since we rewrite reactive scopes entirely (to if/else blocks), we need to track temporaries that a scope produces in `ReactiveScope.declarations` and later promote them to named variables. In the same example, $4, $5, and $6 need to be promoted: $2 ->`t0`, $5 ->`t1`, and $6 ->`t2`. ```js [1] $2 = LoadGlobal(global) foo [2] $3 = LoadLocal bar$1 scope 0: [3] $4 = Call $2(<unknown> $3) scope 1: [4] $5 = Object { } scope 2: [5] $6 = Object { a: $4, b: $5 } [6] $8 = StoreLocal Const x$7 = $6 ``` #### Dependencies `ReactiveScope.dependencies` records the set of (read-only) values that a reactive scope is dependent on. This is currently limited to just variables (named variables from source and promoted temporaries) and property-loads. All dependencies we record need to be hoistable -- i.e. reordered to just before the ReactiveScope begins. Not all PropertyLoads are hoistable. In this example, we should not evaluate `obj.a.b` without before creating x and checking `objIsNull`. ```js // reduce-reactive-deps/no-uncond.js function useFoo({ obj, objIsNull }) { const x = []; if (isFalse(objIsNull)) { x.push(obj.a.b); } return x; } ``` While other memoization strategies with different constraints exist, the current compiler requires that `ReactiveScope.dependencies` be re-orderable to the beginning of the reactive scope. But.. `PropertyLoad`s from null values will throw `TypeError`. This means that evaluating hoisted dependencies should throw if and only if the source program throws. (It is also a bug if source throws and compiler output does not throw. See facebook/react-forget#2709) --- ### Rough high level overview 1. Pass 1 Walk over instructions to gather every temporary used outside of its defining scope (same as ReactiveFunction version). These determine the sidemaps we produce, as temporaries used outside of their declaring scopes get promoted to named variables later (and are not considered hoistable rvals). 2. Pass 2 (collectTemporariesSidemap) Walk over instructions to generate a sidemap of temporary identifier -> named variable and property path (e.g. `$3 -> {obj: props, path: ["a", "b"]}`) 2. Pass 2 (collectHoistablePropertyLoads) a. Build a sidemap of block -> accessed variables and properties (e.g. `bb0 -> [ {obj: props, path: ["a", "b"]} ]`) b. Propagate "non-nullness" i.e. variables and properties for which we can safely evaluate `PropertyLoad`. A basic block can unconditionally read from identifier X if any of the following applies: - the block itself reads from identifier X - all predecessors of the block read from identifier X - all successors of the block read from identifier X 4. Pass 3: (collectDependencies) Walks over instructions again to record dependencies and declarations, using the previously produced sidemaps. We do not record any control-flow here 5. Merge every scope's recorded dependencies with the set of hoistable PropertyLoads Tested by syncing internally and (1) checking compilation output differences ([internal link](https://www.internalfb.com/intern/everpaste/?handle=GPCfUBt_HCoy_S4EAJDVFJyJJMR0bsIXAAAB)), running internally e2e tests ([internal link](https://fburl.com/sandcastle/cs5mlkxq)) --- ### Followups: 1. Rewrite function expression deps This change produces much more optimal output as the compiler now uses the function CFG to understand which variables / paths are assumed to be non-null. However, it may exacerbate [this function-expr hoisting bug](https://github.com/facebook/react/blob/main/compiler/packages/babel-plugin-react-compiler/src/__tests__/fixtures/compiler/bug-invalid-hoisting-functionexpr.tsx). A short term fix here is to simply call some form of `collectNonNullObjects` on every function expression to find hoistable variable / paths. In the longer term, we should refactor out `FunctionExpression.deps`. 2. Enable optional paths (a) don't count optional load temporaries as dependencies (e.g. `collectOptionalLoadRValues(...)`). (b) record optional paths in both collectHoistablePropertyLoads and dependency collection ghstack-source-id: e1fe5d80076df15c5277c6761138f45f267e11dd Pull Request resolved: #30894
Resubmission of #30079 -- core logic unchanged, but needed to rebase past #30573 ### Quick background #### Temporaries The compiler currently treats temporaries and named variables (e.g. `x`) differently in this pass. - named variables may be reassigned (in fact, since we're running after LeaveSSA, a single named identifier's IdentifierId may map to multiple `Identifier` instances -- each with its own scope and mutable range) - temporaries are replaced with their represented expressions during codegen. This is correct (mostly correct, see #29878) as we're careful to always lower the correct evaluation semantics. However, since we rewrite reactive scopes entirely (to if/else blocks), we need to track temporaries that a scope produces in `ReactiveScope.declarations` and later promote them to named variables. In the same example, $4, $5, and $6 need to be promoted: $2 ->`t0`, $5 ->`t1`, and $6 ->`t2`. ```js [1] $2 = LoadGlobal(global) foo [2] $3 = LoadLocal bar$1 scope 0: [3] $4 = Call $2(<unknown> $3) scope 1: [4] $5 = Object { } scope 2: [5] $6 = Object { a: $4, b: $5 } [6] $8 = StoreLocal Const x$7 = $6 ``` #### Dependencies `ReactiveScope.dependencies` records the set of (read-only) values that a reactive scope is dependent on. This is currently limited to just variables (named variables from source and promoted temporaries) and property-loads. All dependencies we record need to be hoistable -- i.e. reordered to just before the ReactiveScope begins. Not all PropertyLoads are hoistable. In this example, we should not evaluate `obj.a.b` without before creating x and checking `objIsNull`. ```js // reduce-reactive-deps/no-uncond.js function useFoo({ obj, objIsNull }) { const x = []; if (isFalse(objIsNull)) { x.push(obj.a.b); } return x; } ``` While other memoization strategies with different constraints exist, the current compiler requires that `ReactiveScope.dependencies` be re-orderable to the beginning of the reactive scope. But.. `PropertyLoad`s from null values will throw `TypeError`. This means that evaluating hoisted dependencies should throw if and only if the source program throws. (It is also a bug if source throws and compiler output does not throw. See facebook/react-forget#2709) --- ### Rough high level overview 1. Pass 1 Walk over instructions to gather every temporary used outside of its defining scope (same as ReactiveFunction version). These determine the sidemaps we produce, as temporaries used outside of their declaring scopes get promoted to named variables later (and are not considered hoistable rvals). 2. Pass 2 (collectTemporariesSidemap) Walk over instructions to generate a sidemap of temporary identifier -> named variable and property path (e.g. `$3 -> {obj: props, path: ["a", "b"]}`) 2. Pass 2 (collectHoistablePropertyLoads) a. Build a sidemap of block -> accessed variables and properties (e.g. `bb0 -> [ {obj: props, path: ["a", "b"]} ]`) b. Propagate "non-nullness" i.e. variables and properties for which we can safely evaluate `PropertyLoad`. A basic block can unconditionally read from identifier X if any of the following applies: - the block itself reads from identifier X - all predecessors of the block read from identifier X - all successors of the block read from identifier X 4. Pass 3: (collectDependencies) Walks over instructions again to record dependencies and declarations, using the previously produced sidemaps. We do not record any control-flow here 5. Merge every scope's recorded dependencies with the set of hoistable PropertyLoads Tested by syncing internally and (1) checking compilation output differences ([internal link](https://www.internalfb.com/intern/everpaste/?handle=GPCfUBt_HCoy_S4EAJDVFJyJJMR0bsIXAAAB)), running internally e2e tests ([internal link](https://fburl.com/sandcastle/cs5mlkxq)) --- ### Followups: 1. Rewrite function expression deps This change produces much more optimal output as the compiler now uses the function CFG to understand which variables / paths are assumed to be non-null. However, it may exacerbate [this function-expr hoisting bug](https://github.com/facebook/react/blob/main/compiler/packages/babel-plugin-react-compiler/src/__tests__/fixtures/compiler/bug-invalid-hoisting-functionexpr.tsx). A short term fix here is to simply call some form of `collectNonNullObjects` on every function expression to find hoistable variable / paths. In the longer term, we should refactor out `FunctionExpression.deps`. 2. Enable optional paths (a) don't count optional load temporaries as dependencies (e.g. `collectOptionalLoadRValues(...)`). (b) record optional paths in both collectHoistablePropertyLoads and dependency collection ghstack-source-id: 6c62db14213284bc61869b8db57198546c399f7f Pull Request resolved: #30894
Resubmission of #30079 -- core logic unchanged, but needed to rebase past #30573 ### Quick background #### Temporaries The compiler currently treats temporaries and named variables (e.g. `x`) differently in this pass. - named variables may be reassigned (in fact, since we're running after LeaveSSA, a single named identifier's IdentifierId may map to multiple `Identifier` instances -- each with its own scope and mutable range) - temporaries are replaced with their represented expressions during codegen. This is correct (mostly correct, see #29878) as we're careful to always lower the correct evaluation semantics. However, since we rewrite reactive scopes entirely (to if/else blocks), we need to track temporaries that a scope produces in `ReactiveScope.declarations` and later promote them to named variables. In the same example, $4, $5, and $6 need to be promoted: $2 ->`t0`, $5 ->`t1`, and $6 ->`t2`. ```js [1] $2 = LoadGlobal(global) foo [2] $3 = LoadLocal bar$1 scope 0: [3] $4 = Call $2(<unknown> $3) scope 1: [4] $5 = Object { } scope 2: [5] $6 = Object { a: $4, b: $5 } [6] $8 = StoreLocal Const x$7 = $6 ``` #### Dependencies `ReactiveScope.dependencies` records the set of (read-only) values that a reactive scope is dependent on. This is currently limited to just variables (named variables from source and promoted temporaries) and property-loads. All dependencies we record need to be hoistable -- i.e. reordered to just before the ReactiveScope begins. Not all PropertyLoads are hoistable. In this example, we should not evaluate `obj.a.b` without before creating x and checking `objIsNull`. ```js // reduce-reactive-deps/no-uncond.js function useFoo({ obj, objIsNull }) { const x = []; if (isFalse(objIsNull)) { x.push(obj.a.b); } return x; } ``` While other memoization strategies with different constraints exist, the current compiler requires that `ReactiveScope.dependencies` be re-orderable to the beginning of the reactive scope. But.. `PropertyLoad`s from null values will throw `TypeError`. This means that evaluating hoisted dependencies should throw if and only if the source program throws. (It is also a bug if source throws and compiler output does not throw. See facebook/react-forget#2709) --- ### Rough high level overview 1. Pass 1 Walk over instructions to gather every temporary used outside of its defining scope (same as ReactiveFunction version). These determine the sidemaps we produce, as temporaries used outside of their declaring scopes get promoted to named variables later (and are not considered hoistable rvals). 2. Pass 2 (collectTemporariesSidemap) Walk over instructions to generate a sidemap of temporary identifier -> named variable and property path (e.g. `$3 -> {obj: props, path: ["a", "b"]}`) 2. Pass 2 (collectHoistablePropertyLoads) a. Build a sidemap of block -> accessed variables and properties (e.g. `bb0 -> [ {obj: props, path: ["a", "b"]} ]`) b. Propagate "non-nullness" i.e. variables and properties for which we can safely evaluate `PropertyLoad`. A basic block can unconditionally read from identifier X if any of the following applies: - the block itself reads from identifier X - all predecessors of the block read from identifier X - all successors of the block read from identifier X 4. Pass 3: (collectDependencies) Walks over instructions again to record dependencies and declarations, using the previously produced sidemaps. We do not record any control-flow here 5. Merge every scope's recorded dependencies with the set of hoistable PropertyLoads Tested by syncing internally and (1) checking compilation output differences ([internal link](https://www.internalfb.com/intern/everpaste/?handle=GPCfUBt_HCoy_S4EAJDVFJyJJMR0bsIXAAAB)), running internally e2e tests ([internal link](https://fburl.com/sandcastle/cs5mlkxq)) --- ### Followups: 1. Rewrite function expression deps This change produces much more optimal output as the compiler now uses the function CFG to understand which variables / paths are assumed to be non-null. However, it may exacerbate [this function-expr hoisting bug](https://github.com/facebook/react/blob/main/compiler/packages/babel-plugin-react-compiler/src/__tests__/fixtures/compiler/bug-invalid-hoisting-functionexpr.tsx). A short term fix here is to simply call some form of `collectNonNullObjects` on every function expression to find hoistable variable / paths. In the longer term, we should refactor out `FunctionExpression.deps`. 2. Enable optional paths (a) don't count optional load temporaries as dependencies (e.g. `collectOptionalLoadRValues(...)`). (b) record optional paths in both collectHoistablePropertyLoads and dependency collection ghstack-source-id: 2507f6ea751dce09ad1dccd353ae6fc7cf411582 Pull Request resolved: #30894
Resubmission of #30079 -- core logic unchanged, but needed to rebase past #30573 ### Quick background #### Temporaries The compiler currently treats temporaries and named variables (e.g. `x`) differently in this pass. - named variables may be reassigned (in fact, since we're running after LeaveSSA, a single named identifier's IdentifierId may map to multiple `Identifier` instances -- each with its own scope and mutable range) - temporaries are replaced with their represented expressions during codegen. This is correct (mostly correct, see #29878) as we're careful to always lower the correct evaluation semantics. However, since we rewrite reactive scopes entirely (to if/else blocks), we need to track temporaries that a scope produces in `ReactiveScope.declarations` and later promote them to named variables. In the same example, $4, $5, and $6 need to be promoted: $2 ->`t0`, $5 ->`t1`, and $6 ->`t2`. ```js [1] $2 = LoadGlobal(global) foo [2] $3 = LoadLocal bar$1 scope 0: [3] $4 = Call $2(<unknown> $3) scope 1: [4] $5 = Object { } scope 2: [5] $6 = Object { a: $4, b: $5 } [6] $8 = StoreLocal Const x$7 = $6 ``` #### Dependencies `ReactiveScope.dependencies` records the set of (read-only) values that a reactive scope is dependent on. This is currently limited to just variables (named variables from source and promoted temporaries) and property-loads. All dependencies we record need to be hoistable -- i.e. reordered to just before the ReactiveScope begins. Not all PropertyLoads are hoistable. In this example, we should not evaluate `obj.a.b` without before creating x and checking `objIsNull`. ```js // reduce-reactive-deps/no-uncond.js function useFoo({ obj, objIsNull }) { const x = []; if (isFalse(objIsNull)) { x.push(obj.a.b); } return x; } ``` While other memoization strategies with different constraints exist, the current compiler requires that `ReactiveScope.dependencies` be re-orderable to the beginning of the reactive scope. But.. `PropertyLoad`s from null values will throw `TypeError`. This means that evaluating hoisted dependencies should throw if and only if the source program throws. (It is also a bug if source throws and compiler output does not throw. See facebook/react-forget#2709) --- ### Rough high level overview 1. Pass 1 Walk over instructions to gather every temporary used outside of its defining scope (same as ReactiveFunction version). These determine the sidemaps we produce, as temporaries used outside of their declaring scopes get promoted to named variables later (and are not considered hoistable rvals). 2. Pass 2 (collectTemporariesSidemap) Walk over instructions to generate a sidemap of temporary identifier -> named variable and property path (e.g. `$3 -> {obj: props, path: ["a", "b"]}`) 2. Pass 2 (collectHoistablePropertyLoads) a. Build a sidemap of block -> accessed variables and properties (e.g. `bb0 -> [ {obj: props, path: ["a", "b"]} ]`) b. Propagate "non-nullness" i.e. variables and properties for which we can safely evaluate `PropertyLoad`. A basic block can unconditionally read from identifier X if any of the following applies: - the block itself reads from identifier X - all predecessors of the block read from identifier X - all successors of the block read from identifier X 4. Pass 3: (collectDependencies) Walks over instructions again to record dependencies and declarations, using the previously produced sidemaps. We do not record any control-flow here 5. Merge every scope's recorded dependencies with the set of hoistable PropertyLoads Tested by syncing internally and (1) checking compilation output differences ([internal link](https://www.internalfb.com/intern/everpaste/?handle=GPCfUBt_HCoy_S4EAJDVFJyJJMR0bsIXAAAB)), running internally e2e tests ([internal link](https://fburl.com/sandcastle/cs5mlkxq)) --- ### Followups: 1. Rewrite function expression deps This change produces much more optimal output as the compiler now uses the function CFG to understand which variables / paths are assumed to be non-null. However, it may exacerbate [this function-expr hoisting bug](https://github.com/facebook/react/blob/main/compiler/packages/babel-plugin-react-compiler/src/__tests__/fixtures/compiler/bug-invalid-hoisting-functionexpr.tsx). A short term fix here is to simply call some form of `collectNonNullObjects` on every function expression to find hoistable variable / paths. In the longer term, we should refactor out `FunctionExpression.deps`. 2. Enable optional paths (a) don't count optional load temporaries as dependencies (e.g. `collectOptionalLoadRValues(...)`). (b) record optional paths in both collectHoistablePropertyLoads and dependency collection ghstack-source-id: 2507f6ea751dce09ad1dccd353ae6fc7cf411582 Pull Request resolved: #30894
Stack from ghstack (oldest at bottom):
This PR updates to use SSA form through the entire compilation pipeline. This means that in both HIR form and ReactiveFunction form,
Identifier
instances map 1:1 toIdentifierId
values. If two identifiers have the same IdentifierId, they are the same instance. What this means is that all our passes can use this more precise information to determine if two particular identifiers are not just the same variable, but the same SSA "version" of that variable.However, some parts of our analysis really care about program variables as opposed to SSA versions, and were relying on LeaveSSA to reset identifiers such that all Identifier instances for a particular program variable would have the same IdentifierId (though not necessarily the same Identifier instance). With LeaveSSA removed, those analysis passes can now use DeclarationId instead to uniquely identify a program variable.
Note that this PR surfaces some opportunties to improve edge-cases around reassigned values being declared/reassigned/depended-upon across multiple scopes. Several passes could/should use IdentifierId to more precisely identify exactly which values are accessed - for example, a scope that reassigns
x
but doesn't usex
prior to reassignment doesn't have to take a dependency onx
. But today we take a dependnecy.My approach for these cases was to add a "TODO LeaveSSA" comment with notes and the name of the fixture demonstrating the difference, but to intentionally preserve the existing behavior (generally, switching to use DeclarationId when IdentifierId would have been more precise).
Beyond updating passes to use DeclarationId instead of Identifier/IdentifierId, the other change here is to extract out the remaining necessary bits of LeaveSSA into a new pass that rewrites InstructionKind (const/let/reassign/etc) based on whether a value is actually const or has reassignments and should be let.