Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce WASM JITs for interpreter opcodes, do_jit_call, and interp_entry wrappers #76477

Merged
merged 3 commits into from
Nov 15, 2022

Conversation

kg
Copy link
Member

@kg kg commented Oct 1, 2022

This PR introduces a "jiterpreter" just-in-time WASM compiler for interpreter opcodes along with a set of specialized WASM JIT compilers to improve the performance of interp_entry and do_jit_call. The result is significantly improved performance for pure-interpreter workloads along with measurable speedups for interp->aot and aot->interp transitions in mixed mode AOT applications.

The jiterpreter can be thought of as an additional layer on top of interpreter tiering, and works by inserting "trace entry points" into interpreter opcode streams. Once an entry point is reached often enough, the JIT kicks in to compile as many following interpreter opcodes into WASM as possible. Once a jitted 'trace' is available, when the interpreter reaches the entry point it can invoke the trace to natively execute dozens of interpreter opcodes at once without loop dispatch overhead. Any trace entry point that fails to JIT is replaced with a special NOP.

See https://paper.dropbox.com/doc/Jiterpreter-For-WebAssembly--BqLn88GjC3z4GDuE09vamtw4Ag-mJ5TRc7xF1PNAGMgwU49K for a more detailed overview of the jiterpreter design.

Due to browser limitations it is not possible to JIT entire methods at once, so entry points have to be placed at key locations within a given method, like immediately after a backwards branch target (for loop bodies). The jiterpreter has a simple single-pass compiler, so when it encounters an opcode or complex pattern it can't handle it 'bails out' to return control to the interpreter. Complex control flow like exceptions, calls and returns are also handled by bailing out to the interpreter. Traces share the interpreter's state and locals instead of maintaining their own, which allows seamless bailouts and maintains existing GC/thread safety characteristics.

The jiterpreter infrastructure is also used to on-demand compile specialized wrappers for interp_entry and do_jit_call. All the critical reflective/dynamic work (like enumerating parameter types) is done at compile time to produce a small, branchless dispatcher that is able to set up an interpreter stack frame or invoke native code with fewer copies than the generic wrappers currently used. To determine which do_jit_call sites should be optimized, we maintain a hit counter much like we do for traces, and only JIT the most frequently executed ones. For interp_entry we optimize all of them.

An additional optimization in this PR leverages the jiterpreter infrastructure to optimize all do_jit_call execution by generating a dedicated function (either in WASM or JS) to trap exceptions and dispatch native calls. The current implementation uses mono_llvm_cpp_catch_exception to perform an indirect call bounced through JavaScript, which adds a measurable amount. By using specialized code we can perform a direct call using WASM Exception Handling (if available) or using a specialized JavaScript function that doesn't do any indirection or table lookups. This optimization by itself speeds up System.Runtime.Tests in AOT mode by about 2 seconds. You can find the function template in the new do-jit-call.wat and do-jit-call.wasm files.

The jiterpreter's features are configurable using a set of new runtime options, the key ones being:

  • --jiterpreter-enable-traces
    Enables the insertion of trace entry points in interpreter functions and JIT compilation for those entry points.
  • --jiterpreter-enable-interp-entry
    Enables the generation of specialized interp_entry wrappers. Note that this option is not retroactive, and these are typically all generated at app startup.
  • --jiterpreter-enable-jit-call
    Enables the generation of specialized do_jit_call wrappers.
  • --jiterpreter-enable-stats
    Enables dumping detailed statistics on the jiterpreter, including how many WASM functions were jitted, which interpreter opcodes caused the most failures, and how much time was spent. If statistics are enabled they are automatically dumped at app exit, but they can be manually dumped using an INTERNAL api.

This PR also contains some scaffolding changes and interpreter optimizations that were necessary in order to finish it and test it on benchmark scenarios:

  • Added interpreter opcodes for Math.Min/Max/Abs (without these lots of compute is impossible to JIT)
  • Updated a couple of the sample csprojs to make it easy to turn AOT+trimming on or off at build time
  • Lowered browser-bench's default heap size (the default can cause OOM during runs, partly due to a memory leak in Chrome)
  • Adds a simple raytracer sample that is engineered to transition in and out of the interpreter frequently when AOT'd, to serve as a useful benchmark for various parts of the PR
  • Updated the 'cwraps' typescript API to produce useful error messages when an exported C function is missing
  • Replaced mono_wasm_array_length C API with a new safe mono_wasm_array_length_ref variant
  • Added an optimization to remove the linker placeholders that currently sit between the runtime's C and TypeScript glue (see startup.ts)

@kg kg added NO-MERGE The PR is not ready for merge yet (see discussion for detailed reasons) arch-wasm WebAssembly architecture area-Codegen-JIT-mono area-Codegen-Interpreter-mono labels Oct 1, 2022
@ghost ghost assigned kg Oct 1, 2022
@ghost
Copy link

ghost commented Oct 1, 2022

Tagging subscribers to 'arch-wasm': @lewing
See info in area-owners.md if you want to be subscribed.

Issue Details

This (draft) PR introduces a 'jiterpreter' which on-the-fly compiles 'traces' of interpreter opcodes into WebAssembly so that the interpreter can transition into and out of them for closer-to-native performance when AOT isn't possible. See https://paper.dropbox.com/doc/Jiterpreter-For-WebAssembly--BqLn88GjC3z4GDuE09vamtw4Ag-mJ5TRc7xF1PNAGMgwU49K for a detailed overview.

The main changes can be broken down into:

  • A set of interpreter opcodes that control the generation and execution of jitted WASM
  • Support C APIs and runtime wrappers used by the jitted WASM
  • Instrumentation and a custom tiering system used to decide when to jit a method to WASM (in the long run this may be removed in favor of just using the interpreter's existing tiering system)
  • A single-pass compiler that can translate a large subset of mono interpreter opcodes into wasm opcodes

This PR also contains a few small optimizations that significantly improve the performance of the interpreter (and as a result, the jiterpreter) on some compute-heavy test cases like raytracers. One of the raytracer test cases is currently included in the PR (but will probably be removed).

While the PR is not 'ready', review for the runtime/interpreter internals bits would be appreciated, there are definitely parts that feel like they need cleanup and suggestions on how to do them right would be great. At present this is able to pass the full System.Runtime test suite in the console with performance roughly at parity with it disabled (including the time spent actually compiling WASM), and for the raytracer test cases I'm using the performance improvements are in the neighborhood of 11x native -> 6x native. The biggest current limitation is the lack of support for backward branches and calls - I don't expect to ever add support for calls, but ideally I can either add backward branches or introduce a mechanism for turning loop/try bodies into their own traces. The jiterpreter also currently ignores debugger opcodes, but it's possible to implement them. The biggest performance problem is that calling through function pointers to a dynamically linked function in browsers is just plain slow, so traces have to be relatively large to overcome that.

Author: kg
Assignees: -
Labels:

NO-MERGE, arch-wasm, area-Codegen-JIT-mono, area-Codegen-Interpreter-mono

Milestone: -

@build-analysis build-analysis bot mentioned this pull request Oct 6, 2022
2 tasks
@kg kg changed the title Introduce a WASM JIT for interpreter opcodes Introduce WASM JITs for interpreter opcodes and interp_entry wrappers Oct 7, 2022
@kg
Copy link
Member Author

kg commented Oct 7, 2022

PR description updated to cover the new 'jiterpreter trampolines', which are jitted replacements for the AOT interp_entry wrappers. The now-sabotaged raytracer sample in the PR forces aot/interp transitions during execution due to runtime generics and delegates, so it shows wildly different timings in each of the 4 possible execution modes (regular AOT, AOT+trampolines, regular interpreter, interpreter+jiterpreter+trampolines). There are opportunities to JIT other wrappers to further improve the performance of this test in AOT mode, I think - it spends a lot of time on overhead in various places instead of actually doing work, and before I sabotaged it it was MUCH faster in AOT.

Comparison timings follow (best time selected from a set of runs in each case). Note that if you are running it for measurement, do not open devtools until all your runs are done, because devtools deoptimize JS and wasm code. If you want to profile it, start the profiler and then reload the page while the profiler is running.

/// aot, no optimizations
Rendering finished in 7191 ms
// jiterpreter produced 0 traces from 0 candidates, and 0 trampolines
// time spent: 0ms generating, 0ms compiling wasm


/// aot with traces and trampolines enabled
Rendering finished in 6863 ms
// jiterpreter produced 0 traces from 2 candidates, and 2 trampolines
// time spent: 3ms generating, 2ms compiling wasm


/// interpreter, no optimizations
Rendering finished in 4868 ms
// jiterpreter produced 0 traces from 0 candidates, and 0 trampolines
// time spent: 0ms generating, 0ms compiling wasm


/// interpreter with traces and trampolines enabled
Rendering finished in 4070 ms
// jiterpreter produced 3 traces from 5 candidates, and 0 trampolines
// time spent: 6ms generating, 0ms compiling wasm

@kg kg force-pushed the wasm-jiterpreter branch from 83b97cc to 76cb2cc Compare October 11, 2022 21:31
@kg
Copy link
Member Author

kg commented Oct 13, 2022

I reworked the size threshold logic used to select where to insert traces, and implemented a C-side heuristic that is able to make fairly accurate guesses about where traces should go which reduces the overhead introduced by the jiterpreter when it fails to JIT a method. The timings for the toy raytracer testcase are still roughly as above, but the addition of the heuristic means less time is spent generating code in the runtime test suite, so the jiterpreter is now a measurable perf improvement for it. Some summary timings for fresh runs of the toy raytracer, pavel's more complex raytracer sample (w/o SIMD), and the system.runtime test suite (which currently is incompatible with AOT):

name interpreter (msec) JITerpreter (msec) % of baseline AOT (msec) AOT+JIT (msec) % of baseline
System.Runtime.Tests 148743 128629 86.47% N/A N/A N/A
Sabotaged raytracer 9321 6681 71.67% 12345 11645 94.32%
Pavel's raytracer 20462 13534 66.14% 2267 N/A N/A
browser-bench large JSON serialize 71.27 58.73 82.40% 8.03 8.00 N/A
browser-bench large JSON deserialize 102.55 79.25 77.27% 14.47 14.60 N/A

(edited to add more timings)
(edit 2: updated system.runtime test timings because a crash fix made them slower and I implemented more opcodes)

@kg kg force-pushed the wasm-jiterpreter branch 2 times, most recently from 8d16571 to 9b93f14 Compare November 8, 2022 11:47
@lewing
Copy link
Member

lewing commented Nov 8, 2022

@BrzVlad @vargaz is there more review needed here or are things in good shape?

@pavelsavara
Copy link
Member

I would love this to be optional part of dotnet.js behind a feature flag. It may be in next PR tho.

@kg
Copy link
Member Author

kg commented Nov 9, 2022

I would love this to be optional part of dotnet.js behind a feature flag. It may be in next PR tho.

With the options interface in this PR, we can set the default to off and then you use a runtime flag to turn it on. See the .withRuntimeOptions(["--jiterpreter-enable-stats"]) in browser-bench, there are flags to enable/disable parts of the jiterpreter as well.

@pavelsavara
Copy link
Member

With the options interface in this PR, we can set the default to off and then you use a runtime flag to turn it on.

I mean, to link it out via rollup.
Or even better, this would be great candidate to make it separate ES6 module and dynamically import() it only when the config flag is set.

@kg
Copy link
Member Author

kg commented Nov 10, 2022

Here are some updated timings for various scenarios, since some of the performance characteristics have changed since the previous timings (a few key reasons are that batching the JIT operation can cause slower code to run for a period of time - this slows down console tests but not most real applications, and that I had to revert the inlining threshold).

System.Runtime.Tests now passes with AOT enabled, but while the jiterpreter is active for it, the fact that the suite runs synchronously means that the improvement is small enough to be within the margin of error, so I omitted it along with a couple others that were ~100%. (Nothing relevant got slower)

name baseline (ms) jiterpreter (ms) relative time (%)
Json, large serialize (AOT) 8.1799 7.6687 93.75%
Json, large deserialize (AOT) 14.3750 14.1799 98.64%
JSInterop, LegacyExportInt (AOT) 8.4975 7.9102 93.09%
JSInterop, JSExportInt (AOT) 6.4096 5.3799 83.94%
JSInterop, JSImportInt (AOT) 0.7562 0.6748 89.24%
mixed-mode raytracer (AOT) 12751 9861 77.34%
System.Runtime.Tests (interp) 138651.7 129579.6 93.46%
Json, small serialize (interp) 0.2517 0.2093 83.15%
Json, small deserialize (interp) 0.3567 0.2901 81.33%
Json, large serialize (interp) 72.5972 57.2889 78.91%
Json, large deserialize (interp) 97.6667 79.5303 81.43%
JSInterop, LegacyExportInt (interp) 4.8936 4.8633 99.38%
JSInterop, JSExportInt (interp) 3.6925 3.2976 89.31%
mixed-mode raytracer (interp) 9271 7024 75.76%
pavel's raytracer (interp) 21353 13810 64.67%

@kg
Copy link
Member Author

kg commented Nov 10, 2022

The latest commit repurposes the little toy .wasm module used for WASM EH support detection to function as an optimized replacement for mono_llvm_cpp_catch_exception when performing JIT calls. If WASM EH is available, it will be used to directly invoke jit_call_cb, reducing the cost of that to a single indirect function call (no WASM-JS-WASM transition overhead). If WASM EH isn't available, a specialized JS function will be invoked instead, which is still a bit cheaper than mono_llvm_cpp_catch_exception because we are able to skip the extra indirect function call thanks to knowing that the call target is always jit_call_cb.

I verified that all do_jit_call paths are faster with this optimization in browser-bench, but the most convincing numbers come from System.Runtime.Tests in AOT mode:
mono_llvm_cpp_catch_exception: 43.646669s
specialized js: 43.251835s
specialized wasm eh: 42.549922s (97.5% of baseline)

This new optimization works even if you disable the rest of the jiterpreter.

@kg
Copy link
Member Author

kg commented Nov 10, 2022

My current plan for landing this is to disable the jiterpreter JIT features (traces, jit_call wrappers, and interp_entry wrappers) and only land it with the new optimized jit call exception handling path enabled, since it provides a good speed improvement and is lower risk. That lets us flush out any unexpected issues with JITing wasm this way along with the parts of the scaffolding that are always enabled (like the options engine, feature detection, etc). Then in a followup PR I can flip the jiterpreter JITs to enabled so that it's easy to revert that part without having to change tons of files.

@kg kg removed the NO-MERGE The PR is not ready for merge yet (see discussion for detailed reasons) label Nov 11, 2022
const int mtl_l = strlen(mtl);

if (!strcmp (option, "jiterpreter-enable-traces"))
jiterpreter_traces_enabled = TRUE;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better to add these as options to utils/options-def.h. It would auto generate the argument parsing code, etc.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I saw while looking at the header I wouldn't be able to use it to do this, but I'll give it another try.

@vargaz
Copy link
Contributor

vargaz commented Nov 14, 2022

The mono changes look ok to me otherwise.

Jiterpreter traces work (yay alignment)
Use function pointers to invoke jiterpreter traces
Implement binops, basic arithmetic, and some unary ops
cknull
Strlen
ldloca_s + print statistics at exit
getchr, track bailout count
Use a JS dispatcher for traces because emscripten addFunction is bad. Use unchecked memory setters in traces. Enable tracing for more code.
Checkpoint ldfld
Checkpoint stfld
Checkpoint: F32 arithmetic
Add enable flag
Move back to addFunction because while the overhead of adding the trace functions is higher, the invoke performance is better
Checkpoint: Traces continue past brfalse/brtrue if the branch is not taken
Checkpoint wasm generation
Generates dummy traces
Checkpoint: Works again
Optimized trace generator
Traces work again
Fixed codegen for memmove and ldloca
Use wasm memmove instruction
Implement cknull
Implement brfalse and brtrue
Refactor null checks
stloc refactoring
Basic ldfld/stfld support
Checkpoint ldind (broken)
ldind works
ldfld_o
stfld_o
ldfld_vt
Unary math intrinsics and ldlen/ldelem_ref
Fix math intrinsics, add some i8 operations
Working (u|i)32 -> i64 promotion
stind
Add fp remainder operation, adjust thresholds to avoid making tests way slower
Fix remainder codegen, implement mono_ldptr
ldelema1
ldelema1 error handling
ldelem for ints and floats
Checkpoint trace enter optimization (broken)
Generate moves instead of a memmove for small constant-sized memmoves
More accurate trace length measurement, higher trace length requirement for better perf
Check in simple raytracer sample code
De-inline intersection test because the branches cause lots of trace bailouts
add_iN_imm and mul_iN_imm
ldc_i8 for small constants
64-bit relops and a few 64-bit superinsns
Checkpoint relop branches
Fixed relop branches
Specify alignment of ldloc/stloc where possible
Fix imm add/mul opcodes and improve debugging
Pass locals to traces instead of frame. Hard-code data items into the trace instead of loading them.
Pass method name to prepare_jiterpreter, include method name in trace name, add flag to trace method name when aborts happen. Add Min and Max interpreter opcodes
Try to keep interpreter tiering working
Add abs interp opcode, add missing float32 interp opcodes. Fix codegen for f32 pow/min/etc. Don't generate additional jiterpreter enter opcodes when inlining.
Fix a warning
Fix more warnings, don't pass ip to traces
Checkpoint work on static fields; fix unbalanced stack on stfld
Static fields work
Support for generating trace entry points at the target of all backward branches.
When bailing out for a branch, add the displacement to the bailout target.
Restructure the simple raytracer
Categorize bailouts and count each type separately when counting is active
Integer divide and remainder
Implement MUL_OVF_I4 and MUL_OVF_UN_I4. Fix the getelema failure check being inverted.
Add support for more precise tracking of what method calls cause traces to abort
Implement INITOBJ
Implement STIND_REF
i4 to u1/u2 conversions
Add i4->i2 and i4->i1 conversions
Fix min and max interp opcodes
Detect when a trace is approaching the browser wasm size limit (4kb) and abort it so that it doesn't fail to compile
Debugging and logging improvements
Optimize generated module boilerplate for size
Rework branch patch point system to be less sloppy
Handle different branch depths
Checkpoint new approach to branches
Fix ldc failures unbalancing the stack. Fix some branch problems
Implement GETCHR and STRLEN
Fully qualified trace names
Move some stuff into a new file
Optimize some null checks to not perform a double memory load
Cache i4 mul/div/rem operands in locals instead of re-loading them from memory after range checks
Less aggressive 'a branch can never skip past this' abort for calls
Fix math operand caching
Fix ldlen
Improved error handling and instrumentation
u4 -> i4 with underflow check
Shifts with immediates
ldind with constant offset
Re-enable stelem
Add ldlen_span, fix incorrect base in trace error messages
newobj_vt_inlined and newobj_inlined
ldc_i8 for full value range
ldobj_vt and cpobj_vt
stobj_vt
br(true|false)_i8_s and ldftn_addr
Fix local index assignment. implement unsigned i8 div and rem
intrins_span_ctor
Improved instrumentation for estimated opcode importance, implement more intrinsics
If a method has the Intrinsic attribute, treat it as if AggressiveInlining were set in the interpreter. This makes System.Numerics.Vectors types much faster.
Fix remainder opcodes. Implement castclass and isinst
Implement INTRINS_MEMORYMARSHAL_GETARRAYDATAREF. Add partial support for safepoint branches and with-immediate relop branches
ldelem_vt
Replace JS remainder implementation with a wrapper around C fmod. Implement atan2.
Fix casing so aot-cross can be located during build of samples
Fix i32/i64 type mismatch for an opcode. Rebase damage fixes.
When rejecting a trace, set its abort reason to trace-too-small. Implement r8 conditional branches. Implement ldind_offset
stfld_vt
Checkpoint: Easier enable/disable for jiterpreter, mess up the raytracer sample so that it causes AOT to bail into the interp
Implement MINT_UNBOX, fix counters
Optimize some jiterpreter entry points by moving them into interp.c so they can be inlined
Don't call stackval_from_data for pointers
Checkpoint backwards branch support
Backwards branch entry points seem to work now
Rename various jiterpreter entry points and make sure they don't use external APIs or volatile
Add some missing indirect opcodes and rework the indirect code
Fix negative constant indirect offsets
Implement MINT_BOX
Adjust thresholds and add some missing offset opcodes
Code cleanup

Move various jiterpreter code to its own .c file

Checkpoint

Checkpoint

Add heuristic filter to avoid inserting trace enter opcodes in places where they are unlikely to produce a trace

Fix crash at startup with jiterpreter turned off

Branch block heuristic fixes

When estimating trace lengths, don't count opcodes like nop and dummy_use

Lower trace length threshold since it ignores dummy opcodes now, adjust test benchmark to be slower, prevent leb being linked out when jit wrappers are enabled

Repair rebase damage

eslint and wasi build fixes

--amend

C4206 workaround

Diff cleanup

Diff cleanup

Fix jiterp opdefs

Checkpoint: Move most jiterpreter options into c so they can be accessed from everywhere

Move the opcode abort counts into C and update them in the heuristic abort logic as well. More options improvements

Revert inlining limit to see if it fixes the crash; fix osx build

Manually enable jiterpreter stats in browser-bench

lint fix; implement ld_delegate_method_ptr since it shows up in browser bench wrappers

Disable jiterpreter and wrapper jit by default if threading is enabled. Update comments.

Checkpoint code cleanup

Code cleanup

Code and project file cleanup

Implement castclass_interface and isinst_interface

Implement localloc and newarr. Pass frame pointer to traces.

CI build fix

Rework call target tracking so it's more useful. Move jiterpreter APIs to INTERNAL.

Implement add_ovf for i4 and u4, implement leave_s, improve usefulness of hottest failed traces output

Implement LDTSFLDA, improve trace list filtering

Implement BOX_VT

Unify overflow check and castclass/isinst helpers to reduce number of imports. Implement more conversions and cast/isinst variants

Unify overflow conversions and implement a few more. Implement i8 immediate comparison branches. Fix overflow in bailout counting

Checkpoint: add jit for specialized do_jit_call trampolines

Checkpoint

Optimized out wasmtable.get overhead when invoking jit call wrappers

Revert test changes

Checkpoint repairing merge damage

Temporarily disable getitem_span since it changed

Implement getitem_span and getitem_localspan

CI build fixes

Address PR feedback

Address PR feedback

Address PR feedback

Address PR feedback

Refactor import section generation; checkpoint non-indirect jit call wrappers

Use direct call instead of indirect call for jit call wrappers.

Optimize some of the __linker_exports generated wrappers to use .call instead of .apply so there is less runtime overhead on C-to-JS calls

Improved linker export wrappers for lower c-to-js overhead

Replace linker export wrappers with the actual export functions at startup to remove overhead

Remove debug print and fix infinitely growing import list

Revert lib.js changes since the optimized wrapper complexity is not needed anymore

lint fix for CI build

Fix computed goto non-wasm interpreter build due to jiterpreter opcodes

Implement LEAVE and BR alongside the short versions

Implement support for wasm exception handling in jit_call wrappers

Aligned memops in a few key places

Remove MINT_NEWARR support since it appears to break NDPin test and it's not terribly important to have. Fix instrumented method support

Remove instrumented method

JIT do_jit_call wrappers in groups if possible instead of one at a time.
If multiple do_jit_call cinfos share the same target function, share a single wrapper
Improve error message for missing lazy cwraps

Lower initial heap size for browser-bench because the default of 512mb causes OOM during the appstart tests

Also generate groups of interp_entry wrappers in one go instead of generating one module each

Repair merge damage

Fix name collision when generating multiple interp entry wrappers at once

Inline stackval_from_data logic for common types into interp_entry wrappers
Optimize jitcall hit counter

Fix jitcall counter continuing to increment even if jit fails
Detect whether the runtime supports WASM EH and if not, disable it automatically
Improve formatting of log messages
Code cleanup

Fix crash due to not handling the unbox flag for the this-reference in interp_entry wrappers
Rename some of the jiterpreter configuration bools
Add a missing jiterpreter config switch
Don't rely on atob for the wasm EH support probe, just convert a hex literal by hand instead

Clean up some options, adjust some thresholds, and add a couple new comments

Code cleanups, rearrangement, and add comments

Add support for optimized JS and WASM EH based implementations of mono_llvm_cpp_catch_exception
Formatting improvements
@kg kg force-pushed the wasm-jiterpreter branch from 86457a5 to 39b87ee Compare November 15, 2022 19:45
// multiple times and waste some work. At present this is unavoidable because
// control flow means we can end up with two traces covering different subsets
// of the same method in order to handle loops and resuming
gboolean should_generate = enabled && should_generate_trace_here(bb, td->last_ins);
Copy link
Member

@BrzVlad BrzVlad Nov 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code around here operates on the wrong assumption that interp instructions are linked together from the first all the way to the last instruction in the method which is wrong. Instructions are linked together only within the same basic block. td->last_ins is pretty much useless and I should probably get rid of it at some point since it is bad design (after an interp_add_ins we can access the added instruction from td->last_ins so we set data into it)

// A preceding trace may have been in a branch block, but we only care whether the current
// trace will have a branch block opened, because that determines whether calls and branches
// will unconditionally abort the trace or not.
gboolean inside_branch_block = FALSE;
Copy link
Member

@BrzVlad BrzVlad Nov 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This serves no purpose currently. Once we hit a branching opcode the bblock ends and we stop the iteration. If you want to continue probing the trace after a branch you should continue with bb->next_bb and restart with first_ins from there.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, this used to do something but it doesn't anymore. I'll figure out whether it's dead code to be removed or whether I need to fix it, since the heuristic is working pretty well as-is.


// Keep this file in sync with mintops.def. The order and values need to match exactly.

export const enum MintOpcode {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mintops.def changes very often. Can't we do anything to simply include it ? Or have this automatically generated ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could probably include it at build time and text parse it at runtime, but that would break the typescript type checking (which has caught quite a few bugs). I could probably also figure out some way to generate this during the build process. I'll look into it.

Note that if the mintops change is actually changing the implementation of an opcode, that'll break the jiterpreter unless it's also updated. Do you have thoughts on how that should be handled? The easiest solution is to disable the opcode until someone who knows typescript can update it and re-enabled, but it means that modifying the interp requires potentially modifying the jiterp.

Copy link
Member

@lambdageek lambdageek Nov 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be possible for the runtime (src/mono/mono) build to create a .ts file using some C preprocessor steps (or, worst case, using an MSBuild task). and then the wasm build (src/mono/wasm/runtime) can just consume the .ts file from elsewhere.

Note that if the mintops change is actually changing the implementation of an opcode, that'll break the jiterpreter unless it's also updated. Do you have thoughts on how that should be handled?

You mean if the semantics of an opcode change? I think we should probably just handle it ad-hoc and disable the opcode.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
arch-wasm WebAssembly architecture area-Codegen-meta-mono
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants