You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A large chunk of the time spent in the fetch-decode-execute loop of the interpreter is consumed by the decoder.
A potential way to speed up execution is to cache the result of decoding each instruction into a format that can be used more directly.
One obstacle to this idea is that the cache lookup has to be very efficient, or it has to be amortized over many instructions.
Solving this problem could make the emulator significantly faster.
Possible solutions
The standard way to solve this problem is to use tracing.
In a nutshell, while decoding instructions in a never-before-seen flow, the emulator would create a trace of decoded instructions and store it in the cache.
Initially, these traces would contain only the instructions between branches.
More advanced versions could perhaps include even branches.
The cache lookup would happen only on taken branch instructions, and would therefore be potentially amortized over many instructions.
An alternative proposed by @edubart is to mirror the address space with a decoded counterpart.
Perhaps each decoded instruction takes a multiple of the size of the non-decoded corresponding instruction, so the mapping is very efficient.
The loop would therefore always try the decoded version and, if an invalid decode is detected (say, 0), it performs the decode, saves the result, and proceed from the saved decode.
It is hard to tell which approach is best.
The mirrored address space seems simpler to implement, but also more constrained in what it can do.
The text was updated successfully, but these errors were encountered:
Context
A large chunk of the time spent in the fetch-decode-execute loop of the interpreter is consumed by the decoder.
A potential way to speed up execution is to cache the result of decoding each instruction into a format that can be used more directly.
One obstacle to this idea is that the cache lookup has to be very efficient, or it has to be amortized over many instructions.
Solving this problem could make the emulator significantly faster.
Possible solutions
The standard way to solve this problem is to use tracing.
In a nutshell, while decoding instructions in a never-before-seen flow, the emulator would create a trace of decoded instructions and store it in the cache.
Initially, these traces would contain only the instructions between branches.
More advanced versions could perhaps include even branches.
The cache lookup would happen only on taken branch instructions, and would therefore be potentially amortized over many instructions.
An alternative proposed by @edubart is to mirror the address space with a decoded counterpart.
Perhaps each decoded instruction takes a multiple of the size of the non-decoded corresponding instruction, so the mapping is very efficient.
The loop would therefore always try the decoded version and, if an invalid decode is detected (say, 0), it performs the decode, saves the result, and proceed from the saved decode.
It is hard to tell which approach is best.
The mirrored address space seems simpler to implement, but also more constrained in what it can do.
The text was updated successfully, but these errors were encountered: