Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

significantly improve instruction printing efficiency #34

Merged
merged 95 commits into from
Jun 24, 2024

Conversation

iximeow
Copy link
Owner

@iximeow iximeow commented Jun 24, 2024

this is where much of iximeow/yaxpeax-arch#7 originated.

std::fmt as a primary writing mechanism has.. some limitations:

and some more interesting more fundamental limitations - writing to a T: fmt::Write means implementations don't know if it's possible to write bytes in reverse order (useful for printing digits) or if it's OK to write too many bytes and then only advance len by the correct amount (useful for copying variable-length-but-short strings like register names). these are both perfectly fine to a String or Vec, less fine to do to a file descriptor like stdout.

at the same time, Colorize and traits depending on it are very broken, for reasons described in yaxpeax-arch.

so, this adapts yaxpeax-x86 to use the new DisplaySink type for writing, with optimizations where appropriate and output spans for certain kinds of tokens - registers, integers, opcodes, etc. it's not a perfect replacement for Colorize-to-ANSI-supporting-outputs but it's more flexible and i think can be made right.

along the way this completes the move of safer_unchecked out to yaxpeax-arch (ty @5225225 it's still so useful), cleans up some docs, and comes with a few new test cases.

because of the major version bump of yaxpeax-arch, and because this removes most functionality of the Colorize impl - it prints the correct words, just without coloring - this is itself a major version bump to 2.0.0. yay! this in turn is a good point to change the Opcode enums from being tuple-like to struct-like, and i've done so in 1b8019d.

full notes in CHANGELOG ofc. this is notes for myself when i'm trying to remember any of this in two years :)

for mem size labels: add one new "BUG" entry at the start of the array
  so `mem_size` does not need to be adjusted before being used to look
  up a string from the `MEM_SIZE_STRINGS` array. it's hard to measure
  the direct benefit of this, but it shrinks codegen size by a bit and
  simplfies a bit of assembly....

for segment reporting changes: stos/scas/lods do not actually need
  special segment override logic. instead, set their use of `es` when
  decoded, if appropriate. this is potentially ambiguous; in non-64bit
  modes the sequence `26aa` would decode as `stos` with explicit `es`
  prefix. this is now identical to simply decoding `aa`, which now also
  reports that there is an explicit `es` prefix even though there is no
  prefix on tne instruction.

  on the other hand, the prefix-reported segment now more accurately
  describes the memory selector through which memory accesses will
  happen. seems ok?
it is almost always the case that self.prefixes.segment == Segment::DS,
meaning testing for it first avoids checking
`self.operands[op].is_memory()` later. this overall avoids a few
instructions in the typical path, rather than checking `is_memory()`
first (which would always be true in the places this function is called
from)
testing against six opcodes to see if we should print rep or repnz is a
bit absurd. they are relatively rare instructions, so this is a long
sequence of never-taken tests. we can avoid the whole thing in the
common case by testing if there is any kind of rep prefix at all.
the match on opcode should have been dce, match on operands would only matter if there was a bug
this reduces a `slice::contains` to a single bit test, and regroups
prefix printing to deduplicate checks of the `rep` prefix

seemingly this reduces instruction counts by about 1%, cycles by 0.3% or
so.
the reasoning for *why* `visit_operand` is better here lives as doc
comments on `visit_operand` itself: it avoids going from scattered
operand details to `enum Operand` only to deconstruct the enum again.
instead, branch arms can get codegen'd directly against `struct
Instruction` layout.
write_2 will never actually be used, but im adapting it into contextualize in a... better way
`name()` returning a `[u8; 2]` is nice when there is a specializing and
unrolling write implementation, whereas `&str` might not consistently
unroll into a simple 2-byte copy (rather than loop). it'll look a little
more reasonable soon, hopefully..
it turns out that yaxpeax-arch's notion of colorization has been broken from the start for systems that do markup without inline sequences (e.g. windows/cmd.exe before vt100 support)
if mem_size is ever out of bounds thats a severe bug on its own
fix 32-bit 66-prefixed ff /2 call not having 16-bit operands

fix momentary regression in rendering `call` instructions to string
this is also checked by a new fuzz target
ffi/ still needs... much more work
@iximeow iximeow merged commit 24b33d5 into no-gods-no- Jun 24, 2024
1 check passed
@5225225
Copy link
Contributor

5225225 commented Jun 25, 2024

Actually, I believe safer_unchecked is not needed anymore, this should be checked by default in debug builds nowadays.

fn main() {
    let slice = &[1];
    unsafe {
        let _ = slice.get_unchecked(2);
    }
}
   Compiling playground v0.0.1 (/playground)
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.47s
     Running `target/debug/playground`
thread 'main' panicked at library/core/src/panicking.rs:220:5:
unsafe precondition(s) violated: slice::get_unchecked requires that the index is within the slice
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
thread caused non-unwinding panic. aborting.

see: rust-lang/rust#120594 (before that, the checks existed, but were optimized out unless you rebuilt the stdlib since the std is built in release mode)

The checking in the std is also a lot more complete than just some bounds checks. Still no replacement for miri, but tells you when you're doing something blatantly wrong without a huge compatibility/perf hit.

@iximeow
Copy link
Owner Author

iximeow commented Jun 25, 2024

oh! well that's great. serves me right for using an old rust for testing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants