significantly improve instruction printing efficiency #34

iximeow · 2024-06-24T22:37:38Z

this is where much of iximeow/yaxpeax-arch#7 originated.

std::fmt as a primary writing mechanism has.. some limitations:

memcpy-style copies of small fixed length arrays become memcpy, unless done via a loop with ops::Index rust-lang/rust#92993 (comment)
small non-fixed-size bytewise copy is transformed to much slower memcpy llvm/llvm-project#87440
improve codegen of fmt_num to delete unreachable panic rust-lang/rust#122770

and some more interesting more fundamental limitations - writing to a T: fmt::Write means implementations don't know if it's possible to write bytes in reverse order (useful for printing digits) or if it's OK to write too many bytes and then only advance len by the correct amount (useful for copying variable-length-but-short strings like register names). these are both perfectly fine to a String or Vec, less fine to do to a file descriptor like stdout.

at the same time, Colorize and traits depending on it are very broken, for reasons described in yaxpeax-arch.

so, this adapts yaxpeax-x86 to use the new DisplaySink type for writing, with optimizations where appropriate and output spans for certain kinds of tokens - registers, integers, opcodes, etc. it's not a perfect replacement for Colorize-to-ANSI-supporting-outputs but it's more flexible and i think can be made right.

along the way this completes the move of safer_unchecked out to yaxpeax-arch (ty @5225225 it's still so useful), cleans up some docs, and comes with a few new test cases.

because of the major version bump of yaxpeax-arch, and because this removes most functionality of the Colorize impl - it prints the correct words, just without coloring - this is itself a major version bump to 2.0.0. yay! this in turn is a good point to change the Opcode enums from being tuple-like to struct-like, and i've done so in 1b8019d.

full notes in CHANGELOG ofc. this is notes for myself when i'm trying to remember any of this in two years :)

for mem size labels: add one new "BUG" entry at the start of the array so `mem_size` does not need to be adjusted before being used to look up a string from the `MEM_SIZE_STRINGS` array. it's hard to measure the direct benefit of this, but it shrinks codegen size by a bit and simplfies a bit of assembly.... for segment reporting changes: stos/scas/lods do not actually need special segment override logic. instead, set their use of `es` when decoded, if appropriate. this is potentially ambiguous; in non-64bit modes the sequence `26aa` would decode as `stos` with explicit `es` prefix. this is now identical to simply decoding `aa`, which now also reports that there is an explicit `es` prefix even though there is no prefix on tne instruction. on the other hand, the prefix-reported segment now more accurately describes the memory selector through which memory accesses will happen. seems ok?

it is almost always the case that self.prefixes.segment == Segment::DS, meaning testing for it first avoids checking `self.operands[op].is_memory()` later. this overall avoids a few instructions in the typical path, rather than checking `is_memory()` first (which would always be true in the places this function is called from)

testing against six opcodes to see if we should print rep or repnz is a bit absurd. they are relatively rare instructions, so this is a long sequence of never-taken tests. we can avoid the whole thing in the common case by testing if there is any kind of rep prefix at all.

the match on opcode should have been dce, match on operands would only matter if there was a bug

this reduces a `slice::contains` to a single bit test, and regroups prefix printing to deduplicate checks of the `rep` prefix seemingly this reduces instruction counts by about 1%, cycles by 0.3% or so.

the reasoning for *why* `visit_operand` is better here lives as doc comments on `visit_operand` itself: it avoids going from scattered operand details to `enum Operand` only to deconstruct the enum again. instead, branch arms can get codegen'd directly against `struct Instruction` layout.

write_2 will never actually be used, but im adapting it into contextualize in a... better way

`name()` returning a `[u8; 2]` is nice when there is a specializing and unrolling write implementation, whereas `&str` might not consistently unroll into a simple 2-byte copy (rather than loop). it'll look a little more reasonable soon, hopefully..

it turns out that yaxpeax-arch's notion of colorization has been broken from the start for systems that do markup without inline sequences (e.g. windows/cmd.exe before vt100 support)

if mem_size is ever out of bounds thats a severe bug on its own

fix 32-bit 66-prefixed ff /2 call not having 16-bit operands fix momentary regression in rendering `call` instructions to string

this is also checked by a new fuzz target

ffi/ still needs... much more work

5225225 · 2024-06-25T02:23:18Z

Actually, I believe safer_unchecked is not needed anymore, this should be checked by default in debug builds nowadays.

fn main() {
    let slice = &[1];
    unsafe {
        let _ = slice.get_unchecked(2);
    }
}

   Compiling playground v0.0.1 (/playground)
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.47s
     Running `target/debug/playground`
thread 'main' panicked at library/core/src/panicking.rs:220:5:
unsafe precondition(s) violated: slice::get_unchecked requires that the index is within the slice
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
thread caused non-unwinding panic. aborting.

see: rust-lang/rust#120594 (before that, the checks existed, but were optimized out unless you rebuilt the stdlib since the std is built in release mode)

The checking in the std is also a lot more complete than just some bounds checks. Still no replacement for miri, but tells you when you're doing something blatantly wrong without a huge compatibility/perf hit.

iximeow · 2024-06-25T03:08:58Z

oh! well that's great. serves me right for using an old rust for testing

iximeow added 30 commits April 2, 2024 00:29

lets see how a visitor for operands works out here...

8b79d59

less write, more write_str

ed4f238

display: remove some pointless checks

050bc1c

the match on opcode should have been dce, match on operands would only matter if there was a bug

use a bit of Opcode to indicate rep/repne applicability

214da3d

this reduces a `slice::contains` to a single bit test, and regroups prefix printing to deduplicate checks of the `rep` prefix seemingly this reduces instruction counts by about 1%, cycles by 0.3% or so.

commit unshippable wildly unsafe asm-filled printing code

ead58f6

write_2 will never actually be used, but im adapting it into contextualize in a... better way

move to shared (safe) impl of RelativeBranchPrinter

2df5d55

remove branch better handled elsewhere

6f03fac

use less of core::fmt, write by hand

7ab69f6

`name()` returning a `[u8; 2]` is nice when there is a specializing and unrolling write implementation, whereas `&str` might not consistently unroll into a simple 2-byte copy (rather than loop). it'll look a little more reasonable soon, hopefully..

might be an ok way to redesign colorization....

0399548

it turns out that yaxpeax-arch's notion of colorization has been broken from the start for systems that do markup without inline sequences (e.g. windows/cmd.exe before vt100 support)

add token spans for some registers

1f18a96

enough infratructure to avoid bounds checks, at incredible user cost

0e99d94

figuring out how to handle short variable-size strings

2ac7935

helper to clear BigEnoughString

00dc2b6

a few more accurate hints

4af752a

less integer formatting in operands

49f5472

move away from fmt for visit_i64 and displacements too

758ddc6

move non-avx512 operand printing away from fmt

4142a4a

mem size strings are all 7b or less

53012e2

write_fixed_size impls for string and BigEnoughString

166695d

actually use small-string specializations when available

4fb6542

looks like that becomes memcpy, not ideal

754e0da

avoid intermediate buffer and copy of hex-formatted ints

9314571

slightly more centralized hex formatting

bebba5a

write_fixed_size really should always be inlined...

514586f

visit_disp is called in only two places, is tiny..

0717863

use get_kinda_unchecked for mem size strings

afc361c

if mem_size is ever out of bounds thats a severe bug on its own

iximeow added 27 commits June 23, 2024 12:53

adapt the rest of formating changes to protected_mode

b121313

adapt OperandVisitor and related to real_mode

2252844

fix inlining attributes re. profiling flag in protected_mode

949aa2e

normalize imports, pull safer_unchecked from yaxpeax-arch

f70232d

adapt protected-mode display to real mode

dc500de

forward long deprecation allowances as appropriate

2ac46a9

add additional call test cases

2002347

fix 32-bit 66-prefixed ff /2 call not having 16-bit operands fix momentary regression in rendering `call` instructions to string

stale file

4225510

InstructionTextBuffer for all three modes, adjust fuzzer to match

9d9bb9b

fuzz caught negation bug

1fdd243

another fuzz bug

24d5384

last vestiges of initial perf experiments

bc4abf8

cfg_attr wants feature, not features plural

0a5e948

remove yaxpeax-x86 safer_unchecked.rs, it is now in yaxpeax-arch

09dcfca

fix several sources of dead code warnings in various crate configs

25b9a53

nightly correctly remarked that == on fat pointers is ambiguous

577b8e8

update yaxpeax-arch to 0.3.1, fix fuzz target warnings

238d65c

note yaxpeax-arch version bump in changelog

0e35363

remove selects_cs(), cs() now does the right thing

b8a294d

rename most operand variants, make them structy rather than tupley

1b8019d

one more stray docs error

f4ae2ed

consistently enter register/number/opcode spans

ddde47c

justify the current max instruction length

dd8bd5c

this is also checked by a new fuzz target

bump cargo version to 2.0.0, not quite releasing yet

42f29e3

bench: fetch from fork updated for yaxpeax-x86 2.0.0

016583f

add missing feature flag to real-mode ffi library

6a5ea10

ffi/ still needs... much more work

document one more stray unsafe

24b33d5

iximeow merged commit 24b33d5 into no-gods-no- Jun 24, 2024
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

significantly improve instruction printing efficiency #34

significantly improve instruction printing efficiency #34

iximeow commented Jun 24, 2024

5225225 commented Jun 25, 2024

iximeow commented Jun 25, 2024

significantly improve instruction printing efficiency #34

significantly improve instruction printing efficiency #34

Conversation

iximeow commented Jun 24, 2024

5225225 commented Jun 25, 2024

iximeow commented Jun 25, 2024